If there is no deadlock between threads, consider other reasons why threads are not carrying out useful work.
The third reason, where threads are waiting (for instance, on sockets) for I/O, requires further investigation. Has the process at the other end of the I/O failed? Do any network problems exist?
To see how the javadump tool is used to diagnose loops, see Threads and stack trace (THREADS). If you cannot diagnose the problem from the javadump and if the process still seems to be using processor cycles, either it has entered an infinite loop or it is suffering from very bad performance. Using ps -mp [process id] -o THREAD allows individual threads in a particular process to be monitored to determine which threads are using the CPU time. If the process has entered an infinite loop, it is likely that a small number of threads will be using the time. For example:
$ ps -mp 43824 -o THREAD USER PID PPID TID ST CP PRI SC WCHAN F TT BND COMMAND wsuser 43824 51762 - A 66 60 77 * 200001 pts/4 - java ... - - - 4021 S 0 60 1 22c4d670 c00400 - - - - - - 11343 S 0 60 1 e6002cbc 8400400 - - - - - - 14289 S 0 60 1 22c4d670 c00400 - - - - - - 14379 S 0 60 1 22c4d670 c00400 - - - ... - - - 43187 S 0 60 1 701e6114 400400 - - - - - - 43939 R 33 76 1 20039c88 c00000 - - - - - - 50275 S 0 60 1 22c4d670 c00400 - - - - - - 52477 S 0 60 1 e600ccbc 8400400 - - - ... - - - 98911 S 0 60 1 7023d46c 400400 - - - - - - 99345 R 33 76 0 - 400000 - - - - - - 99877 S 0 60 1 22c4d670 c00400 - - - - - - 100661 S 0 60 1 22c4d670 c00400 - - - - - - 102599 S 0 60 1 22c4d670 c00400 - - - ...
Those threads with the value 'R' under 'ST' are in the 'runnable' state, and therefore are able to accumulate processor time. What are these threads doing? The output from ps shows the TID (Kernel Thread ID) for each thread. This can be mapped to the Java thread ID using dbx. The output of the dbx thread command gives an output of the form of:
thread state-k wchan state-u k-tid mode held scope function $t1 wait 0xe60196bc blocked 104099 k no sys _pthread_ksleep >$t2 run blocked 68851 k no sys _pthread_ksleep $t3 wait 0x2015a458 running 29871 k no sys pthread_mutex_lock ... $t50 wait running 86077 k no sys getLinkRegister $t51 run running 43939 u no sys reverseHandle $t52 wait running 56273 k no sys getLinkRegister $t53 wait running 37797 k no sys getLinkRegister $t60 wait running 4021 k no sys getLinkRegister $t61 wait running 18791 k no sys getLinkRegister $t62 wait running 99345 k no sys getLinkRegister $t63 wait running 20995 k no sys getLinkRegister
By matching the TID value from ps to the k-tid value from the dbx thread command, you can see that the currently running methods in this case are reverseHandle and getLinkRegister.
Now you can use dbx to generate the C thread stack for these two threads using the dbx thread command for the corresponding dbx thread numbers ($tx). To obtain the full stack trace including Java frames, map the dbx thread number to the threads pthread_t value, which is listed by the Javadump file, and can be obtained from the ExecEnv structure for each thread using the Dump Viewer. Do this with the dbx command thread info [dbx thread number], which produces an output of the form:
thread state-k wchan state-u k-tid mode held scope function $t51 run running 43939 u no sys reverseHandle general: pthread addr = 0x220c2dc0 size = 0x18c vp addr = 0x22109f94 size = 0x284 thread errno = 61 start pc = 0xf04b4e64 joinable = yes pthread_t = 3233 scheduler: kernel = user = 1 (other) event : event = 0x0 cancel = enabled, deferred, not pending stack storage: base = 0x220c8018 size = 0x40000 limit = 0x22108018 sp = 0x22106930
Showing that the TID value from ps (k-tid in dbx) corresponds to dbx thread number 51, which has a pthread_t of 3233. Looking for the pthread_t in the Javadump file, you now have a full stack trace:
"Worker#31" (TID:0x36288b10, sys_thread_t:0x220c2db8) Native Thread State: ThreadID: 00003233 Reuse: 1 USER SUSPENDED Native Stack Data : base: 22107f80 pointer 22106390 used(7152) free(250896) ----- Monitors held ----- java.io.OutputStreamWriter@3636a930 com.ibm.servlet.engine.webapp.BufferedWriter@3636be78 com.ibm.servlet.engine.webapp.WebAppRequestDispatcher@3636c270 com.ibm.servlet.engine.srt.SRTOutputStream@36941820 com.ibm.servlet.engine.oselistener.nativeEntry.NativeServerConnection@36d84490 JNI pinning lock ----- Native stack ----- _spin_lock_global_common pthread_mutex_lock - blocked on Heap Lock sysMonitorEnterQuicker sysMonitorEnter unpin_object unpinObj jni_ReleaseScalarArrayElements jni_ReleaseByteArrayElements Java_com_ibm_servlet_engine_oselistener_nativeEntry_NativeServerConnection_nativeWrite ------ Java stack ------ () prio=5 com.ibm.servlet.engine.oselistener.nativeEntry.NativeServerConnection.write(Compiled Code) com.ibm.servlet.engine.srp.SRPConnection.write(Compiled Code) com.ibm.servlet.engine.srt.SRTOutputStream.write(Compiled Code) java.io.OutputStreamWriter.flushBuffer(Compiled Code) java.io.OutputStreamWriter.flush(Compiled Code) java.io.PrintWriter.flush(Compiled Code) com.ibm.servlet.engine.webapp.BufferedWriter.flushChars(Compiled Code) com.ibm.servlet.engine.webapp.BufferedWriter.write(Compiled Code) java.io.Writer.write(Compiled Code) java.io.PrintWriter.write(Compiled Code) java.io.PrintWriter.write(Compiled Code) java.io.PrintWriter.print(Compiled Code) java.io.PrintWriter.println(Compiled Code) pagecompile._identifycustomer_xjsp.service(Compiled Code) javax.servlet.http.HttpServlet.service(Compiled Code) com.ibm.servlet.jsp.http.pagecompile.JSPState.service(Compiled Code) com.ibm.servlet.jsp.http.pagecompile.PageCompileServlet.doService(Compiled Code) com.ibm.servlet.jsp.http.pagecompile.PageCompileServlet.doGet(Compiled Code) javax.servlet.http.HttpServlet.service(Compiled Code) javax.servlet.http.HttpServlet.service(Compiled Code)
And, using the full stack trace, it should be possible to identify any infinite loop that might be occurring. The above example shows the use of spin_lock_global_common, which is a busy wait on a lock, hence the use of CPU time.