If vmstat has shown that the system is CPU-bound, the next stage is to determine which process is using the most CPU time.
> tprof -s -k -x sleep 60outputs:
Mon Nov 28 12:40:11 2005 System: AIX 5.2 Node: voodoo Machine: 00455F1B4C00 Starting Command sleep 60 stopping trace collection Generating sleep.prof > cat sleep.prof Process Freq Total Kernel User Shared Other ======= ==== ===== ====== ==== ====== ===== ./java 5 59.39 24.28 0.00 35.11 0.00 wait 4 40.33 40.33 0.00 0.00 0.00 /usr/bin/tprof 1 0.20 0.02 0.00 0.18 0.00 /etc/syncd 3 0.05 0.05 0.00 0.00 0.00 /usr/bin/sh 2 0.01 0.00 0.00 0.00 0.00 gil 2 0.01 0.01 0.00 0.00 0.00 afsd 1 0.00 0.00 0.00 0.00 0.00 rpc.lockd 1 0.00 0.00 0.00 0.00 0.00 swapper 1 0.00 0.00 0.00 0.00 0.00 ======= ==== ===== ====== ==== ====== ===== Total 20 100.00 64.70 0.00 35.29 0.00 Process PID TID Total Kernel User Shared Other ======= === === ===== ====== ==== ====== ===== ./java 467018 819317 16.68 5.55 0.00 11.13 0.00 ./java 467018 766019 14.30 6.30 0.00 8.00 0.00 ./java 467018 725211 14.28 6.24 0.00 8.04 0.00 ./java 467018 712827 14.11 6.16 0.00 7.94 0.00 wait 20490 20491 10.24 10.24 0.00 0.00 0.00 wait 8196 8197 10.19 10.19 0.00 0.00 0.00 wait 12294 12295 9.98 9.98 0.00 0.00 0.00 wait 16392 16393 9.92 9.92 0.00 0.00 0.00 /usr/bin/tprof 421984 917717 0.20 0.02 0.00 0.18 0.00 /etc/syncd 118882 204949 0.04 0.04 0.00 0.00 0.00 ./java 467018 843785 0.03 0.02 0.00 0.00 0.00 gil 53274 73765 0.00 0.00 0.00 0.00 0.00 gil 53274 61471 0.00 0.00 0.00 0.00 0.00 /usr/bin/sh 397320 839883 0.00 0.00 0.00 0.00 0.00 rpc.lockd 249982 434389 0.00 0.00 0.00 0.00 0.00 /usr/bin/sh 397318 839881 0.00 0.00 0.00 0.00 0.00 swapper 0 3 0.00 0.00 0.00 0.00 0.00 afsd 65776 274495 0.00 0.00 0.00 0.00 0.00 /etc/syncd 118882 258175 0.00 0.00 0.00 0.00 0.00 /etc/syncd 118882 196839 0.00 0.00 0.00 0.00 0.00 ======= === === ===== ====== ==== ====== ===== Total 100.00 64.70 0.00 35.29 0.00 Total Samples = 24749 Total Elapsed Time = 61.88sThis output shows that the Java™ process with Process ID (PID) 467018 is using the majority of the CPU time. You can also see that the CPU time is being shared among four threads inside that process (Thread IDs 819317, 766019, 725211, and 712827).
From the above example, notice the Kernel and Shared values: these account for all of the CPU time used by this process, indicating that the Java process is spending its time doing work inside the JVM (or some other native code).
To understand what is being done during the Kernel and Shared times, the relevant sections of the tprof output can be analyzed.
The shared library section shows which shared libraries are being invoked:
Shared Object % ============= ====== /j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9gc24.so 17.42 /usr/lib/libc.a[shr.o] 9.38 /usr/lib/libpthreads.a[shr_xpg5.o] 6.94 j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9thr24.so 1.03 j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9prt24.so 0.24 /j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9vm24.so 0.10 j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9ute24.so 0.06 j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9jit24.so 0.05 /usr/lib/libtrace.a[shr.o] 0.04 j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9trc24.so 0.02 p3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/libj9hookable24.so 0.01
In the example above, the CPU time is being spent inside the garbage collection (GC) implementation, implying either that there is a problem in GC or that GC is running almost continuously.
Again, you can obtain a more accurate understanding of what is occurring inside the libj9gc24.so library during the CPU time by analyzing the relevant section of the tprof output:
Profile: /work/j9vmap3224-20071224/inst.images/rios_aix32_6/sdk/jre/bin/ libj9gc24.so Total % For All Processes (/work/j9vmap3224-20071224/inst.images/rios_aix32_6/ sdk/jre/bin/libj9gc24.so) = 17.42 Subroutine % Source ========== ====== ====== Scheme::scanMixedObject(MM_Environment*,J9Object*) 2.67 MarkingScheme.cpp MarkingScheme::scanClass(MM_Environment*,J9Class*) 2.54 MarkingScheme.cpp .GC_ConstantPoolObjectSlotIterator::nextSlot() 1.96 jectSlotIterator.cpp lelTask::handleNextWorkUnit(MM_EnvironmentModron*) 1.05 ParallelTask.cpp orkPackets::getPacket(MM_Environment*,MM_Packet**) 0.70 WorkPackets.cpp cheme::fixupRegion(J9Object*,J9Object*,bool,long&) 0.67 CompactScheme.cpp WorkPackets::putPacket(MM_Environment*,MM_Packet*) 0.47 WorkPackets.cpp rkingScheme::scanObject(MM_Environment*,J9Object*) 0.43 MarkingScheme.cpp sweepChunk(MM_Environment*,MM_ParallelSweepChunk*) 0.42 allelSweepScheme.cpp ment*,J9IndexableObject*,J9Object**,unsigned long) 0.38 MarkingScheme.cpp M_CompactScheme::getForwardingPtr(J9Object*) const 0.36 CompactScheme.cpp ObjectHeapIteratorAddressOrderedList::nextObject() 0.33 dressOrderedList.cpp ckets::getInputPacketFromOverflow(MM_Environment*) 0.32 WorkPackets.cpp .MM_WorkStack::popNoWait(MM_Environment*) 0.31 WorkStack.cpp WorkPackets::getInputPacketNoWait(MM_Environment*) 0.29 WorkPackets.cpp canReferenceMixedObject(MM_Environment*,J9Object*) 0.29 MarkingScheme.cpp MarkingScheme::markClass(MM_Environment*,J9Class*) 0.27 MarkingScheme.cpp ._ptrgl 0.26 ptrgl.s _MarkingScheme::initializeMarkMap(MM_Environment*) 0.25 MarkingScheme.cpp .MM_HeapVirtualMemory::getHeapBase() 0.23 eapVirtualMemory.cpp
This output shows that the most-used functions are:
MarkingScheme::scanMixedObject(MM_Environment*,J9Object*) 2.67 MarkingScheme.cpp MarkingScheme::scanClass(MM_Environment*,J9Class*) 2.54 MarkingScheme.cpp ObjectSlotIterator.GC_ConstantPoolObjectSlotIterator::nextSlot() 1.96 ObjectSlotIterator.cpp ParallelTask::handleNextWorkUnit(MM_EnvironmentModron*) 1.05 ParallelTask.cpp
The values show that the time is being spent during the Mark phase of GC. Because the output also contains references to the Compact and Sweep phases, the GC is probably completing but that it is occurring continuously. You could confirm that likelihood by running with -verbosegc enabled.
The same methodology shown above can be used for any case where the majority of the CPU time is shown to be in the Kernel and Shared columns. If, however, the CPU time is classed as being "Other", a different methodology is required because tprof does not contain a section that correctly details which Java methods are being run.
In the case of CPU time being attributed to "Other", you can use a Javadump to determine the stack trace for the TIDs shown to be taking the CPU time, and therefore provide an idea of the work that it is doing. Map the value of TID shown in the tprof output to the correct thread in the Javadump by taking the tprof TID, which is stored in decimal, and convert it to hexadecimal. The hexadecimal value is shown as the "native ID" in the Javadump.
For the example above:
Process PID TID Total Kernel User Shared Other ======= === === ===== ====== ==== ====== ===== ./java 7018 819317 16.68 5.55 0.00 11.13 0.00
This thread is the one using the most CPU; the TID in decimal is 819317. This value is C8075 in hexadecimal, which can be seen in the Javadump:
3XMTHREADINFO "main" (TID:0x300E3500, sys_thread_t:0x30010734, state:R, native ID:0x000C8075) prio=5 4XESTACKTRACE at java/lang/Runtime.gc(Native Method) 4XESTACKTRACE at java/lang/System.gc(System.java:274) 4XESTACKTRACE at GCTest.main(GCTest.java:5)
These entries show that, in this case, the thread is calling GC, and explains the time spent in the libj9gc24.so shared library.