|
|
Hi Everyone,
With my limited experience in the area of Application Performance Analysis and Improvements mostly with Java-based applications, I would like to share my thoughts and experiences related to High CPU consumption in any Java-based applications. I do not intend to exhaustively list the possible causes for this.
Possible Causes of high CPU:
Applications can suffer from high CPU Utilization. There are numerous reasons for high CPU consumption in a machine.
1. It may be related with the Application processes.
a. Process spawning lots of threads in high load condition where there is no throttling at all. This can cause unusually high CPU consumption just because of the unexpected load on the application. This will generally consume the CPU cycles available on the machine. The situation can be handled more properly by implementing pools and throttling for incoming requests in Webservers or Application Servers. The Throttling can be implemented in Kernel as well Application level. Implementing the kernel and Application Level is a separate topic on its own that requires more details. If the application is a Java-based then high CPU can also occur due to large number of Garbage Collections and respective Memory utilization. Thus, Memory utilization can also indirectly affect the CPU Utilization.
b. An application can be suffering from serious design flaws or coding defects. The examples include, lots of unnecessary synchronized code or regions complemented by high threads contending on same mutexes (lots of threads in Object.wait(), defective code having long iterating loops, Memory or any other resource Leak in the code can also sometimes cause high CPU indirectly. It is also seen sometimes that due to Java-Compilation and Java-Interpretation of certain methods in Java-based applications can also lead to high CPU consumption. Interpreters always known to consume comparatively high CPU then compiled instructions. Hot Spot JVM has a reserved native memory area known as Code-Cache which stores the hot-spot methods compiled instructions in this area. If this area is configured too small then this can lead to more and more methods to be interpreted and thus can increase the CPU as well. Often, checking the native thread dumps repeatedly can reveal this problem. Increasing the Code Cache size (-XX:InitialCodeCacheSize and -XX:ReservedCodeCacheSize) has proven to improve the CPU utilization.
c. As pointed out before, if more threads are contending on same mutex, its likely that these threads consume high CPU overall. This state can also be reached in very low level tasks of the application like native memory allocation via malloc. This type of calls can also cause threads to contend to get a chunk of memory from the heap area of the process. There are several configurations to be taken care to make sure the application is not spending time in memory allocations (malloc) in native. Malloc is a memory allocator that uses low level system calls of OS i.e. mmap or brk to expand or shrink the native heap area of a process. There are several implementations of malloc and the most popular implementation is using Arenas concept. To avoid threads contending to reserve the memory or free up the memory, malloc has to implement all these operations in a thread safe way. Thus, its highly possible that this contention increases as number of threads allocating memory or freeing memory increases resulting in finally CPU increase. Due to this problem, malloc implementation is based on arena concept. An Arena is nothing but a chunk of heap to which a particular thread is made sticky for its memory allocations in future. The Number of Arenas and size of each Arena configured for a process serves very important role here in this behavior. The size of an arena is variable depending upon to factors,
1) size of arena expansion.
2) size of the malloc request,
The default expansion is 32*4096 bytes. This can be adjusted with the _M_ARENA_OPTS environment variable. IF the malloc is large than the expansion, then the arena will be rounded up based on malloc request.
The reason why we do arenas is because the C heap is a global resource within the process. Its basic data structure is a Cartesian double linked list. Every malloc and free has to access this list to find where to either allocate from or put into the free list. In a singled threaded app, this is not a problem but we need a way to control concurrent threads accessing this data structure. So what we did was divide the heap among the threads. In effect what happens now is every thread mod number of arenas shares the same arena, this way multiple threads have the potential to access the heap at the same time or only be block while fewer threads access the same arena. This multiplexes the access and allows far higher throughput than a single point access
2. May be related with other processes not at all related with the application process es.
a. There may be a scenario where a particular application may not at all be related with high CPU utilization in a machine. This may be due to totally unrelated processes like batch jobs, alerting jobs, cron jobs etc. The details can be found out by using popular unix utilities like top, sar, ps, glance, tusc, svmon, prstat etc.
3. May be related with other system/OS related processes
a. Operating system has scheduled jobs/processes that can be consuming high CPU for a particular time period. Generally these processes run with higher priority (nice value) then other regular application processes that may lead to consuming high CPU.
4. Or maybe the complementary effect of Mutiple Unrelated Processeses.
a. This situation can be very well formed when different processes with different priorities starts interacting with each other. For example, Oracle DB server spawns several DB processes. If some of processes run on lower priority then other processes, the situation can very well lead to high CPU consumption in DB server. This is because, If the processes running on lower priority acquires a lock on a shared resource and gets pre-empted by kernel due to context switch by other higher priority processes requiring the same resource, the lower priority process while holding the lock gets swapped/paged out. Once the higher priority processes gets scheduled to be run and if they try to lock the same resource, the higher processes again start spinning on that resource (waiting for that resource to be unlocked). This resource won’t get unlocked until the lower process gets swapped in and scheduled by kernel to run. This situation can cause a cascading effect leading to large of number of threads spinning on resources causing high CPU.