Common heap behavior for Wildfly or application memory leak? - wildfly

We're running our application in Wildfly 14.0.1, with a -Xmx of 4096, running with OpenJDK 11.0.2. I've been using VisualVM 1.4.2 to monitor our heap since we previously were having OOM exceptions (because our -Xmx was only 512 which was incredibly bad).
While we are well within our memory allocation now, we have no more OOM exceptions happening, and even with a good amount of clients and processing happening we're nowhere near the -Xmx4096 (the servers have 16GB so memory isn't an issue), I'm seeing some strange heap behavior that I can't figure out where it's coming from.
Using VisualVM, Eclipse MemoryAnalyzer, as well as heaphero.io, I get summaries like the following:
Total Bytes: 460,447,623
Total Classes: 35,708
Total Instances: 2,660,155
Classloaders: 1,087
GC Roots: 4,200
Number of Objects Pending for Finalization: 0
However, in watching the Heap Monitor, I see the Used Heap over a 4 minute time period increase by about 450MB before the GC runs and drops back down only to spike again. Here's an image:
This is when no clients are connected and nothing is actively happening in our application. We do use Apache File IO to monitor remote directories, we have JMS topics, etc. so it's not like the application is completely idle, but there's zero logging and all that.
My largest objects are the well-known io.netty.buffer.PoolChunk, which in the heap dumps are about 60% of my memory usage, the total is still around 460MB so I'm confused why the heap monitor is going from ~425MB to ~900MB repeatedly, and no matter where I take my snapshots, I can't see any large increase of object counts or memory usage.
I'm just seeing a disconnect between the heap monitor, and .hprof analysis. So there doesn't see a way to tell what's causing the heap to hit that 900MB peak.
My question is if these heap spikes are totally expected when running within Wildfly, or is there something within our application that is spinning up a bunch of objects that then get GC'd? In doing a Component report, objects in our application's package structure make up an extremely small amount of the dump. Which doesn't clear us, we easily could be calling things without closing appropriately, etc.

Related

Identify source of increased PostgreSQL shared memory

After upgrading from PostgreSQL 9.6 to 10 and also updating the overlying application (Trend Micro Deep Security), we see an increase in the overall shared memory utilization by more than 300%.
Currently the shared memory is around 3GB, which aligns with the shared_buffer parameter, set to 3072MB. Using ps, top and pmap I was able to tell that almost the entire shared memory is used by postgres-related processes.
However, I would like to know what's the actual cause of that increase. Is there anyway to identify the real root cause?

why is kdb process showing high memory usage on system?

I am running into serious memory issues with my kdb process. Here is the architecture in brief.
The process runs in slave mode (4 slaves). It loads a ton of data from database into memory initially (total size of all variables loaded in memory calculated from -22! is approx 11G). Initially this matches .Q.w[] and close to unix process memory usage. This data set increases by very little in incremental operations. However, after a long operation, although the kdb internal memory stats (.Q.w[]) show expected memory usage (both used and heap) ~ 13 G, the process is consuming close to 25G on the system (unix /proc, top) eventually running out of physical memory.
Now, when I run garbage collection manually (.Q.gc[]), it frees up memory and brings unix process usage close to heap number displayed by .Q.w[].
I am running Q 2.7 version with -g 1 option to run garbage collection in immediate mode.
Why is unix process usage so significantly differently from kdb internal statistic -- where is the difference coming from? Why is "-g 1" option not working? When i run a simple example, it works fine. But in this case, it seems to leak a lot of memory.
I tried with 2.6 version which is supposed to have automated garbage collection. Suprisingly, there is still a huge difference between used and heap numbers from .Q.w when running with version 2.6 both in single threaded (each) and multi threaded modes (peach). Any ideas?
I am not sure of the concrete answer but this is my deduction based on following information (and some practical experiments) which is mentioned on wiki:
http://code.kx.com/q/ref/control/#peach
It says:
Memory Usage
Each slave thread has its own heap, a minimum of 64MB.
Since kdb 2.7 2011.09.21, .Q.gc[] in the main thread executes gc in the slave threads too.
Automatic garbage collection within each thread (triggered by a wsful, or hitting the artificial heap limit as specified with -w on the command line) is only executed for that particular thread, not across all threads.
Symbols are internalized from a single memory area common to all threads.
My observations:
Thread Specific Memory:
.Q.w[] only shows stats of main thread and not the summation of all the threads (total process memory). This could be tested by starting 'q' with 2 threads. Total memory in that case should be at least 128MB as per point 1 but .Q.w[] it still shows 64 MB.
That's why in your case at the start memory stats were close to unix stats as all the data was in main thread and nothing on other threads. After doing some operations some threads might have taken some memory (used/garbage) which is not shown by .Q.w[].
Garbage collector call
As mentioned on wiki, calling garbage collector on main thread calls GC on all threads. So that might have collected the garbage memory from threads and reduced the total memory usage which was reflected by reduced unix memory stats.

MongoDB, NUMA hardware, page faults but enough RAM for working set, touch command or vmtouch/dd does not load into memory

MongoDB 2.46 & 2.4.8
Use case:
Load up 100.000 documents on a collection with 2 indexes. Resident memory increases (mongostat), and no page faults happen.
Restart mongod. Resident memory is low (this is expected)
Try to 'preheat' mongo, with touch command db.runCommand({ touch: collection, data: true, index: true }) or other means (on OS, vmtouch / dd)
a) On this step, on my development machine (MacOS), I see in mongostat a lot of page faults trying to heat it up (expected) and the resident memory is raised. From that point on, any updates do not raise page faults
b) On a numa server (256 GB RAM), even though I started up mongo with this guide: http://docs.mongodb.org/manual/administration/production-notes/#mongodb-on-numa-hardware (note: I do not have superuser access. However, the 2nd step, echoing 0 in /proc/sys/vm/zone_reclaim_mode, is already 0 so I left it like that), I cannot seem to be able to pre-heat the memory with the 'touch' command. Nothing happens, even though it returns successfully. In mongostat, only 'mapped' and 'vsize' is getting higher, and resident memory is the same (35m). I even tried to load up the data files in OS's memory with vmtouch and dd commands. Only re-indexing the collection changed the resident memory.
The problem started a while after I began to load up data into the server. I do a lot of upserts and the performance was awesome in the beginning (3000 - 4000 upserts/sec). This was expected because the working set would be able to fit in memory. After 30.000.000 documents the process seems to make a lot of page faults and I do not know why. The data files are approx. 33GB and the performance is about 500 upserts/sec, with a lot of page faults. That should mean that the working set is not in memory. However, 256GB RAM should be more than enough. I tried the 'touch' command, but resident memory was low (I even restarted the mongod process, ran the touch command, and even though 'mapped' and 'vsize' skyrocketed to a lot of GB, resident memory kept low, 35m). I tried to reIndex the collection and voilĂ , resident memory went from 35m -> 20GB. However, again, I saw page faults. Then I tried to vmtouch the data files (or with dd). Again, a lot of page faults.
The problem is that I cannot have 'only' 500 upserts/sec. Should I change my application logic? I thought with 256GB memory my 'active' working set (expected 60GB) should fit in memory. I am in the middle (30GB) and it seems that I cannot do anything to fix this. Is it the numa hardware? Should I make any other changes?
Thanks in advance
I just wrote a pretty detailed answer over on ServerFault regarding resident memory, page faulting, and how to troubleshoot, tweak and tune etc. so I will not re-hash that here.
I will say that Sammaye's comment is correct, the touch (or dd, vmtouch etc.) command will not cause memory to be reported as resident agains the mongod process until the process actually accesses the data (until then it is just in the FS cache), and then you can hit the issue in SERVER-9415 which can cause resident memory to be under reported.
I think you are already looking at the key metrics here, and you should be able to achieve higher resident memory than you are reporting (or at least, get more data into memory without significant page faults being seen). The situation you are describing sounds like memory pressure from elsewhere but I am assuming you would have notices another process eating significant amounts of memory.
What I will note is that I have previously spent days (literally) attempting to make a particular AWS instance go above a 30% memory threshold without success.
When we finally gave up and tried on another instance, without changing a thing (we just added a new instance as a secondary and failed over to it) it instantly went to over 70% resident memory. Granted, that was on m2.4xlarge instances, so not at the same scale as yours, but it's always worth bearing in mind. If you can try it on another instance, I would recommend giving it a shot.

Everyday GC is running on same time

In my server everyday 3:00AM GC is running and Heapspace is filling in a flash.
This causing site outage. ANY inputs?
following are my JVM settings.I am using JBOSS server.
-Dprogram.name=run.sh -server -Xms1524m -Xmx1524m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -XX:NewSize=512m -XX:MaxNewSize=512m -Djava.net.preferIPv4Stack=true -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Djavax.net.ssl.trustStorePassword=changeit -Dcom.sun.management.jmxremote.port=8888 -Djava.rmi.server.hostname=192.168.100.140 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote
Any suggestions really helpful..
(This turned out somewhat long; there is an actual suggestion for a fix at the end.)
Very very briefly, garbage collection when you use -XX:+UseConcMarkSweepGC works like this:
All objects are allocated in the so-called young generation. This is typically a couple of hundred megs up to a gig in size, depending om VM settings, number of CPU:s and total heap size. The young generation is collected in a stop-the-world pause, followed by a parallel (multiple CPU) compacting (moving objects) collection. The young generation is sized so as to make this pause reasonably large.
When objects have survived (are still reachable) young gen they get promoted to "old-gen" (old generation).
The old generation is where -XX:+UseConcMarkSweepGC kicks in. In the default mode (without -XX:+UseConcMarkSweepGC) when the old generation becomes full the entire heap is collected and compacted (moving around, eliminating fragmentation) at once in a stop-the-world copy. That pause will typically be longer than young-gen pauses because the entire heap is involved, which is bigger.
With CMS (-XX:+UseConcMarkSweepGC) the work to compact the old generation is mostly concurrent (meaning, running in the background with the application not paused). This work is also not compacting; it works more like malloc()/free() and you are subject to fragmentation.
The main upside of CMS is that when things work well, you avoid long pause times that are linear in the size of the heap, because the main work is cone concurrently (there are some stop-the-world steps involved but they are supposed to usually be short).
The two primary downsides are that:
You are subject to fragmentation because old-gen is not compacted.
If you don't finish a concurrent collection cycle before old-gen fills up, or if fragmentation prevents allocation, the resulting full collection of the entire heap is not parallel as it is with the default collector. I..e, only one CPU is used. That means that when/if you do hit a full garbage collection, the pause will be longer than it would have been with the default collector.
Now... your logs. "Concurrent mode failure" is intended to convey that the concurrent mark/sweep work did not complete in time for another young-gen GC that needs to promote surviving objects into the old generation. The "promotion failed" is rather that during promotion from young-gen to old-gen, an object was unable to be allocated in old-gen due to fragmentation.
Unless you are hitting a true bug in the JVM, the sudden increase in heap usage is almost certainly from your application, JBoss, or some external entity acting on your application. So I can't really help with that. However, what is likely happening is a combination of two things:
The spike in activity is causing an increase in heap usage too quick for the concurrent collection to complete in time.
Old-gen is too fragmented, causing problems especially when the old-gen is almost full.
I should also point out now that the default behavior of CMS is to try to postpone concurrent collections as long as possible (yet not too long) for performance reasons. The later it happens, the more efficient (in terms of CPU usage) the collection is. However, a trade-off is that you're increasing the risk of not finishing in time (which again, will trigger a full GC and a long pause). It should also (I have not made empirical tests here, but it stands to reason) result in fragmentation being a greater concern; basically the more full old-gen is when an object is promoted, the greater is the likelyhood that the object's promotion will worsen fragmentation concerns (too long to go into details here).
In your case, I would do two things:
Keep figuring out what is causing the activity. I would say it's fairly unlikely that it is a GC/JVM bug.
Re-configure the JVM to trigger concurrent collection cycles earlier in order to avoid the heap every becoming so full that fragmentation becomes a particularly huge concern, and giving it more time to complete in time even during your sudden spikes of activity.
You can accomplish (2) most easily be using the JVM options
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
in order to explicitly force the JVM to kick start a CMS cycle at a certain level of heap usage (in this example 75% - you may need to change that; the lower the percentage, the earlier it will kick in).
Note that depending on what your live size is (the number of bytes that are in fact live and reachable) in your application, forcing an earlier CMS cycle may also require that you increase your heap size to avoid CMS running constantly (not a good use of CPU).

JBoss memory usage pattern

Taking a look at one of my applications running in a production server, I noticed that there is a "normal but strange" behavior of memory usage.
Let me explain: watching the execution of my JBoss 4.2.2 without any application deployed, I can see that is constantly grows and frees the used heap space, usually a few megabytes in a development server. When I deploy my application, the pattern is the same, but using more memory in average.
Well, in the production server I can see that my JBoss, even without any workload, has a minimum memory usage of 1.5GB. Still without any workload, the heap usage grows to 3.6GB, when a minor GC runs and the heap usage comes back to 1.5GB. Every 40s my JBoss grows its heap usage from 1,5GB to 3.6GB, and this pattern repeats indefinitely. When the workload grows, the difference is that the period for growing the memory usage falls to 8s.
So, my question is: is this normal?