I have noticed my pyspark codes are causing memory out of error. Using VirtualVM, I have noticed points where heap size increases over the executor memory, and changed the codes. Now that I am trying to deploy codes with bigger data and in dataproc, I found it hard to find a good way to monitor the heap size. Is there any good way to monitor the runtime heap size? I think It would be easiest if I can print out the runtime heap size via py4j or any other libraries.
Related
We have two processes (p1 and p2) in a JVM container (in Docker) using kubernetes.
The resource Limit (in the helm chart) for the container is set to 1000 MiB.
We set the XX:MaxRAMPercentage to 50% (=500 MiB). How will the heap distribution for each process look like?
Will they p1 and p2 equally so they will have 250 MiB each that cannot be exceeded?
Or will they share the whole heap of 500 MiB that cannot be exceeded?
The heap memory is just a part of the memory consumed by the JVM - there is also stack and native memory; the runtime, the JIT and the garbage collector also need memory. So, a typical Java application run with -Xmx500m will need approximately 700-1000MB of RAM (when using the full heap). The full memory usage heavily depends on what your application is doing and how it allocates and deallocates memory - some Java apps with 1GB of heap can use 20GB of RAM.
Back to your question: when you limit the container to 1000MiB and run two same-siced, pretty standard Java web applications, I would size the JVMs with -Xmx300m (or if you really want to use relative values: -XX:MaxRAMPercentage=30.0).
For more information: this answer gives a good overview of Java memory.
We're running our application in Wildfly 14.0.1, with a -Xmx of 4096, running with OpenJDK 11.0.2. I've been using VisualVM 1.4.2 to monitor our heap since we previously were having OOM exceptions (because our -Xmx was only 512 which was incredibly bad).
While we are well within our memory allocation now, we have no more OOM exceptions happening, and even with a good amount of clients and processing happening we're nowhere near the -Xmx4096 (the servers have 16GB so memory isn't an issue), I'm seeing some strange heap behavior that I can't figure out where it's coming from.
Using VisualVM, Eclipse MemoryAnalyzer, as well as heaphero.io, I get summaries like the following:
Total Bytes: 460,447,623
Total Classes: 35,708
Total Instances: 2,660,155
Classloaders: 1,087
GC Roots: 4,200
Number of Objects Pending for Finalization: 0
However, in watching the Heap Monitor, I see the Used Heap over a 4 minute time period increase by about 450MB before the GC runs and drops back down only to spike again. Here's an image:
This is when no clients are connected and nothing is actively happening in our application. We do use Apache File IO to monitor remote directories, we have JMS topics, etc. so it's not like the application is completely idle, but there's zero logging and all that.
My largest objects are the well-known io.netty.buffer.PoolChunk, which in the heap dumps are about 60% of my memory usage, the total is still around 460MB so I'm confused why the heap monitor is going from ~425MB to ~900MB repeatedly, and no matter where I take my snapshots, I can't see any large increase of object counts or memory usage.
I'm just seeing a disconnect between the heap monitor, and .hprof analysis. So there doesn't see a way to tell what's causing the heap to hit that 900MB peak.
My question is if these heap spikes are totally expected when running within Wildfly, or is there something within our application that is spinning up a bunch of objects that then get GC'd? In doing a Component report, objects in our application's package structure make up an extremely small amount of the dump. Which doesn't clear us, we easily could be calling things without closing appropriately, etc.
I have an apache spark application that does the following steps:
inputFile(#s3loc)
mapPartititions(mapper).groupByKey.mapPartitions(reducer).saveAsHadoopFile(params)
When I run this on a small data size it runs fine (around 100 files each one a gzipped 4k-5MB file). When the input size is large (same file size but 14k files) I get a java heap space error on message.serialization and bytearray and something of the sort.
I experimented a bit with my cluster (EMR) and for a cluster size of 60 m2.2x large machines each with 32 gigs of RAM and 4 cores I set the spark.default.parallelism=960 ie, 4 tasks per core. This threw the same error as above. When I changed this parallelism to 240 or 320 my tasks executed smoothly but it was pretty slow. What is causing this heap overflow? Most places that I have read up recommend around 3-4 tasks per core which should make 960 a good choice. How do I increase the number of tasks without causing a heap overflow?
Part of the logs (the latter end) can be found at : http://pastebin.ca/3078231
Tried to update server.xml, deleted dumps and temporary cache files from C:\Users\username\AppData\Local\javasharedresources
And still not able to start the server.
This is the following error message I get:
JVMDUMP010I Java dump written to C:\WAS8\profiles\AppSrv02\bin\javacore.20150210.094417.6468.0009.txt
JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
There could be multiple causes for the OutOfMemoryError exceptions. It could be that there is a memory leak in one of the applications that is loaded on startup, or the maximum heap size is not set high enough to support all of the components loaded on startup.
It is best to go through a troubleshooting exercise. I suggest you download heap analyzer tool from here and analyze the javacore file to see where the potential leak, if any, could be.
If you can't find a memory leak, try increasing the JVM maximum heap size. Check that your host system has enough RAM to support the chosen maximum JVM heap size.
Earlier I didn't update the server.xml with correct arguments and once I updated server.xml with genericJvmArguments="-Xms1024M -Xmx2048M" and InitialHeapSize="1024" maximumHeapSize="2048", I was able to start the server.
I'm still having some problems with this error in Talend. I already changed the VM Arguments to this:
Arguments:
-Xms1024m
-Xms1024m
And then I always get this error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
Any suggestions?
The -Xms option sets the initial and minimum Java heap size
You increased the initial heap size not the maximum.
You need to use Xmx like #David Tonhofer said.
If this is not enough you should watch your memory management. Having too much lookup (data volume)in a same subjob or storing big volume of data in tHash can lead to memory problems.
Additionally I would suggest to check -XX:MaxPermSize parameter as well. In case of bigger jobs I need to change it to -XX:MaxPermSize=512m