Cannot grow BufferHolder; exceeds size limitation - scala

any help on spark error "Cannot grow BufferHolder; exceeds size limitation"
I have tried using databricks recommended solution https://kb.databricks.com/en_US/sql/cannot-grow-bufferholder-exceeds-size but no success.
I tried other solutions online and no success

Related

How to monitor jvm heap size with Pyspark / Dataproc

I have noticed my pyspark codes are causing memory out of error. Using VirtualVM, I have noticed points where heap size increases over the executor memory, and changed the codes. Now that I am trying to deploy codes with bigger data and in dataproc, I found it hard to find a good way to monitor the heap size. Is there any good way to monitor the runtime heap size? I think It would be easiest if I can print out the runtime heap size via py4j or any other libraries.

Postgres cursor calculation in Talend

Data was read from postgres table & written to file using Talend. Table size is 1.8GB with 1,050,000 records and has around 125 columns.
Assigned JVM as -Xms256M -Xmx1024M. The job failed due to being out of memory. Postgres keeps the result set in physical memory until the query completes. So the entire JVM was occupied and getting an out of memory issue. Please correct me if my understanding is wrong.
Enabled Cursor option and kept the value as 100,000 and JVM as -Xms256M -Xmx1024M. Job failed with java.lang.OutOfMemoryError: Java heap space
I don't understand the concept here. Cursor used here denotes the fetch size of rows. In my case, 100,000 was set. So 100,000 will be fetched and stored in physical memory and it will be pushed to file. Then, the occupied memory will be released and the next batch will be fetched. Please correct me if I'm wrong.
Considering my case, with 1,050,000 records it occupies 1.8GB. Each record occupies 1.8KB of size. 100,000 * 1.8 = 180,000KB. So entire size is just 175MB. Why is the job not running with a 1GB JVM? Someone please help me with, how does this process work?
Some records got dropped after setting the cursor option in talend. Cannot trace the problem in that.
Had the same problem with a tPostgresInput component. Disabling the Auto Commit setting in the tPostgresConnection component fixed the problem for me!

Getting error when i want to fetch bulk records

i am getting below error when i am trying to fetch more then 300000 records.
m using link to fetch records and using muiltiple classes.
Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
please let me know solution for this.
Thnaks
In your case, memory allocated to JVM is not sufficient.
You can try by allocating more memory as follows :
Run --> Run Configurations --> select the "JRE" tab --> then enter -Xmx2048m
I believe you are running program with default VM arguments.
You can also figure out memory requirement by performing heap dump analysis or memory analyzer.
Even though this may resolve your issue temporarily (depending upon how much memory is required for 300000 records), I would suggest to do changes in your program, such as fetching records in batches.
I would suggest to refer to this post.
How to deal with "java.lang.OutOfMemoryError: Java heap space" error (64MB heap size)

Out of memory in Talend

I'm still having some problems with this error in Talend. I already changed the VM Arguments to this:
Arguments:
-Xms1024m
-Xms1024m
And then I always get this error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
Any suggestions?
The -Xms option sets the initial and minimum Java heap size
You increased the initial heap size not the maximum.
You need to use Xmx like #David Tonhofer said.
If this is not enough you should watch your memory management. Having too much lookup (data volume)in a same subjob or storing big volume of data in tHash can lead to memory problems.
Additionally I would suggest to check -XX:MaxPermSize parameter as well. In case of bigger jobs I need to change it to -XX:MaxPermSize=512m

Mongodb Production - nssize, preallocDataFiles, managing large number of collections

I have a large number of collections getting created at high bursts of traffic. I generally delete this collections once I m done processing the data in them. But at sudden bursts I sometimes run into namesspace issues..
Can I increase nssize for handling this and what values of nssize are OK? By default, it is 16 MB.. I increased it to 100 MB and still hit the issue.. Can I still increase it without worrying?
Also, I have a lot of databases where the data is around 1 Mb but mongo pre allocates 64 Mb space. How do I fix this? If I run compact, does it hit mongo performance?
You can increase the namespace size, up to 2047MB. Each namespace file is per database and the default size should be fine for about 24000 collections.
What are the issues you're seeing, exactly? Do you have log lines or error messages? The numbers don't look like they should be a problem.
For more about nsSize, see the docs.
As for your second question, please see the link in the first comment as it has a good explanation and links to more info.