Error while fetching 100K records from mongodb in Scala using Akk-Http - mongodb

I've written following code to fetch records and perform action
On akka-http route,
complete(mongoDB.getCollection(getCollectionName(user_id, list_id)).find()
.getAllContacts(user_id, list_id).map { line =>
validateNumber(line.phone, prefixTrim)
}.toFuture().map(_.size.toString))
On testing the API with wrk with 1000 connection and 10 threads, I'm getting following errors
java.lang.OutOfMemoryError: GC overhead limit exceeded
Error during processing of request: 'Boxed Error'.
Completing with 500 Internal Server Error response.
To change default exception handling behavior, provide a custom ExceptionHandler.
java.util.concurrent.ExecutionException: Boxed Error
Caused by: java.lang.OutOfMemoryError: Java heap space
Is there anyway to resolve this issue without increasing heap size?
UPDATED
The collection size = 33MB
and
java -XX:+PrintFlagsFinal -version | grep -iE 'heapsize|permsize|threadstacksize'
intx CompilerThreadStackSize = 0 {pd product}
uintx ErgoHeapSizeLimit = 0 {product}
uintx HeapSizePerGCThread = 87241520 {product}
uintx InitialHeapSize := 268435456 {product}
uintx LargePageHeapSizeThreshold = 134217728 {product}
uintx MaxHeapSize := 4294967296 {product}
intx ThreadStackSize = 1024 {pd product}
intx VMThreadStackSize = 1024 {pd product}
java version "1.8.0_131"

If your heap cannot hold all the data you will need to stream the result as you get it:
https://doc.akka.io/docs/akka-http/current/routing-dsl/source-streaming-support.html

Related

When a large amount of data is inserted into the gbase database query, the heap usage has exceeded memorylimit error is generated

DETAL:(GBA-01EX-700) Gbase general error:(gns_host:IP) Failed to append:
status is error,last error: thd 0xb17308000 id(7724267) BLK_TEMP:
rerurn NULL in alloc(240050,73),
HeapUsed()
SystemUsed()
heap usage has exceed MemoryLimit
SystemMemStatus:
memfree:
swapfree:
Solution: set the parallelism to set gbase_ parallel_ degree = 2;

MATLAB H5 files cannot exceed 2GB

If I create a h5 file larger than 2GB, I get an error. Smaller than 2GB things work fine. I didn't think this filesize was a problem with the h5 format. This seems to not be a problem on my windows 8.1 machine, but is on my Ubuntu 20.04 machine.
MWE:
fname = 'tmp001.h5';
h5create(fname,'/DS1',[195 2048 1500],'DataType','single'); % doesn't work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
fname = 'tmp002.h5';
h5create(fname,'/DS1',[195 2048 1400],'DataType','single');% doesn't work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
fname = 'tmp003.h5';
h5create(fname,'/DS1',[195 2048 1300],'DataType','single');% does work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
This only seems to be a problem if I give it only a segment of the data set, and even then it works providing the size of the data is more than 2^14! See files 4 and 5:
fname = 'tmp004.h5';
h5create(fname,'/DS1',[536886273],'DataType','single');% does work
h5write(fname,'/DS1',rand(536886273,1)); % filesize > 2gb but length > 2^14
fname = 'tmp005.h5';
h5create(fname,'/DS1',[536886273],'DataType','single');% doesn't work
h5write(fname,'/DS1',1,1,1); % filesize > 2gb but length < 2^14
fname = 'tmp006.h5';
h5create(fname,'/DS1',[536886272],'DataType','single');% does work
h5write(fname,'/DS1',1,1,1); % length > 2^14 but okay since filesize < 2gb
Error:
Warning: The following error was caught while executing 'H5ML.id' class destructor:
Error using hdf5lib2
The HDF5 library encountered an error and produced the following stack trace information:
H5FD_truncate driver truncate request failed
H5F_dest low level truncate failed
H5F_try_close problems closing file
H5O_close problem attempting file close
H5D_close unable to release object header
H5I_dec_app_ref can't decrement ID ref count
H5I_dec_app_ref_always_close can't decrement ID ref count
H5Dclose can't decrement count on dataset ID
Error in H5ML.id/close (line 47)
H5ML.hdf5lib2(obj.callback, obj.identifier);
Error in H5ML.id/delete (line 36)
obj.close();

Spark getting slow as time goes on

I have a scala code that runs on top of spark 2.4.0 to compute the BFS of a graph which is stored in a table as below:
src
dst
isVertex
1
1
1
2
2
1
...
...
...
1
2
0
2
4
0
...
...
...
At some point in the algorithm, I need to update the visited flag of current vertex neighbors. I am doing this by the following code. When I execute the code, it works fine but as time goes on, it becomes slower and slower. It seems that the last nested loop is the problem:
//var vertices = schema(StructType(Seq(StructField("id",IntegerType),StructField("visited", IntegerType),StructField("qOrder",IntegerType))
val neighbours = edges.filter($"src" === start).join(vertices,$"id" === $"dst").filter($"visited" === lit(0))
.select($"dst".as("id")).withColumn("visited", lit(1)).withColumn("qOrder", lit(priorityCounter)).cache()
-----------------------------------------------------------------------
vertices.collect.foreach{x=>
if(!neighbours.filter(col("id")===x(0)).head(1).isEmpty){
vertices = vertices.filter($"id" =!= x(0)).union(neighbours.filter(col("id")===x(0))).cache()
}
}
-----------------------------------------------------------------------
When it becomes slow, it starts giving the following errors and warnings:
2021-06-08 19:55:08,998 [driver-heartbeater] WARN org.apache.spark.executor.Executor - Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: GC overhead limit exceeded
Does anyone have any idea about the problem?
I have set the spark parameters as follows:
spark.scheduler.listenerbus.eventqueue.capacity 100000000
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 2m
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 10000
spark.max.fetch.failures.per.stage 10
spark.rpc.io.serverThreads 64
spark.rpc.askTimeout 600s
spark.driver.memory 32g
spark.executor.memory 32g

Meaning of the benchmark variables in snakemake

I included a benchmark directive to some of the rules in my snakemake workflow, and the resulting files have the following header:
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load
The only documentation I've found mentions a "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".
I can guess that columns 1 and 2 are two different ways of displaying the time taken to execute the rule (in seconds, and converted to hours, minutes and seconds).
io_in and io_out likely related to disk read and write activity, but in what units are they measured?
What are the others? Is this documented somewhere?
Edit: Looking at the source code
I've found the following piece of code in /snakemake/benchmark.py, that might well be where the benchmark data come from:
def _update_record(self):
"""Perform the actual measurement"""
# Memory measurements
rss, vms, uss, pss = 0, 0, 0, 0
# I/O measurements
io_in, io_out = 0, 0
# CPU seconds
cpu_seconds = 0
# Iterate over process and all children
try:
main = psutil.Process(self.pid)
this_time = time.time()
for proc in chain((main,), main.children(recursive=True)):
meminfo = proc.memory_full_info()
rss += meminfo.rss
vms += meminfo.vms
uss += meminfo.uss
pss += meminfo.pss
ioinfo = proc.io_counters()
io_in += ioinfo.read_bytes
io_out += ioinfo.write_bytes
if self.bench_record.prev_time:
cpu_seconds += proc.cpu_percent() / 100 * (
this_time - self.bench_record.prev_time)
self.bench_record.prev_time = this_time
if not self.bench_record.first_time:
self.bench_record.prev_time = this_time
rss /= 1024 * 1024
vms /= 1024 * 1024
uss /= 1024 * 1024
pss /= 1024 * 1024
io_in /= 1024 * 1024
io_out /= 1024 * 1024
except psutil.Error as e:
return
# Update benchmark record's RSS and VMS
self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
self.bench_record.io_in = io_in
self.bench_record.io_out = io_out
self.bench_record.cpu_seconds += cpu_seconds
So this seems to come from functionalities provided by psutil.
I will just leave this here for future reference.
Reading through
snakemake >= 6.0.0 benchmark module
psutil's memory_info(), memory_full_info(), io_counters(), cpu_times()
as previously suggested:
colname
type (unit)
description
s
float (seconds)
Running time in seconds
h:m:s
string (-)
Running time in hour, minutes, seconds format
max_rss
float (MB)
Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms
float (MB)
Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss
float (MB)
“Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss
float (MB)
“Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in
float (MB)
the number of MB read (cumulative).
io_out
float (MB)
the number of MB written (cumulative).
mean_load
float (-)
CPU usage over time, divided by the total running time (first row)
cpu_time
float(-)
CPU time summed for user and system
Benchmarking in snakemake could certainly be better documented, but psutil is documanted here:
get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps.
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.
psutil.disk_io_counters(perdisk=False)
Return system disk I/O statistics as a namedtuple including the following attributes:
read_count: number of reads
write_count: number of writes
read_bytes: number of bytes read
write_bytes: number of bytes written
read_time: time spent reading from disk (in milliseconds)
write_time: time spent writing to disk (in milliseconds)
The code you found confirms that all the memory usage and IO counts are reported in MB (= bytes * 1024 * 1024).

orientdb 2.1.11 java process consuming too much memory

We're running OrientDB 2.1.11 (Community Edition) along with JDK 1.8.0.74.
We're noticing memory consumption by 'orientdb' java slowly creeping up and in a few days, the database becomes un-responsive (we have to stop/start Orientdb in order to release memory).
We also noticed this kind of behavior in a few hours when we index the database.
The total size of the database is only 60 GB and not more than 200 million records!
As you can see below, it already consumes VIRT(11.44 GB) RES(8.62 GB).
We're running CentOS 7.1.x.
Even change heap from 512 to 256M and modified diskcache.bufferSize to 8GB
MAXHEAP=-Xmx256m
ORIENTDB MAXIMUM DISKCACHE IN MB, EXAMPLE, ENTER -Dstorage.diskCache.bufferSize=8192 FOR 8GB
MAXDISKCACHE="-Dstorage.diskCache.bufferSize=8192"
top output:
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16269052 total, 229492 free, 9510740 used, 6528820 buff/cache
KiB Swap: 8257532 total, 8155244 free, 102288 used. 6463744 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2367 nmx 20 0 11.774g 8.620g 14648 S 0.3 55.6 81:26.82 java
ps aux output:
nmx 2367 4.3 55.5 12345680 9038260 ? Sl May02 81:28 /bin/java
-server -Xmx256m -Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError
-Djava.awt.headless=true -Dfile.encoding=UTF8 -Drhino.opt.level=9
-Dprofiler.enabled=true -Dstorage.diskCache.bufferSize=8192
How do I control memory usage?
Is there a CB memory leak?
Could you set following setting for JVM -XX:+UseLargePages -XX:LargePageSizeInBytes=2m .
This should sovle your issue.
this page, solved my issue.
in nutshell:
add this configuration to your database:
static {
OGlobalConfiguration.NON_TX_RECORD_UPDATE_SYNCH.setValue(true); //Executes a synch against the file-system at every record operation. This slows down records updates but guarantee reliability on unreliable drives
OGlobalConfiguration.STORAGE_LOCK_TIMEOUT.setValue(300000);
OGlobalConfiguration.RID_BAG_EMBEDDED_TO_SBTREEBONSAI_THRESHOLD.setValue(-1);
OGlobalConfiguration.FILE_LOCK.setValue(false);
OGlobalConfiguration.SBTREEBONSAI_LINKBAG_CACHE_SIZE.setValue(5000);
OGlobalConfiguration.INDEX_IGNORE_NULL_VALUES_DEFAULT.setValue(true);
OGlobalConfiguration.MEMORY_CHUNK_SIZE.setValue(32);
long maxMemory = Runtime.getRuntime().maxMemory();
long maxMemoryGB = (maxMemory / 1024L / 1024L / 1024L);
maxMemoryGB = maxMemoryGB < 1 ? 1 : maxMemoryGB;
long cacheSizeMb = 2 * 65536 * maxMemoryGB / 1024;
long maxDirectMemoryMb = VM.maxDirectMemory() / 1024L / 1024L;
String cacheProp = System.getProperty("storage.diskCache.bufferSize");
if (cacheProp==null) {
long maxDirectMemoryOrientMb = maxDirectMemoryMb / 3L;
long cachSizeMb = cacheSizeMb > maxDirectMemoryOrientMb ? maxDirectMemoryOrientMb : cacheSizeMb;
cacheSizeMb = (long)Math.pow(2, Math.ceil( Math.log(cachSizeMb)/ Math.log((2))));
System.setProperty("storage.diskCache.bufferSize",Long.toString(cacheSizeMb));
// the command below generates a NullPointerException in Orient 2.2.15-snapshot
// OGlobalConfiguration.DISK_CACHE_SIZE.setValue(cacheSizeMb);
}
}