I was running an Foxx app under arangoDB 3.3.11. A stressed test was conducted in similar manner;
var traverser=new traversal.Traverser(config);
//Loop through an list of entities and do traversal on each of them
BigArray.forEach(function(e){
var vertex=db._document(e);
traverser.traverse(result,vertex)
})
The traversal itself is nothing special, except its config.visitor is made to push vertex if and only if some conditions are satisfied.
config.visitor=function(config,result,vertex,path){
//Write a vertex if conditions are right. Vertex normal size json object
if(hashTable[vertex.id])
result.push(vertex);
}
With this, the memory slowly builds up and crashes and return canceled request
{"error":true,"errorMessage":"canceled request","code":410,"errorNum":21}
Together with heap size warnng
reached heap-size limit, interrupting V8 execution (heap size limit 3254779904, used 3140128304)
Is there any caveat in using traversal inside a loop? With small array, the app still works, but with complex and big enough array, the error occurs. I always thought of each traversal sis an independent function and within each iteration the GC sweep in and manage the memory by itself.
please use AQL traversals for better performance and less limits with V8 execution.
V8 has a limit of 256 MB for strings, and it seems that for larger traversals this limit may be hit by the old traversal implementation, and sadly there is not much we can do about this.
Other users have had similar observations
Related
I am trying to compare the performance of a view before and after adding an index. So I am trying to measure the performance of it using below query:
create table qtemp.ffs as (select * from psavlldsvw) with data
Statement ran successfully (1,932 ms = 1.932 sec)
Above statement is what I have used where psavlldsvw is the view name.
As you might guess, the idea is to measure how much time the above query takes to complete in both cases.
Can I please get some feedback on how good this method is for comparison?
The test is indeed meaningless...
First of all the question is poorly worded, you can not and are not testing a view. Views are performance neutral on Db2 for i.
Running a statement, adding an index and rerunning the statement is a meaningless test. Db2 for i has all kinds tricks built in to improve the speed of a repeated statement. Among them
Input data cached in memory
Data access paths are left open
Starting from a fresh connection, you can ensure that no data is in memory by using SETOBJACC OBJ(YOURLIB/YOURFILE) OBJTYPE(*FILE) POOL(*PURGE) for each table referenced by your statement.
Now run the statement multiple times; at least 3 if the system defaults have not been changed. You should see that the first (few) iterations is slower than the last few. This is a result of the data access path being left open for a repeated statement.
Now add your index, disconnect/reconnect, clear the object(s) from memory and run your tests again.
Depending on the use case for the statement, you may want to focus on the first iteration performance or the later iterations.
Mao is correct in that using Visual Explain (VE) is the best way to see if an index is being used or otherwise understanding how the query is performing.
Lastly realize that load on the server effects how the query engine operates. The query engine optimizer will calculate your jobs "fair share" of memory and that value will affect rather or not some more efficient yet memory intensive plans would be used. So if you're testing in a non-prod environment that doesn't exactly match prod in terms of resources, data size and load, the results are likely to differ when the query is moved to prod.
Performance tuning is part art, part science. Generally, use VE to ensure that you've got a decent query to start with. Then monitor actual production use to ensure that it's preforming as expected.
I have to do a lot of little collect() operations in my application to send data through HTTPcall.
val payload = sparkSession.sql(s"select * from table where ID = id").toJSON.collect().mkString("\n")
Is there a way to purge used objects to free some memory space in my driver between operations?
First off, I agree with #Luis Miguel Mejia Suarez here in that collects are generally bad practice and a bad code smell. I'd take a look at why you are doing collects, and determine if you can do this in a different way.
As for your actual question, the garbage collector will free any unreferenced memory once memory starts getting tight. The code snippet you showed above should be fine since the output of collect is immediately operated on and then discarded so that output should be removed during the next GC pause, while the mkString output would be kept. So make sure that this applies to the other collect statements you are using.
Additionally, if you are seeing long GC pauses, consider lowering your driver memory size, so that there's less memory to collect. You might also look into tuning your GC parameters. There's lots of documentation on that on the internet, and it is too intricate to describe in detail here.
Finally, you can force the JVM to run garbage collection. You should be able to use System.gc() (https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#gc()). This is a Java function but Scala should be able to call it as well.
I have a C++ application which is processing JSON and inserting into DB. It's doing a pretty simple insert, parses the JSON adds a couple of fields and inserts as below.
I have also surrounded the call with a try/catch to handle any exceptions that might be thrown.
The application is probably processing around 360-400 records a minute.
It has been working quite well however it occasionally errors and crashes. Code and stack trace below.
Code
try {
bsoncxx::builder::basic::document builder;
builder.append(bsoncxx::builder::basic::concatenate(bsoncxx::from_json(jsonString)));
builder.append(bsoncxx::builder::basic::kvp(std::string("created"), bsoncxx::types::b_date(std::chrono::system_clock::now())));
bsoncxx::document::value doc = builder.extract();
bsoncxx::stdx::optional<mongocxx::result::insert_one> result = m_client[m_dbName][std::string("Data")].insert_one(doc.view());
if (!result) {
m_log->warn(std::string("Insert failed."), label);
}
} catch (...) {
m_log->warn(std::string("Insert failed and threw exception."), label);
}
Stack
So I guess I have two questions. Any ideas as to why it is crashing? And is there some way I can catch and handle this error in such a way that it does crash the application.
mongocxx version is: 3.3.0
mongo-c version is: 1.12.0
Any help appreciated.
Update 1
I did some profiling on the DB and although its doing a high amount of writes in performing quite well and all operations are using indexes. Most inefficient operation takes 120ms so I don't think its a performance issue on that end.
Update 2
After profiling with valgrind I was unable to find any allocation errors. I then used perf to profile CPU usage and have found over time CPU usage builds up. Typically starting at around 15% base load when the process first starts and then ramping up of the course of 4 or 5 hours until the crash occurs with CPU usage around 40 - 50%. During this whole time the number of DB operations per second remains consistent at 10 per second.
To rule out the possibility of other processing code causing this I removed all DB handling and ran process over night and CPU usage remained flat lined at 15% the entire time.
I'm trying a few other strategies to try and identify root cause now. Will update if I find anything.
Update 3
I believe I have discovered cause of this issue. After the insert process there is also an update which pushes two items onto an array using the $push operator. It does this for different records up to 18,000 times before that document is closed. The fact it was crashing on the insert under high CPU load was a bit of a red-herring.
The CPU usage build up is that process taking longer to execute as the array within document being pushed to gets longer. I re-architected to use Redis and only insert into mongodb once all items to be pushed into array are received. This flat lined CPU usage. Another way around this could be to insert each item into a temporary collection and use aggregation with $push to compile single record with array once all items to be push are received.
Graphs below illustrate issue with using $push on document as array gets longer. The huge drops are when all records are received and new documents are created for next set of items.
In closing this issue I'll take a look over on MongoDB Jira and see if anyone has reported this issue with the $push operator and if not report it myself in case this something that can be improved in future releases.
In my system I have the requirement that the number of edges on the node must be stored as an internal property on the vertex as well as a vertex centric index on a specific outgoing edge. This naturally requires me to count the number of edges on the node after all the data has finished loading. I do so as follows:
long edgeCount = graph.getGraph().traversal().V(vertexId).bothE().count().next();
However when I scale up my tests to the point where some of my nodes are "super" nodes I get the following exception on the above line:
Caused by: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9160, latency=4792(4792), attempts=1]org.apache.thrift.transport.TTransportException: Frame size (70936735) larger than max length (62914560)!
at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:197) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119) ~[astyanax-core-3.8.0.jar!/:3.8.0]
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352) ~[astyanax-core-3.8.0.jar!/:3.8.0]
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112) ~[titan-cassandra-1.0.0.jar!/:na]
What is the best way to fix this ? Should I simply increase the frame size or is there a better way to count the number of edges on the node ?
Yes, you will need to increase the frame size. When you have a supernode, there is a really big row that needs to be read out of the storage backend, and this is even true in the OLAP case. I agree that if you are planning to calculate this on every vertex in the graph, this would be best done as an OLAP operation.
This and several other good tips can be found in this Titan mailing list thread. Keep in mind that link is pretty old, so the concepts are still valid, but some of the Titan configuration properties names may be different.
Such a task, which is OLAP by its nature, should be performed using a distributed system, not using a traversal.
There is a concept called GraphComputer in TinkerPop 3, which can be used to perform such a task.
It is basically allows you to run Gremlin queries, which will be evaluated on multiple machines.
For example, you can use SparkGraphComputer to run your queries on top of Apache Spark.
I have written an application in Scala. Basically, the first step is to create a array of objects an then to initialise these objects from a csv file. When running the application on the jvm it is really slow, and after some experimenting I found out that using the -J-Xincgc flag which enables incremental garbage collection speeds up the application by a factor of 4 (it's 4 times faster with the switch!). I wonder:
Why?
Did I use some inefficient coding, and if so, where should I start to find out whats going on?
Thanks!
I'll assume you're running this on hotspot.
The hotspot JVM has a whole zoo of garbage collectors, most of which also may have some sort of sub-modes or various command-line switches that significantly alter their behavior.
Which GC is used by default varies based on JVM version, operating system and 32/64bit VM.
So you basically changed whatever the default was to a specific algorithm that happened to perform "faster" for your workload.
But "faster" is a fuzzy measure. Wall time is not the same as CPU cycles spent if you consider multi-threading. And some collectors may simply choose to grow the heap more aggressively, thus deferring the cost of collection to a later point in time, which you might not have measured if your program didn't run long enough.
To make an accurate assessment much more information would be needed
what GC was used by default
your VM version
how many cores your CPU has
what kind of workload do you have (multi/single-thread, long/short-running, expected memory footprint, object allocation rate)
Oracle's GC tuning guide may prove useful for you
In your case, -Xincgc translates to CMS in incremental mode, which is intended for single-core environments and has been deprecated as of java8. It probably just happened to be better than the default, but it's not necessarily an optimal choice.
If you get into a situation where you are running close to your heap-size limit, you can waste a lot of GC time, which can lead to a lot of false findings about performance. If that's your situation, first increase your heap-size limit before doing anything else. Consider use of jvisualvm to eyeball the situation - it's trivially easy to get started with.