mongocxx crash when doing many inserts & catching exceptions in underlying C driver - mongodb

I have a C++ application which is processing JSON and inserting into DB. It's doing a pretty simple insert, parses the JSON adds a couple of fields and inserts as below.
I have also surrounded the call with a try/catch to handle any exceptions that might be thrown.
The application is probably processing around 360-400 records a minute.
It has been working quite well however it occasionally errors and crashes. Code and stack trace below.
Code
try {
bsoncxx::builder::basic::document builder;
builder.append(bsoncxx::builder::basic::concatenate(bsoncxx::from_json(jsonString)));
builder.append(bsoncxx::builder::basic::kvp(std::string("created"), bsoncxx::types::b_date(std::chrono::system_clock::now())));
bsoncxx::document::value doc = builder.extract();
bsoncxx::stdx::optional<mongocxx::result::insert_one> result = m_client[m_dbName][std::string("Data")].insert_one(doc.view());
if (!result) {
m_log->warn(std::string("Insert failed."), label);
}
} catch (...) {
m_log->warn(std::string("Insert failed and threw exception."), label);
}
Stack
So I guess I have two questions. Any ideas as to why it is crashing? And is there some way I can catch and handle this error in such a way that it does crash the application.
mongocxx version is: 3.3.0
mongo-c version is: 1.12.0
Any help appreciated.
Update 1
I did some profiling on the DB and although its doing a high amount of writes in performing quite well and all operations are using indexes. Most inefficient operation takes 120ms so I don't think its a performance issue on that end.
Update 2
After profiling with valgrind I was unable to find any allocation errors. I then used perf to profile CPU usage and have found over time CPU usage builds up. Typically starting at around 15% base load when the process first starts and then ramping up of the course of 4 or 5 hours until the crash occurs with CPU usage around 40 - 50%. During this whole time the number of DB operations per second remains consistent at 10 per second.
To rule out the possibility of other processing code causing this I removed all DB handling and ran process over night and CPU usage remained flat lined at 15% the entire time.
I'm trying a few other strategies to try and identify root cause now. Will update if I find anything.
Update 3
I believe I have discovered cause of this issue. After the insert process there is also an update which pushes two items onto an array using the $push operator. It does this for different records up to 18,000 times before that document is closed. The fact it was crashing on the insert under high CPU load was a bit of a red-herring.
The CPU usage build up is that process taking longer to execute as the array within document being pushed to gets longer. I re-architected to use Redis and only insert into mongodb once all items to be pushed into array are received. This flat lined CPU usage. Another way around this could be to insert each item into a temporary collection and use aggregation with $push to compile single record with array once all items to be push are received.
Graphs below illustrate issue with using $push on document as array gets longer. The huge drops are when all records are received and new documents are created for next set of items.
In closing this issue I'll take a look over on MongoDB Jira and see if anyone has reported this issue with the $push operator and if not report it myself in case this something that can be improved in future releases.

Related

What could be causing high cpu in google spanner databases? (unresponsive table)

I'm facing an issue where 2 out of 10 spanner databases are showing a high CPU usage (above 40%) whereas the others are around %1 each, with almost identical or more data.
I notice one of our tables has become "unresponsive" no queries work against it. We shutdown all apps that connect to those dbs, and we also deleted all current sessions using gcloud sessions list and then gcloud session delete.
However the table is still unresponsive. A simple select like select id from mytable where name = 'test' is not responding (when tested from an app, and also from gcloud web interface), it only happens with that table, which has only a few columns with normal data and no more than 2000 records. We identified the query that could have been the source of the problem, however the table seems to be locked (only count(*) without any where clause works).
I was wondering if there is any way to "unlock" the table, kill those "transactions" that might be causing the issue, or restart those specific spanner databases, or in the worst case scenario restarting the spanner instance.
I have seen the monitoring high cpu documentation, but even if we can identify the cpu is high, we don't really know how to restart or make it back to normal before reviewing the query/ies that could have caused the issue (if that was the case).
High CPU can be caused by user initiated queries, from different types of operations. It is important to notice that your instance is where you allocate resources to be used by the underlying Cloud Spanner databases. This means, that if all of your databases are in the same instance and if some of your databases are hogging the CPU, all your other databases will struggle.
In terms of a locked table, I would be very surprised if a deadlock is the problem here, since Spanner mitigates those issues using "wound-wait" algorithm. What I suspect might be happening is that a long time is necessary to perform the query in that table and it times out. It would be nice to investigate a bit more on your problem:
How long did you wait for your query on the problematic table (before you deemed to be stuck)? It might be a problem where your query just takes long and you are timing out before getting the results. It might be a problem where there are hotspots in your table and transactions are getting aborted often, preventing you from getting results.
What error did you get when performing the query on the table? The error code can tell you more about what might be happening.
Have you tried doing a stale read on the table to see if any data is returned? If lock contention is the problem, this should succeed and return results faster for you. Thus, you can further investigate the lock usage with the statistics table (as shown below).
Inspect query statistics: you can list the queries with the highest CPU usage, find the average latency for a query and find out the queries that timeout the most. There is much more you can do, as seen here. You can also view the query statistics in cloud console. What I would verify is, by reducing the overall CPU, does your query come back without any issues? You might need more resources if so. Or you might need to reduce hotspots in your database.
Inspect Lock statistics: you can investigate lock conflicts in your rows and tables. I think that an interesting query for your problem would be Discovering which row keys and columns had long lock wait times during the selected period. You can then see if your query is reading those row keys and columns and act accordingly.

Bulk update and insert using Meteor method call in loop making high cpu usage

My application is on METEOR#1.6.0.1 and I am using  reywood:publish-composite ,  matb33:collection-hooks  for db relations.
I need to insert a list of 400 people into collection from excel file, for it currently i am inserting from client using Meteor method inside loop but when i see on galaxy during this CPU usage is very high 70-80% or some time 100%.
Once all data inserted, i need to send a mail and update the record so i am sending mail and update using Meteor method call one by one that again making CPU 70-80%.
How i can do above task in correct and efficient way. Please help.
Thanks.
I suspect that you are not using oplog tailing and you are trying to insert when some other part of your app has subscriptions to publications open. Without this meteor polls the collections and generates lots of slow queries at each document insert.
You can enable it by passing an url to meteor at startup. See https://docs.meteor.com/environment-variables.html#MONGO-OPLOG-URL for more info.
Having oplog tailing eases the strain on the server and should reduce the high cpu usage to a manageable level.
If you are still having issues then you may have to set up some tracing e.g. monti-apm https://docs.montiapm.com/introduction

Using find() with arguments that will result in not finding anything results in the RAM never getting "dropped"

I'm currently setting up a mongodb and so far I have a DB with 25million objects, and some index. When I try use find() with 4 arguments and I know that one of the arguments are "wrong" or that the result of the query will be nothing found, then what happens is that mongodb takes all my RAM (8gb) and the time to get the "nothing found" varies a lot.
BUT the biggest problem is that once it's done it never lets go of the RAM and I need to restart mongodb to get back the RAM.
I've tried many other tests, such as adding huge amounts of objects (1million+), use the find() for something that I know exist and other operations and they work fine.
Also want to know that this works fine if I have a smaller DB like 1million objects.
I will keep on doing test on this, but if anyone have anything they know right away that could help that would be great.
EDIT: Tested some more with 1m DB. The first failed found took about 4sec and took some RAM, the proceding fails took only 0.6sec and didn't seem to affect the RAM. Does mongodb store the DB in RAM when trying to find a document and with very huge DB it can't get all in the cache?
Btw. on the 25mil it first took about 60sec to not find it, but the 2nd try took 174sec and after that find it had taken all the 8gb on RAM.

Drools is very slow when we integrate with Talend ETL and process millions of records

we have used around 30 rules with multiple conditions in it. we are under the assumption that Drools takes one record and compares it against the records then will give the output for each one.So the time taken for processing 1 million record is around 4 hours. Cant we not process the records in batches. I mean to say in big numbers and reducing time for processing. Pls help me this issue. Thanks for the response.
Inserting 1M facts in one batch is a very bad strategy (unless you need to find combinations out of the lot). The documentation makes it clear that all work (at least in 5.x) is done during inserts and modifications. (6.x is reportedly different, but it's still bad practice to needlessly fill your memory up with objects galore.)
Simply insert, and after some suitable number, call fireAllRules() and process (transmit,...) the results. Make sure that no "dead stock" remains in Working Memory from such a batch - this would also slow you down.

Reading 16G from MongoDB

I have a MongoDB collection that is about 16GB and I'd like to read it into memory all at once and manipulate it. When reading the data, the code appears to freeze (I've waited 20+ minutes). I'm running the code from inside a scala play app on a machine with plenty of memory.
I've turned the MongoDB profiler on and when I repeatedly make the following query: >db.system.profile.find().sort({ts:-1})
I start by seeing documents containigng the "getmore" operation but then only see my profiler queries (which leads me to believe that all of the records have been read but the function that wraps reading from Mongo hasn't "returned" yet or reading has severely slowed down). If I run "top" I see that my app is using a ton of the CPU and using all of the memory I've allotted to it (currently 24GB). If I only allocate 8GB to the app, it crashes with a GC error. I should also note that I can run the same code on the same machine just fine if I point it at a smaller DB (2GB). Also, I believe that this collection was originally created using Mongo 1.x and then migrated so that it was compatible with Mongo 2.x (I am not the creator of the collection).
Is my problem related to the size of my data? Is the data itself corrupted? How do I find out?
EDIT:
What I'm running is a bit obscured by in house (de)serialization code
val initialEntities = for(record<-records) yield {
/** INSTRUMENTATION **/
println(...)
val e = newEntity
record.fetch(e)
e}).toSeq
records is a sequence of everything read from Mongo built from a CursorIterator constructed via a DBCollection.find() (after importing com.mongodb). None of the instrumentation is ever being printed. Fetch uses each record to create an entity.