Reading 16G from MongoDB - mongodb

I have a MongoDB collection that is about 16GB and I'd like to read it into memory all at once and manipulate it. When reading the data, the code appears to freeze (I've waited 20+ minutes). I'm running the code from inside a scala play app on a machine with plenty of memory.
I've turned the MongoDB profiler on and when I repeatedly make the following query: >db.system.profile.find().sort({ts:-1})
I start by seeing documents containigng the "getmore" operation but then only see my profiler queries (which leads me to believe that all of the records have been read but the function that wraps reading from Mongo hasn't "returned" yet or reading has severely slowed down). If I run "top" I see that my app is using a ton of the CPU and using all of the memory I've allotted to it (currently 24GB). If I only allocate 8GB to the app, it crashes with a GC error. I should also note that I can run the same code on the same machine just fine if I point it at a smaller DB (2GB). Also, I believe that this collection was originally created using Mongo 1.x and then migrated so that it was compatible with Mongo 2.x (I am not the creator of the collection).
Is my problem related to the size of my data? Is the data itself corrupted? How do I find out?
EDIT:
What I'm running is a bit obscured by in house (de)serialization code
val initialEntities = for(record<-records) yield {
/** INSTRUMENTATION **/
println(...)
val e = newEntity
record.fetch(e)
e}).toSeq
records is a sequence of everything read from Mongo built from a CursorIterator constructed via a DBCollection.find() (after importing com.mongodb). None of the instrumentation is ever being printed. Fetch uses each record to create an entity.

Related

Has some reason to MongoDB continues with write operation when i stop the service that writes?

I have some applications that read data in MongoDB and only one that writes massively(real time data). These applications and MongoDB was doing well a more than 3 months.
Today i saw that some applications was consuming a lot of memory and had bad performance, and i notice a big delay in the data that are insert in MongoDB.
So i report some facts:
I stoped the service that writes in MongoDB.
I noticed that even with the service turned off, the MongoDB still write data for more than 1 minute.
2.1. I saw it when new data was write in the collection (have replicaset)
2.2. In MongoCompass Perfomance tab, in hottest collection list, the writes opperations still with about 45% in the collection that service writes.
I executed some commands about concurrency/locks, and everything is fine, and had no clients writing
I executed a command to see memory of MongoDB and dont see everthing wrong or near the limits.
The MongoDB is "setuped" in a machine in EC2 with huge resources (i don't know whichs).
Any ideas to try troubleshooting that?
Thanks in advance

mongocxx crash when doing many inserts & catching exceptions in underlying C driver

I have a C++ application which is processing JSON and inserting into DB. It's doing a pretty simple insert, parses the JSON adds a couple of fields and inserts as below.
I have also surrounded the call with a try/catch to handle any exceptions that might be thrown.
The application is probably processing around 360-400 records a minute.
It has been working quite well however it occasionally errors and crashes. Code and stack trace below.
Code
try {
bsoncxx::builder::basic::document builder;
builder.append(bsoncxx::builder::basic::concatenate(bsoncxx::from_json(jsonString)));
builder.append(bsoncxx::builder::basic::kvp(std::string("created"), bsoncxx::types::b_date(std::chrono::system_clock::now())));
bsoncxx::document::value doc = builder.extract();
bsoncxx::stdx::optional<mongocxx::result::insert_one> result = m_client[m_dbName][std::string("Data")].insert_one(doc.view());
if (!result) {
m_log->warn(std::string("Insert failed."), label);
}
} catch (...) {
m_log->warn(std::string("Insert failed and threw exception."), label);
}
Stack
So I guess I have two questions. Any ideas as to why it is crashing? And is there some way I can catch and handle this error in such a way that it does crash the application.
mongocxx version is: 3.3.0
mongo-c version is: 1.12.0
Any help appreciated.
Update 1
I did some profiling on the DB and although its doing a high amount of writes in performing quite well and all operations are using indexes. Most inefficient operation takes 120ms so I don't think its a performance issue on that end.
Update 2
After profiling with valgrind I was unable to find any allocation errors. I then used perf to profile CPU usage and have found over time CPU usage builds up. Typically starting at around 15% base load when the process first starts and then ramping up of the course of 4 or 5 hours until the crash occurs with CPU usage around 40 - 50%. During this whole time the number of DB operations per second remains consistent at 10 per second.
To rule out the possibility of other processing code causing this I removed all DB handling and ran process over night and CPU usage remained flat lined at 15% the entire time.
I'm trying a few other strategies to try and identify root cause now. Will update if I find anything.
Update 3
I believe I have discovered cause of this issue. After the insert process there is also an update which pushes two items onto an array using the $push operator. It does this for different records up to 18,000 times before that document is closed. The fact it was crashing on the insert under high CPU load was a bit of a red-herring.
The CPU usage build up is that process taking longer to execute as the array within document being pushed to gets longer. I re-architected to use Redis and only insert into mongodb once all items to be pushed into array are received. This flat lined CPU usage. Another way around this could be to insert each item into a temporary collection and use aggregation with $push to compile single record with array once all items to be push are received.
Graphs below illustrate issue with using $push on document as array gets longer. The huge drops are when all records are received and new documents are created for next set of items.
In closing this issue I'll take a look over on MongoDB Jira and see if anyone has reported this issue with the $push operator and if not report it myself in case this something that can be improved in future releases.

Using find() with arguments that will result in not finding anything results in the RAM never getting "dropped"

I'm currently setting up a mongodb and so far I have a DB with 25million objects, and some index. When I try use find() with 4 arguments and I know that one of the arguments are "wrong" or that the result of the query will be nothing found, then what happens is that mongodb takes all my RAM (8gb) and the time to get the "nothing found" varies a lot.
BUT the biggest problem is that once it's done it never lets go of the RAM and I need to restart mongodb to get back the RAM.
I've tried many other tests, such as adding huge amounts of objects (1million+), use the find() for something that I know exist and other operations and they work fine.
Also want to know that this works fine if I have a smaller DB like 1million objects.
I will keep on doing test on this, but if anyone have anything they know right away that could help that would be great.
EDIT: Tested some more with 1m DB. The first failed found took about 4sec and took some RAM, the proceding fails took only 0.6sec and didn't seem to affect the RAM. Does mongodb store the DB in RAM when trying to find a document and with very huge DB it can't get all in the cache?
Btw. on the 25mil it first took about 60sec to not find it, but the 2nd try took 174sec and after that find it had taken all the 8gb on RAM.

mongod clean memory used in ram

I have a huge amount of data in my mongodb. It's filled with tweets (50 GB) and my Ram is 8 GB. When querying it retrieves all tweets and mongodb starts filling the ram, when it reaches 8 GB it starts moving files to disk. This is the part where it gets really slowwwww. So i changed the query from skipping and starting using indexes. Now i have indexes and i query only 8GB to my program, save the id of the last tweet used in a file and the program stops. Then restart the program and it goes get the id of the tweet from the file. But mogod server still is ocupping the ram with the first 8GB, that no longer will be used, because i have a index to the last. How can i clean the memory of the mongo db server without restarting it?
(running in a win)
I am a bit confused by your logic here.
So i changed the query from skipping and starting using indexes. Now i have indexes and i query only 8GB to my program, save the id of the last tweet used in a file and the program stops.
Using ranged queries will not help the amount of data you have to page in (in fact it might worsen it because of the index), it merely makes the query faster server side by using an index for huge skips (like 42K+ row skip). If you are dong the same as that skip() but in index then (without a covered index) then you are still paging in exactly the same.
It is slow due to memory mapping and your working set. You have more data than RAM and not only that but you are using more of that data than you have RAM as such you are page faulting probably all the time.
Restarting the program will not solve this, nor will clearing its data OS side (with restart or specific command) because of your queries. You probably need to either:
Think about your queries so that your working set is more in line to your memory
Or shard your data across many servers so that you don't have to build up your primary server
Or get a bigger primary server (moar RAM!!!!!)
Edit
The LRU of your OS should be swapping out old data already since MongoDB is using its fully allocated lot, which means that if that 8GB isn't swapped it is because your working set is taking that full 8GB (most likely with some swap on the end).

Machine hangs when mongoDB db.copyDatabase(...) takes all vailable RAM

When I try to copy a database from one mongoDB server to another (About 100GB) the mongo daemon process takes 99% of the available RAM (Windows 64bit 16GB). As a result the system becomes very slow and sometimes unstable.
Is there any way to avoid it?
MongoDB 2.0.6
Albert.
MongoDB is very much an "in ram" application. Mongo has all of your database memory mapped for usage but normally only the most recently used data will be in RAM (called your working set) and mongo will page out to get any data not in RAM as needed. Normally mongo's behaviour is only to have as much as it needs in RAM, however when you do something like a DB Copy all of the data is needed - thus the mongod consuming all your ram.
There is no ideal solution to this, but if desperately needed you could use WSRM http://technet.microsoft.com/en-us/library/cc732553.aspx to try and limit the amount of RAM consumed by the process. This will have the effect of making the copy take longer and may cause other issues.