Why update operation on one collection blocks read from another? - mongodb

I have an issue with my mongoDB, I am running one mongo instance on the server. I have several collections in the db - the problem I'm running into is that when I run a long update operation (150k records) in myRecords collection, any read query in myDetails will be blocked until that long update operation is complete.
This doesn't make much sense to me, I can see how reading from the same collection might be blocked during the update, but why would another be affected? Am I missing something?
More details:
-running nodejs and performing operations with javascript
-db version v3.0.11
-mmapv1

Related

MongoDB closes connection on read operation

I run MongoDB 4.0 on WiredTiger under Ubuntu Server 16.04 to store complex documents. There is an issue with one of the collections: the documents have many images written as strings in base64. I understand this is a bad practice, but I need some time to fix it.
Because of this some find operations fail, but only those which have a non-empty filter or skip. For example db.collection('collection').find({}) runs OK while db.collection('collection').find({category: 1}) just closes connection after a timeout. It doesn't matter how many documents should be returned: if there's a filter, the error will pop every time (even if it should return 0 docs), while an empty query always executes well until skip is too big.
UPD: some skip values make queries to fail. db.collection('collection').find({}).skip(5000).limit(1) runs well, db.collection('collection').find({}).skip(9000).limit(1) takes way much time but executes too, while db.collection('collection').find({}).skip(10000).limit(1) fails every time. Looks like there's some kind of buffer where the DB stores query related data and on the 10000 docs it runs out of the resources. The collection itself has ~10500 docs. Also, searching by _id runs OK. Unfortunately, I have no opportunity to make new indexes because the operation fails just like read.
What temporary solution I may use before removing base64 images from the collection?
This happens because such a problematic data scheme causes huge RAM usage. The more entities there are in the collection, the more RAM is needed not only to perform well but even to run find.
Increasing MongoDB default RAM usage with storage.wiredTiger.engineConfig.cacheSizeGB config option allowed all the operations to run fine.

MongoDB query for deleted documents

I will be running a nightly cron job to query a collection and then send results to another system.
I need to sync this collection between two systems.
Documents can be removed from the host and this deletion needs to be reflected on the client system.
So - my question is, is there a way to query for documents that have been recently deleted?
I'm looking for something like db.Collection.find({RECORDS_THAT_WERE_DELETED_YESTERDAY});
I was reading about parsing the oplog. However, I don't have one setup yet. Is that something you can introduce into an existing DB?

SELECT FOR UPDATE SKIP LOCKED in MongoDB

I have 100+ worker threads, which are going to poll database, looking for a new job.
To take a job, a thread need to change status of the bunch of documents from NEW to IN_PROGRESS, so no other threads can peek the same job.
This can be solved perfectly fine in PostgreSQL with SELECT FOR UPDATE SKIP LOCKED WHERE status = "NEW" statement.
Is there a way to do such atomic update in MongoDB for a single document? For a batch?
There's a findAndModify method, which works exactly as you've described for a single document.
For a batch, it's not possible right now, as
In MongoDB, write operations, e.g. db.collection.update(), db.collection.findAndModify(), db.collection.remove(), are atomic on the level of a single document.
It will be possible in MongoDB 4.0 though, with transactions.

How long will a mongo internal cache sustain?

I would like to know how long a mongo internal cache would sustain. I have a scenario in which i have some one million records and i have to perform a search on them using the mongo-java driver.
The initial search takes a lot of time (nearly one minute) where as the consecutive searches of same query reduces the computation time (to few seconds) due to mongo's internal caching mechanism.
But I do not know how long this cache would sustain, like is it until the system reboots or until the collection undergoes any write operation or things like that.
Any help in understanding this is appreciated!
PS:
Regarding the fields with which search is performed, some are indexed
and some are not.
Mongo version used 2.6.1
It will depend on a lot of factors, but the most prominent are the amount of memory in the server and how active the server is as MongoDB leaves much of the caching to the OS (by MMAP'ing files).
You need to take a long hard look at your log files for the initial query and try to figure out why it takes nearly a minute.
In most cases there is some internal cache invalidation mechanism that will drop your cached query internal record when write operation occurs. It is the simplest describing of process. Just from my own expirience.
But, as mentioned earlier, there are many factors besides simple invalidation that can have place.
MongoDB automatically uses all free memory on the machine as its cache.It would be better to use MongoDB 3.0+ versions because it comes with two Storage Engines MMAP and WiredTiger.
The major difference between these two is that whenever you perform a write operation in MMAP then the whole database is going to lock and whereas the locking mechanism is upto document level in WiredTiger.
If you are using MongoDB 2.6 version then you can also check the query performance and execution time taking to execute the query by explain() method and in version 3.0+ executionStats() in DB Shell Commands.
You need to index on a particular field which you will query to get results faster. A single collection cannot have more than 64 indexes. The more index you use in a collection there is performance impact in write/update operations.

replicating mongo db oplog to another mongo db

Hi we have a production DB on mongo which has a set of collections and all the activities are loaded into an oplog. Now i wanna write a script to watch this oplog so that when ever a new record is added to the oplog, i wanna write it to a db on another dummy server. How can i go about this. I am new to mongo, so im unsure of where to start with this. any ideas would be helpful for me. I am thinking of something on the lines of
while(true)
{
watch(oplog)
OnNewEntry
{
AddToAnotherMongo(another.server.com,port,dbname,record)
}
}
There are various oplog readers which can watch and replay to a specific server. This is what replicasets do by default and there is only one primary (writer). If you just want copies of your data then replicasets are the best option, and supported without any code.
Here are some samples of code which read the oplog:
http://github.com/wordnik/wordnik-oss/
http://github.com/RedBeard0531/mongo-oplog-watcher/
http://github.com/mongodb/mongo-java-driver/blob/master/examples/ReadOplog.java
I had a simliar problem and found a quite easy solution following your opcode-example in javascript to be executed in a mongo-shell.
source code available here
With opening a tailable cursor on the oplog of the master server each operation could be applied to another server (of course you can filter by the namespace of the collections or even the databases...)