mongo: finding missing documents by propery value - mongodb

I have two collections with documents of identical schema. Let's call them collections A (with documents {a1, a2,...,am}) and B (with documents {b1, b2,...,bn}). Is there a way to construct a mongo query such that it returns a set of documents {a1, a2,.., ak} from collection A where ai<m[p] ∉ UNIQUE(bj<n[p])?
In reality collection B has a few dozen million records and collection A has several hundred million records. I can't figure out how to do this efficiently.

Related

Find oldest for every joined document

I have two collections, A and B. Each document in collection A corresponds to many documents in collection B. Documents in collection B have two important properties, "AID", the ID of the corresponding collection A document, and "date", something I want to sort by.
I now have a find query on collection A, which returns many documents in that collection. For each of the returned documents, I now want to find the oldest corresponding document in collection B.
So far, I used a for-each on my find query in collection A, and then used a find({"AID": doc._id}).sort({"date": 1}).limit(1) on collection B.
This is obviously super inefficient, since I have to go through my enormous collection B every single time for each document in collection A. Can I somehow simplify this into two aggregated queries only, perhaps using some kind of aggregate pipeline? How can I only traverse collection B once?
Thanks!

query in mongodb atlas to verify the existence of multiple specific documents in a collection

I have a mongodb collection called employeeInformation, in which I have two documents:
{"name1":"tutorial1"}, {"name2":"tutorial2"}
When I do db.employeeInformation.find(), I get both these documents displayed. My question is - is there a query that I can run to confirm that the collection contains only those two specified documents? I tried db.employeeInformation.find({"name1":"tutorial1"}, {"name2":"tutorial2"}) but I only got the id corresponding to the first object with key "name1". I know it's easy to do here with 2 documents just by seeing the results of .find(), but I want to ensure that in a situation where I insert multiple (100's) of documents into the collection, I have a way of verifying that the collection contains all and only those 100 documents (note I will always have the objects themselves as text). Ideally this query should work in mongoatlas console/interface as well.
db.collection.count()
will give you number of inserts once you have inserted the document.
Thanks,
Neha

How does the limit() option work in mongodb?

Let say you have a collection of 10,000 documents and I make a find query with a the option limit(50). How will mongoDb choose which 50 documents to return.
Will it auto-sort them(maybe by their creation date) or not?
Will the query return the same documents every time it is called? How does the limit option work in mongodb?
Does mongoDB limit the documents after they are returned or as it queries them. Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
The first 50 documents of the result set will be returned.
If you do not sort the documents (or if the order is not well-defined, such as sorting by a field with values that occur multiple times in the result set), the order may change from one execution to the next.
Will it auto-sort them(maybe by their creation date) or not?
No.
Will the query return the same documents every time it is called?
The query may produce the same results for a while and then start producing different results if, for example, another document is inserted into the collection.
Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
Depends on the query. If an index is used, only the needed documents will be read from the storage engine. If a sort stage is used in the query execution, all documents will be read from storage, sorted, then the required number will be returned and the rest discarded.

Duplicate a Mongodb Collection but with a `limit` applied

Is it possible to have a second Collection that is exactly the same as the first Collection, but with a limit and sort operator applied to it?
If the first Collection have 1000+ records and have new records being added, the second Collection will have all the new records, but limited to N newest records (sort by timestamp field).
The reason for doing this is to overcome a limitation in my database driver that does not have sort and limit implemented yet.
It sounds like what you want is a capped collection.
A capped collection is automatically limited to a fixed amount of records. When they are full and you add a new document, the oldest document is deleted. All records are guaranteed to be in insertion order when you query from the collection without an explicit sort. The biggest limitation is that documents in a capped collection can not be updated when that would increase their size.
Capped collections need to be created explicitely with the createCollection function. This shell command would create a capped collection limited to 1000 documents:
db.createCollection( "your_collection_name", { capped: true, size: 1000 } );
When you want to convert an already existing collection to a capped collection, you can use the convertToCapped database command:
db.runCommand({"convertToCapped": "your_existing_collection_name", size: 1000});

Mongo: Fetching documents that aren't referenced in another collection

Lets assume that I have two collections in my Mongo database: A & B. Each A document may have reference to B, but B documents don't have references back to A.
How can I efficiently find all documents in B that are not referenced by a document in A?
Is there a more effective approach than retrieving all documents in B and manually comparing against A documents? Can this be done with map reduce?
Should I consider adding references from B to A to support the query? Since Mongo doesn't support transactions, I had avoided any two way references to avoid any chance of an inconsistent state in the event of a failure.
Also, I need to be able to effectively page through these results, if that impacts the solution.
In pseudo-code:
// Get the set of B document ids that are referenced by A documents.
var bref_ids = db.A.distinct('b_id');
// Get the set of all other B documents.
var unreferenced_b_docs = db.B.find({_id: {$nin: bref_ids}});