Query or command to find a Document, given an ObjectID but NOT a collection - mongodb

So I have a document that has references to foreign ObjectIDs that may point to other documents or collections.
For example this is the pseudo-structure of the document
{
_id: ObjectID(xxxxxxxx),
....
reference: ObjectID(yyyyyyyy)
}
I can't find anything that does not involve providing the collection and given that I don't know for sure on which collection to search, I am wondering if there is a way for me to find the document in the entire database and find the collection ObjectID(yyyyyyyy) belongs to.

The only possible way to do this is by listing every collection in the database and performing a db.collection.find() on each one.
E.g. in the Mongo shell I would do something like
var result = new Array();
var collections = db.getCollectionNames();
for (var i = 0; i < collections.length; i++) {
var found = db.getCollection(collections[i]).findOne({ "_id" : ObjectId("yyyyyyyy") });
if (found) {
result.push(found);
}
}
print(result);

You need to run your query on all collections in your database.
db.getCollectionNames().forEach(function(collection){
db[collection].find({ $or : [
{ _id : ObjectId("535372b537e6210c53005ee5") },
{ reference : ObjectId("535372b537e6210c53005ee5")}]
}).forEach(printjson);
});

Related

MongoDB project nested element in _id field

I'm stuck in something very stupid but I can't get out from my own.
MongoDB v4.2 and I have a collection with documents like this:
{"_id":{"A":"***","B":0}}, "some other fields"...
I'm working on top of mongo-c driver and I want to query only the "_id.B" field but I don't know how I can do it. I have tried:
"projection":{"_id.B":1}: It returns me the whole _id object. _id.A & _id.B.
"projection":{"_id.A":0,"All other fields except _id.B":0}: Returns same as above.
"projection":{"_id.A":0,"_id.B":1}: Returns nothing.
How I can do it to get only some object elements when this object is inside the _id field? The first option works for me with objects that aren't inside the _id field but not here.
Best regards, thanks for your time.
Héctor
You can use MongoDB's $project in aggregation to do this. You can also use $addFields to get _id.B into new field + all other fields in document & finally project _id :0.
Code:
var coll = localDb.GetCollection("yourCollectionName");
var project = new BsonDocument
{
{
"$project",
new BsonDocument
{
{ "_id.B": 1 }
}
}
}
var pipeline = new[] { project };
var result = coll.Aggregate(pipeline);
Test : MongoDB-Playground

extract only ids from a mongo collection

I only need the ids of all documents in a collection in mongodb. I am using meteor. For now, I am using the basic ._each loop but I bet a better way exists but unfortunately its not clicking to me.
Below is my code :
var followedIds = Doubts.find({ch : chId, userId : userId}).fetch();
var d_ids = [];
_.each(followedIds, function(doubt){
d_ids.push(doubt._id)
});
A small change in projection can help you to fetch only _ids from collection:
var followedIds = Doubts.find({ch : chId, userId : userId},
{
fields:{
_id:1
}
}).fetch();
var d_ids = [];
_.each(followedIds, function(doubt){
d_ids.push(doubt._id)
});
You can use _.pluck if you only need the ids.
db.collection_name.find({},{"id":1})
See Docs
{} means all documents
{"id":1} We are only interested in id not other fields.

How to compare all documents in two collections with millions of doc and write the diff in a third collection in MongoDB

I have two collections (coll_1, coll_2) with a million documents each.
These two collections are actually created by running two versions of a code from the same data source, so both two collections will have the same number of documents but the document in both collections can have one more field or sub-document missing or have a different values, but both collection's documents will have the same primary_key_id which is indexed.
I have this javascript function saved on the db to get the diff
db.system.js.save({
_id: "diffJSON", value:
function(obj1, obj2) {
var result = {};
for (key in obj1) {
if (obj2[key] != obj1[key]) result[key] = obj2[key];
if (typeof obj2[key] == 'array' && typeof obj1[key] == 'array')
result[key] = arguments.callee(obj1[key], obj2[key]);
if (typeof obj2[key] == 'object' && typeof obj1[key] == 'object')
result[key] = arguments.callee(obj1[key], obj2[key]);
}
return result;
}
});
Which runs fine like this
diffJSON(testObj1, testObj2);
Question: How to run diffJSON on coll1 and coll2, and output diffJSON result into coll3 along with primary_key_id.
I am new to MongoDB, and I understand the JOINS doesn't work as similar to RDBMS, so I wonder if I have to copy the two comparing documents in a single collection and then run the diffJSON function.
Also, most of the time (say 90%) documents in two collections will be identical, I would need to know about only 10% of docs which have any diff.
Here is a simple example document:
(but real doc is around 15k in size, just so you know the scale)
var testObj1 = { test:"1",test1: "2", tt:["td","ax"], tr:["Positive"] ,tft:{test:["a"]}};
var testObj2 = { test:"1",test1: "2", tt:["td","ax"], tr:["Negative"] };
If you know a better way to diff the documents, please feel free to suggest.
you can use a simple shell script to achieve this. First create a file named script.js and paste this code in it :
// load previously saved diffJSON() function
db.loadServerScripts();
// get all the document from collection coll1
var cursor = db.coll1.find();
if (cursor != null && cursor.hasNext()) {
// iterate over the cursor
while (cursor.hasNext()){
var doc1 = cursor.next();
// get the doc with the same _id from coll2
var id = doc1._id;
var doc2 = db.coll2.findOne({_id: id});
// compute the diff
var diff = diffJSON(doc2, doc1);
// if there is a difference between the two objects
if ( Object.keys(diff).length > 0 ) {
diff._id = id;
// insert the diff in coll3 with the same _id
db.coll3.insert(diff);
}
}
}
In this script I assume that your primary_key is the _id field.
then execute it from you shell like this:
mongo --host hostName --port portNumber databaseName < script.js
where databaseName is the came of the database containing the collections coll1 and coll2.
for this samples documents (just added an _id field to your docs):
var testObj1 = { _id: 1, test:"1",test1: "2", tt:["td","ax"], tr:["Positive"] ,tft:{test:["a"]}};
var testObj2 = { _id: 1, test:"1",test1: "2", tt:["td","ax"], tr:["Negative"] };
the script will save the following doc in coll3 :
{ "_id" : 1, "tt" : { }, "tr" : { "0" : "Positive" } }
This solution builds upon the one proposed by felix (I don't have the necessary reputation to comment on his). I made a few small changes to his script that bring important performance improvements:
// load previously saved diffJSON() function
db.loadServerScripts();
// get all the document from collection coll1 and coll2
var cursor1 = db.coll1.find().sort({'_id': 1});
var cursor2 = db.coll2.find().sort({'_id': 1});
if (cursor1 != null && cursor1.hasNext() && cursor2 != null && cursor2.hasNext()) {
// iterate over the cursor
while (cursor1.hasNext() && cursor2.hasNext()){
var doc1 = cursor1.next();
var doc2 = cursor2.next();
var pk = doc1._id
// compute the diff
var diff = diffJSON(doc2, doc1);
// if there is a difference between the two objects
if ( Object.keys(diff).length > 0 ) {
diff._id = pk;
// insert the diff in coll3 with the same _id
db.coll3.insert(diff);
}
}
}
Two cursors are used for fetching all the entries in the database sorted by the primary key. This is a very important aspect and brings most of the performance improvement. By retrieving the documents sorted by primary key, we make sure we match them correctly by the primary key. This is based on the fact that the two collections hold the same data.
This way we avoid making a call to coll2 for each document in coll1. It might seem as something insignificant, but we're talking about 1 million calls which put a lot of stress on the database.
Another important assumption is that the primary key field is _id. If it's not the case, it is crucial to have an unique index on the primary key field. Otherwise, the script might mismatch documents with the same primary key.

mongodb query result from multiple collections and save to one

For example:
1.Using find, test each collection:
var objIdMin = ObjectId(Math.floor((new Date('2016/05/01 00:00:00'))/1000).toString(16) + "0000000000000000");
var objIdMax = ObjectId(Math.floor((new Date('2016/05/11 00:00:00'))/1000).toString(16) + "0000000000000000");
db.getCollection('google').find({ _id:{$gt: objIdMin, $lt: objIdMax}, 'result.text':/phone/}).count();
google collection result count is 50.
db.getCollection('apple').find({ _id:{$gt: objIdMin, $lt: objIdMax}, 'result.text':/phone/}).count();
apple collection result count is 100.
2.then I turn to achieve my purpose:
var cols = db.getCollectionNames();
var objIdMin = ObjectId(Math.floor((new Date('2016/05/01 00:00:00'))/1000).toString(16) + "0000000000000000");
var objIdMax = ObjectId(Math.floor((new Date('2016/05/11 00:00:00'))/1000).toString(16) + "0000000000000000");
var cols_in = ['google', 'apple'];
for (var i=0; i<cols_in.length; i++){
db.getCollection(cols_in[i]).aggregate([ { $match: { _id:{$gt: objIdMin, $lt: objIdMax}, 'result.text':/phone/}}, { $out: "target" } ]);
};
The count of target collection's result is equal to 100(same as apple collection), so the later collection will overwrite the former, how to solve this?
Edit:
I find that is due to:
Replace Existing Collection
If the collection specified by the $out operation already exists, then
upon completion of the aggregation, the $out stage atomically replaces
the existing collection with the new results collection. The $out
operation does not change any indexes that existed on the previous
collection. If the aggregation fails, the $out operation makes no
changes to the pre-existing collection.
So, the only way is to foreach every record and insert to another collection?
as per comment -> there is no UNION ALL in mongo, which could merge outputs from many queries into one logical piece.
So you solution to iterate collections with for is a very good approach, but with every pass you are overwriting output collection named target in your example
To solve this problem in for loop save aggregation 'output' as an array, and then insert.
Please see snipped bellow:
for (var i = 0; i < cols_in.length; i++) {
var documents = db.getCollection(cols_in[i]).aggregate([{
$match : {
_id : {
$gt : objIdMin,
$lt : objIdMax
},
}
}
]).toArray();
db.target.insert(documents)
};

Mongodb - return an array of _id of all updated documents

I need to update some documents in one collection, and send an array of the _ids of the updated documents to another collection.
Since update() returns the number of updated items not their ids, I've come up with the following to get the array:
var docsUpdated = [];
var cursor = myCollection.find(<myQuery>);
cursor.forEach(function(doc) {
myCollection.update({_id : doc._id}, <myUpdate>, function(error, response){
docsUpdated.push(doc._id);
});
});
Or I could do:
var docsUpdated = myCollection.distinct("_id", <myQuery>);
myCollection.update(<myQuery>, <myUpdate>, {multi : true});
I'm guessing the second version would be faster because it only calls the database twice. But both seem annoyingly inefficient - is there another way of doing this without multiple database calls? Or am I overcomplicating things?
I think you need the cursor operator ".aggregate()"
db.orders.aggregate([
{ $group: { _id: "$_id"} }
])
something along those lines that returns the results of all the id's in the collection