Projecting nested documents in Mongo - mongodb

I am trying to find and project from a nested structure. For example, I have the following document where each unit might have an embedded sub-unit:
{
"_id" : 1,
"unit" : {
"_id" : 2,
"unit" : {
"_id" : 3,
"unit" : {
"_id" : 4
}
}
}
}
And I want to get the id's of all the subunits under unit 1:
[{_id:2}, {_id:3}, {_id:4}]
$graphlookup does not seem to handle this kind of nested structure. As far as I understand, it works when the units are saved at a single level without nesting and each keep a reference to its parent unit.
What is the correct way to retrieve the desired result?

Firstly, $graphlookup isn't operator for your problem, because it's recursive search on a collection, not recursive in a document
$graphLookup Performs a recursive search on a collection, with options
for restricting the search by recursion depth and query filter.
Therefore, it didn't recursive search in your document, it only recursive search on a collection (includes multiple documents), it cannot handle your problem.
With your problem, I think it is not responsibility of Mongo, because you've retrieved your wanted document. You want to parse the retrieved document to array of sub-documents, you can do it in your language.
Example if you use JavaScript (Node.JS for backend), you can parse this document to array:
const a = {
"_id": 1,
"unit": {
"_id": 2,
"unit": {
"_id": 3,
"unit": {
"_id": 4
}
}
}
}
const parse = o => {
const { _id } = o;
if (!o.unit) return [{ _id }];
return [{ _id }, ...parse(o.unit) ];
}
console.log(parse(a.unit));

You can not do that from mongodb query. Mongodb will guarantee the document with id :1, and will not recursively search inside the document.
What you can do is: retrieve the document from mongodb, then parse it into a Map object and retrieve the information from that map, recursively.

Related

Converting older Mongo database references to DBRefs

I'm in the process of updating some legacy software that is still running on Mongo 2.4. The first step is to upgrade to the latest 2.6 then go from there.
Running the db.upgradeCheckAllDBs(); gives us the DollarPrefixedFieldName: $id is not valid for storage. errors and indeed we have some older records with legacy $id, $ref fields. We have a number of collections that look something like this:
{
"_id" : "1",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
},
{
"_id" : "2",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "3",
"someRef" : DBRef("someRef", "42")
},
{
"_id" : "4",
"someRef" : {"$id" : "42", "$ref" : "someRef"}
}
I want to script this to convert the older {"$id" : "42", "$ref" : "someRef"} objects to DBRef("someRef", "42") objects but leave the existing DBRef objects untouched. Unfortunately, I haven't been able to differentiate between the two types of objects.
Using typeof and $type simply say they are objects.
Both have $id and $ref fields.
In our groovy console when you pull one of the old ones back and one of the new ones getClass() returns DBRef for both.
We have about 80k records with this legacy format out of millions of total records. I'd hate to have to brute force it and modify every record whether it needs it or not.
This script will do what I need it to do but the find() will basically return all the records in the collection.
var cursor = db.someCollection.find({"someRef.$id" : {$exists: true}});
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update({"_id": rec._id}, {$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}});
}
Is there another way that I am missing that can be used to find only the offending records?
Update
As described in the accepted answer the order matters which made all the difference. The script we went with that corrected our data:
var cursor = db.someCollection.find(
{
$where: "function() { return this.someRef != null &&
Object.keys(this.someRef)[0] == '$id'; }"
}
);
while(cursor.hasNext()) {
var rec = cursor.next();
db.someCollection.update(
{"_id": rec._id},
{$set: {"someRef": DBRef(rec.someRef.$ref, rec.someRef.$id)}}
);
}
We did have a collection with a larger number of records that needed to be corrected where the connection timed out. We just ran the script again and it got through the remaining records.
There's probably a better way to do this. I would be interested in hearing about a better approach. For now, this problem is solved.
DBRef is a client side thing. http://docs.mongodb.org/manual/reference/database-references/#dbrefs says it pretty clear:
The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
The drivers benefit from the fact that order of fields in BSON is consistent to recognise DBRef, so you can do the same:
db.someCollection.find({ $expr: {
$let: {
vars: {firstKey: { $arrayElemAt: [ { $objectToArray: "$someRef" }, 0] } },
in: { $eq: [{ $substr: [ "$$firstKey.k", 1, 2 ] } , "id"]}
}
} } )
will return objects where order of the fields doesn't match driver's expectation.

Counting entries of subdocument in MongoDB documents

I have a document structure like so
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : {
"r152f47f1daf-0-2" : "c",
"r152f48413c1-0-2" : "c",
"r152f4851bf7-0-1" : "c"
}
}
My task is to find all documents with the following conditions:
The "_id" needs to start with "5:"
The number of revisions need to be exclusively greater then 3
The first part is easy, I have solved it with
db.nodes.find( {'_id': /^5:/} )
But I am struggling with the second part, am supposed to use $gt.
Since I am new to MongoDB, I was first looking at $size, but _revisions is not an array, it is a subdocument, right?.
Was also looking at $unwind and then counting the results, but that does not make sense either, since my result need to be the documents that match the above two conditions.
Any pointers highly appreciated.
Using the $where operator.
db.nodes.find(function() {
return (/^5:/.test(this._id) && Object.keys(this._revisions).length > 3 );
})
The problem with this as mentioned in the documentation is that:
$where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in).
You should definitely consider to change the _revisions field to an array of sub-documents like this:
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : [
{
"rev": "r152f47f1daf-0-2",
"value": "c"
},
{
"rev": "r152f48413c1-0-2",
"value": "c"
},
{
"rev": "r152f4851bf7-0-1",
"value": "c"
}
]
}
And use the $exists operator.
db.nodes.find({ "_id": /^5:/, "_revisions.3": { "$exists": true } } )

Finding which elements in an array do not exist for a field value in MongoDB

I have a collection users as follows:
{ "_id" : ObjectId("51780f796ec4051a536015cf"), "userId" : "John" }
{ "_id" : ObjectId("51780f796ec4051a536015d0"), "userId" : "Sam" }
{ "_id" : ObjectId("51780f796ec4051a536015d1"), "userId" : "John1" }
{ "_id" : ObjectId("51780f796ec4051a536015d2"), "userId" : "john2" }
Now I am trying to write a code which can provides suggestions of a userId to user in case id provided by user already exists in DB. In same routine I just append values from 1 to 5 to the for example in case user have selected userId to be John, suggested user name array that needs to be checked for Id in database will look like this
[John,John1,John2,John3,John4,John5].
Now I just want to execute it against Db and to find out which of the suggested values do not exist in DB. So instead of selecting any document, I want to select values within suggested array which do not exist for users collection.
Any pointers are highly appreciated.
Your general approach here is you want to find the "distinct" values for the "userId's" that already exist in your collection. You then compare these to your input selection to see the difference.
var test = ["John","John1","John2","John3","John4","John5"];
var res = db.collection.aggregate([
{ "$match": { "userId": { "$in": test } }},
{ "$group": { "_id": "$userId" }}
]).toArray();
res.map(function(x){ return x._id });
test.filter(function(x){ return res.indexOf(x) == -1 })
The end result of this is the userId's that do not match in your initial input:
[ "John2", "John3", "John4", "John5" ]
The main operator there is $in which takes an array as the argument and compares those values against the specified field. The .aggregate() method is the best approach, there is a shorthand mapReduce wrapper in the .distinct() method which directly produces just the values in the array, removing the call to a function like .map() to strip out the values for a given key:
var res = db.collection.distinct("userId",{ "userId": { "$in": test } })
It should run notably slower though, especially on large collections.
Also do not forget to index your "userId" and likely you want this to be "unique" anyway so you really just want the $match or .find() result:
var res = db.collection.find({ "$in": test }).toArray()

Inline/combine other collection into one collection

I want to combine two mongodb collections.
Basically I have a collection containing documents that reference one document from another collection. Now I want to have this as a inline / nested field instead of a separate document.
So just to provide an example:
Collection A:
[{
"_id":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"someValue": "foo"
},
{
"_id":"5F0BB248-E628-4B8F-A2F6-FECD79B78354",
"someValue": "bar"
}]
Collection B:
[{
"_id":"169099A4-5EB9-4D55-8118-53D30B8A2E1A",
"collectionAID":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"some":"foo",
"andOther":"stuff"
},
{
"_id":"83B14A8B-86A8-49FF-8394-0A7F9E709C13",
"collectionAID":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"some":"bar",
"andOther":"random"
}]
This should result in Collection A looking like this:
[{
"_id":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"someValue": "foo",
"collectionB":[{
"some":"foo",
"andOther":"stuff"
},{
"some":"bar",
"andOther":"random"
}]
},
{
"_id":"5F0BB248-E628-4B8F-A2F6-FECD79B78354",
"someValue": "bar"
}]
I'd suggest something simple like this from the console:
db.collB.find().forEach(function(doc) {
var aid = doc.collectionAID;
if (typeof aid === 'undefined') { return; } // nothing
delete doc["_id"]; // remove property
delete doc["collectionAID"]; // remove property
db.collA.update({_id: aid}, /* match the ID from B */
{ $push : { collectionB : doc }});
});
It loops through each document in collectionB and if there is a field collectionAID defined, it removes the unnecessary properties (_id and collectionAID). Finally, it updates a matching document in collectionA by using the $push operator to add the document from B to the field collectionB. If the field doesn't exist, it is automatically created as an array with the newly inserted document. If it does exist as an array, it will be appended. (If it exists, but isn't an array, it will fail). Because the update call isn't using upsert, if the _id in the collectionB document doesn't exist, nothing will happen.
You can extend it to delete other fields as necessary or possibly add more robust error handling if for example a document from B doesn't match anything in A.
Running the code above on your data produces this:
{ "_id" : "5F0BB248-E628-4B8F-A2F6-FECD79B78354", "someValue" : "bar" }
{ "_id" : "90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"collectionB" : [
{
"some" : "foo",
"andOther" : "stuff"
},
{
"some" : "bar",
"andOther" : "random"
}
],
"someValue" : "foo"
}
Sadly mapreduce can't produce full documents.
https://jira.mongodb.org/browse/SERVER-2517
No idea why despite all the attention, whining and upvotes they haven't changed it. So you'll have to do this manually in the language of your choice.
Hopefully you've indexed 'collectionAID' which should improve the speed of your queries. Just write something that goes through your A collection one document at a time, loading the _id and then adding the array from Collection B.
There is a much faster way than https://stackoverflow.com/a/22676205/1578508
You can do it the other way round and run through the collection you want to insert your documents in. (Far less executions!)
db.collA.find().forEach(function (x) {
var collBs = db.collB.find({"collectionAID":x._id},{"_id":0,"collectionA":0});
x.collectionB = collBs.toArray();
db.collA.save(x);
})

Nested documents and _id indexes in mongodb

I have a collection with nested documents in it. Each document also has an _id field.
Here's an example of a documents structure
{
"_id": ObjectId("top_level_doc"),
"title": "Cadernos",
"parent": "4fd55bbc5d1709793b000008",
"criterias": {
"0": {
"_id": ObjectId("a_nested_doc"),
"value": "caderno",
"operator": "contains",
"field": "design0"
}
}
}
I want to be able to find the nested document just by searching it's _id
With this query
{
"criterias._id" : ObjectId("a_nested_doc")
}
It returns the parent document (i just want the one that's nested).
Ideally I would do this
{
"_id" : ObjectId("a_nested_doc")
}
And it would return the document with that id (either its nested or not).
Ps. I edited the "_id" values for the sake of simplicity just for this example.
You may have to live with selecting criterias._id (without writing a wrapper around the query, at least), but you can select the document itself by simply retrieving a subset of the fields.
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields
// The simplest case converted to your use case
db.collection.find( { criterias._id : ObjectId("a_nested_doc") }, { criterias : 1 } );