runCommand vs aggregate method to do aggregation - mongodb

To run aggregation query it is possible to use either of these:
db.collectionName.aggregate(query1);
OR
db.runCommand(query2)
But I noticed something bizarre this morning. While this:
db.runCommand(
{
"aggregate":"collectionName",
allowDiskUse: true,
"pipeline":[
{
"$match":{
"field":param
}
}
]
});
fails with error:
{
"ok" : 0.0,
"errmsg" : "aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"codeName" : "Location16389"
}
This:
db.collectionName.aggregate([
{
$match: {
field: param
}
}
])
is working (gives the expected aggregation result).
How is this possible?

Well the difference is of course that the .aggregate() method returns a "cursor", where as the options you are providing to runCommand() you are not. This actually was the legacy form which returned the response as a single BSON document with all it's limitations. Cursors on the other hand do not have the limitation.
Of course you can use the runCommand() method to "make your own cursor" with the shell, since after-all that is exactly what the .aggregate() method is doing "under the covers". The same goes for all drivers, which essentially invoke the database command for everything.
With the shell, you can transform your request like this:
var cmdRes = db.runReadCommand({
"aggregate": "collectionName",
"allowDiskUse": true,
"pipeline":[
{
"$match":{
"field":param
}
}
],
"cursor": { "batchSize": 25 }
});
var cursor = new DBCommandCursor(db, cmdRes);
cursor.next(); // will actually iterate the cursor
If you really want to dig into it then type in db.collectionName.aggregate without the parenthesis () so you actually print the function definition. This will show you some other function calls and you can dig further into them and eventually see what is effectively the lines shown above, amongst a lot of other stuff.
But the way you ran it, it's a "single BSON Document" response. Run it the way shown here, and you get the same "cursor" response.

Related

MongoDB- Query Embedded Document, but Projection Showing Path to key/value?

So basically I want
db.scoreFacts.find(
{"instrumentRanges.flute.minPitch": {$gte: 0, $lte:56}},
{"instrumentRanges.flute.minPitch": 1})
to return
{ "_id" : "Bach_Brandenburg5_Mov1.xml", "minPitch" : 50 }
but instead I get:
{ "_id" : "Bach_Brandenburg5_Mov1.xml", "instrumentRanges" : { "flute" : { "minPitch" : 50 } } }
Essentially the path to "minPitch" is returned, which is not what I need. How can I achieve my desired output with only .find() (no map, etc)? Thanks.
You can't do this with a standard .find() query. If you wish to alter the document structure, look into using an aggregate() call. You can then use projection to define the resulting field(s) you desire.
For example:
db.scoreFacts.aggregate([
{ $match: {"instrumentRanges.flute.minPitch": {$gte: 0, $lte:56}} },
{ $project: {"minPitch": "$instrumentRanges.flute.minPitch"} }
]);
For more information, please see the relevant documentation. Additionally, take a look at the prerequisite aggregation pipeline section.
Note: I have not tested the above query myself, so you may need to alter it somewhat to get the behavior you want.

Aggregate not behaving in Meteor as in Mongo

This is the query that I'm trying to run. If I run it on the Mongo console I get
meteor:PRIMARY> db.keywords_o2m.aggregate({$match:{keyword:{$in:['sql']}}},{$unwind:'$synonym'},{$group:{_id:0,kw:{$addToSet:'$synonym'}}});
{ "_id" : 0, "kw" : [ "database" ] }
However, if I copypaste it and try to run it on Meteor calling Meteor.call('getAllKeywordSynonyms',kw,function(err,data){...}); with this code
if(Meteor.isServer){
Meteor.methods({
'getAllKeywordSynonyms':function(keyword){
console.log("keywordO2M aggregate");
console.log(keywordO2M.aggregate({$match:{keyword:{$in:['sql']}}},{$unwind:'$synonym'},{$group:{_id:0,kw:{$addToSet:'$synonym'}}}));
}
)};
}
I get
I20151220-12:49:38.197(-8)? keywordO2M aggregate
I20151220-12:49:38.197(-8)? [ { _id: 5676fe5a17aeddb799dc4ef8,
I20151220-12:49:38.197(-8)? keyword: 'sql',
I20151220-12:49:38.197(-8)? synonym: 'database' } ]
It looks like it ran the $match and ignored the $unwind and $group.
I've tried using meteorhacks:aggregate and monbro:mongod-mapreduce-aggregation, but no difference.
What am I doing wrong?
Meteor is not as forgiving with notation.
With the aggregation function, the pipeline stages need to all be passed as a single parameter in an array, so the correct syntax would be
console.log(keywordO2M.aggregate([{$match:{keyword:{$in:['sql']}}},{$unwind:'$synonym'},{$group:{_id:0,kw:{$addToSet:'$synonym'}}}]));
Note the square brackets so that only one parameter gets passed to aggregate.

Mongodb aggregation in mongo command prompt

I have the following code based upon this question
How to efficiently perform "distinct" with multiple keys?:
collection = db.products;
result = collection.aggregate(
[
{"$group": { "_id": { "P1 Connection": "$p1c", "P1 Size": "$p1s" } } },
{"$match" : {"parentGUID":ObjectId("5509b246c519ce4b900138a3")}}
]
)
printjson(result);
The printjson statement only prints a bunch of code, and not an object. I also tried result() but that got the following error:
> result()
2015-10-29T10:31:14.892-0400 TypeError: Property 'result' of object #<Object> is not a function
How do I get the results of this aggregation? It looks like it may be possible to do this if I put my code in a file and run that, but I am having a hard time believing that there is no quick and dirty way to run this query in the mongodb command prompt.
Move the $match pipeline step to the very beginning, this will filter the documents that get into the pipeline and the $group pipeline stage will then run the pipeline with the correct documents. Since MongoDB 2.6 adds support for returning a cursor for the aggregate() method, you would need to iterate over the cursor using the forEach() method and access the documents, as in the following example:
var pipeline = [
{"$match" : {"parentGUID":ObjectId("5509b246c519ce4b900138a3")}},
{"$group": { "_id": { "P1 Connection": "$p1c", "P1 Size": "$p1s" } } }
];
var results = db.products.aggregate( pipeline );
results.forEach(printjson);

How can I get all the doc ids in MongoDB?

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.
You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.
One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})
I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.
Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!
I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.
Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")
i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.
One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});

MongoDB MapReduce : use positional operator $ in map function

I have a collection with entries that look like that :
{"userid": 1, "contents": [ { "tag": "whatever", "value": 100 }, {"tag": "whatever2", "value": 110 } ] }
I'm performing a MapReduce on this collection with queries such as {"contents.tag": "whatever"}.
What I'd like to do in my map function is emiting the field "value" corresponding to the entry in the array "contents" that matched the query without having to iterate through the whole array. Under normal circumstances, I could do that using the $ positional operator with something like contents.$.value. But in the MapReduce case, it's not working.
To summarize, here is the code I have right now :`
map=function(){
emit(this.userid, WHAT DO I WRITE HERE TO EMIT THE VALUE I WANT ?);
}
reduce=function(key,values){
return values[0]; //this reduce function does not make sense, just for the example
}
res=db.runCommand(
{
"mapreduce": "collection",
"query": {'contents.tag':'whatever'},
"map": map,
"reduce": reduce,
"out": "test_mr"
}
);`
Any idea ?
Thanks !
This will not work without iterating over the whole array. In MongoDB a query is intended to match an entire document.
When dealing with Map / Reduce, the query is simply trimming the number of documents that are passed into the map function. However, the map function has no knowledge of the query that was run. The two are disconnected.
The source code around the M/R is here.
There is an upcoming aggregation feature that will more closely match this desire. But there's no timeline on this feature.
No way. I've had the same problem. The iterate is necessary.
You could do this:
map=function() {
for(var i in this.contents) {
if(this.contents[i].tag == "whatever") {
emit(this.userid, this.contents[i].value);
}
}
}