I would like to count the number of documents returned by an aggregation.
I'm sure my initial aggregation works, because I use it later in my programm. To do so I created a pipeline variable (here called pipelineTest, ask me if you want to see it in detail, but it's quite long, that's why I don't give the lines here).
To count the number of documents returned, I push my pipeline with :
{$count: "totalCount"}
Now I would like to get (or log) totalCount value. What should I do ?
Here is the aggregation :
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.?)
Thanks for your help, I read lot and lot doc about aggregation and I still don't understand how to read the result of an aggregation...
Assuming you're using async/await syntax - you need to await on the result of the aggregation.
You can convert the cursor to an array, get the first element of that array and access totalCount.
pipelineTest.push({$count: "totalCount"});
cursorTest = await collection.aggregate(pipelineTest, options).toArray();
console.log(cursorTest[0].totalCount);
Aggregation
db.mycollection.aggregate([
{
$count: "totalCount"
}
])
Result
[ { totalCount: 3 } ]
Your Particulars
Try the following:
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.totalCount)
Related
Can somebody tell me please where is the problem in this simple aggregation command:
db.test.aggregate([
{
$group: {
_id: "$type",
numbers: { $sum: 1 }
}
}
]).pretty()
Collection has about 2 millions of documents and everyone has type field. But the result returns only few of them as result + message "Type "it" for more" If I type "it" it returns next partial aggregation result till the end. But I want to have the whole aggregation in one result. What am I doing wrong?
Thanks.
MongoDB won't return you whole bunch of data because it has built-in pagination.
In other case (2 mil of documents) would crash your server/computer as it runs out of memory.
But, if you'd like to get all bunch of data, it's better to store it with script.
You can write script with your programming language, request db, paginate through data and store in some variable.
Example
Is there a way to push all the documents of a given collection in a array?
I did this but is there any quicker way?
var ops = [];
db.getCollection('stock').find({}).forEach(function (stock) {
ops.push(stock);
})
PS: I use Mongo 3.4
You can just use the toArray function on the cursor that's returned from find, like this:
var ops = db.getCollection('stock').find({}).toArray();
Note: As with your original solution, this might suffer with performance if the stock collection contains millions of documents.
As an aside, you can use db.stock directly to shorten the query a little bit:
var ops = db.stock.find({}).toArray();
Try using lean query option. in your case:
db.getCollection('stock').find({}).lean()
You could as well use $facet which will allow you to create the array on the server side - provided the resulting document array is no bigger than 16MB in which case you'll get an exception:
db.stock.aggregate({
$facet: {
ops: [ { $match: {} } ]
}
})
In order to reduce the amount of data returned you could limit the number of returned fields in the above pipeline (instead of an empty $match stage - which is a hack anyway - you would then use $project).
Can anyone tell me how to add a $match stage to an aggregation pipeline to filter for where a field MATCHES a query, (and may have other data in it too), rather than limiting results to entries where the field EQUALS the query?
The query specification...
var query = {hello:"world"};
...can be used to retrieve the following documents using the find() operation of MongoDb's native node driver, where the query 'map' is interpreted as a match...
{hello:"world"}
{hello:"world", extra:"data"}
...like...
collection.find(query);
The same query map can also be interpreted as a match when used with $elemMatch to retrieve documents with matching entries contained in arrays like these documents...
{
greetings:[
{hello:"world"},
]
}
{
greetings:[
{hello:"world", extra:"data"},
]
}
{
greetings:[
{hello:"world"},
{aloha:"mars"},
]
}
...using an invocation like [PIPELINE1] ...
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
]).toArray()
However, trying to get a list of the matching greetings with unwind [PIPELINE2] ...
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
]).toArray()
...produces all the array entries inside the documents with any matching entries, including the entries which don't match (simplified result)...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world", extra:"data"}},
{greetings:{hello:"world"}},
{greetings:{aloha:"mars"}},
]
I have been trying to add a second match stage, but I was surprised to find that it limited results only to those where the greetings field EQUALS the query, rather than where it MATCHES the query [PIPELINE3].
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
{$match:{greetings:query}},
]).toArray()
Unfortunately PIPELINE3 produces only the following entries, excluding the matching hello world entry with the extra:"data", since that entry is not strictly 'equal' to the query (simplified result)...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
]
...where what I need as the result is rather...
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
{greetings:{"hello":"world","extra":"data"}
]
How can I add a second $match stage to PIPELINE2, to filter for where the greetings field MATCHES the query, (and may have other data in it too), rather than limiting results to entries where the greetings field EQUALS the query?
What you're seeing in the results is correct. Your approach is a bit wrong. If you want the results you're expecting, then you should use this approach:
collection.aggregate([
{$match:{greetings:{$elemMatch:query}}},
{$unwind:"$greetings"},
{$match:{"greetings.hello":"world"}},
]).toArray()
With this, you should get the following output:
[
{greetings:{hello:"world"}},
{greetings:{hello:"world"}},
{greetings:{"hello":"world","extra":"data"}
]
Whenever you're using aggregation in MongoDB and want to create an aggregation pipeline that yields documents you expect, you should always start your query with the first stage. And then eventually add stages to monitor the outputs from subsequent stages.
The output of your $unwind stage would be:
[{
greetings:{hello:"world"}
},
{
greetings:{hello:"world", extra:"data"}
},
{
greetings:{hello:"world"}
},
{
greetings:{aloha:"mars"}
}]
Now if we include the third stage that you used, then it would match for greetings key that have a value {hello:"world"} and with that exact value, it would find only two documents in the pipeline. So you would only be getting:
{ "greetings" : { "hello" : "world" } }
{ "greetings" : { "hello" : "world" } }
Let's say I have four documents in my collection:
{u'a': {u'time': 3}}
{u'a': {u'time': 5}}
{u'b': {u'time': 4}}
{u'b': {u'time': 2}}
Is it possible to sort them by the field 'time' which is common in both 'a' and 'b' documents?
Thank you
No, you should put your data into a common format so you can sort it on a common field. It can still be nested if you want but it would need to have the same path.
You can use use aggregation and the following code has been tested.
db.test.aggregate({
$project: {
time: {
"$cond": [{
"$gt": ["$a.time", null]
}, "$a.time", "$b.time"]
}
}
}, {
$sort: {
time: -1
}
});
Or if you also want the original fields returned back: gist
Alternatively you can sort once you get the result back, using a customized compare function ( not tested,for illustration purpose only)
db.eval(function() {
return db.mycollection.find().toArray().sort( function(doc1, doc2) {
var time1 = doc1.a? doc1.a.time:doc1.b.time,
time2 = doc2.a?doc2.a.time:doc2.b.time;
return time1 -time2;
})
});
You can, using the aggregation framework.
The trick here is to $project a common field to all the documents so that the $sort stage can use the value in that field to sort the documents.
The $ifNull operator can be used to check if a.time exists, it
does, then the record will be sorted by that value else, by b.time.
code:
db.t.aggregate([
{$project:{"a":1,"b":1,
"sortBy":{$ifNull:["$a.time","$b.time"]}}},
{$sort:{"sortBy":-1}},
{$project:{"a":1,"b":1}}
])
consequences of this approach:
The aggregation pipeline won't be covered by any of the index you
create.
The performance will be very poor for very large data sets.
What you could ideally do is to ask the source system that is sending you the data to standardize its format, something like:
{"a":1,"time":5}
{"b":1,"time":4}
That way your query can make use of the index if you create one on the time field.
db.t.ensureIndex({"time":-1});
code:
db.t.find({}).sort({"time":-1});
How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.
You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.
One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})
I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.
Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!
I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.
Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")
i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.
One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});