MongoDB 3.6 aggregate returns firstBatch - mongodb

In MongoDB 3.4 (and still in 3.6 according to the doc), the collection.aggregate([...]) methods returns a cursor. Then we could do collection.aggregate([...]).toArray() or .forEach(). And get or iterate over the array of results.
Since 3.6 (appart the fact that the cursor options is now required, which is not specified in the doc), the following command:
collection.aggregate(
[...],
{ cursor: { batchSize: 10 } }
)
Returns on object with this shape:
{
"cursor": {
"firstBatch": [...],
"id",
"ns"
},
"ok": 1,
"$clusterTime": {...},
"operationTime": Timestamp(1525344553, 1)
}
It's not possible anymore to iterate over the whole results and collection.aggregate([...]).toArray() is not a function.
A found a few issues about this, but nothing really relevant.
The doc is really outdated on this and I'm not even sure that this is the expected behaviours.
Note: I'm running it in the mongoshell, but also encountered these issues with the last node driver.

The problems comes from mongo-hacker...

Related

Using a $match and/or reference fields in DocumentDB causes Aggregation not supported error

I have a what I consider to be a pretty straight forward search. I am developing it on MOngoDB locally but deploying it for DocumentDB on AWS. The function that I've written looks something like this:
if (searchCategory) {
return [
{
$match: {
$or: [
{ title: params.searchString },
{ description: params.searchString },
{ workingNotes: params.searchString },
],
active: true,
},
},
];
}
This query used to not have the $match statement around it and it worked fine, but something that we wanted to do was to use reference fields in our model for searching linked items as well. So we have something like this in our model:
appendix: [{type: Schema.Types.ObjectId, ref: "document"}],
Now when I fire off the search query, I get an error back saying "Aggregation stage not supported: '$lookup on multiple join conditions and uncorrelated subquery.'"
Adding the reference field and the $match were the only changes made to the functionality, which worked before we added those.
Based off the documentation I've read about DocumentDB, the $match operator is supported, so it's possible I am missing something else about how to structure the query or handle the reference fields but I've been unable to determine what that might be.
Any help would be appreciated. Thanks!

Azure CosmosDB operation not supported when using $elemMatch and $in

I am doing a query like the following, which works fine with MongoDB, but sometimes fails with CosmosDB. I need it to work with both.
(XXX is a placeholder for any string value. All strings have unique values that are redacted for readability, and actual content should be of no significance.)
{
server_index: {
$elemMatch: {
server: "XXX",
index: "XXX",
delete_time: { $exists: false },
path: {
$in: ["XXX", "XXX", "XXX" ]
}
}
}
}
The schema of a document is somewhat like this:
{
...,
server_index: [
{
server: "XXX",
index: "XXX",
delete_time: ISODate(...), // optional
path: "XXX"
},
{...}, // same as above
...
],
...
}
This query sometimes works as expected with CosmosDB as well, but sometimes I also get the following response:
{
_t: "OKMongoResponse",
ok: 0,
code: 115,
errmsg: "Command is not supported",
$err: "Command is not supported"
}
What is especially strange is that the query seemingly succeeds, and the response above is returned by a "valid" cursor as the first document, which then causes my document parser "crash".
I am using the C++ legacy driver. Is this even supported by Cosmos DB?
(According to the developer I inherited this project from, it is, and as always when you inherit projects, it all worked fine according to the previous developer... So this may be due to a change in Cosmos DB, due to the nature of my test data, or who knows what...)
Side note: In MongoDB, there is a multi-key index on server_index, which looks like this:
{
"server_index.delete_time" : 1,
"server_index.server" : 1,
"server_index.index" : 1,
"server_index.path" : 1
}
Is this even supported in CosmosDB?
EDIT: Trying to add this index with Robo 3T silently fails, with no error message whatsoever. The index is simply not added. Nice!
(Please don't ask about the strange database schema. It is like it is for a reason, and believe me, I, too, would like to burn it all down and replace it with something else ... I am open for suggestions for alternative queries, though)
This was probably a server-side problem. It seemed wrong in the first place (error status returned as part of the query result), and it has disappeared after a couple of weeks without me changing anything.

runCommand vs aggregate method to do aggregation

To run aggregation query it is possible to use either of these:
db.collectionName.aggregate(query1);
OR
db.runCommand(query2)
But I noticed something bizarre this morning. While this:
db.runCommand(
{
"aggregate":"collectionName",
allowDiskUse: true,
"pipeline":[
{
"$match":{
"field":param
}
}
]
});
fails with error:
{
"ok" : 0.0,
"errmsg" : "aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"codeName" : "Location16389"
}
This:
db.collectionName.aggregate([
{
$match: {
field: param
}
}
])
is working (gives the expected aggregation result).
How is this possible?
Well the difference is of course that the .aggregate() method returns a "cursor", where as the options you are providing to runCommand() you are not. This actually was the legacy form which returned the response as a single BSON document with all it's limitations. Cursors on the other hand do not have the limitation.
Of course you can use the runCommand() method to "make your own cursor" with the shell, since after-all that is exactly what the .aggregate() method is doing "under the covers". The same goes for all drivers, which essentially invoke the database command for everything.
With the shell, you can transform your request like this:
var cmdRes = db.runReadCommand({
"aggregate": "collectionName",
"allowDiskUse": true,
"pipeline":[
{
"$match":{
"field":param
}
}
],
"cursor": { "batchSize": 25 }
});
var cursor = new DBCommandCursor(db, cmdRes);
cursor.next(); // will actually iterate the cursor
If you really want to dig into it then type in db.collectionName.aggregate without the parenthesis () so you actually print the function definition. This will show you some other function calls and you can dig further into them and eventually see what is effectively the lines shown above, amongst a lot of other stuff.
But the way you ran it, it's a "single BSON Document" response. Run it the way shown here, and you get the same "cursor" response.

Mongo Oplog - Extracting Specifics of Updates

Say I have an entry in the inventory collection that looks like
{ _id: 1, item: "polarizing_filter", tags: [ "electronics", "camera" ]}
and I issue the command
db.inventory.update(
{ _id: 1 },
{ $addToSet: { tags: "accessories" } }
)
I have an oplog tailer, and would like to know that, specifically, "accessories" has been added to this document's tags field. As far as I can tell, the oplog always normalizes commands to use $set and $unset to maintain idempotency. In this case, the field of the entry describing the update would show something like
{$set : { tags : ["electronics", "camera", "accessories"] } }
which makes it impossible to know which tags were actually added by this update. Is there anyway to do this? I'm also curious about the analogous case in which fields are modified through deletion, e.g. through $pull. Solutions outside of the realm of an oplog tailer are welcome, as well as pointers to documentation of this command normalization process - I can't find it.
Thanks!

How can I get all the doc ids in MongoDB?

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.
You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.
One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})
I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.
Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!
I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.
Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")
i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.
One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});