How can I get all the doc ids in MongoDB? - mongodb

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.

You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.

One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})

I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.

Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!

I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.

Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")

i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.

One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});

Related

Mongodb aggregate $count

I would like to count the number of documents returned by an aggregation.
I'm sure my initial aggregation works, because I use it later in my programm. To do so I created a pipeline variable (here called pipelineTest, ask me if you want to see it in detail, but it's quite long, that's why I don't give the lines here).
To count the number of documents returned, I push my pipeline with :
{$count: "totalCount"}
Now I would like to get (or log) totalCount value. What should I do ?
Here is the aggregation :
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.?)
Thanks for your help, I read lot and lot doc about aggregation and I still don't understand how to read the result of an aggregation...
Assuming you're using async/await syntax - you need to await on the result of the aggregation.
You can convert the cursor to an array, get the first element of that array and access totalCount.
pipelineTest.push({$count: "totalCount"});
cursorTest = await collection.aggregate(pipelineTest, options).toArray();
console.log(cursorTest[0].totalCount);
Aggregation
db.mycollection.aggregate([
{
$count: "totalCount"
}
])
Result
[ { totalCount: 3 } ]
Your Particulars
Try the following:
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.totalCount)

How to fetch just the "_id" field from MongoDB find()

I wish to return just the document id's from mongo that match a find() query.
I know I can pass an object to exclude or include in the result set, however I cannot find a way to just return the _id field.
My thought process is returning just this bit of information is going to be way more efficient (my use case requires no other document data just the ObjectId).
An example query that I expected to work was:
collection.find({}, { _id: 1 }).toArray(function(err, docs) {
...
}
However this returns the entire document and not just the _id field.
You just need to use a projection to find what ya want.
collection.find({filter criteria here}, {foo: 0, bar: 0, _id: 1});
Since I don't know what your document collection looks like this is all I can do for you. foo: 0 for example is exclude this property.
I found that using the cursor object directly I can specify the required projection. The mongodb package on npm when calling toArray() is returning the entire document regardless of the projection specified in the initial find(). Fixed working example below that satisfies my requirements of just getting the _id field.
Example document:
{
_id: new ObjectId(...),
test1: "hello",
test2: "world!"
}
Working Projection
var cursor = collection.find({});
cursor.project({
test1: 0,
test2: 0
});
cursor.toArray(function(err, docs) {
// Importantly the docs objects here only
// have the field _id
});
Because _id is by definition unique, you can use distinct to get an array of the _id values of all documents as:
collection.distinct('_id', function(err, ids) {
...
}
you can do like this
collection.find({},'_id').toArray(function(err, docs) {
...
}

$group which _id equals null or Array.prototype.length?

After performing a aggregation operation on a Mongo collection, my last step is to get the length of the array result. Now I have two options:
Use one more $group stage which _id equals null:
db.col.aggregate([
// ...,
{
$group: {
_id: null,
length: { $sum: 1},
},
},
]);
Or use the .length method:
db.col.aggregate([
// ...
]).length;
Both of them work well and give me the expected result. I just wonder which way is better in term of performance. What do you think?
I would use the .length method as it's likely to be an attribute in the JS Array object (it might depend on the JS engine your code is using).
I believe that using $group will make the mongo engine to process all the data and then count how many document it returns, which would be much slower.
As felix said, you can run a small benchmark and see which option is faster.

Updating multiple MongoDB records in Sails.js

I need to update multiple records in mongodb.
From frontend logic , i got the array of id's as below.
ids: [ [ '530ac94c9ff87b5215a0d6e6', '530ac89a7345edc214618b25' ] ]
I have an array of ids as above , i need to update the folder field for all the records in that array.
I tried passing the id's to mongodb query as below , but still that doesn't work.
Post.native(function(err, collection) {
collection.update({
_id : {
"$in" : ids
}
}, { folder : 'X'}, {
multi : true
}, function(err, result) {
console.log(result);
});
});
Please help.
There seem to be two possible problems.
1) your ids array is not an array of ids, it's an array which has a single element which is itself an array, which has two elements. An array of ids would be `[ 'idvalue1', 'idvalue2']
2) your id values inside of arrays are strings - is that how you are storing your "_id" values? If they are ObjectId() type then they are not a string but a type ObjectId("stringhere") which is not the same type and won't be equal to "stringhere".
There is no reason to use the native method in this case. Just do:
Post.update({id : ids}, {folder : 'X'}).exec(console.log);
Waterline automatically does an "in" query when you set a criteria property to an array, and Sails-Mongo automatically translates "id" to "_id" and handles ObjectId translation for you.
Those strings look like the string representation of mongod ObjectIds, so probably what you want to do is turn them into ObjectIds before querying. Assuming you've corrected your problem with the extra level of nesting in the array, that is:
ids = ['530ac94c9ff87b5215a0d6e6', '530ac89a7345edc214618b25']
Then you want to do something like this:
oids = []
for (var i in ids)
oids.push(ObjectId(ids[i]))
db.c.find({_id: {$in: oids}})
Does that fix your problem?

Can the same MongoDB document show up more than once in a single cursor using a mulitkey index?

I'm considering bundling time-sequence data together in session documents. Inside each session, there would be an array of events. Each event would have a timestamp. I know that I can create a multikey index on the timestamp of those events, but I'm curious what mechanism MongoDB uses to prevent the same document from showing up twice in one query.
To clarify, imagine a collection of sessions with the following documents:
{
_id: 'A',
events: [
{time: '10:00'},
{time: '15:00'}
]
}
{
_id: 'B',
events: [
{time: '12:00'}
]
}
If I add a multikey index with db.sessions.ensureIndex({'events.time' : 1}), I would expect the b-tree of that index to look like this:
'10:00' => 'A'
'12:00' => 'B'
'15:00' => 'A'
If I query the collection with {'events.time': {$gte: '10:00'}}, MongoDB scans the b-tree and returns:
{ "_id" : "A", "events" : [ { "time" : "10:00" }, { "time" : "15:00" } ] }
{ "_id" : "B", "events" : [ { "time" : "12:00" } ] }
How does Mongo prevent document A from showing up a second time as the third result in the cursor? For small index scans, it could just keep track of which documents had already been seen, but what happens if the index is enormous? Is there ever a case where the same document would show up more than once in a singe cursor?
My assumption is that it would not. Mongo could look at the document it is scanning and detect that it already would have matched earlier in the scan by inspecting earlier entries in the indexed array. However, I cannot find any mention of this behavior in the MongoDB documentation, and it is important to actually know what to expect.
(NOTE: I do know that it is possible for a document to show up in a single query more than once if the document is modified while the cursor is being scanned. That shouldn't pose a problem for queries on time-sequence data where timestamps are never edited. Even if a new event is added to a session during a scan, if Mongo uses something like the detection mechanism I mentioned above, it should be able to omit the moved document from query results.)
I cannot find any mention of this behavior in the MongoDB
documentation, and it is important to actually know what to expect.
Internals of implementation are seldom mentioned in the documentation, and after all, what you describe is the expected behavior.
There is code to deduplicate a result set and there are tests to make sure that it's working correctly. After all, a multi-key index isn't the primary use case for such functionality - if you have an $or clause in your query, the results must be de-duplicated as well.