mongodb, pymongo, aggregate gives strange output (something about cursor) - mongodb

I am trying get a list of people with the most entries in my database.
print db.points.aggregate(
[
{
"$group":
{
"_id": "$created.user",
"count":{"$sum":1}
}
},
{
"$sort":
{"count":-1}
}
]
)
An entry looks like this :
{
u'id': u'342902',
u'_id': ObjectId('555af76a029d3b1b0ff9a4be'),
u'type': u'node',
u'pos': [48.9979746, 8.3719741],
u'created': {
u'changeset': u'7105928',
u'version': u'4',
u'uid': u'163673',
u'timestamp': u'2011-01-27T18:05:54Z',
u'user': u'Free_Jan'
}
}
I know that created.user exists and is otherwise accessible.
Still the output i get is:
<pymongo.command_cursor.CommandCursor object at 0x02ADD6B0>
Shouldn't I get a sorted list ?

The result of an aggregation query is a cursor, as for a regular find query. In case of pymongo the CommandCursor is iterable, thus you are able to do any of the following:
cursor = db.points.aggregate(...)
# Option 1
print(list(cursor))
# Option 2
for document in cursor:
print(document)
Note: as arun noticed, in both cases, i.e. after you create a list out of the cursor, or iterate in the for loop, you will not be able to re-iterate over the cursor. In that case the first option becomes better, if you want to use it in future, as you can use the obtained list as much as you want, because it is in the memory already.
The reason of not being able to reiterate is that the cursor is actually on the server, and it send the data chunk-by-chunk, and after it has sent you all the data (or the server terminates) the cursor gets destroyed.

Related

Mongodb use foreach on query and update results its results

I have a mongo collection where a field is supposed to point to another document's id in the same collection, but instead it is pointing to its "number". I need to perform an update on them but I'm having some problems on forming the query. Could you help me?
The structure of the document is like this:
{
"_id": "269410e2-cebf-40f1-a81f-fdce34185cdc",
"number": 1471,
"alternativeLocationId": "9871",
"locationType": "DUMMY"
},
{
"_id": "2945b24a-b82f-45a9-ad06-a884379b5597",
"number": 9871,
"locationType": "MAIN"
}
So as asked, I'd need to make document with number 1471 "alternativeLocationId" to be "2945b24a-b82f-45a9-ad06-a884379b5597" instead of 9871 (Note that the referenced documents are not locationType "DUMMY" nor have this alternativeLocationId field).
The query I've done so far goes like this, but when executed its not doing any changes:
db.location.find({alternativeLocationId: {$exists:true}}).forEach(
function (loc) {
var correctLocation = db.location.findOne({number: loc.alternativeLocationId});
db.location.update(
{_id: loc._id},
{$set: {alternativeLocationId: correctLocation._id} }
);
}
);
As mentioned by user20042973 the issue was the type mismatch between alternativeLocationId and locationNumber, after converting the value to int when looking for it, it works perfectly.

Mongo aggregaton does not return whole result only a part of it

Can somebody tell me please where is the problem in this simple aggregation command:
db.test.aggregate([
{
$group: {
_id: "$type",
numbers: { $sum: 1 }
}
}
]).pretty()
Collection has about 2 millions of documents and everyone has type field. But the result returns only few of them as result + message "Type "it" for more" If I type "it" it returns next partial aggregation result till the end. But I want to have the whole aggregation in one result. What am I doing wrong?
Thanks.
MongoDB won't return you whole bunch of data because it has built-in pagination.
In other case (2 mil of documents) would crash your server/computer as it runs out of memory.
But, if you'd like to get all bunch of data, it's better to store it with script.
You can write script with your programming language, request db, paginate through data and store in some variable.
Example

How to force MongoDB pullAll to disregard document order

I have a mongoDB document that has the following structure:
{
user:user_name,
streams:[
{user:user_a, name:name_a},
{user:user_b, name:name_b},
{user:user_c, name:name_c}
]
}
I want to use $pullAll to remove from the streams array, passing it an array of streams (the size of the array varies from 1 to N):
var streamsA = [{user:"user_a", name:"name_a"},{user:"user_b", name:"name_b"}]
var streamsB = [{name:"name_a", user:"user_a"},{name:"name_b", user:"user_b"}]
I use the following mongoDB command to perform the update operation:
db.streams.update({name:"user_name", {"$pullAll:{streams:streamsA}})
db.streams.update({name:"user_name", {"$pullAll:{streams:streamsB}})
Removing streamsA succeeds, whereas removing streamsB fails. After digging through the mongoDB manuals, I saw that the order of fields in streamsA and streamsB records has to match the order of fields in the database. For streamsB the order does not match, that's why it was not removed.
I can reorder the streams to the database document order prior to performing an update operation, but is there an easier and cleaner way to do this? Is there some flag that can be set to update and/or pullAll to ignore the order?
Thank You,
Gary
The $pullAll operator is really a "special case" that was mostly intended for single "scalar" array elements and not for sub-documents in the way you are using it.
Instead use $pull which will inspect each element and use an $or condition for the document lists:
db.streams.update(
{ "user": "user_name" },
{ "$pull": { "streams": { "$or": streamsB } }}
)
That way it does not matter which order the fields are in or indeed look for an "exact match" as the current $pullAll operation is actually doing.

Print document value in mongodb shell

I want to print the value for this JSON document in the mongo shell. Like a simple console output, without creating a new collection or document.
Thanks in advance
I found a solution, by using .forEach() to apply a JavaScript method:
db.widget.find(
{ id : "4" },
{quality_level: 1, _id:0}
).forEach(function(x) {
print(x.quality_level);
});
db.collection.find() returns a cursor.
There are a bunch of methods for the cursor object which can be used to get the cursor information, iterate, get and work with the individual documents from the query result (in the Mongo Shell). For example,
let result = db.widget.find( { id : "4" }, { quality_level: 1, _id: 0 } );
The methods print, printjson and tojson are useful for printing the query results assigned to a cursor and iterated over it, in the Mongo Shell.
In this particular example, each of the following statements will print the expected output:
result.forEach( doc => print( doc.quality_level ) );
result.forEach( doc => print( tojson(doc)) );
result.forEach( printjson );
Note the query must be run each time the cursor is to be iterated, as the cursor gets closed after iterating thru it. But, one can use the cursor.toArray() method to get a JavaScript array, and iterate the array and work with the query result documents without running the query again.
NOTES:
The db.collection.findOne() method returns a document (not a cursor).
Iterate a Cursor in the mongo Shell
Mongo Shell cursor methods
To print each document when returning a cursor:
db.widget.find().forEach(printjson);
db.widget.findOne({"id":4},{quality_level:1,_id:0}).quality_level
The projection is not necessary for this to work, but it reduces the data which is to be transferred in case of a sharded collection.

How can I get all the doc ids in MongoDB?

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.
You can do this in the Mongo shell by calling map on the cursor like this:
var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })
The result is that a is an array of just the _id values.
The way it works in Node is similar.
(This is MongoDB Node driver v2.2, and Node v6.7.0)
db.collection('...')
.find(...)
.project( {_id: 1} )
.map(x => x._id)
.toArray();
Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.
One way is to simply use the runCommand API.
db.runCommand ( { distinct: "distinct", key: "_id" } )
which gives you something like this:
{
"values" : [
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
],
"stats" : {
"n" : 7,
"nscanned" : 7,
"nscannedObjects" : 0,
"timems" : 2,
"cursor" : "DistinctCursor"
},
"ok" : 1
}
However, there's an even nicer way using the actual distinct API:
var ids = db.distinct.distinct('_id', {}, {});
which just gives you an array of ids:
[
ObjectId("54cfcf93e2b8994c25077924"),
ObjectId("54d672d819f899c704b21ef4"),
ObjectId("54d6732319f899c704b21ef5"),
ObjectId("54d6732319f899c704b21ef6"),
ObjectId("54d6732319f899c704b21ef7"),
ObjectId("54d6732319f899c704b21ef8"),
ObjectId("54d6732319f899c704b21ef9")
]
Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:
db.collection('c').distinct('_id', {}, {}, function (err, result) {
// result is your array of ids
})
I also was wondering how to do this with the MongoDB Node.JS driver, like #user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:
myCollection.aggregate([
{$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
{$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
{$group: {_id:null, ids: {$addToSet: "$_id"}}}
]).exec()
The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:
[ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]
Then just use the contents of result[0].ids somewhere else.
The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.
Another way to do this on mongo console could be:
var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)
Hope that helps!!!
Thanks!!!
I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:
db.c.find({},{_id:1});
would be the answer.
It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.
I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?
I have no idea why it would do that, but the solution is simple:
db.c.find({},{_id:1}).hint({_id:1});
or in Java:
query.withHint("{_id:1}");
Then it was able to proceed along as normal, using stream style:
createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
map(MortgageDocument::getId).forEach(transformer);
Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.
Try with an agregation pipeline, like this:
db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}
])
this gona return a documents array with this structure
_id: ObjectId("5fc98977fda32e3458c97edd")
i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.
One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.
const idArray = await Model.distinct('_id', {}, function (err, result) {
// result is your array of ids
return result;
});