Mongodb Sorting returns null instead of data - mongodb

My Mongodb dataset is like this
{
"_id" : ObjectId("5a27cc4783800a0b284c7f62"),
"action" : "1",
"silent" : "0",
"createdate" : ISODate("2017-12-06T10:53:59.664Z"),
"__v" : 0
}
Now I have to find the data whose Action value is 1 and silent value is 0. one more thing is that all the data returns is descending Order.
My Mongodb Query is
db.collection.find({'action': 1, 'silent': 0}).sort({createdate: -1}).exec(function(err, post) {
console.log(post.length);
});
Earlier It works Fine for me But Now I have 121000 entry on this Collection. Now it returns null.
I know there is some confusion on .sort()
If i remove the sort Query then everything is fine. Example
db.collection.find({'action': 1, 'silent': 0}).exec(function(err, post) {
console.log(post.length);// Now it returns data but not on Descending order
});

MongoDB limits the amount of data it will attempt to sort without an index .
This is because Mongo has to sort the data in memory or on disk, both of which can be expensive operations, particularly for queries run frequently.
In most cases, this can be alleviated by creating indexes on the fields you sort on.
you can create index with :-
db.myColl.createIndex( { createdate: 1 })
thanks !

Related

Issue with cosmos DB collection order

I'm trying to order my collection using the following query:
db.getCollection('trip').find().sort({'itinerary.0.timestamp': 1})
The result is not being correctly sorted, however I exported the full collection to a local mongoDB database and the same query works like a charm. In order to perform that sort in cosmos DB I had to create the index 'itinerary.0.timestamp'.
data example:
{
"_id" : ObjectId("6087104ca68f171ce7715448"),
"tripId" : NumberLong(38533184),
"itinerary" : [
{
"transId" : NumberLong(39800097),
"timestamp" : NumberLong(1619372446291)
},
{
"transId" : NumberLong(39800576),
"timestamp" : NumberLong(1619372446321)
},
],
"results" : [],
"tripTimeSent" : ISODate("2021-04-29T14:44:53.253Z")
}
What am I missing?
Thanks!!
The solution was to create a new field, itiTimestamp, outside the array containing the value 'itinerary.0.timestamp'. Then just order by itiTimestamp
It's true that you need to create an index for the sort field. Here's the doc related:
To apply a sort to a query, you must create an index on the fields
used in the sort operation.
==========================================
I've tested in my side, after creating wildcard index on itinerary, sort query could be executed but has no luck. I also refer to this answer(new BasicDBObject("labels.0.value", 1)) and this one(db.testCollection.find().sort({"someArray.0": 1})), they all don't work for the date format Op provided.
But when I added a properity "score":[20,55,80] in each item in the collection, I found it can be sorted by the first item when sort by score directly.
I assume that this feature hasn't supported.

Overflow sort stage buffered data usage

We have a mongoDB 2.6.4 replica set running and are trying to diagnose this behavior. We are getting the Runner error: Overflow sort stage buffered data usage of 33598393 bytes exceeds internal limit of 33554432 bytes when we expect that we would not. The collection has millions of records and has a compound index that includes the key that is being sorted. As an example
index looks like this
{ from: 1, time : -1, otherA : 1, otherB : 1}
our find is
find.collection({ from : { $in : ["a", "b"] }, time : { $gte : timestamp },
otherA : {$in:[...]}, otherB : {$in:[...]}})
.sort( time : -1 )
mongoDB parallels (clauses) this query like this:
{ from : a }, { time : { $gte : timestamp }, ... }
{ from : b }, { time : { $gte : timestamp }, ... }
In the explain each stage reports that scanAndOrder : false, which implies that the index was used to return the results. This all seems fine, however the mongoDB client gets the Runner error: Overflow sort stage buffered data usage error. This seems to imply that the sort was done in memory. Is this because it is doing an in-memory merge sort of the clauses? Or is there some other reason that this error could occur?
I was also facing the same problem of memory overflow.
I am using PHP with MongoDB to manipulate Documents.
When I am accessing a collection which is probably having large set of documents it is throwing an error.
As per the following link, it can sort only upto 32MB data at a time.
http://docs.mongodb.org/manual/reference/limits/#Sorted-Documents .
So as per the description given in MongoDocs about sorting,
I sorted the array converted from MongoCursor object with PHP rather than Mongo's sort() method.
Hope it'll help you.
Thanks.

MongoDB: phantom records in $unwind results under heavy load

I have a simple collection of elements like this
{_id: n, xs: [...]}
I'm trying to count total number of elements in all arrays
db.testRace.aggregate([{ $unwind : "$xs" }, { $group : { _id : null, count : { $sum : 1 } } }])
And it works great unless I start to do massive updates of this collection. Under heavy load of update operations I get wrong total - slightly bigger than it should be.
It can be easily reproduced.
First generate some test data
for(var i = 1; i <= 1000000; i++) {
db.testRace.insert({_id: i, xs: [i]});
}
Then simulate a lot of updates
while(true) {
var id = Math.floor((Math.random() * 1000000) + 1);
var obj = db.testRace.find({_id: id}).next();
obj.some="change";
db.testRace.update({_id: id}, obj);
}
And while it is running do aggregate unwind query.
Without load I get right result - 1000000. But when there are a lot of updates I get bigger numbers, like 1001456.
And if I run query like this
db.testRace.aggregate([{ $unwind : "$xs" }, {$group: {_id:"$xs", count:{$sum: 1}}}, { $sort : { count : -1 } }, { $limit : 2 }]);
I get
"result" : [
{
"_id" : 996972,
"count" : 2
},
{
"_id" : 997789,
"count" : 2
}
],
So it seems aggregate count some records twice.
Is it expected behaviour or maybe I'm doing aggregation wrong?
I tested on local mongodb instance, version - 2.4.9
It's expected behavior due to the way MongoDB handles read isolation. When you have a long running query (and an aggregation that reads every single document is a long running query) with updates to that data during the query it may impact whether or no the updated data is returned in the query - depending on what happens when, you could miss a document, receive it or receive it twice.
From the source code:
Any data inserted, deleted, or modified during a yield that should be
returned by a query may or may not be returned by that query. The
query could return: nothing; the data before; the data after; or both
the data before and the data after.
In short, there is no isolation between a query and an
insert/delete/update. AKA, READ_UNCOMMITTED.
https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/plan_stage.h
Your aggregation query is yielding mid query, during which some of the data is updated. This impacts the results of the query.

mongodb mapreduce doesn't return right in a sharded cluster

very interesting, mapreduce works fine in a single instance, but not on a sharded collection. as below, you may see that i got a collection and write a simple map-reduce
function,
mongos> db.tweets.findOne()
{
"_id" : ObjectId("5359771dbfe1a02a8cf1c906"),
"geometry" : {
"type" : "Point",
"coordinates" : [
131.71778292855996,
0.21856835860911106
]
},
"type" : "Feature",
"properties" : {
"isflu" : 1,
"cell_id" : 60079,
"user_id" : 35,
"time" : ISODate("2014-04-24T15:42:05.048Z")
}
}
mongos> db.tweets.find({"properties.user_id":35}).count()
44247
mongos> map_flow
function () { var key=this.properties.user_id; var value={ "cell_id":1}; emit(key,value); }
mongos> reduce2
function (key,values){ var ros={flows:[]}; values.forEach(function(v){ros.flows.push(v.cell_id);});return ros;}
mongos> db.tweets.mapReduce(map_flow,reduce2, { out:"flows2", sort:{"properties.user_id":1,"properties.time":1} })
but the results are not what i want
mongos> db.flows2.find({"_id":35})
{ "_id" : 35, "value" : { "flows" : [ null, null, null ] } }
I got lots of null and interesting all have three ones.
mongodb mapreduce seems not right on sharded collection?
The number one rule of MapReduce is:
thou shall emit the value of the same type as reduce function returneth
You broke this rule, so your MapReduce only works for small collection where reduce is only called once for each key (that's the second rule of MapReduce - reduce function may be called zero, once or many times).
Your map function emits exactly this value {cell_id:1} for each document.
How does your reduce function use this value? Well, you return a value which is a document with an array, into which you push the cell_id value. This is strange already, because that value was 1, so I'm not sure why you wouldn't just emit 1 (if you wanted to count).
But look what happens when multiple shards push a bunch of 1's into this flows array (whether it's what you intended, that's what your code is doing) and now reduce is called on several already reduced values:
reduce(key, [ {flows:[1,1,1,1]},{flows:[1,1,1,1,1,1,1,1,1]}, etc ] )
Your reduce function now tries to take each member of the values array (which is a document with a single field flows) and you push v.cell_id to your flows array. There is no cell_id field here, so of course you end up with null. And three nulls could be because you have three shards?
I would recommend that you articulate to yourself what exactly you are trying to aggregate in this code, and then rewrite your map and your reduce to comply with the rules that mapReduce in MongoDB expects your code to follow.

Get position of selected document in collection [mongoDB]

How to get position (index) of selected document in mongo collection?
E.g.
this document: db.myCollection.find({"id":12345})
has index 3 in myCollection
myCollection:
id: 12340, name: 'G'
id: 12343, name: 'V'
id: 12345, name: 'A'
id: 12348, name: 'N'
If your requirement is to find the position of the document irrespective of any order, that is not
possible as MongoDb does not store the documents in specific order.
However,if you want to know the index based on some field, say _id , you can use this method.
If you are strictly following auto increments in your _id field. You can count all the documents
that have value less than that _id, say n , then n + 1 would be index of the document based on _id.
n = db.myCollection.find({"id": { "$lt" : 12345}}).count() ;
This would also be valid if documents are deleted from the collection.
As far as I know, there is no single command to do this, and this is impossible in general case (see Derick's answer). However, using count() for a query done on an ordered id value field seems to work. Warning: this assumes that there is a reliably ordered field, which is difficult to achieve in a concurrent writer case. In this example _id is used, however this will only work with a single writer case.:
MongoDB shell version: 2.0.1
connecting to: test
> use so_test
switched to db so_test
> db.example.insert({name: 'A'})
> db.example.insert({name: 'B'})
> db.example.insert({name: 'C'})
> db.example.insert({name: 'D'})
> db.example.insert({name: 'E'})
> db.example.insert({name: 'F'})
> db.example.find()
{ "_id" : ObjectId("4fc5f040fb359c680edf1a7b"), "name" : "A" }
{ "_id" : ObjectId("4fc5f046fb359c680edf1a7c"), "name" : "B" }
{ "_id" : ObjectId("4fc5f04afb359c680edf1a7d"), "name" : "C" }
{ "_id" : ObjectId("4fc5f04dfb359c680edf1a7e"), "name" : "D" }
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
{ "_id" : ObjectId("4fc5f053fb359c680edf1a80"), "name" : "F" }
> db.example.find({_id: ObjectId("4fc5f050fb359c680edf1a7f")})
{ "_id" : ObjectId("4fc5f050fb359c680edf1a7f"), "name" : "E" }
> db.example.find({_id: {$lte: ObjectId("4fc5f050fb359c680edf1a7f")}}).count()
5
>
This should also be fairly fast if the queried field is indexed. The example is in mongo shell, but count() should be available in all driver libs as well.
This might be very slow but straightforward method. Here you can pass as usual query. Just I am looping all the documents and checking if condition to match the record. Here I am checking with _id field. You can use any other single field or multiple fields to check it.
var docIndex = 0;
db.url_list.find({},{"_id":1}).forEach(function(doc){
docIndex++;
if("5801ed58a8242ba30e8b46fa"==doc["_id"]){
print('document position is...' + docIndex);
return false;
}
});
There is no way that MongoDB can return this as it does not keep documents in order in the database, just like MySQL f.e. doesn't name row numbers.
The ObjectID trick from jhonkola will only work if only one client creates new elements, as the ObjectIDs are generated on the client side, with the first part being a timestamp. There is no guaranteed order if different clients talk to the same server. Still, I would not rely on this.
I also don't quite understand what you are trying to do though, so perhaps mention that in your question? I can then update the answer.
Restructure your collection to include the position of any entry i.e {'id': 12340, 'name': 'G', 'position': 1} then when searching the database collection(myCollection) using the desired position as a query
The queries I use that return the entire collection all use sort to get a reproducible order, find.sort.forEach works with the script above to get the correct index.