"too much data for sort()" on a small collection - mongodb

When trying to do a find and sort on a mongodb collection I get the error below. The collection is not large at all - I have only 28 documents and I start getting this error when I cross the limit of 23 records.
The special thing about that document is that it holds a large ArrayCollection inside but I am not fetching that specific field at all, I am only trying to get a DateTime field.
db.ANEpisodeBreakdown.find({creationDate: {$exists:true}}, {creationDate: true} ).limit(23).sort( { creationDate: 1}
{ "$err" : "too much data for sort() with no index. add an index or specify a smaller limit", "code" : 10128 }

So the problem here is a 32MB limit and you have no index that can be used as an "index only" or "covered" query to get to the result. Without that, your "big field" still gets loaded in the data to sort.
Easy to replicate;
var string = "";
for ( var n=0; n < 10000000; n++ ) {
string += 0;
}
for ( var x=0; x < 4; x++ ) {
db.large.insert({ "large": string, "date": new Date() });
sleep(1000);
}
So this query will blow up, unless you limit to 3:
db.large.find({},{ "date": 1 }).sort({ "date": -1 })
To overcome this:
Create an index on date (and other used fields) so the whole document is not loaded in your covered index query:
db.large.ensureIndex({ "date": 1 })
db.large.find({},{ "_id": 0, "date": 1 }).sort({ "date": -1 })
{ "date" : ISODate("2014-07-07T10:08:33.067Z") }
{ "date" : ISODate("2014-07-07T10:08:31.747Z") }
{ "date" : ISODate("2014-07-07T10:08:30.391Z") }
{ "date" : ISODate("2014-07-07T10:08:29.038Z") }
Don't index and use aggregate instead, as the $project there does not suffer the same limitations as the document actually gets altered before passing to $sort.
db.large.aggregate([
{ "$project": { "_id": 0, "date": 1 }},
{ "$sort": {"date": -1 }}
])
{ "date" : ISODate("2014-07-07T10:08:33.067Z") }
{ "date" : ISODate("2014-07-07T10:08:31.747Z") }
{ "date" : ISODate("2014-07-07T10:08:30.391Z") }
{ "date" : ISODate("2014-07-07T10:08:29.038Z") }
Either way gets you the results under the limit without modifying cursor limits in any way.

Without an index, the size you can use for a sort only extends over shellBatchSize which by default is 20.
DBQuery.shellBatchSize = 23;
This should do the trick.

The problem is that projection in this particular scenario still loads the entire document, it just sends it to your application without the large array field.
As such MongoDB is still sorting with too much data for its 32mb limit.

Related

Using $sum on a existent field returns a value of 0 [duplicate]

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

Select data where the range between two different fields contains a given number

I want to make a find query on my database for documents that have an input value between or equal to these 2 fields, LOC_CEP_INI and LOC_CEP_FIM
Example: user input a number to the system with value : 69923994, then I use this input to search my database for all documents that have this value between the range of the fields LOC_CEP_INI and LOC_CEP_FIM.
One of my documents (in this example this document is selected by the query because the input is inside the range):
{
"_id" : ObjectId("570d57de457405a61b183ac6"),
"LOC_CEP_FIM" : 69923999, //this field is number
"LOC_CEP_INI" : 69900001, // this field is number
"LOC_NO" : "RIO BRANCO",
"LOC_NU" : "00000016",
"MUN_NU" : "1200401",
"UFE_SG" : "AC",
"create_date" : ISODate("2016-04-12T20:17:34.397Z"),
"__v" : 0
}
db.collection.find( { field: { $gt: value1, $lt: value2 } } );
https://docs.mongodb.com/v3.2/reference/method/db.collection.find/
refer this mongo provide range facility with $gt and $lt .
You have to invert your field names and query value.
db.zipcodes.find({
LOC_CEP_INI: {$gte: 69923997},
LOC_CEP_FIM: {$lte: 69923997}
});
For your query example to work, you would need your documents to hold an array property, and that each item in this prop hold a 69923997 prop. Mongo would then check that this 69923997 prop has a value that is both between "LOC_CEP_INI" and "LOC_CEP_FIM" for each item in your array prop.
Also I'm not sure whether you want LOC_CEP_INI <= 69923997 <= LOC_CEP_FIM or the contrary, so you might need to switch the $gte and $lte conditions.
db.zipcodes.find( {
"LOC_CEP_INI": { "$lte": 69900002 },
"LOC_CEP_FIM": { "$gte": 69900002 } })
Here is the logic use it as per the need:
Userdb.aggregate([
{ "$match": { _id: ObjectId(session._id)}},
{ $project: {
checkout_list: {
$filter: {
input: "$checkout_list",
as: "checkout_list",
cond: {
$and: [
{ $gte: [ "$$checkout_list.createdAt", new Date(date1) ] },
{ $lt: [ "$$checkout_list.createdAt", new Date(date2) ] }
]
}
}
}
}
}
Here i use filter, because of some reason data query on nested data is not gets succeed in mongodb

MongoDB :: Order Search result depend on search condition

I have a data
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
},
{ "name":"QQQ",
"keyword":"key3",
"city":"xyz"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
}]
and i need to search records which have name= "BS" OR keyword="key2" with the help of query
db.collection.find({"$OR" : [{"name":"BS"}, {"keyword":"Key2"}]});
These records i need in the sequence
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
}]
but i am getting in following sequences:
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
}]
Please provide some suggestion i am stuck with this problem since 2 days.
Thanks
The order of results returned by MongoDB is not guaranteed unless you explicitly sort your data using the sort function. For smaller datasets you maybe "lucky" in the sense that the results are always returned in the same order, however, for bigger datasets and in particular when you have sharded Mongo clusters this is very unlikely. As proposed by Yathish you need to explicitly order your results using the sort function. Based on the suggested output, it seems you want to sort by name in descending order so I have set the sorting flag to -1 for the field name.
db.collection.find({"$or" : [{"name":"BS"}, {"keyword":"Key2"}]}).sort({"name" : -1});
If you need a more complex sorting algorithm as specified in your comment, you can convert your results to a Javascript array and create a custom sort function. This sort function will first list documents with a name equal to "BS" and then documents containing the keyword "Key2"
db.data.find({
"$or": [{
"name": "BS"
}, {
"keyword": "Key2"
}]
}).toArray().sort(function(doc1, doc2) {
if (doc1.name == "BS" && doc2.keyword == "Key2") {
return -1
} else if (doc2.name == "BS" && doc1.keyword == "Key2") {
return 1
} else {
return doc1.name < doc2.name
}
});

Difference between documents returned from Two MongoDB queries

I am new to MongoDB and NoSQL and have the following query:
I have the following documents in a sample MongoDB Collection
Data Collection
{ "Name" : "A",
"date" : "2015-04-29"
},
{ "Name" : "B",
"date" : "2015-04-29"
},
{ "Name" : "A",
"date" : "2015-04-30"
}
I want to run a query on the comparing the dates trying to find out which name was not present on date = "2015-04-30" but was present on date = "2015-04-29".
The result of the above query would be :
{ "Name" : "B" }
Basically trying to compare results from two mongodb queries and then showing a result.
Please let me know if this would be possible to do.
I would achieve it with sets.
First take the elements that you are interested in:
var need = db.a.distinct('Name', {date: '2015-04-29'})
take the element that you are not interested:
var doNotNeed = db.a.distinct('Name', {date: '2015-04-30'})
Now if you subtract the set(need) with set(doNotNeed), you will get your answers. Mongoshell does not support sets, but you can use dictionaries:
var needSet = {}
for (var i=0; i < need.length; i++){
needSet[need[i]] = 1;
}
for (var i=0; i < doNotNeed.length; i++){
if (doNotNeed[i] in needSet){
delete needSet[doNotNeed[i]]
}
}
needSet would have all the elements that satisfy your query. You can get them by calling Object.keys(needSet).
You can do that using aggregation pipeline operators. First you will need to group your document by Name using $group. The $push is a accumulator operator that let you create and array of date field. Use $match to field document with given criteria.
db.data.aggregate(
[
{ "$group": { "_id": "$Name", "date": { "$push": "$date" }}},
{ "$match": { "date": { "$not": { "$all": ["2015-04-29", "2015-04-30" ]}}}},
]
)
I had similar issue, try this snippet
var arr=[];
db.Collection.find({"date":"2015-04-30"}).forEach(function(item){
var val=item.Name;
var found=db.Collection.findOne({"date":"2015-04-29","Name":val});
if (!found)
arr.push(val);
});
printjson(arr);

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.