MongoDB aggregation query - mongodb

I am using mongoDb 2.6.4 and still getting an error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422033698000, 105),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Reading different advices I came across of saving result in another collection using $out. My query looks like this now:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" } ,
{
$out : "tmp"
}]
)
But I am getting different error:
uncaught exception: aggregate failed:
{"errmsg" : "exception: insert for $out failed: { lastOp: Timestamp 1422034172000|25, connectionId: 625789, err: \"insertDocument :: caused by :: 11000 E11000 duplicate key error index: duties_and_taxes.tmp.agg_out.5.$_id_ dup key: { : ObjectId('54c12d784c1b2a767b...\", code: 11000, n: 0, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1422034172000|25, electionId: ObjectId('542c2900de1d817b13c8d339') } }",
"code" : 16996,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422034172000, 26),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Can someone has a solution?

The error is due to the $unwind step in your pipeline.
When you unwind by a field having n elements, n copies of the same documents are produced with the same _id. Each copy having one of the elements from the array that was used to unwind. See the below demonstration of the records after an unwind operation.
Sample demo:
> db.t.insert({"a":[1,2,3,4]})
WriteResult({ "nInserted" : 1 })
> db.t.aggregate([{$unwind:"$a"}])
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 1 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 2 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 3 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 4 }
>
Since all these documents have the same _id, you get a duplicate key exception(due to the same value in the _id field for all the un-winded documents) on insert into a new collection named tmp.
The pipeline will fail to complete if the documents produced by the
pipeline would violate any unique indexes, including the index on the
_id field of the original output collection.
To solve your original problem, you could set the allowDiskUse option to true. It allows, using the disk space whenever it needs to.
Optional. Enables writing to temporary files. When set to true,
aggregation operations can write data to the _tmp subdirectory in the
dbPath directory. See Perform Large Sort Operation with External Sort
for an example.
as in:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" }] , // note, the pipeline ends here
{
allowDiskUse : true
});

Related

mongodb getting on simple $push The positional operator did not find the match needed from the query

I have this simple update api invocation :
this is my document :
{
"_id" : ObjectId("577a5b9a89xxx32a1"),
"oid" : {
"a" : 0,
"b" : 0,
"c" : NumberLong("1260351143035")
},
"sessions" : [
{
}
]
}
Then i try to insert 1 element into sessions array :
db.getCollection('CustomerInfo').update({"oid.c":1260351143035},{$push:{"sessions.$.asessionID":"test123"}})
but im getting this error:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16837,
"errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: sessions.$.asessionID"
}
})
using $set im getting the same error
As the error implies,
"The positional operator did not find the match needed from the query.
Unexpanded update: sessions.$.asessionID",
the positional operator will work if the array to be updated is also part of the query. In your case, the query only involves the embedded document oid. The best update operator to use in your case is the $set instead.
You can include the sessions array in the query, for example:
db.getCollection('CustomerInfo').update(
{
"oid.c": 1260351143035,
"sessions.0": {} // query where sessions array first element is an empty document
/* "sessions.0": { "$exists": true } // query where sessions array first element exists */
},
{
"$set": { "sessions.$.asessionID": "test123" }
}
)
As the documentation says, you can do as the follow:
db.getCollection('CustomerInfo').update(
{ "oid.c": 1260351143035 },
{ $push: {
"sessions": {
"asessionID":"test123"
}
}
}
)

MongoDB: updating the value of an inner BSON

I have a document with a schema in mongodb that looks like this:
{
"_id" : ObjectId("572f88424de8c74a69d4558c"),
"storecode" : "ABC",
"credit" : true,
"group" : [
{
"group_name" : "Frequent_Buyer",
"time" : NumberLong("1462732865712"),
}
],
}
I want to add on the _id part for the first object in the array under group so it looks like this:
{
"_id" : ObjectId("572f88424de8c74a69d4558c"),
"storecode" : "ABC",
"credit" : true,
"group" : [
{
"group_name" : "Frequent_Buyer",
"time" : NumberLong("1462732865712"),
"_id" : "573216fee4430577cf35e885"
}
],
}
When I try this code it fails:
db.customer.update({ "_id": ObjectId("572f88424de8c74a69d4558c") },{ "$set": { "group.$._id": "573216fee4430577cf35e885"} })
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16837,
"errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: groups.$._id"
}
})
However, if I adjust the code slightly and add on an extra criteria for querying, it works:
db.customer.update({ "_id": ObjectId("572f88424de8c74a69d4558c"), "group.groupname": "Frequent_Buyer" },{ "$set": { "group.$._id": "573216fee4430577cf35e885"} })
Results:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
Why did the first command not work but the second command work?
This is the expected result. To use the positional $ update operator, the array field must appear as part of the query document as mentioned in the documentation.
When used with update operations, e.g. db.collection.update() and db.collection.findAndModify(),
the positional $ operator acts as a placeholder for the first element that matches the query document, and
the array field must appear as part of the query document.
For example If you don't want to filter your documents using group_name, simply add group: { "$exists": true } or "group.0": { "$exists": true } to your query criteria. You query will then look like this:
db.customer.updateOne(
{
"_id": ObjectId("572f88424de8c74a69d4558c"),
"group.0": { "$exists": true}
},
{ "$set": { "group.$._id": "573216fee4430577cf35e885" } }
)
Last and not least, you should be using updateOne or updateMany because update is deprecated in official language driver.

Get back BulkWriteError from the Mongo shell

I want to store writeErrors documents into another collection in MongoDB while performing a bulk.execute(). I am basically doing a bulk insert/update but want to capture all the errors into another collection in parallel to the bulk operation.
I can see the BulkWriteError object is returned in Mongo-Shell, I can also see the writeErrors array in the object. But how can I capture it?
In accordance with https://github.com/mongodb/mongo/blob/master/src/mongo/shell/bulk_api.js (line 363):
// Bulk errors are basically bulk results with additional error information
BulkWriteResult.apply(this, arguments);
So you can use the BulkWriteResult.getWriteErrors() method.
try {
bulk.execute();
...
} catch(err) {
if ("name" in err && err.name == 'BulkWriteError') {
var wErrors = err.getWriteErrors();
wErrors.forEach(function(doc){
db.errlog.insert(doc);
});
}
}
I can see the BulkWriteError object is returned in Mongo-Shell
It is not returned. This is a raised exception. You need a try...catch block to get it back:
> bulk = db.w.initializeUnorderedBulkOp();
> bulk.insert({_id:1})
> bulk.insert({_id:1})
> try { result = bulk.execute() } catch(e) { err = e }
> err
BulkWriteError({
"writeErrors" : [
{
"index" : 1,
"code" : 11000,
"errmsg" : "E11000 duplicate key error index: test.w.$_id_ dup key: { : 1.0 }",
"op" : {
"_id" : 1
}
}
],
"writeConcernErrors" : [ ],
"nInserted" : 1,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
Surprisingly enough, it is rather painful to store the BulkWriteError in a collection. One easy way of doing (not necessary an elegant way, though) is to parse the JSON representation of the error to get back the field(s) that interest you.
> db.errlog.insert(JSON.parse(err.tojson()).writeErrors)
// ^^^^^^^^^^^^^^^^^^^^^^^^
// parse the JSON representation of `BulkWriteError`
That way, you get back the array of write errors, that insert will happily store in the collection:
> db.errlog.find().pretty()
{
"_id" : ObjectId("55619737c0c8238aef6e21c5"),
"index" : 0,
"code" : 11000,
"errmsg" : "E11000 duplicate key error index: test.w.$_id_ dup key: { : 1.0 }",
"op" : {
"_id" : 1
}
}

$group needs an array for _id but $out won't handle it

I need to sum the number of occurrences of an array. I need to output this to a collection but when I try and use the $out keyword, it fails with "can't use an array for _id\"
Is there any way to project the value of the _id field from the group stage into a new key and create a new _id?
db.djnNews_filtered.aggregate([
{$unwind:"$processed_text.headline_trigrams"},
{$group:{_id:"$processed_text.headline_trigrams","num":{$sum:1}}},
{$sort:{"num":-1}}
])
{ "_id" : [ "Reports", "First", "Quarter" ], "num" : 279 }
{ "_id" : [ "ST", "upside", "prevails" ], "num" : 167 }
{ "_id" : [ "First", "Quarter", "Results" ], "num" : 160 }
{ "_id" : [ "Announces", "First", "Quarter" ], "num" : 155 }
db.djnNews_filtered.aggregate([
{$unwind:"$processed_text.headline_trigrams"},
{$group:{_id:"$processed_text.headline_trigrams","num":{$sum:1}}},
{$sort:{"num":-1}},
{$out:"new_collection"}
])
assert: command failed: {
"errmsg" : "exception: insert for $out failed: { connectionId: 3, err: \"can't use an array for _id\", code: 2, n: 0, ok: 1.0 }",
"code" : 16996,
"ok" : 0
} : aggregate failed
In MongoDB, you can't have a document with an _id that is an array.
Can you simply $project the array to a different field?
db.djnNews_filtered.aggregate([
{$unwind:"$processed_text.headline_trigrams"},
{$group:{_id:"$processed_text.headline_trigrams","num":{$sum:1}}},
{$sort:{"num":-1}},
{$project: {trigram: "$_id", count: "$num"}},
{$out:"new_collection"}
])
Also, I'm not sure what your intention is with sorting it before inserting the documents into a collection. If the sort was only for looking at the data before you decided to add it to a collection, you might want to consider removing that step.

Mongodb find near maxdistance

I am firing the following query in mongodb
db.acollection.find({
"field.location": {
"$near": [19.0723058, 73.00067739999997]
},
$maxDistance : 100000
}).count()
and getting the following error -
uncaught exception: count failed: {
"shards" : {
},
"cause" : {
"errmsg" : "exception: unknown top level operator: $maxDistance",
"code" : 2,
"ok" : 0
},
"code" : 2,
"ok" : 0,
"errmsg" : "failed on : Shard ShardA"
}
You did it wrong. The $maxDistance argument is a "child" of the $near operator:
db.acollection.find({
"field.location": {
"$near": [19.0723058, 73.00067739999997],
"$maxDistance": 100000
}
}).count()
Has to be within the same expression.
Also look at GeoJSON when you are making a new application. It is the way you should be storing in the future.