I want to store writeErrors documents into another collection in MongoDB while performing a bulk.execute(). I am basically doing a bulk insert/update but want to capture all the errors into another collection in parallel to the bulk operation.
I can see the BulkWriteError object is returned in Mongo-Shell, I can also see the writeErrors array in the object. But how can I capture it?
In accordance with https://github.com/mongodb/mongo/blob/master/src/mongo/shell/bulk_api.js (line 363):
// Bulk errors are basically bulk results with additional error information
BulkWriteResult.apply(this, arguments);
So you can use the BulkWriteResult.getWriteErrors() method.
try {
bulk.execute();
...
} catch(err) {
if ("name" in err && err.name == 'BulkWriteError') {
var wErrors = err.getWriteErrors();
wErrors.forEach(function(doc){
db.errlog.insert(doc);
});
}
}
I can see the BulkWriteError object is returned in Mongo-Shell
It is not returned. This is a raised exception. You need a try...catch block to get it back:
> bulk = db.w.initializeUnorderedBulkOp();
> bulk.insert({_id:1})
> bulk.insert({_id:1})
> try { result = bulk.execute() } catch(e) { err = e }
> err
BulkWriteError({
"writeErrors" : [
{
"index" : 1,
"code" : 11000,
"errmsg" : "E11000 duplicate key error index: test.w.$_id_ dup key: { : 1.0 }",
"op" : {
"_id" : 1
}
}
],
"writeConcernErrors" : [ ],
"nInserted" : 1,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
Surprisingly enough, it is rather painful to store the BulkWriteError in a collection. One easy way of doing (not necessary an elegant way, though) is to parse the JSON representation of the error to get back the field(s) that interest you.
> db.errlog.insert(JSON.parse(err.tojson()).writeErrors)
// ^^^^^^^^^^^^^^^^^^^^^^^^
// parse the JSON representation of `BulkWriteError`
That way, you get back the array of write errors, that insert will happily store in the collection:
> db.errlog.find().pretty()
{
"_id" : ObjectId("55619737c0c8238aef6e21c5"),
"index" : 0,
"code" : 11000,
"errmsg" : "E11000 duplicate key error index: test.w.$_id_ dup key: { : 1.0 }",
"op" : {
"_id" : 1
}
}
Related
I have this simple update api invocation :
this is my document :
{
"_id" : ObjectId("577a5b9a89xxx32a1"),
"oid" : {
"a" : 0,
"b" : 0,
"c" : NumberLong("1260351143035")
},
"sessions" : [
{
}
]
}
Then i try to insert 1 element into sessions array :
db.getCollection('CustomerInfo').update({"oid.c":1260351143035},{$push:{"sessions.$.asessionID":"test123"}})
but im getting this error:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16837,
"errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: sessions.$.asessionID"
}
})
using $set im getting the same error
As the error implies,
"The positional operator did not find the match needed from the query.
Unexpanded update: sessions.$.asessionID",
the positional operator will work if the array to be updated is also part of the query. In your case, the query only involves the embedded document oid. The best update operator to use in your case is the $set instead.
You can include the sessions array in the query, for example:
db.getCollection('CustomerInfo').update(
{
"oid.c": 1260351143035,
"sessions.0": {} // query where sessions array first element is an empty document
/* "sessions.0": { "$exists": true } // query where sessions array first element exists */
},
{
"$set": { "sessions.$.asessionID": "test123" }
}
)
As the documentation says, you can do as the follow:
db.getCollection('CustomerInfo').update(
{ "oid.c": 1260351143035 },
{ $push: {
"sessions": {
"asessionID":"test123"
}
}
}
)
I have this mongodb data i have stored in this way
db.orders.insert( { _id: ObjectId().str, name: "admin", status: "online",catalog : [
{
"objectid" : ObjectId().str,
"message" : "sold",
"status" : "open"
}
]})
and i am trying to update it in this manner db.orders.update({"_id":"5703b86df3d607cb5fa75ff3"},{$set: {"catalog.message": "added to cart"}})
and this is the error message i am getting
> db.orders.update({"_id":"5703b86df3d607cb5fa75ff3"},{$set: {"catalog.message": "added to cart"}})
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16837,
"errmsg" : "cannot use the part (catalog of catalog.message) to traverse the element ({catalog: [ { objectid: \"5703b86df3d607cb5fa75ff4\", message: \"sold\", status: \"open\" } ]})"
}
})
How can i update this record?.
According to MongoDB's Examples you should provide the index of the Object in the array which you want to update. Since you want to update the first object i.e. object in the array at index 0 use this :
db.orders.update(
{"_id":"5703b86df3d607cb5fa75ff3"},
{$set: {"catalog.0.message": "added to cart"}});
I am using mongodb (v2.6.7) and mongo (2.6.7) shell client.
I am trying to use the WriteResult object returned by the insert and and update commands.
According to mongodocs in case of error it returns a writeResult object with writeError subdocument. But I am not able to access this subdocument in shell or in mongo's javascript file.
Below is an illustration of my problem.
I insert an object and get the successful writeResult.
Then I am inserting again with the same _id and I am getting the writeResult correctly printed on the screen which has "writeError" correctly set.
Also the writeResult object seems to contain the writeError when I print it with printjson() method
But when I print it with JSON.stringify() then I am not able to see the "writeError".
Also writeResult.writeError is coming as undefined so I am not able to access its writeResult.writeError.code property.
Can someone please explain why is this happening and what is the correct way.
afv:PRIMARY>writeResult=db.sysinfo.insert({_id:"myid",otherData:"otherDataValue"})
WriteResult({ "nInserted" : 1 })
afv:PRIMARY>writeResult=db.sysinfo.insert({_id:"myid",otherData:"otherDataValue"})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: afvadmin.sysinfo.$_id_ dup key: { : \"myid\" }"
}
})
afv:PRIMARY> printjson(writeResult)
{
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: afvadmin.sysinfo.$_id_ dup key: { : \"myid\" }"
}
}
afv:PRIMARY> JSON.stringify(writeResult);
{"nInserted":0,"nUpserted":0,"nMatched":0,"nModified":0,"nRemoved":0}
afv:PRIMARY> writeResult.writeError
afv:PRIMARY> writeResult.nInserted
0
afv:PRIMARY> writeResult.writeError.code
2015-02-02T16:34:42.402+0530 TypeError: Cannot read property 'code' of undefined
afv:PRIMARY> writeResult["writeError"]
I don't know why it was implemented this way, but to access the contained writeError subdocument, you call getWriteError() on a WriteResult object:
> writeResult.getWriteError()
WriteError({
"index": 0,
"code": 11000,
"errmsg": "insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.test.$_id_ dup key: { : \"myid\" }",
"op": {
"_id": "myid",
"otherData": "otherDataValue"
}
})
> writeResult.getWriteError().code
11000
I couldn't find any documentation for this. I figured it out by hitting Tab twice after typing writeResult. in the shell to see what's available.
In addition to using the methods to get the WriteError object as mentioned already, you can also get the raw response and work on that:
> writeResult.getRawResponse()
{
"writeErrors" : [
{
"index" : 0,
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: foo.bar.$_id_ dup key: { : ObjectId('54cf8152733aa5e886f0e400') }",
"op" : {
"_id" : ObjectId("54cf8152733aa5e886f0e400"),
"a" : 1
}
}
],
"writeConcernErrors" : [ ],
"nInserted" : 0,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
}
So, you can then do something like this and things are a bit more consistent:
> raw = writeResult.getRawResponse();
> raw.writeErrors[0].errmsg
insertDocument :: caused by :: 11000 E11000 duplicate key error index: foo.bar.$_id_ dup key: { : ObjectId('54cf8152733aa5e886f0e400') }
> printjson(raw.writeErrors[0])
{
"index" : 0,
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: foo.bar.$_id_ dup key: { : ObjectId('54cf8152733aa5e886f0e400') }",
"op" : {
"_id" : ObjectId("54cf8152733aa5e886f0e400"),
"a" : 1
}
}
> JSON.stringify(raw.writeErrors[0])
{"code":11000,"index":0,"errmsg":"insertDocument :: caused by :: 11000 E11000 duplicate key error index: foo.bar.$_id_ dup key: { : ObjectId('54cf8152733aa5e886f0e400') }"}
I am using mongoDb 2.6.4 and still getting an error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422033698000, 105),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Reading different advices I came across of saving result in another collection using $out. My query looks like this now:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" } ,
{
$out : "tmp"
}]
)
But I am getting different error:
uncaught exception: aggregate failed:
{"errmsg" : "exception: insert for $out failed: { lastOp: Timestamp 1422034172000|25, connectionId: 625789, err: \"insertDocument :: caused by :: 11000 E11000 duplicate key error index: duties_and_taxes.tmp.agg_out.5.$_id_ dup key: { : ObjectId('54c12d784c1b2a767b...\", code: 11000, n: 0, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1422034172000|25, electionId: ObjectId('542c2900de1d817b13c8d339') } }",
"code" : 16996,
"ok" : 0,
"$gleStats" : {
"lastOpTime" : Timestamp(1422034172000, 26),
"electionId" : ObjectId("542c2900de1d817b13c8d339")
}
}
Can someone has a solution?
The error is due to the $unwind step in your pipeline.
When you unwind by a field having n elements, n copies of the same documents are produced with the same _id. Each copy having one of the elements from the array that was used to unwind. See the below demonstration of the records after an unwind operation.
Sample demo:
> db.t.insert({"a":[1,2,3,4]})
WriteResult({ "nInserted" : 1 })
> db.t.aggregate([{$unwind:"$a"}])
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 1 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 2 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 3 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 4 }
>
Since all these documents have the same _id, you get a duplicate key exception(due to the same value in the _id field for all the un-winded documents) on insert into a new collection named tmp.
The pipeline will fail to complete if the documents produced by the
pipeline would violate any unique indexes, including the index on the
_id field of the original output collection.
To solve your original problem, you could set the allowDiskUse option to true. It allows, using the disk space whenever it needs to.
Optional. Enables writing to temporary files. When set to true,
aggregation operations can write data to the _tmp subdirectory in the
dbPath directory. See Perform Large Sort Operation with External Sort
for an example.
as in:
db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
$lt : ISODate("2015-01-23T00:00:00.000Z")
}
}
},
{ $unwind : "$data.items" }] , // note, the pipeline ends here
{
allowDiskUse : true
});
I have a collection that stores search query logs. It's two main attributes are user_id and search_query. user_id is null for a logged out user. I am trying to run a mapreduce job to find out the count and terms per user.
var map = function(){
if(this.user_id !== null){
emit(this.user_id, this.search_query);
}
}
var reduce = function(id, queries){
return Array.sum(queries + ",");
}
db.searchhistories.mapReduce(map,
reduce,
{
query: { "time" : {
$gte : ISODate("2013-10-26T14:40:00.000Z"),
$lt : ISODate("2013-10-26T14:45:00.000Z")
}
},
out : "mr2"
}
)
throws the following exception
Wed Nov 27 06:00:07 uncaught exception: map reduce failed:{
"errmsg" : "exception: assertion src/mongo/db/commands/mr.cpp:760",
"code" : 0,
"ok" : 0
}
I looked at mr.cpp L#760 but could not gather any vital information. What could be causing this?
My Collection has values like
> db.searchhistories.find()
{ "_id" : ObjectId("5247a9e03815ef4a2a005d8b"), "results" : 82883, "response_time" : 0.86, "time" : ISODate("2013-09-29T04:17:36.768Z"), "type" : 0, "user_id" : null, "search_query" : "awareness campaign" }
{ "_id" : ObjectId("5247a9e0606c791838005cba"), "results" : 39545, "response_time" : 0.369, "time" : ISODate("2013-09-29T04:17:36.794Z"), "type" : 0, "user_id" : 34225174, "search_query" : "eficaz eficiencia efectividad" }
Looking at the docs I could see that this is not possible in the slave. It will work perfectly fine in the master though. If you still want to use the slave then you have to use the following syntax.
db.searchhistories.mapReduce(map,
reduce,
{
query: { "time" : {
$gte : ISODate("2013-10-26T14:40:00.000Z"),
$lt : ISODate("2013-10-26T14:45:00.000Z")
}
},
out : { inline : 1 }
}
)
** Ensure that the output document size does not exceed 16MB limit while using inline function.