How to use upsert

How to use upsert - mongodb

I have collection with documents like this
[
{ _id : ObjectId("xxxxxx") , u_id : 5 , name : "E" , comment : [1,2] },
{ _id : ObjectId("yyyyyy") , u_id : 4 , name : "D" , comment : [] },
{ _id : ObjectId("zzzzzz") , u_id : 3 , name : "C" , comment : [1,2] },
{ _id : ObjectId("aaaaaa") , u_id : 2 , name : "B" , comment : [1,2] },
{ _id : ObjectId("bbbbbb") , u_id : 1 , name : "A" , comment : [1] },
]
Now I have an array of documents prepare to Insert or Update to this collection like this
var multi_document =
[
{ u_id : 8 , name : "H" , comment : [1,2] }, //Insert new document
{ u_id : 7 , name : "G" , comment : [] }, //Insert new document
{ u_id : 6 , name : "F" , comment : [1,2] }, //Insert new document
{ u_id : 5 , name : "E" , comment : [1,2,3] }, //update [1,2] to [1,2,3]
{ u_id : 4 , name : "DD" , comment : [1] }, //update D to DD and [] to [1]
{ u_id : 3 , name : "C" , comment : [1,2] }, //not do anything it same original
];
Can I use db.collection.update(multi_document); ? If not , What should I do?
This is the expected result:
[
{ _id : ObjectId("db_created") , u_id : 8 , name : "H" , comment : [1,2] },
{ _id : ObjectId("db_created") , u_id : 7 , name : "G" , comment : [] },
{ _id : ObjectId("db_created") , u_id : 6 , name : "F" , comment : [1,2] },
{ _id : ObjectId("xxxxxx") , u_id : 5 , name : "E" , comment : [1,2,3] },
{ _id : ObjectId("yyyyyy") , u_id : 4 , name : "DD" , comment : [1] },
{ _id : ObjectId("zzzzzz") , u_id : 3 , name : "C" , comment : [1,2] },
{ _id : ObjectId("aaaaaa") , u_id : 2 , name : "B" , comment : [1,2] },
{ _id : ObjectId("bbbbbb") , u_id : 1 , name : "A" , comment : [1] },
]

The best way to do this is using the "Bulk" API.
First you need to loop over your multi_document array and for each document find in your collection, documents with same u_id. To that we need to use the bulk.find.upsert method which sets the upsert to true and then use the .update method which specifies the fields to update, here comment. In the update document you need to use the $addToSet operator to ensure that there are not duplicate in your comment field and the $each modifier because comment is an array.
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
multi_document.forEach( function (doc) {
bulk.find({ "u_id": doc.u_id })
.upsert()
.update({
"$set": { "name": doc.name },
"$addToSet": { "comment": { "$each": doc.comment }}
});
count++;
if ( count % 1000 == 0 ) {
// Execute per 1000 operations and re-init.
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
})
// Clean up queues
if ( count % 1000 != 0 )
bulk.execute();

You could use Bulk() http://docs.mongodb.org/manual/reference/method/Bulk/
With an ordered operations list, MongoDB executes the write operations in the list serially. If an error occurs during the processing of one of the write operations, MongoDB will return without processing any remaining write operations in the list.
Interesting Blog Article about the Bulk API: http://blog.mongodb.org/post/84922794768/mongodbs-new-bulk-api

Related

MongoDB update latest subdocument

here is my mongo document..
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
}
]
}
i want to update and upsert invoice_id into last element of sub-array..
i have tried..
sort: {$natural: -1},
subscription.$.invoice
what i want it to be is....
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8d"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}

While there are ways to get the last array element, like Saravana shows in her answer, I don't recommend doing it that way because it introduces race conditions. For example, if two subs are added simultaneously, you can't depend on which one is 'last' in the array.
If an invoice_id has to be tied to a specific sub_id, then it's far better to query and find that specific element in the array, then add the invoice_id to it.
In the comments, the OP indicated that the current order of operations is 1) add sub_id, 2) insert the invoice record into the INVOICE collection and get the invoice_id, 3) add the invoice_id into the new subscription.
However, if you already have the sub_id, then it's better to re-order your operations this way: 1) insert the invoice record and get the invoice_id 2) add both sub_id and invoice_id with a single operation.
Doing this improves performance (eliminates the second update operation), but more importantly, eliminates race conditions because you're adding both sub_id and invoice_id at the same time.

we can get the document and update last element by index
> var doc = db.sub.findOne({"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8")})
> if ( doc.subscriptions.length - 1 >= 0 )
doc.subscriptions[doc.subscriptions.length-1].invoice_id="5a56fd399dd78e33948c9b8f"
> db.sub.update({_id:doc._id},doc)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
or write an aggregation pipeline to form the document and use it for update
db.sub.aggregate(
[
{$match : { "_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8") }},
{$addFields : { last : { $subtract : [{$size : "$subscriptions"},1]}}},
{$unwind : { path :"$subscriptions" , includeArrayIndex : "idx"}},
{$project : { "subscriptions.sub_id" : 1,
"subscriptions.invoice_id" : {
$cond : {
if: { $eq: [ "$idx", "$last" ] },
then: "5a56fd399dd78e33948c9b8f",
else: "$$REMOVE"
}
}
}
},
{$group : {_id : "$_id", subscriptions : {$push : "$subscriptions"}}}
]
).pretty()
result doc
{
"_id" : ObjectId("5a69d0acb76d1c2e08e4ccd8"),
"subscriptions" : [
{
"sub_id" : "5a56fd399dd78e33948c9b8e"
},
{
"sub_id" : "5a56fd399dd78e33948c9b8e",
"invoice_id" : "5a56fd399dd78e33948c9b8f"
}
]
}

OrderBy in mongo db to get some thing like this

I am very new to mongodb. Please help me get a solution for this.
The below are the documents in mongo db :
{name : sass, city : banga},
{name : pass, city : banga},
{name : kanga, city : runga},
{name : jass, city : canga},
{name : dass, city : tunga}
I want to query with city as 'banga', so that the result array should contain documents with city 'banga' first and then all other documents.
Thanks.

Please try this. The below query will give you "banga" at the top and all other documents after that.
Note:-
Please note that the query doesn't sort the data by city. It just return the "banga" documents at the top. The documents where "city" not equals to "banga" could be in any order after that.
Query:-
You may need to change the collection name accordingly in the below query.
db.collection.aggregate([
{ $project: {_id : 1, "name": 1, "city" : 1,
isRequiredCity: { $cond: { if: { $eq: [ "$city", "banga" ] }, then: 0, else: 1 } }} },
{$sort : {"isRequiredCity" : 1} }
]);
Output:-
/* 1 */
{
"_id" : ObjectId("58342ccaba41f1f22e600c67"),
"name" : "sass",
"city" : "banga",
"isRequiredCity" : 0
}
/* 2 */
{
"_id" : ObjectId("58342ccaba41f1f22e600c6a"),
"name" : "pass",
"city" : "banga",
"isRequiredCity" : 0
}
/* 3 */
{
"_id" : ObjectId("58342ccaba41f1f22e600c68"),
"name" : "kanga",
"city" : "runga",
"isRequiredCity" : 1
}
/* 4 */
{
"_id" : ObjectId("58342ccaba41f1f22e600c69"),
"name" : "jass",
"city" : "canga",
"isRequiredCity" : 1
}
/* 5 */
{
"_id" : ObjectId("58342ccaba41f1f22e600c6b"),
"name" : "dass",
"city" : "tunga",
"isRequiredCity" : 1
}

An option for that is to add a specific field to all the documents. The new field will identify by a unique value all the documents contains city : banga.
Than you can order all the collection based on that values.
For example, lets add to all the collection the unique field and a unique value:
db.collection.update({'city':'banga'}, {$set:{"value":10}}, {upsert:false, multi:true});
db.collection.update({'city': { $ne: 'banga'}}, {$set:{"value":1}}, {upsert:false, multi:true});
Now, each document with field 'city'=='banga' has value = 10, and each document with field 'city'!='banga' has value = 1.
And now you can sort the documents in the collection using the value field.
db.collection.find({}).sort({'value': -1})

Remove Duplicates from MongoDB

hi I have a ~5 million documents in mongodb (replication) each document 43 fields. how to remove duplicate document. I tryed
db.testkdd.ensureIndex({
duration : 1 , protocol_type : 1 , service : 1 ,
flag : 1 , src_bytes : 1 , dst_bytes : 1 ,
land : 1 , wrong_fragment : 1 , urgent : 1 ,
hot : 1 , num_failed_logins : 1 , logged_in : 1 ,
num_compromised : 1 , root_shell : 1 , su_attempted : 1 ,
num_root : 1 , num_file_creations : 1 , num_shells : 1 ,
num_access_files : 1 , num_outbound_cmds : 1 , is_host_login : 1 ,
is_guest_login : 1 , count : 1 , srv_count : 1 ,
serror_rate : 1 , srv_serror_rate : 1 , rerror_rate : 1 ,
srv_rerror_rate : 1 , same_srv_rate : 1 , diff_srv_rate : 1 ,
srv_diff_host_rate : 1 , dst_host_count : 1 , dst_host_srv_count : 1 ,
dst_host_same_srv_rate : 1 , dst_host_diff_srv_rate : 1 ,
dst_host_same_src_port_rate : 1 , dst_host_srv_diff_host_rate : 1 ,
dst_host_serror_rate : 1 , dst_host_srv_serror_rate : 1 ,
dst_host_rerror_rate : 1 , dst_host_srv_rerror_rate : 1 , lable : 1
},
{unique: true, dropDups: true}
)
run this code i get a error "errmsg" : "namespace name generated from index ..
{
"ok" : 0,
"errmsg" : "namespace name generated from index name \"project.testkdd.$duration_1_protocol_type_1_service_1_flag_1_src_bytes_1_dst_bytes_1_land_1_wrong_fragment_1_urgent_1_hot_1_num_failed_logins_1_logged_in_1_num_compromised_1_root_shell_1_su_attempted_1_num_root_1_num_file_creations_1_num_shells_1_num_access_files_1_num_outbound_cmds_1_is_host_login_1_is_guest_login_1_count_1_srv_count_1_serror_rate_1_srv_serror_rate_1_rerror_rate_1_srv_rerror_rate_1_same_srv_rate_1_diff_srv_rate_1_srv_diff_host_rate_1_dst_host_count_1_dst_host_srv_count_1_dst_host_same_srv_rate_1_dst_host_diff_srv_rate_1_dst_host_same_src_port_rate_1_dst_host_srv_diff_host_rate_1_dst_host_serror_rate_1_dst_host_srv_serror_rate_1_dst_host_rerror_rate_1_dst_host_srv_rerror_rate_1_lable_1\" is too long (127 byte max)",
"code" : 67
}
how can solve the problem ?

The "dropDups" syntax for index creation has been "deprecated" as of MongoDB 2.6 and removed in MongoDB 3.0. It is not a very good idea in most cases to use this as the "removal" is arbitrary and any "duplicate" could be removed. Which means what gets "removed" may not be what you really want removed.
Anyhow, you are running into an "index length" error since the value of the index key here would be longer that is allowed. Generally speaking, you are not "meant" to index 43 fields in any normal application.
If you want to remove the "duplicates" from a collection then your best bet is to run an aggregation query to determine which documents contain "duplicate" data and then cycle through that list removing "all but one" of the already "unique" _id values from the target collection. This can be done with "Bulk" operations for maximum efficiency.
NOTE: I do find it hard to believe that your documents actually contain 43 "unique" fields. It is likely that "all you need" is to simply identify only those fields that make the document "unique" and then follow the process as outlined below:
var bulk = db.testkdd.initializeOrderedBulkOp(),
count = 0;
// List "all" fields that make a document "unique" in the `_id`
// I am only listing some for example purposes to follow
db.testkdd.aggregate([
{ "$group": {
"_id": {
"duration" : "$duration",
"protocol_type": "$protocol_type",
"service": "$service",
"flag": "$flag"
},
"ids": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } } }
],{ "allowDiskUse": true}).forEach(function(doc) {
doc.ids.shift(); // remove first match
bulk.find({ "_id": { "$in": doc.ids } }).remove(); // removes all $in list
count++;
// Execute 1 in 1000 and re-init
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.testkdd.initializeOrderedBulkOp();
}
});
if ( count % 1000 != 0 )
bulk.execute();
If you have a MongoDB version "lower" than 2.6 and don't have bulk operations then you can try with standard .remove() inside the loop as well. Also noting that .aggregate() will not return a cursor here and the looping must change to:
db.testkdd.aggregate([
// pipeline as above
]).result.forEach(function(doc) {
doc.ids.shift();
db.testkdd.remove({ "_id": { "$in": doc.ids } });
});
But do make sure to look at your documents closely and only include "just" the "unique" fields you expect to be part of the grouping _id. Otherwise you end up removing nothing at all, since there are no duplicates there.

Mongo DB find method confusion in Implicit AND

I am new to mongodb I have started to learn basic syntax recently. I was trying operators with find method, and I got a confusing case while trying Implicit AND.
My Collection mathtable having 400 documents is as follows:
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b2") , "index" : 1 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b3") , "index" : 2 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b4") , "index" : 3 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b5") , "index" : 4 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b6") , "index" : 5 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b7") , "index" : 6 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b8") , "index" : 7 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4b9") , "index" : 8 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4ba") , "index" : 9 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4bb") , "index" : 10 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4bc") , "index" : 11 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4bd") , "index" : 12 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4be") , "index" : 13 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4bf") , "index" : 14 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4c0") , "index" : 15 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4c1") , "index" : 16 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4c2") , "index" : 17 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4c3") , "index" : 18 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4c4") , "index" : 19 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4c5") , "index" : 20 }
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4d1") , "index" : 1 }
..
..
{ "_id" : ObjectId("540efc2bd8af78d9b0f5d4z5") , "index" : 20 }
There are 400 rows in mathtable collection:
Value of index ranges from 1 to 20.
For each value of index there are 20 entries with different _id value.
I am trying below two operations and expecting same results, considering that they are both implicit AND cases.
Calculating even index values having a value greater than 5.
Using classic EXPLICIT AND ( results into 160 records ) :
db.mathtable.count({
$and: [
{ index: { $mod: [2,0] } },
{ index: { $gt: 5 } }
]
});
Using variable name only once ( results into 160 records ) :
db.mathtable.count({
index : { $mod : [2,0] , $gt:5 }
});
Using field name with every condition ( results into 300 records ):
db.mathtable.find({
index : { $mod : [2,0]} ,
index : {$gt:5}
});
Using field name with every condition, conditions in opposite order ( results into 200 records ):
db.mathtable.find({
index : {$gt:5} ,
index : { $mod : [2,0]}
});
There is no mention of implicit OR in mongoDB documentation( or at-least I did not find a direct reference like implicit AND ) .
I was expecting same count of records ( 160 ) in both cases. I am unable to understand why above codes are behaving differently.
Also, order of condition specification results into different number of results. As per observation, only the last condition specified in find was applied, when same field was specified multiple times. That is weird and incorrect.
NOTE: I am using Mongo-DB-2.6 and code is being executed on mongo shell that comes with the distribution.

Json or an associative array or a map does not contain duplicate keys:
db.mathtable.find({
index : { $mod : [2,0]} ,
index : {$gt:5}
});
The above will be considered equivalent to:
db.mathtable.find({
index : {$gt:5}
});
The first condition will be overwritten,
and the below,
db.mathtable.find({
index : {$gt:5} ,
index : { $mod : [2,0]}
});
will be equivalent to,
db.mathtable.find({
index : { $mod : [2,0]}
});
However in the first case,
db.mathtable.count({
$and: [
{ index: { $mod: [2,0] } },
{ index: { $gt: 5 } }
]
});
the $and takes two json documents as input and behaves as expected.
and in the second case, count takes a single document with no duplicate keys and behaves as expected.
db.mathtable.count({
index : { $mod : [2,0] , $gt:5 }
});
Hence the difference in the number of rows returned. Hope it is helpful.

MongoDB find where key equals string from array

I am trying to find in a collection all of the documents that have the given key equal to one of the strings in an array.
Heres an example of the collection.
{
roomId = 'room1',
name = 'first'
},
{
roomId = 'room2',
name = 'second'
},
{
roomId = 'room3',
name = 'third'
}
And heres an example of the array to look through.
[ 'room2', 'room3' ]
What i thought would work is...
collection.find({ roomId : { $in : [ 'room2', 'room3' ]}}, function( e, r )
{
// r should return the second and third room
});
How can i achieve this?
One way this could be solve would be to do a for loop...
var roomIds = [ 'room2', 'room3' ];
for ( var i=0; i < roomIds.length; i++ )
{
collection.find({ id : roomIds[ i ]})
}
But this is not ideal....

What you posted should work - no looping required. The $in operator does the job:
> db.Room.insert({ "_id" : 1, name: 'first'});
> db.Room.insert({ "_id" : 2, name: 'second'});
> db.Room.insert({ "_id" : 3, name: 'third'});
> // test w/ int
> db.Room.find({ "_id" : { $in : [1, 2] }});
{ "_id" : 1, "name" : "first" }
{ "_id" : 2, "name" : "second" }
> // test w/ strings
> db.Room.find({ "name" : { $in : ['first', 'third'] }});
{ "_id" : 1, "name" : "first" }
{ "_id" : 3, "name" : "third" }
Isn't that what you expect?
Tested w/ MongoDB 2.1.1