Mongo bulk find and update matched documents field in single query? - mongodb

I have an image gallery with meta stored in mongo. Every time the web site query matches some number of stored images i would like the documents shown counter field to be incremented. Using findAndModify would work OK for single document but i can't see a nice way to match multiple documents and update them all.
Is this possible with latest version of mongo? Or any recommend best practices to achieve this ?
thanks
fLo
The document format is very simple
{
"name" : "img name",
"description" : "some more info",
"size" : "img size in bytes",
"shown" : "count of times the image was selected by query",
"viewed" : "count of times the image was clicked"
}
And the query is a simple find, then use cursor to loop over results and bump the shown count using document id.. i.e.
db.images.update(
{ _id: "xxxx" },
{ $inc: { shown: 1 } }
)
But i would prefer not to get 100 documents then have to loop over each to update individually. Was hoping to perform find and update in single query.

For improved performance, take advantage of using a Bulk() API for updating the collection efficiently in bulk as you will be sending the operations to the server in batches (for example, say a batch size of 500). This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.
The following demonstrates this approach, the first example uses the Bulk() API available in MongoDB versions >= 2.6 and < 3.2. It updates all the matched documents in the collection from a given array by incrementing 1 to the shown field. It assumes the array of images has the structure
var images = [
{ "_id": 1, "name": "img_1.png" },
{ "_id": 2, "name": "img_2.png" }
{ "_id": 3, "name": "img_3.png" },
...
{ "_id": n, "name": "img_n.png" }
]
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.images.initializeUnorderedBulkOp(),
counter = 0;
images.forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$inc": { "shown": 1 }
});
counter++;
if (counter % 500 === 0) {
// Execute per 500 operations
bulk.execute();
// re-initialize every 500 update statements
bulk = db.images.initializeUnorderedBulkOp();
}
})
// Clean up remaining queue
if (counter % 500 !== 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite().
MongoDB version 3.2 and greater:
var ops = [];
images.forEach(function(doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$inc": { "shown": 1 }
}
}
});
if (ops.length === 500 ) {
db.images.bulkWrite(ops);
ops = [];
}
})
if (ops.length > 0)
db.images.bulkWrite(ops);

Related

Using $sum on a existent field returns a value of 0 [duplicate]

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

How to change the datatype of a value of mongodb array?

Having the following document:
{
"_id" : 1.0000000000000000,
"l" : [
114770288670819.0000000000000000,
NumberLong(10150097174480584)
]}
How can I convert the second elemento of the "l" array from NumberLong to flating-point (double) such as the first element.
Well as it seems that as other responses are either highly vague in execution or suggest writing a new collection or other "unsafe" operations, then I suppose a response that you should follow is in order.
It's acutally not a simple problem to solve ( though others have discounted it as such ) when you take in all the considerations such as "keeping the order of elements" and "not possibly overwriting other changes to the document", which are important considerations in a production system.
The only real "safe(ish)" way that I can see of doing this without possiblly (mostly) blowing away your whole array and loosing any "active" writes is with a construct like this ( using Bulk Operations for efficiency ):
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "l": { "$type": 18 }}).forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$pull": { "l": { "$in": doc.l } }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
doc.l.map(function(l,idx) {
return { "l": parseFloat(l.valueOf()), "idx": idx }
}).forEach(function(l) {
bulk.find({ "_id": doc._id }).updateOne({
"$push": { "l": { "$each": [l.l], "$position": l.idx } }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
});
});
if ( count % 1000 != 0 )
bulk.execute();
This is meant to "bulk" rewrite the data in your entire collection where needed, but on your single sample the result is:
{ "_id" : 1, "l" : [ 114770288670819, 10150097174480584 ] }
So the basics of that is that this will find any document in your collection that matches the current $type for a NumberLong ( 64bit integer in BSON types ) and process the results in the following way.
First remove all the elements from the array with the values of the elements already there.
Convert all elements and add them back into the array at the same position they were found in.
Now I say "safe(ish)" as while this is mostly "safe" in considering that your documents could well be being updated and having new items appended to the array, or indeed other alterations happening to the document, there is one possible problem here.
If your array "values" here are not truly "unique" then there is no real way to "pull" the items that need to be changed and also re-insert them at the same position where they originally occurred. Not in an "atomic" and "safe" way at any rate.
So the "risk" this process as shown runs, is that "if" you happened to append a new array element that has the same value as another existing element to the array ( and only when this happens between the cursor read for the document and before the bulk execution ), then you run the "risk" of that new item being removed from the array entirely.
If however all values in your array are completely "unique" per document, then this will work "perfectly", and just transpose everything back into place with converted types, and no other "risk" to missing other document updates or important actions.
It's not a simple statement, because the logic demands you take the precaution. But it cannot fail within the contraints already mentioned. And it runs on the collection you have now, without producing a new one.
load the document in the mongo shell
var doc = db.collection.find({ _id:1.0000000000000000});
explicitly convert the field to a floating point number using the parseFloat function:
doc.l[1] = parseFloat(doc.l[1]);
save the document back into the database
db.collection.save(doc);
For updating this value you should used either programming language code or some work around using java script.
Other wise use aggregation with $out which write a documents in new collection like below :
db.collectionName.aggregate({
"$unwind": "$l"
}, {
"$project": {
"l1": {
"$divide": ["$l", 1] // divide by one or multiply by one
}
}
}, {
"$group": {
"_id": "$_id"
, "l": {
"$push": "$l1"
}
}
}, {
"$out": "newCollectionName" // write data in new collection
})
If done above query then drop your old collection and used this new collection.

Rename a sub-document field within an Array

Considering the document below how can I rename 'techId1' to 'techId'. I've tried different ways and can't get it to work.
{
"_id" : ObjectId("55840f49e0b"),
"__v" : 0,
"accessCard" : "123456789",
"checkouts" : [
{
"user" : ObjectId("5571e7619f"),
"_id" : ObjectId("55840f49e0bf"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"techId1" : ObjectId("553d9cbcaf")
},
{
"user" : ObjectId("5571e7619f15"),
"_id" : ObjectId("55880e8ee0bf"),
"date" : ISODate("2015-06-22T13:01:51.672Z"),
"techId1" : ObjectId("55b7db39989")
}
],
"created" : ISODate("2015-06-19T12:47:05.422Z"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"location" : ObjectId("55743c8ddbda"),
"model" : "model1",
"order" : ObjectId("55840f49e0bf"),
"rid" : "987654321",
"serialNumber" : "AHSJSHSKSK",
"user" : ObjectId("5571e7619f1"),
"techId" : ObjectId("55b7db399")
}
In mongo console I tried which gives me ok but nothing is actually updated.
collection.update({"checkouts._id":ObjectId("55840f49e0b")},{ $rename: { "techId1": "techId" } });
I also tried this which gives me an error. "cannot use the part (checkouts of checkouts.techId1) to traverse the element"
collection.update({"checkouts._id":ObjectId("55856609e0b")},{ $rename: { "checkouts.techId1": "checkouts.techId" } })
In mongoose I have tried the following.
collection.findByIdAndUpdate(id, { $rename: { "checkouts.techId1": "checkouts.techId" } }, function (err, data) {});
and
collection.update({'checkouts._id': n1._id}, { $rename: { "checkouts.$.techId1": "checkouts.$.techId" } }, function (err, data) {});
Thanks in advance.
You were close at the end, but there are a few things missing. You cannot $rename when using the positional operator, instead you need to $set the new name and $unset the old one. But there is another restriction here as they will both belong to "checkouts" as a parent path in that you cannot do both at the same time.
The other core line in your question is "traverse the element" and that is the one thing you cannot do in updating "all" of the array elements at once. Well, not safely and without possibly overwriting new data coming in anyway.
What you need to do is "iterate" each document and similarly iterate each array member in order to "safely" update. You cannot really iterate just the document and "save" the whole array back with alterations. Certainly not in the case where anything else is actively using the data.
I personally would run this sort of operation in the MongoDB shell if you can, as it is a "one off" ( hopefully ) thing and this saves the overhead of writing other API code. Also we're using the Bulk Operations API here to make this as efficient as possible. With mongoose it takes a bit more digging to implement, but still can be done. But here is the shell listing:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "checkouts.techId1": { "$exists": true } }).forEach(function(doc) {
doc.checkouts.forEach(function(checkout) {
if ( checkout.hasOwnProperty("techId1") ) {
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$set": { "checkouts.$.techId": checkout.techId1 }
});
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$unset": { "checkouts.$.techId1": 1 }
});
count += 2;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
}
});
});
if ( count % 500 !== 0 )
bulk.execute();
Since the $set and $unset operations are happening in pairs, we are keeping the total batch size to 1000 operations per execution just to keep memory usage on the client down.
The loop simply looks for documents where the field to be renamed "exists" and then iterates each array element of each document and commits the two changes. As Bulk Operations, these are not sent to the server until the .execute() is called, where also a single response is returned for each call. This saves a lot of traffic.
If you insist on coding with mongoose. Be aware that a .collection acessor is required to get to the Bulk API methods from the core driver, like this:
var bulk = Model.collection.inititializeOrderedBulkOp();
And the only thing that sends to the server is the .execute() method, so this is your only execution callback:
bulk.exectute(function(err,response) {
// code body and async iterator callback here
});
And use async flow control instead of .forEach() such as async.each.
Also, if you do that, then be aware that as a raw driver method not governed by mongoose, you do not get the same database connection awareness as you do with mongoose methods. Unless you know for sure the database connection is already established, it is safter to put this code within an event callback for the server connection:
mongoose.connection.on("connect",function(err) {
// body of code
});
But otherwise those are the only real ( apart from call syntax ) alterations you really need.
This worked for me, I created this query to perform this procedure and I share it, (although I know it is not the most optimized way):
First, make an aggregate that (1) $match the documents that have the checkouts array field with techId1 as one of the keys of each sub-document. (2) $unwind the checkouts field (that deconstructs the array field from the input documents to output a document for each element), (3) adds the techId field (with $addFields), (4) $unset the old techId1 field, (5) $group the documents by _id to have again the checkout sub-documents grouped by its _id, and (6) write the result of these aggregation in a temporal collection (with $out).
const collection = 'yourCollection'
db[collection].aggregate([
{
$match: {
'checkouts.techId1': { '$exists': true }
}
},
{
$unwind: {
path: '$checkouts'
}
},
{
$addFields: {
'checkouts.techId': '$checkouts.techId1'
}
},
{
$project: {
'checkouts.techId1': 0
}
},
{
$group: {
'_id': '$_id',
'checkouts': { $push: { 'techId': '$checkouts.techId' } }
}
},
{
$out: 'temporal'
}
])
Then, you can make another aggregate from this temporal collection to $merge the documents with the modified checkouts field to your original collection.
db.temporal.aggregate([
{
$merge: {
into: collection,
on: "_id",
whenMatched:"merge",
whenNotMatched: "insert"
}
}
])

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

Change schema of an object in mongodb with update

I made a csv import from a huge ms excel sheet table. So the collection is one level. I need to change a few column so they are 2 levels.
Example, this is from the csv-import.
{
title: 'House A',
energyElectricity: 55,
energyHeat: 35,
energyCooling:45
}
This is not good. I want this in the following format:
{
title: 'House A',
energy: {
electricity: 55,
heat: 35,
cooling:45
}
}
Is there anyway to do this with an update query?
I tried some stuff but no luck.
Some pseudo code here:
db.consumers.update({}, {energy.electricity: energyElectricity, energy.heat:energyHeat}, {multi:true});
There really is no other way to do this other than looping the results as it is presently not possible to refer to any existing fields of a document during an update operation.
So your basic construct needs to look something like ( in whatever language ):
db.collection.find({}).forEach(function(doc) {
db.collection.update(
{ "_id": doc._id },
{
"title": doc.title,
"energy": {
"electricity": doc.energyElectricty,
"heat": doc.energyHeat,
"cooling": doc.energyCooling
}
}
);
});
You could do this a little more efficiently with "bulk updates" as available from MongoDB 2.6 and upwards:
var batch = [];
var count = 0;
db.collection.find({}).forEach(function(doc) {
batch.push({
"q": { "_id": doc._id },
"u": {
"title": doc.title,
"energy": {
"electricity": doc.energyElectricty,
"heat": doc.energyHeat,
"cooling": doc.energyCooling
}
}
});
count++;
if ( count % 500 == 0 ) {
db.runCommand({ "update": "collection", "updates": batch });
batch = [];
}
});
if ( batch.length > 0 ) {
db.runCommand({ "update": "collection", "updates": batch });
}
So while all updates are still being done over the wire, this does actually only send over the wire once per 500 ( or how many your feel comfortably sits under the 16MB BSON limit ) items.
Of course though, since you mention this came from a CSV import, you can always re-shape your input and import the collection again if that turns out to be a reasonable option.