I made a csv import from a huge ms excel sheet table. So the collection is one level. I need to change a few column so they are 2 levels.
Example, this is from the csv-import.
{
title: 'House A',
energyElectricity: 55,
energyHeat: 35,
energyCooling:45
}
This is not good. I want this in the following format:
{
title: 'House A',
energy: {
electricity: 55,
heat: 35,
cooling:45
}
}
Is there anyway to do this with an update query?
I tried some stuff but no luck.
Some pseudo code here:
db.consumers.update({}, {energy.electricity: energyElectricity, energy.heat:energyHeat}, {multi:true});
There really is no other way to do this other than looping the results as it is presently not possible to refer to any existing fields of a document during an update operation.
So your basic construct needs to look something like ( in whatever language ):
db.collection.find({}).forEach(function(doc) {
db.collection.update(
{ "_id": doc._id },
{
"title": doc.title,
"energy": {
"electricity": doc.energyElectricty,
"heat": doc.energyHeat,
"cooling": doc.energyCooling
}
}
);
});
You could do this a little more efficiently with "bulk updates" as available from MongoDB 2.6 and upwards:
var batch = [];
var count = 0;
db.collection.find({}).forEach(function(doc) {
batch.push({
"q": { "_id": doc._id },
"u": {
"title": doc.title,
"energy": {
"electricity": doc.energyElectricty,
"heat": doc.energyHeat,
"cooling": doc.energyCooling
}
}
});
count++;
if ( count % 500 == 0 ) {
db.runCommand({ "update": "collection", "updates": batch });
batch = [];
}
});
if ( batch.length > 0 ) {
db.runCommand({ "update": "collection", "updates": batch });
}
So while all updates are still being done over the wire, this does actually only send over the wire once per 500 ( or how many your feel comfortably sits under the 16MB BSON limit ) items.
Of course though, since you mention this came from a CSV import, you can always re-shape your input and import the collection again if that turns out to be a reasonable option.
Related
I have a document which is structured like this:
{
'item_id': '12345'
'total_score': 100,
'user_scores': {
'ABC': 40,
'DEF': 60
}
}
I'm using PyMongo, but documentation of MongoDB seems easily translatable across different distributions. With PyMongo, I could update user scores with:
collection.update_one(
{ 'item_id': '12345' },
{ '$set': { 'user_scores.GHI': 20 } },
upsert=True
)
Which results in this:
{
'item_id': '12345'
'total_score': 100,
'user_scores': {
'ABC': 40,
'DEF': 60,
'GHI': 20
}
}
The issue is of course that the total_score is now incorrect. I want that total score to update, so that in a future query, I can quickly ascertain the score of each result, and even sort by score.
One solution could be to find an existing document using find_one({'item_id: '12345'}), (create if it doesn't exist), then update with new scores, and update total score. The problem there is that I want to run thousands of these at the same time, and it's far more efficient to call bulk_write on a series of requests.
So, a better solution would be to do two sequential update requests:
request1 = UpdateOne(
{ 'item_id' : '12345' },
{ '$set': { 'user_scores.GHI': 20 } },
upsert = True
)
request2 = UpdateOne(
{ 'item_id' : '12345' },
{ '$set': { 'total_score': { '$sum': { '$values': 'user_scores' } } } },
upsert = True
)
The first request updates the user scores, same as before. The second request, there are two concepts going on. The syntax for this isn't correct, but here's what I'm trying to do:
I need to get the values from the user_scores dictionary. { '$values': 'user_scores' } is how I've tried to convey this.
That gives me an array of values. I know these are all numeric, so I now need to sum those, conveyed with { '$sum': { '$values': 'user_scores' } }.
I can run these batch updates consecutively, so there's no risk of summing the wrong thing. The danger with having a total_score field will always be that it isn't updated and thus doesn't contain the correct number. I'd imagine this is a common case with document-based models?
If you're using Mongo version 4.2+ they introduced a new feature: pipelined updates, Meaning now you can do what you want in one go:
db.collection.updateOne({ 'item_id' : '12345' },
[
{ '$set': { 'user_scores.GHI': 20 } },
{ '$set': { 'total_score': { '$sum': [ "$user_scores.GHI", "$user_scores.ABC", "$user_scores.GHI"] } } },,
]);
Unfortunately this is not possible for lesser Mongo versions hence if that is the case you'll have to keep using your solution which is splitting this into 2 actions.
EDIT:
For dynamic update we can use $map and $objectToArray like so:
db.collection.updateOne(
{'item_id': '12345'},
[
{'$set': {'user_scores.GHI': 20}},
{
'$set':
{
'total_score': {
'$sum': {
'$map': {
'input': {'$objectToArray': '$user_scores'},
'as': 'score',
'in': '$$score.v'
}
}
}
}
}
]);
I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.
I have an image gallery with meta stored in mongo. Every time the web site query matches some number of stored images i would like the documents shown counter field to be incremented. Using findAndModify would work OK for single document but i can't see a nice way to match multiple documents and update them all.
Is this possible with latest version of mongo? Or any recommend best practices to achieve this ?
thanks
fLo
The document format is very simple
{
"name" : "img name",
"description" : "some more info",
"size" : "img size in bytes",
"shown" : "count of times the image was selected by query",
"viewed" : "count of times the image was clicked"
}
And the query is a simple find, then use cursor to loop over results and bump the shown count using document id.. i.e.
db.images.update(
{ _id: "xxxx" },
{ $inc: { shown: 1 } }
)
But i would prefer not to get 100 documents then have to loop over each to update individually. Was hoping to perform find and update in single query.
For improved performance, take advantage of using a Bulk() API for updating the collection efficiently in bulk as you will be sending the operations to the server in batches (for example, say a batch size of 500). This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.
The following demonstrates this approach, the first example uses the Bulk() API available in MongoDB versions >= 2.6 and < 3.2. It updates all the matched documents in the collection from a given array by incrementing 1 to the shown field. It assumes the array of images has the structure
var images = [
{ "_id": 1, "name": "img_1.png" },
{ "_id": 2, "name": "img_2.png" }
{ "_id": 3, "name": "img_3.png" },
...
{ "_id": n, "name": "img_n.png" }
]
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.images.initializeUnorderedBulkOp(),
counter = 0;
images.forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$inc": { "shown": 1 }
});
counter++;
if (counter % 500 === 0) {
// Execute per 500 operations
bulk.execute();
// re-initialize every 500 update statements
bulk = db.images.initializeUnorderedBulkOp();
}
})
// Clean up remaining queue
if (counter % 500 !== 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite().
MongoDB version 3.2 and greater:
var ops = [];
images.forEach(function(doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$inc": { "shown": 1 }
}
}
});
if (ops.length === 500 ) {
db.images.bulkWrite(ops);
ops = [];
}
})
if (ops.length > 0)
db.images.bulkWrite(ops);
I have two mongodb databases.
Development DB
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key4":"val4",
"key5":"val5",
"key6":"val6"
}
}
Production DB
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key10":"val10",
"key11":"val11",
"key12":"val12"
}
}
I want to move my Development data to production data without losing newly added keys in production.
The output should become:
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key4":"val4",
"key5":"val5",
"key6":"val6"
"key10":"val10",
"key11":"val11",
"key12":"val12"
}
}
I can't update by using db.collection.update( { _id:...} , { $set: { some_key.param2 : new_info } }, as I can't add parent to each and every key.
Depending on your eventual needs there are a couple of approaches you can take to this:
Cycle Object keys and apply updates: Being where you essentially "read" the current object and then take note of it's current state when applying individual updates per each key. Bulk operations help somewhat here:
var bulk = db.target.initializeOrderedBulkOp(),
count = 0;
db.source.find().forEach(function(doc) {
Object.keys(doc.parent1).forEach(function(key) {
var query = { "_id": doc.id };
query["parent1." + key] = { "$ne": doc.parent1[key] };
var update = { "$set": {} };
update.$set["parent1." + key] = doc.parent1[key];
bulk.find(query).updateOne(update);
query = { "_id": doc._id };
update = { "$setOnInsert": {} };
update.$setOnInsert["parent1." + key] = doc.parent1[key];
bulk.find(query).upsert().updateOne(update);
count++;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.target.initializeOrderedBulkOp();
}
});
});
if ( count % 500 != 0 )
bulk.execute();
Use a utility to "merge" the results per key: Such as with "lodash" library as in:
db.source.find().forEach(function(doc) {
var id = doc._id;
delete doc._id;
var result = db.target.findAndModify({
"query": { "_id": id },
"update": { "$setOnInsert": doc },
"upsert": true,
"new": true
});
var merged = _.merge(result,doc);
db.target.update({ "_id": merged._id }, merged );
});
The "latter" is generally heavier in "update" and communication load though a bit lighter in overall code. You can also "tweak" this in API code where you can in fact return if the "upsert" in fact resulted in such a thing or whether the document was actually just "found", in which case a decision can be made whether to do the "merge" or not.
Of course I am "abstracting" here, as in reality you source from different "databases" and "connections" rather than just collections as is given as an example. But these are the basic model patterns to follow.
I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.