Move mongodb data without losing destination data - mongodb

I have two mongodb databases.
Development DB
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key4":"val4",
"key5":"val5",
"key6":"val6"
}
}
Production DB
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key10":"val10",
"key11":"val11",
"key12":"val12"
}
}
I want to move my Development data to production data without losing newly added keys in production.
The output should become:
{
_id:"someid",
"parent1":{
"key1":"val1",
"key2":"val2",
"key3":"val3",
"key4":"val4",
"key5":"val5",
"key6":"val6"
"key10":"val10",
"key11":"val11",
"key12":"val12"
}
}
I can't update by using db.collection.update( { _id:...} , { $set: { some_key.param2 : new_info } }, as I can't add parent to each and every key.

Depending on your eventual needs there are a couple of approaches you can take to this:
Cycle Object keys and apply updates: Being where you essentially "read" the current object and then take note of it's current state when applying individual updates per each key. Bulk operations help somewhat here:
var bulk = db.target.initializeOrderedBulkOp(),
count = 0;
db.source.find().forEach(function(doc) {
Object.keys(doc.parent1).forEach(function(key) {
var query = { "_id": doc.id };
query["parent1." + key] = { "$ne": doc.parent1[key] };
var update = { "$set": {} };
update.$set["parent1." + key] = doc.parent1[key];
bulk.find(query).updateOne(update);
query = { "_id": doc._id };
update = { "$setOnInsert": {} };
update.$setOnInsert["parent1." + key] = doc.parent1[key];
bulk.find(query).upsert().updateOne(update);
count++;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.target.initializeOrderedBulkOp();
}
});
});
if ( count % 500 != 0 )
bulk.execute();
Use a utility to "merge" the results per key: Such as with "lodash" library as in:
db.source.find().forEach(function(doc) {
var id = doc._id;
delete doc._id;
var result = db.target.findAndModify({
"query": { "_id": id },
"update": { "$setOnInsert": doc },
"upsert": true,
"new": true
});
var merged = _.merge(result,doc);
db.target.update({ "_id": merged._id }, merged );
});
The "latter" is generally heavier in "update" and communication load though a bit lighter in overall code. You can also "tweak" this in API code where you can in fact return if the "upsert" in fact resulted in such a thing or whether the document was actually just "found", in which case a decision can be made whether to do the "merge" or not.
Of course I am "abstracting" here, as in reality you source from different "databases" and "connections" rather than just collections as is given as an example. But these are the basic model patterns to follow.

Related

Using $sum on a existent field returns a value of 0 [duplicate]

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

mongodb update a key to all documents using forEach

I want to update in Mongo the 'order' field to all of my documents so they will be 1..2..3..4....34.
After running this, they all have "order": "34".
What am I doing wrong?
var i = 1;
db.images.find().forEach(function() {
db.images.update(
{},
{ "$set": {"order": NumberInt(i)} },
{ multi: true }
);
i++;
})
multi : true means all documents matching the query will be updated. And your query is {}, which matches all the documents. So, basically you are updating the order of all the documents in every iteration.
Also, snapshot mode has to be enabled on the cursor to ensure that the same document isn't returned more than once.
You could try this:
var i = 1;
db.images.find().snapshot().forEach(function(image) {
db.images.update(
{"_id" : image._id},
{ "$set": {"order": NumberInt(i)} }
);
i++;
})
From a performance standpoint, it is better to use the bulk APIs. bulkwrite

Add field if not exist to document in Mongo

Source Doc
{
"_id" : "12345",
"LastName" : "Smith",
"FirstName" : "Fred",
"ProfileCreated" : NumberLong(1447118831860),
"DropOut" : false,
}
New Doc
{
"_id" : "12345",
"LastName" : "Smith",
"FirstName" : "Fred",
"ProfileCreated" : NumberLong(1447118831860),
"DropOut" : true,
"LatestConsultation" : false,
}
I have two collections which share a lot of the same document ID's and fields but over time the new documents will have fields added to them and or completely new documents with new ID's will get created.
I think I know how to handle new documents using $setOnInsert and upsert = true but I'm not sure how best to handle the addition of new fields. The behavior I require for documents that exists in both collection matched on _id with new fields is to add the new field to the document without modifying the values of any of the other fields even if they have changed as in the example where the DropOut value has changed. The resulting document I require is.
Result document
{
"_id" : "12345",
"LastName" : "Smith",
"FirstName" : "Fred",
"ProfileCreated" : NumberLong(1447118831860),
"DropOut" : false,
"LatestConsultation" : false,
}
What is the best and most performatic way to achive this? Also if this can somehow be combined into a single statement that also includes the addition of documents that exists in the new collection but not in the source collection that would be amazing :-)
PS. I am using Pymongo so a Pymongo example would be even better but I can translate a mongo shell example.
Not sure is this is possible with an atomic update. However, you could string in some mixed operations and tackle this in such a way that you iterate the new collection and for each document in the new collection:
Use the _id field to query the old collection. Use the findOne() method to return a document from the old collection that matches on the _id from the new collection.
Extend the new doc with the old doc by adding the new fields which do not exist in the old document.
Update the new collection with this merged document.
The following basic mongo shell example demonstrates the algorithm above:
function merge(from, to) {
var obj = {};
if (!from) {
from = {};
} else {
obj = from;
}
for (var key in to) {
if (!from.hasOwnProperty(key)) {
obj[key] = to[key];
}
}
return obj;
}
db.new_collection.find({}).snapshot().forEach(function(doc){
var old_doc = db.old_collection.findOne({ "_id": doc._id }),
merged_doc = merge(old_doc, doc);
db.new_collection.update(
{ "_id": doc._id },
{ "$set": merged_doc }
);
});
For dealing with large collections, better leverage your updates using the bulk API which offers better performance and efficient update operations done through
sending the update requests in bulk rather than each update operation for every request (which is slow). The method to use is the bulkWrite() function, which can be applied in the above example as:
function merge(from, to) {
var obj = {};
if (!from) {
from = {};
} else {
obj = from;
}
for (var key in to) {
if (!from.hasOwnProperty(key)) {
obj[key] = to[key];
}
}
return obj;
}
var ops = [];
db.new_collection.find({}).snapshot().forEach(function(doc){
var old_doc = db.old_collection.findOne({ "_id": doc._id }),
merged_doc = merge(old_doc, doc);
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": merged_doc }
}
});
if (ops.length === 1000) {
db.new_collection.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.new_collection.bulkWrite(ops);
Or for MongoDB 2.6.x and 3.0.x releases use this version of Bulk operations:
var bulk = db.new_collection.initializeUnorderedBulkOp(),
counter = 0;
db.new_collection.find({}).snapshot().forEach(function(doc){
var old_doc = db.old_collection.findOne({ "_id": doc._id }),
merged_doc = merge(old_doc, doc);
bulk.find({ "_id": doc._id }).updateOne({ "$set": merged_doc });
if (counter % 1000 === 0) {
bulk.execute();
bulk = db.new_collection.initializeUnorderedBulkOp();
}
});
if (counter % 1000 !== 0 ) bulk.execute();
The Bulk operations API in both cases will help reduce the IO load on the server by sending the requests only once in every 1000 documents in the collection to process.

Mongo : How to convert all entries using a long timeStamp to an ISODate?

I have a current Mongo database with the accumulated entries/fields
{
name: "Fred Flintstone",
age : 34,
timeStamp : NumberLong(14283454353543)
}
{
name: "Wilma Flintstone",
age : 33,
timeStamp : NumberLong(14283454359453)
}
And so on...
Question : I want to convert all entries in the database to their corresponding ISODate instead - How does one do this?
Desired Result :
{
name: "Fred Flintstone",
age : 34,
timeStamp : ISODate("2015-07-20T14:50:32.389Z")
}
{
name: "Wilma Flintstone",
age : 33,
timeStamp : ISODate("2015-07-20T14:50:32.389Z")
}
Things I've tried
>db.myCollection.find().forEach(function (document) {
document["timestamp"] = new Date(document["timestamp"])
//Not sure how to update this document from here
db.myCollection.update(document) //?
})
Using the aggregation pipeline for update operations, simply run the following update operation:
db.myCollection.updateMany(
{ },
[
{ $set: {
timeStamp: {
$toDate: '$timeStamp'
}
} },
]
])
With you initial attempt, you were almost there, you just need to call the save() method on the modified document to update it since the method uses either the insert or the update command. In the above instance, the document contains an _id fieldand thus the save() method is equivalent to an update() operation with the upsert option set to true and the query predicate on the _id field:
db.myCollection.find().snapshot().forEach(function (document) {
document["timestamp"] = new Date(document["timestamp"]);
db.myCollection.save(document)
})
The above is similar to explicitly calling the update() method as you had previously attempted:
db.myCollection.find().snapshot().forEach(function (document) {
var date = new Date(document["timestamp"]);
var query = { "_id": document["_id"] }, /* query predicate */
update = { /* update document */
"$set": { "timestamp": date }
},
options = { "upsert": true };
db.myCollection.update(query, update, options);
})
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.myCollection.initializeUnorderedBulkOp(),
counter = 0;
db.myCollection.find({"timestamp": {"$not": {"$type": 9 }}}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "timestamp": new Date(doc.timestamp") }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.myCollection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.myCollection.find({"timestamp": {"$not": {"$type": 9 }}});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "timestamp": new Date(doc.timestamp") } }
}
});
if (ops.length === 1000) {
db.myCollection.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.myCollection.bulkWrite(ops);
It seems that there are some cumbersome things happening in mongo when trying to instantiate Date objects from NumberLong values. Mainly becasue the NumberLong values are converted to wrong representations and the fallback to current date is used.
I was fighting 2 days with mongo and finally I found the solution. The key is to convert NumberLong to Double ... and pass double values to Date constructor.
Here is the solution that uses bulb operations and work for me ...
(lastIndexedTimestamp is the collection field that is migrated to ISODate and stored in lastIndexed field. A temporary collection is created, and it is renamed to the original value in the end.)
db.annotation.aggregate( [
{ $project: {
_id: 1,
lastIndexedTimestamp: 1,
lastIndexed: { $add: [new Date(0), {$add: ["$lastIndexedTimestamp", 0]}]}
}
},
{ $out : "annotation_new" }
])
//drop annotation collection
db.annotation.drop();
//rename annotation_new to annotation
db.annotation_new.renameCollection("annotation");

Merge changeset documents in a query

I have recorded changes from an information system in a mongo database. Every time a set of values are set or changed, a record is saved in the mongo database.
The change collection is in the following form:
{ "user_id": 1, "timestamp": { "date" : "2010-09-22 09:28:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "valueA", "fieldB": "valueB", "fieldC": "valueC" } }
{ "user_id": 1, "timestamp": { "date" : "2010-09-24 19:01:52", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "new_valueA", "fieldB": null, "fieldD": "valueD" } }
{ "user_id": 1, "timestamp": { "date" : "2010-10-01 11:11:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldD": "new_valueD" } }
Of course there are thousands of records per user with different attributes which represent millions of records. What I want to do is to see a user status at a given time. By example, the user_id 1 at 2010-09-30 would be
fieldA: new_valueA
fieldC: valueC
fieldD: valueD
This means I need to flatten all the changes prior to a given date for a given user into a single record. Can I do that directly in mongo ?
Edit: I am using the 2.0 version of mongodb hence cannot benefit from the aggregation framework.
Edit: It sounds I have found the answer to my question.
var mapTimeAndChangesByUserId = function() {
var key = this.user_id;
var value = { timestamp: this.timestamp.date, changes: this.changes };
emit(key, value);
}
var reduceMergeChanges = function(user_id, changeset) {
var mergeFunction = function(a, b) { for (var attr in b) a[attr] = b[attr]; };
var result = {};
changeset.forEach(function(e) { mergeFunction(result, e.changes); });
return { timestamp: changeset.pop().timestamp, changes: result };
}
The reduce function merges the changes in the order they come and returns the result.
db.user_change.mapReduce(
mapTimeAndChangesByUserId,
reduceMergeChanges,
{
out: { inline: 1 },
query: { user_id: 1, "timestamp.date": { $lt: "2010-09-30" } },
sort: { "timestamp.date": 1 }
});
'results' : [
"_id": 1,
"value": {
"timestamp": "2010-09-24 19:01:52",
"changes": {
"fieldA": "new_valueA",
"fieldB": null,
"fieldC": "valueC",
"fieldD": "valueD"
}
}
]
Which is fine to me.
You could write a MR to do this.
Since the fields are a lot like tags you can modify a nice cookbook example of counting tags here: http://cookbook.mongodb.org/patterns/count_tags/ of course instead of counting you want the latest value applied (assumption since this is not clear in your question) for that field.
So lets get our map function:
map = function() {
if (!this.changes) {
// If there were not changes for some reason lets bail this record
return;
}
// We iterate the changes
for (index in this.changes) {
emit(index /* We emit the field name */, this.changes[index] /* We emit the field value */);
}
}
And now for our reduce:
reduce = function(values){
// This part is dependant upon your input query. If you add a sort of
// date (ts) DESC then you will prolly want the first index (0) not the last as
// gathered here by values.length
return values[values.length];
}
And this will output a single document per field change of the type:
{
_id: your_field_ie_fieldA,
value: whoop
}
You can then iterate the end of the (most likely) in line output and, bam, you have your changes.
This is of course one way of dong it and is not designed to be run completely in line to your app, however that all depends on the size of the data your working on; it could be run very close.
I am unsure whether the group and distinct can run on this but it looks like it might: http://docs.mongodb.org/manual/reference/method/db.collection.group/#db-collection-group however I should note that group is basically a MR wrapper but you could do something like (untested just like the MR above):
db.col.group( {
key: { 'changes.fieldA': 1, // the rest of the fields },
cond: { 'timestamp.date': { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
But it does require you to define the keys instead of just iterating them programmably (maybe a better way).