Suppose I have a single document in my mongo collection that looks like this:
{
"_id": 123,
"field_to_prune":
{
"keep_field_1": "some value",
"random_field_1": "some value",
"keep_field_2": "some value",
"random_field_2": "some value",
"random_field_3": "some value"
}
}
I want to prune that document to look like this:
{
"_id": 123,
"field_to_prune":
{
"keep_field_1": "some value",
"keep_field_2": "some value"
}
}
However, my issue is that I don't know what the "random" field names are. In mongo, how would i $unset all fields except a couple of known fields?
I can think of a couple of ways, but i don't know the syntax.. i could select all field NAMES and then for each one of those unset the field. kind of like this:
[Some query to find all field names under "field_to_prune" for id 123].forEach(function(i) {
var key = "field_to_prune." + i;
print("removing field: " + key);
var mod = {"$unset": {}};
mod["$unset"][key] = "";
db.myCollection.update({ _id: "123" }, mod);
});
Another way I was thinking of doing it was to unset where the field name is not in an array of strings that i defined. not sure how to do that either. Any ideas?
If you don't care about atomicity then you may do it with save:
doc = db.myCollection.findOne({"_id": 123});
for (k in doc.field_to_prune) {
if (k === 'keep_field_1') continue;
if (k === 'keep_field_2') continue;
delete doc.field_to_prune[k];
}
db.myCollection.save(doc);
The main problem of this solution is that it's not atomic. So, any update to doc between findOne and save will be lost.
Alternative is to actually unset all unwanted fields instead of saving the doc:
doc = db.myCollection.findOne({"_id": 123});
unset = {};
for (k in doc.field_to_prune) {
if (k === 'keep_field_1') continue;
if (k === 'keep_field_2') continue;
unset['field_to_prune.'+k] = 1;
}
db.myCollection.update({_id: doc._id}, {$unset: unset});
This solution is much better because mongo runs update atomically, so no update will be lost. And you don't need another collection to do what you want.
Actually the best way to do this is to iterate over the cursor an use the $unset update operate to remove those fields in subdocuments except the known fields you want to keep. Also you need to use "bulk" operations for maximum efficiency.
MongoDB 3.2 deprecates Bulk() and its associated methods. So if you should use the .bulkWrite()
var count = 0;
var wantedField = ["keep_field_1", "keep_field_2"];
var requests = [];
var count = 0;
db.myCollection.find().forEach(function(document) {
var fieldToPrune = document.field_to_prune;
var unsetOp = {};
for (var key in fieldToPrune) {
if ((wantedFields.indexOf(key) === -1) && Object.prototype.hasOwnProperty.call(fieldToPrune, key ) ) {
unsetOp["field_to_prune."+key] = " ";
}
}
requests.push({
"updateOne": {
"filter": { "_id": document._id },
"update": { "$unset": unsetOp }
}
});
count++;
if (count % 1000 === 0) {
// Execute per 1000 operations and re-init
db.myCollection.bulkWrite(requests);
requests = [];
}
})
// Clean up queues
db.myCollection.bulkWrite(requests)
From MongoDB 2.6 you can use the Bulk API.
var bulk = db.myCollection.initializeUnorderedBulkOp();
var count = 0;
db.myCollection.find().forEach(function(document) {
fieldToPrune = document.field_to_prune;
var unsetOp = {};
for (var key in fieldToPrune) {
if ((wantedFields.indexOf(key) === -1) && Object.prototype.hasOwnProperty.call(fieldToPrune, key ) ) {
unsetOp["field_to_prune."+key] = " ";
}
}
bulk.find({ "_id": document._id }).updateOne( { "$unset": unsetOp } );
count++;
if (count % 1000 === 0) {
// Execute per 1000 operations and re-init
bulk.execute();
bulk = db.myCollection.initializeUnorderedBulkOp();
}
})
// Clean up queues
if (count > 0) {
bulk.execute();
}
I solved this with a temporary collection. i did the following:
db.myCollection.find({"_id": "123"}).forEach(function(i) {
db.temp.insert(i);
});
db.myCollection.update(
{_id: "123"},
{ $unset: { "field_to_prune": ""}}
)
db.temp.find().forEach(function(i) {
var key1 = "field_to_prune.keep_field_1";
var key2 = "field_to_prune.keep_field_2";
var mod = {"$set": {}};
mod["$set"][key1] = i.field_to_prune.keep_field_1;
mod["$set"][key2] = i.field_to_prune.keep_field_2;
db.myCollection.update({_id: "123"}, mod)
});
db.getCollection("temp").drop();
Unfortunately all the solutions presented so far are relying on script execution and some sort of forEach invocation, which will end up handling only one document at a time. If the collection to normalize is big this is going to be impractical and take way too long.
Also the functions passed to forEach are executed on the client, meaning that if the connection to the database is lost, the operation is going to be interrupted in the middle of the process, potentially leaving the collection in inconsistent state.
Performance issues could be mitigated by using bulk operations like the one proposed by #styvane here. That's solid advice.
But we can do better. Update operations support aggregation pipeline syntax since MongoDB 4.2, allowing the data normalization operation to be achieved by simply creating a new temporary object containing only the desired fields, unset the old one and then putting the temporary one back in its place, all using with the current values of the document as references:
db.theCollection.updateMany(
{field_to_prune: {$exists: true}},
[
{$set: {_temp: {
keep_field_1: '$field_to_prune.keep_field_1',
keep_field_2: '$field_to_prune.keep_field_2'
}}},
{$unset: 'field_to_prune'},
{$set: {field_to_prune: '$_temp'}},
{$unset: '_temp'}
]
)
Example:
> db.myColl.insertOne({
... _id: 123,
... field_to_prune: {
... keep_field_1: "some value",
... random_field_1: "some value",
... keep_field_2: "some value",
... random_field_2: "some value",
... random_field_3: "some value"
... }
... })
{ "acknowledged" : true, "insertedId" : 123 }
>
> db.myColl.insertOne({
... _id: 234,
... field_to_prune: {
... // keep_field_1 is absent
... random_field_1: "some value",
... keep_field_2: "some value",
... random_field_2: "some value",
... random_field_3: "some value"
... }
... })
{ "acknowledged" : true, "insertedId" : 234 }
>
> db.myColl.find()
{ "_id" : 123, "field_to_prune" : { "keep_field_1" : "some value", "random_field_1" : "some value", "keep_field_2" : "some value", "random_field_2" : "some value", "random_field_3" : "some value" } }
{ "_id" : 234, "field_to_prune" : { "random_field_1" : "some value", "keep_field_2" : "some value", "random_field_2" : "some value", "random_field_3" : "some value" } }
>
> db.myColl.updateMany(
... {field_to_prune: {$exists: true}},
... [
... {$set: {_temp: {
... keep_field_1: '$field_to_prune.keep_field_1',
... keep_field_2: '$field_to_prune.keep_field_2'
... }}},
... {$unset: 'field_to_prune'},
... {$set: {field_to_prune: '$_temp'}},
... {$unset: '_temp'}
... ]
...)
{ "acknowledged" : true, "matchedCount" : 2, "modifiedCount" : 2 }
>
> db.myColl.find()
{ "_id" : 123, "field_to_prune" : { "keep_field_1" : "some value", "keep_field_2" : "some value" } }
{ "_id" : 234, "field_to_prune" : { "keep_field_2" : "some value" } }
here is my solution, I think easier than the others I read:
db.labels.find({"_id" : ObjectId("123")}).snapshot().forEach(
function (elem) {
db.labels.update({_id: elem._id},
{'field_to_prune.keep_field_1': elem.field_to_prune.keep_field_1,
'field_to_prune.keep_field_2': elem.field_to_prune.keep_field_2});
});
I'm deleting everything but the fields 'keep_field_1' and 'keep_field_2'
Related
Following the suggestions over here MongoDB: How to change the type of a field? I tried to update my collection to change the type of field and its value.
Here is the update query
db.MyCollection.find({"ProjectID" : 44, "Cost": {$exists: true}}).forEach(function(doc){
if(doc.Cost.length > 0){
var newCost = doc.Cost.replace(/,/g, '').replace(/\$/g, '');
doc.Cost = parseFloat(newCost).toFixed(2);
db.MyCollection.save(doc);
} // End of If Condition
}) // End of foreach
upon completion of the above query, when I run the following command
db.MyCollection.find({"ProjectID" : 44},{Cost:1})
I still have Cost field as string.
{
"_id" : ObjectId("576919b66bab3bfcb9ff0915"),
"Cost" : "11531.23"
}
/* 7 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0916"),
"Cost" : "13900.64"
}
/* 8 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0917"),
"Cost" : "15000.86"
}
What am I doing wrong here?
Here is the sample document
/* 2 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0911"),
"Cost" : "$7,100.00"
}
/* 3 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0912"),
"Cost" : "$14,500.00"
}
/* 4 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0913"),
"Cost" : "$12,619.00"
}
/* 5 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0914"),
"Cost" : "$9,250.00"
}
The problem is that toFixed returns an String, not a Number. Then your are just updating the document with a new, and different String.
Example from Mongo Shell:
> number = 2.3431
2.3431
> number.toFixed(2)
2.34
> typeof number.toFixed(2)
string
If you want a 2 decimals number you must parse it again with something like:
db.MyCollection.find({"ProjectID" : 44, "Cost": {$exists: true}}).forEach(function(doc){
if(doc.Cost.length > 0){
var newCost = doc.Cost.replace(/,/g, '').replace(/\$/g, '');
var costString = parseFloat(newCost).toFixed(2);
doc.Cost = parseFloat(costString);
db.MyCollection.save(doc);
} // End of If Condition
}) // End of foreach
Follow this pattern to convert a currency field of string type to a float. You need to query all the documents in the collection that have the Cost field type string. To do so you would need to take advantage of using the Bulk API for bulk updates. These offer better performance as you will be sending the operations to the server in batches of say 1000, which gives you a better performance as you are not sending every request to the server, but just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing all the Cost fields to floating value fields:
var bulk = db.MyCollection.initializeUnorderedBulkOp(),
counter = 0;
db.MyCollection.find({
"Cost": { "$exists": true, "$type": 2 }
}).forEach(function (doc) {
var newCost = Number(doc.Cost.replace(/[^0-9\.]+/g,""));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "Cost": newCost }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations
// re-initialize every 1000 update statements
bulk = db.MyCollection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite().
It uses the same cursors as above but creates the arrays with the bulk operations using the same forEach() cursor method to push each bulk write document to the array. Because write commands can accept no more than 1000 operations, you will need to group your operations to have at most 1000 operations and re-intialise the array when loop hit the 1000 iteration:
var cursor = db.MyCollection.find({ "Cost": { "$exists": true, "$type": 2 } }),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newCost = Number(doc.Cost.replace(/[^0-9\.]+/g,""));
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "Cost": newCost } }
}
});
if (bulkUpdateOps.length == 1000) {
db.MyCollection.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.MyCollection.bulkWrite(bulkUpdateOps); }
Since mongoDB version 4.2, It can be done entirely inside one mongoDB query using Updates with Aggregation Pipeline:
db.collection.updateMany(
{Cost: {$exists: true}},
[{$set: {
Cost: {
$toDouble: {
$reduce: {
input: {$split: [{$substr: ["$Cost", 1, {$strLenCP: "$Cost"}]}, ","]},
initialValue: "",
in: {$concat: ["$$value", "$$this"]}
}
}
}
}}]
)
See how it works on the playground example
I have a "mongodb colllenctions" and I'd like to remove the "empty strings"with keys from it.
From this:
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year" : "15",
"year_comment" : "",
}
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year" : "",
"year_comment" : "asd",
}
I'd like to gain this result:
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year" : "15",
}
{
"_id" : ObjectId("56323d975134a77adac312c5"),
"year_comment" : "asd",
}
How could I solve it?
Please try executing following code snippet in Mongo shell which strips fields with empty or null values
var result=new Array();
db.getCollection('test').find({}).forEach(function(data)
{
for(var i in data)
{
if(data[i]==null || data[i]=='')
{
delete data[i]
}
}
result.push(data)
})
print(tojson(result))
Would start with getting a distinct list of all the keys in the collection, use those keys as your query basis and do an ordered bulk update using the Bulk API operations. The update statement uses the $unset operator to remove the fields.
The mechanism to get distinct keys list that you need to assemble the query is possible through Map-Reduce. The following mapreduce operation will populate a separate collection with all the keys as the _id values:
mr = db.runCommand({
"mapreduce": "my_collection",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "my_collection" + "_keys"
})
To get a list of all the dynamic keys, run distinct on the resulting collection:
db[mr.result].distinct("_id")
// prints ["_id", "year", "year_comment", ...]
Now given the list above, you can assemble your query by creating an object that will have its properties set within a loop. Normally your query will have this structure:
var keysList = ["_id", "year", "year_comment"];
var query = keysList.reduce(function(obj, k) {
var q = {};
q[k] = "";
obj["$or"].push(q);
return obj;
}, { "$or": [] });
printjson(query); // prints {"$or":[{"_id":""},{"year":""},{"year_comment":""}]}
You can then use the Bulk API (available with MongoDB 2.6 and above) as a way of streamlining your updates for better performance with the query above. Overall, you should be able to have something working as:
var bulk = db.collection.initializeOrderedBulkOp(),
counter = 0,
query = {"$or":[{"_id":""},{"year":""},{"year_comment":""}]},
keysList = ["_id", "year", "year_comment"];
db.collection.find(query).forEach(function(doc){
var emptyKeys = keysList.filter(function(k) { // use filter to return an array of keys which have empty strings
return doc[k]==="";
}),
update = emptyKeys.reduce(function(obj, k) { // set the update object
obj[k] = "";
return obj;
}, { });
bulk.find({ "_id": doc._id }).updateOne({
"$unset": update // use the $unset operator to remove the fields
});
counter++;
if (counter % 1000 == 0) {
// Execute per 1000 operations and re-initialize every 1000 update statements
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
})
If you need to update a single blank parameter or you prefer to do parameter by parameter, you can use the mongo updateMany functionality:
db.comments.updateMany({year: ""}, { $unset : { year : 1 }})
I am new to mongodb and now using aggregate.
I am in a problem that I have 2 column let this column1 and column2 I want to match either by column1 or column2 inside $match Is it possible. I am getting stuck please help.
db Structure:
{
"_id" : ObjectId("55794aa1be1f8fe822da139d"),
"transactionType" : "1",
"_store" : {
"storeLocation" : "Pitampura",
"storeName" : "Godown",
"_id" : "5576b5c5e414d90c03d1e330"
}
}
I am try to filter according to transactionType and storeName, I am sending these 2 params to api but when storeName sended as empty string then only filter according to transactionType else by both paramater. Not wanted to use if-elseif.
Well of course it can suit your query. You just handle as follows:
// Initial data
var request = { "storeName": "", "transactionType": "1" };
// Transform to array
var conditions = Object.keys(request).map(function(key) {
var obj = {},
newKey = "";
if ( key == "storeName" ) {
newKey = "_store." + key;
} else {
newKey = key;
}
obj[newKey] = request[key];
return obj;
});
db.collection.find({ "$or": conditions });
Where the whole structure after transformation breaks down to :
db.collection.find({
"$or": [
{ "_store.storeName": "" },
{ "transactionType": "1" }
]
})
Which of course matches the document on the condition that "transactionType" is met.
So that is what $or does, considers that at least one of the conditions in the query arguments matches data in the document.
The other thing here is that since the data presented in the request is not a "direct match" for the data in the document, manipulation is done on the "key name" to use the correct "dot notation" form for acessing that element.
These are just basic queries, so the same rules apply to aggregation $match, which is just a query element itself:
db.collection.aggregate([
// Possibly other pipeline before
// Your match phase, which probably should be first
{ "$match": {
"$or": [
{ "_store.storeName": "" },
{ "transactionType": "1" }
]
}},
// Other aggregagtion pipeline
])
Is there a way to conditionally $addToSet based on a specific key field in a subdocument on an array?
Here's an example of what I mean - given the collection produced by the following sample bootstrap;
cls
db.so.remove();
db.so.insert({
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples"
}
]
});
n defines a unique document key. I only want one entry with the same n value in the array at any one time. So I want to be able to update the pfms array using n so that I end up with just this;
{
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples",
"mState": 1111234
}
]
}
Here's where I am at the moment;
db.so.update({
"Name": "fruitBowl",
},{
// not allowed to do this of course
// "$pull": {
// "pfms": { n: "apples" },
// },
"$addToSet": {
"pfms": {
"$each": [
{
"n": "apples",
"mState": 1111234
}
]
}
}
}
)
Unfortunately, this adds another array element;
db.so.find().toArray();
[
{
"Name" : "fruitBowl",
"_id" : ObjectId("53ecfef5baca2b1079b0f97c"),
"pfms" : [
{
"n" : "apples"
},
{
"n" : "apples",
"mState" : 1111234
}
]
}
]
I need to effectively upsert the apples document matching on n as the unique identifier and just set mState whether or not an entry already exists. It's a shame I can't do a $pull and $addToSet in the same document (I tried).
What I really need here is dictionary semantics, but that's not an option right now, nor is breaking out the document - can anyone come up with another way?
FWIW - the existing format is a result of language/driver serialization, I didn't choose it exactly.
further
I've gotten a little further in the case where I know the array element already exists I can do this;
db.so.update({
"Name": "fruitBowl",
"pfms.n": "apples",
},{
$set: {
"pfms.$.mState": 1111234,
},
}
)
But of course that only works;
for a single array element
as long as I know it exists
The first limitation isn't a disaster, but if I can't effectively upsert or combine $addToSet with the previous $set (which of course I can't) then it the only workarounds I can think of for now mean two DB round-trips.
The $addToSet operator of course requires that the "whole" document being "added to the set" is in fact unique, so you cannot change "part" of the document or otherwise consider it to be a "partial match".
You stumbled on to your best approach using $pull to remove any element with the "key" field that would result in "duplicates", but of course you cannot modify the same path in different update operators like that.
So the closest thing you will get is issuing separate operations but also doing that with the "Bulk Operations API" which is introduced with MongoDB 2.6. This allows both to be sent to the server at the same time for the closest thing to a "contiguous" operations list you will get:
var bulk = db.so.initializeOrderedBulkOp();
bulk.find({ "Name": "fruitBowl", "pfms.n": "apples": }).updateOne({
"$pull": { "pfms": { "n": "apples" } }
});
bulk.find({ "Name": "fruitBowl" }).updateOne({
"$push": { "pfms": { "n": "apples", "state": 1111234 } }
})
bulk.execute();
That pretty much is your best approach if it is not possible or practical to move the elements to another collection and rely on "upserts" and $set in order to have the same functionality but on a collection rather than array.
I have faced the exact same scenario. I was inserting and removing likes from a post.
What I did is, using mongoose findOneAndUpdate function (which is similar to update or findAndModify function in mongodb).
The key concept is
Insert when the field is not present
Delete when the field is present
The insert is
findOneAndUpdate({ _id: theId, 'likes.userId': { $ne: theUserId }},
{ $push: { likes: { userId: theUserId, createdAt: new Date() }}},
{ 'new': true }, function(err, post) { // do the needful });
The delete is
findOneAndUpdate({ _id: theId, 'likes.userId': theUserId},
{ $pull: { likes: { userId: theUserId }}},
{ 'new': true }, function(err, post) { // do the needful });
This makes the whole operation atomic and there are no duplicates with respect to the userId field.
I hope this helpes. If you have any query, feel free to ask.
As far as I know MongoDB now (from v 4.2) allows to use aggregation pipelines for updates.
More or less elegant way to make it work (according to the question) looks like the following:
db.runCommand({
update: "your-collection-name",
updates: [
{
q: {},
u: {
$set: {
"pfms.$[elem]": {
"n":"apples",
"mState": NumberInt(1111234)
}
}
},
arrayFilters: [
{
"elem.n": {
$eq: "apples"
}
}
],
multi: true
}
]
})
In my scenario, The data need to be init when not existed, and update the field If existed, and the data will not be deleted. If the datas have these states, you might want to try the following method.
// Mongoose, but mostly same as mongodb
// Update the tag to user, If there existed one.
const user = await UserModel.findOneAndUpdate(
{
user: userId,
'tags.name': tag_name,
},
{
$set: {
'tags.$.description': tag_description,
},
}
)
.lean()
.exec();
// Add a default tag to user
if (user == null) {
await UserModel.findOneAndUpdate(
{
user: userId,
},
{
$push: {
tags: new Tag({
name: tag_name,
description: tag_description,
}),
},
}
);
}
This is the most clean and fast method in the scenario.
As a business analyst , I had the same problem and hopefully I have a solution to this after hours of investigation.
// The customer document:
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK"
}
]
}
// The problem : I want to insert dateField "01.01.2016" to customer documents where customerCodes subdocument has a document with code "YK" but does not have dateField. The final document must be as follows :
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK" ,
"dateField" : "01.01.2016"
}
]
}
// The solution : the solution code is in three steps :
// PART 1 - Find the customers with customerCodes "YK" but without dateField
// PART 2 - Find the index of the subdocument with "YK" in customerCodes list.
// PART 3 - Insert the value into the document
// Here is the code
// PART 1
var myCursor = db.customers.find({ customerCodes:{$elemMatch:{code:"YK", dateField:{ $exists:false} }}});
// PART 2
myCursor.forEach(function(customer){
if(customer.customerCodes != null )
{
var size = customer.customerCodes.length;
if( size > 0 )
{
var iFoundTheIndexOfSubDocument= -1;
var index = 0;
customer.customerCodes.forEach( function(clazz)
{
if( clazz.code == "YK" && clazz.changeDate == null )
{
iFoundTheIndexOfSubDocument = index;
}
index++;
})
// PART 3
// What happens here is : If i found the indice of the
// "YK" subdocument, I create "updates" document which
// corresponds to the new data to be inserted`
//
if( iFoundTheIndexOfSubDocument != -1 )
{
var toSet = "customerCodes."+ iFoundTheIndexOfSubDocument +".dateField";
var updates = {};
updates[toSet] = "01.01.2016";
db.customers.update({ "id" : customer.id } , { $set: updates });
// This statement is actually interpreted like this :
// db.customers.update({ "id" : "1212" } ,{ $set: customerCodes.0.dateField : "01.01.2016" });
}
}
}
});
Have a nice day !
I have recorded changes from an information system in a mongo database. Every time a set of values are set or changed, a record is saved in the mongo database.
The change collection is in the following form:
{ "user_id": 1, "timestamp": { "date" : "2010-09-22 09:28:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "valueA", "fieldB": "valueB", "fieldC": "valueC" } }
{ "user_id": 1, "timestamp": { "date" : "2010-09-24 19:01:52", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "new_valueA", "fieldB": null, "fieldD": "valueD" } }
{ "user_id": 1, "timestamp": { "date" : "2010-10-01 11:11:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldD": "new_valueD" } }
Of course there are thousands of records per user with different attributes which represent millions of records. What I want to do is to see a user status at a given time. By example, the user_id 1 at 2010-09-30 would be
fieldA: new_valueA
fieldC: valueC
fieldD: valueD
This means I need to flatten all the changes prior to a given date for a given user into a single record. Can I do that directly in mongo ?
Edit: I am using the 2.0 version of mongodb hence cannot benefit from the aggregation framework.
Edit: It sounds I have found the answer to my question.
var mapTimeAndChangesByUserId = function() {
var key = this.user_id;
var value = { timestamp: this.timestamp.date, changes: this.changes };
emit(key, value);
}
var reduceMergeChanges = function(user_id, changeset) {
var mergeFunction = function(a, b) { for (var attr in b) a[attr] = b[attr]; };
var result = {};
changeset.forEach(function(e) { mergeFunction(result, e.changes); });
return { timestamp: changeset.pop().timestamp, changes: result };
}
The reduce function merges the changes in the order they come and returns the result.
db.user_change.mapReduce(
mapTimeAndChangesByUserId,
reduceMergeChanges,
{
out: { inline: 1 },
query: { user_id: 1, "timestamp.date": { $lt: "2010-09-30" } },
sort: { "timestamp.date": 1 }
});
'results' : [
"_id": 1,
"value": {
"timestamp": "2010-09-24 19:01:52",
"changes": {
"fieldA": "new_valueA",
"fieldB": null,
"fieldC": "valueC",
"fieldD": "valueD"
}
}
]
Which is fine to me.
You could write a MR to do this.
Since the fields are a lot like tags you can modify a nice cookbook example of counting tags here: http://cookbook.mongodb.org/patterns/count_tags/ of course instead of counting you want the latest value applied (assumption since this is not clear in your question) for that field.
So lets get our map function:
map = function() {
if (!this.changes) {
// If there were not changes for some reason lets bail this record
return;
}
// We iterate the changes
for (index in this.changes) {
emit(index /* We emit the field name */, this.changes[index] /* We emit the field value */);
}
}
And now for our reduce:
reduce = function(values){
// This part is dependant upon your input query. If you add a sort of
// date (ts) DESC then you will prolly want the first index (0) not the last as
// gathered here by values.length
return values[values.length];
}
And this will output a single document per field change of the type:
{
_id: your_field_ie_fieldA,
value: whoop
}
You can then iterate the end of the (most likely) in line output and, bam, you have your changes.
This is of course one way of dong it and is not designed to be run completely in line to your app, however that all depends on the size of the data your working on; it could be run very close.
I am unsure whether the group and distinct can run on this but it looks like it might: http://docs.mongodb.org/manual/reference/method/db.collection.group/#db-collection-group however I should note that group is basically a MR wrapper but you could do something like (untested just like the MR above):
db.col.group( {
key: { 'changes.fieldA': 1, // the rest of the fields },
cond: { 'timestamp.date': { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
But it does require you to define the keys instead of just iterating them programmably (maybe a better way).