MongoDB: Updating documents using data from the same document [duplicate] - mongodb

This question already has answers here:
Update MongoDB field using value of another field
(12 answers)
Closed 5 years ago.
I have a list of documents, each with lat and lon properties (among others).
{ 'lat': 1, 'lon': 2, someotherdata [...] }
{ 'lat': 4, 'lon': 1, someotherdata [...] }
[...]
I want to modify it so that it looks like this:
{ 'coords': {'lat': 1, 'lon': 2}, someotherdata [...]}
{ 'coords': {'lat': 4, 'lon': 1}, someotherdata [...]}
[...]
So far I've got this:
db.events.update({}, {$set : {'coords': {'lat': db.events.lat, 'lon': db.events.lon}}}, false, true)
But it treats the db.events.lat and db.events.lon as strings. How can I reference the document's properties?
Cheers.

Update: If all you have to do is change the structure of a document without changing the values, see gipset's answer for a nice solution.
According to a (now unavailable) comment on the Update documentation page, you cannot reference the current document's properties from within an update().
You'll have to iterate through all the documents and update them like this:
db.events.find().snapshot().forEach(
function (e) {
// update document, using its own properties
e.coords = { lat: e.lat, lon: e.lon };
// remove old properties
delete e.lat;
delete e.lon;
// save the updated document
db.events.save(e);
}
)
Such a function can also be used in a map-reduce job or a server-side db.eval() job, depending on your needs.

The $rename operator (introduced a month after this question was posted) makes it really easy to do these kinds of things where you don't need to modify the values.
Insert some test documents
db.events.insert({ 'lat': 1, 'lon': 2, someotherdata: [] })
db.events.insert({ 'lat': 4, 'lon': 1, someotherdata: [] })
use the $rename operator
db.events.update({}, {$rename: {'lat': 'coords.lat', 'lon': 'coords.lon'}}, false, true)
Results
db.events.find()
{
"_id" : ObjectId("5113c82dd28c4e8b79971add"),
"coords" : {
"lat" : 1,
"lon" : 2
},
"someotherdata" : [ ]
}
{
"_id" : ObjectId("5113c82ed28c4e8b79971ade"),
"coords" : {
"lat" : 4,
"lon" : 1
},
"someotherdata" : [ ]
}

Neils answer. Just to let people know you cannot run this on a large database if you are lets say doing it remote shell like Robomongo. You will need to ssh into your actual server's mongo shell. Also you could also do this if you would rather do an Update.
db.Collection.find({***/ possible query /***}).toArray().forEach(
function(obj){
obj.item = obj.copiedItem;
obj.otherItem = obj.copiedItem;
obj.thirdItem = true;
obj.fourthItem = "string";
db.Collection.update({_id: obj._id}, obj);
}
);

We can use Mongo script to manipulate data on the fly. It works for me!
I use this script to correct my address data.
Example of current address: "No.12, FIFTH AVENUE,".
I want to remove the last redundant comma, the expected new address ""No.12, FIFTH AVENUE".
var cursor = db.myCollection.find().limit(100);
while (cursor.hasNext()) {
var currentDocument = cursor.next();
var address = currentDocument['address'];
var lastPosition = address.length - 1;
var lastChar = address.charAt(lastPosition);
if (lastChar == ",") {
var newAddress = address.slice(0, lastPosition);
currentDocument['address'] = newAddress;
db.localbizs.update({_id: currentDocument._id}, currentDocument);
}
}
Hope this helps!

As long as you are OK with creating a copy of the data, the aggregation framework can be used as an alternative here. You also have the option to do more to the data if you wish using other operators, but the only one you need is $project. It's somewhat wasteful in terms of space, but may be faster and more appropriate for some uses. To illustrate, I'll first insert some sample data into the foo collection:
db.foo.insert({ 'lat': 1, 'lon': 2, someotherdata : [1, 2, 3] })
db.foo.insert({ 'lat': 4, 'lon': 1, someotherdata : [4, 5, 6] })
Now, we just use $project to rework the lat and lon fields, then send them to the newfoo collection:
db.foo.aggregate([
{$project : {_id : "$_id", "coords.lat" : "$lat", "coords.lon" : "$lon", "someotherdata" : "$someotherdata" }},
{ $out : "newfoo" }
])
Then check newfoo for our altered data:
db.newfoo.find()
{ "_id" : ObjectId("544548a71b5cf91c4893eb9a"), "someotherdata" : [ 1, 2, 3 ], "coords" : { "lat" : 1, "lon" : 2 } }
{ "_id" : ObjectId("544548a81b5cf91c4893eb9b"), "someotherdata" : [ 4, 5, 6 ], "coords" : { "lat" : 4, "lon" : 1 } }
Once you are happy with the new data, you can then use the renameCollection() command to drop the old data and use the new data under the old name:
> db.newfoo.renameCollection("foo", true)
{ "ok" : 1 }
> db.foo.find()
{ "_id" : ObjectId("544548a71b5cf91c4893eb9a"), "someotherdata" : [ 1, 2, 3 ], "coords" : { "lat" : 1, "lon" : 2 } }
{ "_id" : ObjectId("544548a81b5cf91c4893eb9b"), "someotherdata" : [ 4, 5, 6 ], "coords" : { "lat" : 4, "lon" : 1 } }
One last note - until SERVER-7944 is completed you can't do the equivalent of a snapshot by hinting the _id index as suggested in this answer and so you can end up hitting a document more than once if activity elsewhere causes it to move. Since you are inserting the _id field in this example, any such occurrence would cause a unique key violation, so you will not end up with dupes, but you might have an "old" version of a document. As always, check your data thoroughly before dropping it, and preferably take a backup.

From the CLI? I think you have to pull the values out first and assign the value into a variable. Then run your update command.
Or (I haven't tried) remove 'db' from the string. events.lat and events.lon If it works, you will still have multiple values, the old values for "lat" and "lon" and the new array you created.

Related

MongoDB Aggregation - Does $unwind order documents the same way as the nested array order

I am wandering whether using $unwind operator in aggregation pipeline for document with nested array will return the deconstructed documents in the same order as the order of the items in the array.
Example:
Suppose I have the following documents
{ "_id" : 1, "item" : "foo", values: [ "foo", "foo2", "foo3"] }
{ "_id" : 2, "item" : "bar", values: [ "bar", "bar2", "bar3"] }
{ "_id" : 3, "item" : "baz", values: [ "baz", "baz2", "baz3"] }
I would like to use paging for all values in all documents in my application code. So, my idea is to use mongo aggregation framework to:
sort the documents by _id
use $unwind on values attribute to deconstruct the documents
use $skip and $limit to simulate paging
So the question using the example described above is:
Is it guaranteed that the following aggregation pipeline:
[
{$sort: {"_id": 1}},
{$unwind: "$values"}
]
will always result to the following documents with exactly the same order?:
{ "_id" : 1, "item" : "foo", values: "foo" }
{ "_id" : 1, "item" : "foo", values: "foo2" }
{ "_id" : 1, "item" : "foo", values: "foo3" }
{ "_id" : 2, "item" : "bar", values: "bar" }
{ "_id" : 2, "item" : "bar", values: "bar2" }
{ "_id" : 2, "item" : "bar", values: "bar3" }
{ "_id" : 3, "item" : "baz", values: "baz" }
{ "_id" : 3, "item" : "baz", values: "baz2" }
{ "_id" : 3, "item" : "baz", values: "baz3" }
I also asked the same question in the MongoDB community forum . An answer that confirms my assumption was posted from a member of MongoDB stuff.
Briefly:
Yes, the order of the returned documents in the example above will always be the same. It follows the order from the array field.
In the case that you do run into issues with order. You could use includeArrayIndex to guarantee order.
[
{$unwind: {
path: 'values',
includeArrayIndex: 'arrayIndex'
}},
{$sort: {
_id: 1,
arrayIndex: 1
}},
{ $project: {
index: 0
}}
]
From what I see at https://github.com/mongodb/mongo/blob/0cee67ce6909ca653462d4609e47edcc4ac5c1a9/src/mongo/db/pipeline/document_source_unwind.cpp
The cursor iterator uses getNext() method to unwind an array:
DocumentSource::GetNextResult DocumentSourceUnwind::doGetNext() {
auto nextOut = _unwinder->getNext();
while (nextOut.isEOF()) {
.....
// Try to extract an output document from the new input document.
_unwinder->resetDocument(nextInput.releaseDocument());
nextOut = _unwinder->getNext();
}
return nextOut;
}
And the getNext() implemenation relies on array's index:
DocumentSource::GetNextResult DocumentSourceUnwind::Unwinder::getNext() {
....
// Set field to be the next element in the array. If needed, this will automatically
// clone all the documents along the field path so that the end values are not shared
// across documents that have come out of this pipeline operator. This is a partial deep
// clone. Because the value at the end will be replaced, everything along the path
// leading to that will be replaced in order not to share that change with any other
// clones (or the original).
_output.setNestedField(_unwindPathFieldIndexes, _inputArray[_index]);
indexForOutput = _index;
_index++;
_haveNext = _index < length;
.....
return _haveNext ? _output.peek() : _output.freeze();
}
So unless there is anything upstream that messes with document's order the cursor should have unwound docs in the same order as subdocs were stored in the array.
I don't recall how merger works for sharded collections and I imagine there might be a case when documents from other shards are returned from between 2 consecutive unwound documents. What the snippet of the code guarantees is that unwound document with next item from the array will never be returned before unwound document with previous item from the array.
As a side note, having million items in an array is quite an extreme design. Even 20-bytes items in the array will exceed 16Mb doc limit.

Querying with array of parameters in mongodb

I have below collection in the DB, I want to retrieve data where birth month equal to given 2 months. lets say [1,2], or [4,5]
{
"_id" : ObjectId("55aa1e526fea82e9a4188f38"),
"name" : "Nilmini",
"birthDate" : 6,
"birthMonth" : 1
},
{
"_id" : ObjectId("55aa1e526fea82e9a4188f39"),
"name" : "Ruwan",
"birthDate" : 6,
"birthMonth" : 1
},{
"_id" : ObjectId("55aa1e526fea82e9a4188f40"),
"name" : "Malith",
"birthDate" : 6,
"birthMonth" : 1
},
{
"_id" : ObjectId("55aa1e526fea82e9a4188f7569"),
"name" : "Pradeep",
"birthDate" : 6,
"birthMonth" : 7
}
I use below query to get the result set, I could get the result for give one month,now I want to get results for multiple months.
var currentDay = moment().date();
var currentMonths = [];
var currentMonth = moment().month();
if(currentDay > 20){
currentMonths.push(moment().month());
currentMonths.push(moment().month()+1);
}else{
currentMonths.push(currentMonth);
}
// In blow query I am trying to pass the array to the 'birthMonth',
I'm getting nothing when I pass array to the query, I think there should be another way to do this,
Employee.find(
{
"birthDate": {$gte:currentDay}, "birthMonth": currentMonths
}, function(err, birthDays) {
res.json(birthDays);
});
I would really appreciate if you could help me to figure this out
You can use the $in operator to match against multiple values in an array like currentMonths.
So your query would be:
Employee.find(
{
"birthDate": {$gte:currentDay}, "birthMonth": {$in: currentMonths}
}, function(err, birthDays) {
res.json(birthDays);
});

Increment nested value

I create players the following way.
Players.insert({
name: name,
score: 0,
items: [{'name': 0}, {'name2': 0}...]
});
How do I increment the score in a specific player and specific item name (upserting if necessary)?
Sorry for the terrible wording :p
Well, the answer is - as in life - to simplify the problem by breaking it up.
And to avoid arrays in mongoDB - after all, objects can have as many keys as you like. So, my structure became:
{
"_id": <id>,
"name": <name>,
"score": <score>,
"items": {}
}
And to increment the a dynamic key in items:
// create your update skeleton first
var ud = { $inc: {} };
// fill it in
ud.$inc['item.' + key] = value;
// call it
db.Players.update(player, ud, true);
Works a charm :)
Lets say you have:
{
"_id" : ObjectId("5465332e6c3e2eeb66ef3683"),
"name" : "Alex",
"score" : 0,
"items" : [
{
"food" : 0
}
]
}
To update you can do:
db.Players.update({name: "Alex", "items.food": {$exists : true}},
{$inc: {score: 1, "items.$.food": 5}})
Result:
{
"_id" : ObjectId("5465332e6c3e2eeb66ef3683"),
"name" : "Alex",
"score" : 1,
"items" : [
{
"food" : 5
}
]
}
I am not sure you can upsert if the document doesn't exist because of the positional operator needed to update the array.

Is there a way to project the type of a field

Suppose we had something like the following document, but we wanted to return only the fields that had numeric information:
{
"_id" : ObjectId("52fac254f40ff600c10e56d4"),
"name" : "Mikey",
"list" : [ 1, 2, 3, 4, 5 ],
"people" : [ "Fred", "Barney", "Wilma", "Betty" ],
"status" : false,
"created" : ISODate("2014-02-12T00:37:40.534Z"),
"views" : 5
}
Now I know that we can query for fields that match a certain type by use of the $type operator. But I'm yet to stumble upon a way to $project this as a field value. So if we looked at the document in the "unwound" form you would see this:
{
"_id" : ObjectId("52fac254f40ff600c10e56d4"),
"name" : 2,
"list" : 16,
"people" : 2
"status" : 8,
"created" : 9,
"views" : 16
}
The final objective would be list only the fields that matched a certain type, let's say compare to get the numeric types and filter out the fields, after much document mangling, to produce a result as follows:
{
"_id" : ObjectId("52fac254f40ff600c10e56d4"),
"list" : [ 1, 2, 3, 4, 5 ],
"views" : 5
}
Does anyone have an approach to handle this.
There are a few issues that make this not practical:
Since the query is a distinctive parameter from the ability to do a projection, this isn't possible from a single query alone, as the projection cannot be influenced by the results of the query
As there's no way with the aggregation framework to iterate fields and check type, that's also not an option
That being said, there's a slightly whacky way of using a Map-Reduce that does get similar answers, albeit in a Map-Reduce style output that's not awesome:
map = function() {
function isNumber(n) {
return !isNaN(parseFloat(n)) && isFinite(n);
}
var numerics = [];
for(var fn in this) {
if (isNumber(this[fn])) {
numerics.push({f: fn, v: this[fn]});
}
if (Array.isArray(this[fn])) {
// example ... more complex logic needed
if(isNumber(this[fn][0])) {
numerics.push({f: fn, v: this[fn]});
}
}
}
emit(this._id, { n: numerics });
};
reduce = function(key, values) {
return values;
};
It's not complete, but the results are similar to what you wanted:
"_id" : ObjectId("52fac254f40ff600c10e56d4"),
"value" : {
"n" : [
{
"f" : "list",
"v" : [
1,
2,
3,
4,
5
]
},
{
"f" : "views",
"v" : 5
}
]
}
The map is just looking at each property and deciding whether it looks like a number ... and if so, adding to an array that will be stored as an object so that the map-reduce engine won't choke on array output. I've kept it simple in the example code -- you could improve the logic of numeric and array checking for sure. :)
Of course, it's not live like a find or aggregation, but as MongoDB wasn't designed with this in mind, this may have to do if you really wanted this functionality.

Upsert with pymongo and a custom _id field

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.