I have lots of documents inside a collection.
The structure of each of the documents inside the collection is as it follows:
{
"_id" : ObjectId(....),
"valor" : {
"AB" : {
"X" : 0.0,
"Y" : 142.6,
},
"FJ" : {
"X" : 0.2,
"Y" : 3.33
....
The collection has currently about 200 documents and I have noticed that one of the keys inside valor has the wrong name. In this case we will say "FJ" shall be "JOF" in all the docs of the collection.
Im pretty sure it is possible to change the key in all the docs using the update function of pymongo. The problem I am facing is that when I visit the online doc available https://docs.mongodb.com/v3.0/reference/method/db.collection.update/ only explains how to change the values(which I would like to remain how they currently are and change only the keys).
This is what I have tried:
def multi_update(spec_key,key_updte):
rdo=col.update((valor.spec_key),{"$set":(valor.key_updte)},multi=True)
return rdo
print(multi_update('FJ','JOF'))
But outputs name 'valor' is not defined . I thought I shall use valor.specific_key to access to the corresponding json
how can I update a key only along the docs of the collection?
You have two problems. First, valor is not an identifier in your Python code, it's a field name of a MongoDB document. You need to quote it in single or double quotes in Python in order to make it a string and use it in a PyMongo update expression.
Your second problem is, MongoDB's update command doesn't allow you set one field to the value of another, nor to rename a field. However, you can reshape all the documents in your collection using the aggregate command with a $project stage and store the results in a second collection using a $out stage.
Here's a complete example to play with:
db = MongoClient().test
collection = db.collection
collection.delete_many({})
collection.insert_one({
"valor" : {
"AB" : {
"X" : 0.0,
"Y" : 142.6,
},
"FJ" : {
"X" : 0.2,
"Y" : 3.33}}})
collection.aggregate([{
"$project": {
"valor": {
"AB": "$valor.AB",
"FOJ": "$valor.FJ"
}
}
}, {
"$out": "collection2"
}])
This is the dangerous part. First, check that "collection2" has all the documents you want, in the desired shape. Then:
collection.drop()
db.collection2.rename("collection")
import pprint
pprint.pprint(collection.find_one())
Related
MongoDB bulk operations have two options:
Bulk.find.updateOne()
Adds a single document update operation to a bulk operations list. The operation can either replace an existing document or update specific fields in an existing document.
Bulk.find.replaceOne()
Adds a single document replacement operation to a bulk operations list. Use the Bulk.find() method to specify the condition that determines which document to replace. The Bulk.find.replaceOne() method limits the replacement to a single document.
According to the documentation, both of these two methods can replace a matching document. Do I understand correctly, that updateOne() is more general purpose method, which can either replace the document exactly like replaceOne() does, or just update its specific fields?
With replaceOne() you can only replace the entire document, while updateOne() allows for updating fields.
Since replaceOne() replaces the entire document - fields in the old document not contained in the new will be lost. With updateOne() new fields can be added without losing the fields in the old document.
For example if you have the following document:
{
"_id" : ObjectId("0123456789abcdef01234567"),
"my_test_key3" : 3333
}
Using:
replaceOne({"_id" : ObjectId("0123456789abcdef01234567")}, { "my_test_key4" : 4})
results in:
{
"_id" : ObjectId("0123456789abcdef01234567"),
"my_test_key4" : 4.0
}
Using:
updateOne({"_id" : ObjectId("0123456789abcdef01234567")}, {$set: { "my_test_key4" : 4}})
results in:
{
"_id" : ObjectId("0123456789abcdef01234567"),
"my_test_key3" : 3333.0,
"my_test_key4" : 4.0
}
Note that with updateOne() you can use the update operators on documents.
replaceOne() replaces the entire document, while updateOne() allows for updating or adding fields. When using updateOne() you also have access to the update operators which can reliably perform updates on documents. For example two clients can "simultaneously" increment a value on the same field in the same document and both increments will be captured, while with a replace the one may overwrite the other potentially losing one of the increments.
Since replaceOne() replaces the entire document - fields in the old document not contained in the new will be lost. With updateOne() new fields can be added without losing the fields in the old document.
For example if you have the following document:
{
"_id" : ObjectId("0123456789abcdef01234567"),
"my_test_key3" : 3333
}
Using:
replaceOne({"_id" : ObjectId("0123456789abcdef01234567")}, { "my_test_key4" : 4})
results in:
{
"_id" : ObjectId("0123456789abcdef01234567"),
"my_test_key4" : 4.0
}
Using:
updateOne({"_id" : ObjectId("0123456789abcdef01234567")}, {$set: { "my_test_key4" : 4}})
results in:
{
"_id" : ObjectId("0123456789abcdef01234567"),
"my_test_key3" : 3333.0,
"my_test_key4" : 4.0
}
db.collection.replaceOne() does exactly the same thing as db.collection.updateOne().
The main difference is that db.collection.replaceOne()'s data that are being edited will have to go back and forth to the server, whereas db.collection.UpdateOne() will request only the filtered ones and not the whole document!
I have recently changed one of my fields from object to array of objects.
In my production I have only 14 documents with this field, so I decided to change those fields.
Is there any best practices to do that?
As it is in my production I need to do it in a best way possible?
I got the document Id's of those collections.like ['xxx','yyy','zzz',...........]
my doc structure is like
_id:"xxx",option1:{"op1":"value1","op2":"value2"},option2:"some value"
and I want to change it like(converting object to array of objects)
_id:"xxx",option1:[{"op1":"value1","op2":"value2"},
{"op1":"value1","op2":"value2"}
],option2:"some value"
Can I use upsert? If so How to do it?
Since you need to create the new value of the field based on the old value, you should retrieve each document with a query like
db.collection.find({ "_id" : { "in" : [<array of _id's>] } })
then iterate over the results and $set the value of the field to its new value:
db.collection.find({ "_id" : { "in" : [<array of _id's>] } }).forEach(function(doc) {
oldVal = doc.option1
newVal = compute_newVal_from_oldVal(oldVal)
db.collection.update({ "_id" : doc._id }, { "$set" : { "option" : newVal } })
})
The document structure is rather schematic, so I omitted putting in actual code to create newVal from oldVal.
Since it is an embedded document type you could use push query
db.collectionname.update({_id:"xxx"},{$push:{option1:{"op1":"value1","op2":"value2"}}})
This will create document inside embedded document.Hope it helps
Is there a way to match a value with every array and sub document inside the document in mongodb collection and return the document
{
"_id" : "2000001956",
"trimline1" : "abc",
"trimline2" : "xyz",
"subtitle" : "www",
"image" : {
"large" : 0,
"small" : 0,
"tiled" : 0,
"cropped" : false
},
"Kytrr" : {
"count" : 0,
"assigned" : 0
}
}
for eg if in the above document I am searching for xyz or "ab" or "xy" or "z" or "0" this document should be returned.
I actually have to achieve this at the back end using C# driver but a mongo query would also help greatly.
Please advice.
Thanks
You could probably do this using '$where'
db.mycollection({$where:"JSON.stringify(this).indexOf('xyz')!=-1"})
I'm converting the whole record to a big string and then searching to see if your element is in the resulting string. Probably won't work if your xyz is in the fieldnames!
You can make it iterate through the fields to make a big string and then search it though.
This isn't the most elegant way and will involve a full tablescan. It will be faster if you look through the individual fields!
While Malcolm's answer above would work, when your collection gets large or you have high traffic, you'll see this fall over pretty quickly. This is because of 2 things. First, dropping down to javascript is a big deal and second, this will always be a full table scan because $where can't use an index.
MongoDB 2.6 introduced text indexing which is on by default (it was in beta in 2.4). With it, you can have a full text index on all the fields in the document. The documentation gives the following example where a text index is created for every field and names the index "TextIndex".
db.collection.ensureIndex(
{ "$**": "text" },
{ name: "TextIndex" }
)
I want to $set a field in all documents within all arrays within a document within a document.
Basically, I want to do this
{$set : {'documentname.*anyandallstrings*.*anyandallnum*.fieldname' : value}}
A sample of the document schema is here
{
"_id" : "abc123",
"documenttoset" : {
"arrayname" : [
{
"fieldname" : "fieldvalue"
//i want to add fields here,
},
{
"fieldname2" : "fieldvalue2",
"fieldname3" : "fieldvalue3"
//here,
}
],
"arrayname2" : [
{
"fieldname4" : "fieldvalue4",
"fieldname5" : "fieldvalue5",
"fieldname6" : "fieldvalue6",
//and here.
}
]
},
}
It should add the field in question to these nested documents, and must be scalable if there are more documents and more arrays.
I did not design the schema.
How is this done? I am not sure if it is even possible.
The only way to accomplish this is to iterate through the documents and update each field individually.
import pymongo
c = pymongo.mongo_client.MongoClient(host='localhost')
db = c.local
cursor = db.test.find()
for q in cursor:
dic = q
for f in dic:
field = dic[f]
if(f != '_id'):
for a in field:
array = field[a]
for d in array:
d['newfield'] = 'value'
After you've done that you just have to find the document in question and update it.
db.test.update({'_id':'abc123'}, dic)
You'll have to use a driver (obviously). I like using python and pymongo, but there are tons of drivers out there.
If this is a db admin job I would recommend using pymongo within the python shell.
Hope it helps!!
I have the following collection:
{
"Milestones" : [
{ "ActualDate" : null,
"Index": 0,
"Name" : "milestone1",
"TargetDate" : ISODate("2011-12-13T22:00:00Z"),
"_id" : ObjectId("4ee89ae7e60fc615c42e28d1")},
{ "ActualDate" : null,
"Index" : 0,
"Name" : "milestone2",
"TargetDate" : ISODate("2011-12-13T22:00:00Z"),
"_id" : ObjectId("4ee89ae7e60fc615c42e28d2") } ]
,
"Name" : "a", "_id" : ObjectId("4ee89ae7e60fc615c42e28ce")
}
I want to update definite documents: that have specified _id, List of Milestones._id and ActualDate is null.
I dotnet my code looks like:
var query = Query.And(new[] { Query.EQ("_id", ObjectId.Parse(projectId)),
Query.In("Milestones._id", new BsonArray(values.Select(ObjectId.Parse))),
Query.EQ("Milestones.ActualDate", BsonNull.Value) });
var update = Update.Set("Milestones.$.ActualDate", DateTime.Now.Date);
Coll.Update(query, update, UpdateFlags.Multi, SafeMode.True);
Or in native code:
db.Projects.update({ "_id" : ObjectId("4ee89ae7e60fc615c42e28ce"), "Milestones._id" : { "$in" : [ObjectId("4ee89ae7e60fc615c42e28d1"), ObjectId("4ee89ae7e60fc615c42e28d2"), ObjectId("4ee8a648e60fc615c41d481e")] }, "Milestones.ActualDate" : null },{ "$set" : { "Milestones.$.ActualDate" : ISODate("2011-12-13T22:00:00Z") } }, false, true)
But only the first item is being updated.
This is not possible in current moment. Flag multi in update means update of multiple root documents. Positional operator can match only one nested array item. There is such feature in mongodb jira. You can vote up and wait.
Current solution can be only load document, update as you wish and save back or multiple atomic update for each nested array id.
From documentation at mongodb.org:
Currently the $ operator only applies to the first matched item in the
query
As answered by Andrew Orsich, this is not possible for the moment, at least not as you wish. But loading the document, modifying the array then saving it back will work. The risk is that some other process could modify the array in the meantime, so you would overwrite its changes. To avoid this, you can use optimistic locking, especially if the array is not modified every second.
load the document, including a new attribute: milestones_version
modify the array as needed
save back to mongodb, but now add a query constraint on the milestones_version, and increment it:
db.Projects.findAndModify({
query: {
_id: your_project_id,
milestones_version: expected_milestones_version
},
update: {
$set: {
Milestones: modified_milestones
},
$inc: {
milestones_version: 1
}
},
new: 1
})
If another process modified the milestones array (and hence the milestones_version) before we did, then this command will do nothing and simply return null. We just need to reload the document and try again. If the array is not modified every second, then this will be very rare and will not have any impact on performance.
The main problem with this solution is that you have to edit every Project, one by one (no multi: true). You could still write a javascript function and have it run on the server though.
According to their JIRA page "This new feature is available starting with the MongoDB 3.5.12 development version, and included in the MongoDB 3.6 production version"
https://jira.mongodb.org/browse/SERVER-1243