PyMongo bulk_write UpdateOne only runs last operation - mongodb

Got a weird bug that I can't quite figure out.
I have some pymongo code that looks like this:
from pymongo import UpdateOne
client = pymongo.MongoClient()
...
def update_image_locations(user_key, dataset_key, preset_name,
keys_and_coords):
db = docdb_client.db
col = db.col
operations = []
query = {'ownerKey': user_key, 'imageInfo.datasetKey': dataset_key}
for key_and_coords in keys_and_coords:
query['key'] = key_and_coords['key']
operations.append(
pymongo.UpdateOne(
query, {
'$set': {
'imageInfo.presets.%s.coords' % preset_name:
key_and_coords['coords']
}
}))
print(operations)
if len(operations) > 0:
print(col.bulk_write(operations, ordered=False).bulk_api_result)
# This section fails with a KeyError.
cursor = col.find({
'ownerKey': user_key,
'imageInfo.datasetKey': dataset_key
}, {'imageInfo': 1}
)
for doc in cursor:
print(doc['imageInfo']['presets'])
If I print out the bulk_write output, I get the following.
{'writeErrors': [], 'writeConcernErrors': [], 'nInserted': 0, 'nUpserted': 0, 'nMatched': 65, 'nModified': 65, 'nRemoved': 0, 'upserted': []}
which as far as I can tell is exactly what I expect.
However, I get KeyError failures for all but the last document in the collection when I try to iterate through the documents that should ostensibly have the new field. If I then go into the actual mongodb shell, I can confirm that only the last operation from the bulk_write seems to have actually gone off.
Based on the bulk_api_result I would expect that all of the documents would be updated, instead of only the last one. What's going on?
EDIT:
As requested, before and after queries. I'm not showing the full doc because there's a lot of vector embedding info that's going to muddle things.
Query:
> db.user_uploads.find({}, {'imageInfo.presets': 1})
Before:
{ "_id" : ObjectId("6074792104cc23375a8f979a"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979b"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979c"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979d"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979e"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979f"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a0"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a1"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a2"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a3"), "imageInfo" : { } }
After:
{ "_id" : ObjectId("6074792104cc23375a8f979a"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979b"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979c"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979d"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979e"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f979f"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a0"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a1"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a2"), "imageInfo" : { } }
{ "_id" : ObjectId("6074792104cc23375a8f97a3"), "imageInfo" : { "presets" : { "preset_one" : { "coords" : [ 2.229365348815918, 1.4654869735240936 ] } } } }

Turns out the answer has to do with how the query is constructed. Specifically, this works:
for key_and_coords in keys_and_coords:
query = {'key': key_and_coords['key']}
operations.append(
pymongo.UpdateOne(
query, {
'$set': {
'imageInfo.presets.%s.coords' % preset_name:
key_and_coords['coords']
}
}))
and this fails:
query = {}
for key_and_coords in keys_and_coords:
query['key'] = key_and_coords['key']
operations.append(
pymongo.UpdateOne(
query, {
'$set': {
'imageInfo.presets.%s.coords' % preset_name:
key_and_coords['coords']
}
}))
I think what's happening here is some async javascript-esque magic, where the query object is passed by reference to the bulk operation which then executes them once all of the bulk operations are in place. Since the query is passed by reference, the actual key value gets overwritten each time until the last one (which is also why only the last object is updated). Unfortunately this was tough to catch because printing out the queries and the operations both looked fine, but the async kicked in at execution. Still, not really an issue with pymongo after all.
Thanks to everyone who responded!

Related

MongoDB query to update dates doesn't work on all documents

Basically I want to update all documents inside one collection. The update is just adding 2 hours to date fields present in each document.
The documents all follow a basic structure like this :
{
code : 1,
file : {
dates : {
start : 2018-05-27 22:00:00.000Z,
end : 2018-05-27 22:00:00.000Z,
},
otherInfos : {
...
...
}
}
}
Here is my query :
var cursor = db.getCollection('files').find({});
while(cursor.hasNext()){
e = cursor.next();
let delta = 120*60*1000; //2 hours
if(e.file.dates) {
let fileStartDate = e.file.dates.start ? new Date(e.file.dates.start.getTime() + delta) : null;
let fileEndDate = e.file.dates.end ? new Date(e.file.dates.end.getTime() + delta) : null;
if(fileStartDate) {
e.file.dates.start = fileStartDate;
}
if(fileEndDate) {
e.file.dates.end = fileEndDate;
}
}
print(e);
db.getMongo().getDB('myDB').files.updateOne(
{"code":e.code},
{
$set: {"file.dates.start": fileStartDate, "file.dates.end": fileEndDate}
})
}
I am testing the query with around 20 documents and the first 10 are perfectly printed and updated with +2hours as expected but then for the second half the dates remain the exact same than before (both with the print and update).
All the documents have the same structure and same Date type so I don't understand why the query doesn't go all the way.
EDIT :
Here is a document that was succesfully updated :
{
"_id" : ObjectId("5b36c7fdd515e80009e7cc84"),
"code" : "1",
"file" : {
"dates" : {
"start" : ISODate("2018-06-11T22:00:00.000Z"),
"end" : ISODate("2018-06-11T22:00:00.000Z")
}
}
}
became as expected
{
"_id" : ObjectId("5b36c7fdd515e80009e7cc84"),
"code" : "1",
"file" : {
"dates" : {
"start" : ISODate("2018-06-12T00:00:00.000Z"),
"end" : ISODate("2018-06-12T00:00:00.000Z")
}
}
}
but for example this document :
{
"_id" : ObjectId("5b36c7ffd515e80009e7cf03"),
"code" : "15",
"file" : {
"dates" : {
"start" : ISODate("2018-09-02T22:00:00.000Z"),
"end" : ISODate("2019-09-26T22:00:00.000Z")
}
}
}
stayed the exact same
With MongoDBv4.2+, you can do an update with aggregation pipeline. Use $add to increment 2 hour * 60 minute * 60 seconds * 1000 milliseconds.
db.collection.update({},
[
{
"$set": {
"file.dates.start": {
$add: [
"$file.dates.start",
7200000
]
},
"file.dates.end": {
$add: [
"$file.dates.end",
7200000
]
}
}
}
],
{
multi: true
})
Here is the Mongo playground for your reference.
db.getMongo().getDB('myDB').files.updateOne(
{"code":e.code},
{
$set: {"file.dates.start": fileStartDate, "file.dates.end": fileEndDate}
})
updateOne only allows update on one document
You should use updateMany() to update more than 1 document
https://www.mongodb.com/docs/manual/reference/method/db.collection.updateMany/

MongoDB querying nested documents

I have records like:
{
"_id" : ObjectId("5f99cede36fd08653a3d4e92"),
"accessions" : {
"sample_accessions" : {
"5f99ce9636fd08653a3d4e86" : {
"biosampleAccession" : "SAMEA7494329",
"sraAccession" : "ERS5250977",
"submissionAccession" : "ERA3032827",
"status" : "accepted"
},
"5f99ce9636fd08653a3d4e87" : {
"biosampleAccession" : "SAMEA7494330",
"sraAccession" : "ERS5250978",
"submissionAccession" : "ERA3032827",
"status" : "accepted"
}
}
}
}
How do I query by the mongo id in sample_accessions? I thought this should work but it doesn't. What should I be doing?
db.getCollection('collection').find({"accessions.sample_accessions":"5f99ce9636fd08653a3d4e86"})
The id is a key and check whether key is exists or not use $exists, customize response using project to get specific object
db.getCollection('collection').find(
{
"accessions.sample_accessions.5f99ce9636fd08653a3d4e86": {
$exists: true
}
},
{ sample_doc: "$accessions.sample_accessions.5f99ce9636fd08653a3d4e86" }
)
Playground

How to set value from different field or collection from specific document in Mongo shell

I have colection "ttn_data" with doc:
{
"_id" : ObjectId("Some_different_ID"),
"dev_id" : "e0e1e20102030405",
"payload_fields" : {"temp_C" : 28.308}
}
and collection "records" with doc
{
"_id" : ObjectId("5ed8af72c377d5b209597981"),
"temp_C_different" : ""
}
I would like to set temp_C_different value from temp_C in ttn_data collection so the return after update query would be
{
"_id" : ObjectId("5ed8af72c377d5b209597981"),
"temp_C_different" : "28.308"
}
I try this method:
try { db.records.updateMany( { "_id" : ObjectId("5ed8af72c377d5b209597981") },
{ $set: { "temp_C_different" : db.ttn_data.temp_C.value } } ); }
catch (e) { print(e); }
but it sets "temp_C_different"value to some metadata info from database. What is the write way to do that kind of update?

Remove by _id inside a nested array, inside of a collection

this is my mongoDb footballers collection :
[
{
"_id" : ObjectId("5d83b4a7e5511f28847f1884"),
"prenom" : "djalil",
"pseudo" : "dja1000",
"email" : "djalil#gmail.com",
"selectionned" : [
{
"_id" : "5d83af3be5511f28847f187f",
"role" : "footballeur",
"prenom" : "Gilbert",
"pseudo" : "Gilbert",
},
{
"_id" : "5d83b3d5e5511f28847f1883",
"role" : "footballeur",
"prenom" : "Xavier",
"pseudo" : "xav4544",
}
]
},
{
"_id" : ObjectId("5d83afa8e5511f28847f1880"),
"prenom" : "Rolande",
"pseudo" : "Rolande4000",
"email" : "rolande#gmail.com",
"selectionned" : [
{
"_id" : "5d83b3d5e5511f28847f1883",
"role" : "footballeur",
"prenom" : "Xavier",
"pseudo" : "xav4544",
}
]
}
}
How could I delete each selectionned people who has the 5d83b3d5e5511f28847f1883 _id through all of the collection?
I do need xavier to deseappear from any 'selectionned' array , just like doing a 'delete cascade' in SQL language
This is what I've tried with no luck :
function delete_fb_from_all(fb){
var ObjectId = require('mongodb').ObjectID; //working
var idObj = ObjectId(fb._id); //working
try {
db.collection('footballers').remove( { "selectionned._id" : idObj } );
console.log('All have been erased');
} catch (e) {
console.log(e);
}
}
And this too is not working :
db.collection('footballers.selectionned').remove( { "_id" : idObj } );
i really dont know how to do this.
i'm trying out this right now :
db.collection.update({'footballers.selectionned': idObj }, {$pull: {footballers:{ selectionned: idObj}}})
This is the error :
TypeError: db.collection.update is not a function
I think that the solution is maybe there :
https://docs.mongodb.com/manual/reference/operator/update/pull/#pull-array-of-documents
EDIT 1
i'm currently trying ou this :
var ObjectId = require('mongodb').ObjectID; //working
var idObj = ObjectId(fb._id); //working
try {
db.collection('footballers').update(
{ },
{ $pull: { selectionned: { _id: idObj } } },
{ multi: true }
)
} catch (e) {
console.log(e);
}
SOLVED :
Specifiying the email, it is now working, I guess the problem was comin from the _id field :
try {
db.collection('footballers').update(
{ },
{ $pull: { selectionned: { email: fb.email } } },
{ multi: true }
)
} catch (e) {
console.log(e);
}
Object ID :
The issue is may be on your object id creation. No need to make string-id with mongoDB object id.
// No need
var ObjectId = require('mongodb').ObjectID;
var idObj = ObjectId(fb._id);
// do as normal string
db.collection('footballers').remove( { "selectionned._id" : fb._id } );

Collection modified within cursor.foreach() is removed after completion

I'm trying to iterate through a collection to build a new collection (hits_col) with counts of entries from the first collection. The code I've written so far appears to work as the iteration is happening, however, once the .forEach() method is finished the new collection (hits_col) gets removed.
RAW_COL.find({}, {fields: {created_time: 1}}).forEach(function (doc) {
var date = moment.unix(doc.created_time).format("YYYYMMD");
var hitCOUNT = hits_COL.findOne({'_id': date});
try {
if(tags === undefined) {
hits_COL.insert({'_id': date, 'hits': 1}, function (err, id) {
if(err == null) console.log("Entry " + id + " was created.");
else console.log(err);
});
} else {
hitCOUNT.hits = hitCOUNT.hits + 1;
hits_COL.update({'_id': date}, {'hits': tags.hits});
}
} catch (err) {throw err;}
}
While RAW_COL is iterating I can go to my collection and check the current entries and all is well.
meteor:PRIMARY> db.hits.find()
{ "_id" : "20160121", "hits" : 7887 }
{ "_id" : "20160120", "hits" : 7417 }
{ "_id" : "20160122", "hits" : 7533 }
{ "_id" : "20160124", "hits" : 8047 }
{ "_id" : "20160123", "hits" : 8262 }
{ "_id" : "20160125", "hits" : 7579 }
{ "_id" : "20160126", "hits" : 2111 }
{ "_id" : "20160119", "hits" : 7594 }
{ "_id" : "20160118", "hits" : 7788 }
{ "_id" : "20160117", "hits" : 7746 }
{ "_id" : "20160116", "hits" : 7609 }
{ "_id" : "20160115", "hits" : 3348 }
However, after the forEach() function is finished the collection is removed or something and the same mongo call returns nothing.
meteor:PRIMARY> db.hits.find()
What am I missing here?
Thanks for any and all help!
The above code was proceeded with
Meteor.startup(function () { hits_COL.remove({}); });
Which is called after the the forEach() call.