Silent insertion failure in MongoDB - mongodb

I have a mongodb instance running and I need to change the '_id' field of each document.
Therefore I retrieve documents that need to changed, modified the '_id', inserted the modified documents, and deleted the original document. Everything succeeds, but I am unable to see the change in the database.
doc_id = doc['_id']
inserted = db.candidates.insert_one(doc)
deleted = db.candidates.delete_one({'_id': doc_id})
res = list(db.candidates.find({'_id': inserted.inserted_id}))
assert len(res) >= 1, 'could not retrieve inserted document'
The assertion on line 4 fails for every document. And I can find the deleted document in the database even though the DeleteResult reports that it was deleted.
{'n': 1, 'ok': 1.0}
I have been unable to find anything in the Pymongo documentation about this issue and I have been unable to find similar problems online. Am I missing something obvious?
Modification of other collections and fields seems to work fine.

You're inserting the document, then deleting the same document. That's why your find doesn't return any records.

Related

Pymongo how to do set on update by not on insert

I am trying to update the document if the document is found otherwise insert as follows
upserts = [UpdateOne({"$and":[{'_id': x['_id']},{'time':{"$lt": x['time']}}]},
{'$setOnInsert': x, '$set':{'time':x['time']}},
upsert=True) for x in batch]
collection.bulk_write(upserts)
However, I am getting the following error:
Updating the path 'time' would create a conflict at 'time'
I understand that it's happening because time key is getting updated in both set and setOnInsert. I cannot specify fields in setOnInsert as the keys are not fixed. In case excluding a field in setOnInsert is allowed, then I can exclude time in that.
How can I work this around?
When the document is inserted, both the $set and $setOnInsert documents will be processed.
The query executor is refusing to update the same field twice in a single update.
You might try using a dictionary comprehension to remove the time field from the $setOnInsert, like:
'$setOnInsert': {i:x[i] for i in x if i!='time'}

Get latest document inserted in MongoDb

How do I get the latest document inserted in (a standalone, no RS) MongoDb over existing collections?
And how do I get all documents inserted after this document?
It can be done only in replica set. Please follow the tutorial to convert standalone instance to replica set.
You can get a reference to the last inserted document from oplog:
db.oplog.rs.find({op:"i"}).sort({$natural: -1}).limit(1);
ns field contains name of the database and collection, and o._id contains the object's identifier.
To get references to documents that were inserted after that later you can use ts field of the document you retrieved in the previous query:
db.oplog.rs.find({op:"i", ts: {$gt: last.ts}});
This command will cause MongoDB to load everything to memory, if oplog.rs is very big, then it will cause memory usage high
db.oplog.rs.find({op:"i"}).sort({$natural: -1}).limit(1);

Remove all fields except one from mongodb document

This Meteor server code needs to remove all fields except "fName" from a document found by a field and if the document does not exist then create it.
Is there a way to do that at one go? thx
myCol.update({fName: someName}, {fName: someName}); // works if doc exists, fails if no doc.
myCol.upsert({fName: someName}, {fName: someName}); // failed if doc exists, works if it exists
You can use fName :{$exists:true} in your query part.
This will update document only if fName in present.

Pymongo : insert_many + unique index

I want to insert_many() documents in my collection. Some of them may have the same key/value pair (screen_name in my example) than existing documents inside the collection. I have a unique index set on this key, therefore I get an error.
my_collection.create_index("screen_name", unique = True)
my_collection.insert_one({"screen_name":"user1", "foobar":"lalala"})
# no problem
to_insert = [
{"screen_name":"user1", "foobar":"foo"},
{"screen_name":"user2", "foobar":"bar"}
]
my_collection.insert_many(to_insert)
# error :
# File "C:\Program Files\Python\Anaconda3\lib\site-packages\pymongo\bulk.py", line 331, in execute_command
# raise BulkWriteError(full_result)
#
# BulkWriteError: batch op errors occurred
I'd like to :
Not get an error
Not change the already existing documents (here {"screen_name":"user1", "foobar":"lalala"})
Insert all the non-already existing documents (here, {"screen_name":"user2", "foobar":"bar"})
Edit : As someone said in comment "this question is asking how to do a bulk insert and ignore unique-index errors, while still inserting the successful records. Thus it's not a duplicate with the question how do I do bulk insert". Please reopen it.
One solution could be to use the ordered parameter of insert_many and set it to False (default is True):
my_collection.insert_many(to_insert, ordered=False)
From the PyMongo documentation:
ordered (optional): If True (the default) documents will be inserted
on the server serially, in the order provided. If an error occurs all
remaining inserts are aborted. If False, documents will be inserted on
the server in arbitrary order, possibly in parallel, and all document
inserts will be attempted.
Although, you would still have to handle an exception when all the documents couldn't be inserted.
Depending on your use-case, you could decide to either pass, log a warning, or inspect the exception.

Pymongo w=1 with continue_on_error

I have a collection of tweets. I want to insert a list of tweets into this collection. The new list may have some duplicate tweets as well and I want to ensure that duplicate tweets do not get written but all remaining does. To achieve this, I'm using following code.
mongoPayload = <list of tweets>
committedTweetIDs = db.tweets.insert(mongoPayload, w=1, continue_on_error=True)
print "%d documents committed" % len(committedTweetIDs)
The above code snippet should work. However, the behavior I'm getting is that second line generated DuplicateKeyError. I don't know what this is happening since, I mentioned continue_on_error.
What I want in the end is for Mongo to commit all the non-duplicate documents and return to me (as acknowledgement) tweetIDs of all the documents written to the journal.
Even with continue_on_error=True, PyMongo will raise a DuplicateKeyError if MongoDB tells it that you tried to insert a document with a duplicate _id. However, with continue_on_error=True, the server has attempted to insert all the documents in your list, instead of aborting the operation on the first error. The error_document attribute of the exception tells you the last duplicate _id in your list of documents.
Unfortunately you cannot determine how many documents succeeded and failed in total when you do a bulk insert. MongoDB 2.6 and PyMongo 2.7 will address this in the next release when we implement bulk write operations.