Update document from inside another document - mongoengine - mongodb

I have the following code in a collection:
class Author(Agent):
def foo(self):
self.find_another_document_and_update_it(ids)
self.processed = True
self.save()
def find_another_document_and_update_it(self, ids):
for id in ids:
documentA = Authors.objects(id=id)
documentA.update(inc__mentions=1)
Inside find_another_document_and_update_it() I query the database and retrieve a document A. and then I increment a counter in A. Then in foo(), after calling find_another_document_and_update_it(), I also save the current document lets say B. The problem is that although I can see that the counter in A is actually increased when self.save() is called, document A is reset to its old value. I guess the problem is to do with a concurrency issue and how MongoDB deals with it. I appreciate your help.

In MongoEngine 0.5 save only updates fields that have changed - prior it saved the whole document, which would have meant the previous update in find_another_document_and_update_it would have been overwritten. In general and as with all things python, its better to be explicit - so you might want to use update to update a document.
You should be able to update all mentions with a single update:
Authors.objects(id__in=ids).update(inc__mentions=1)
Regardless, the best way to update would be to call the global updates after self.save(). That way the mentions are only incremented after you've processed and saved any changes.

Related

create a document in mongodb with field that is max + 1

I want to create a document such as
{
increment: 12
}
Where the value of increment is the max of all values stored in that collection + 1.
You could separate this into a query and an update, but then you run the risk of a race condition, if two separate calls are are made to the method within an inopportune short period of time.
Is there a way of doing this in a single, atomic call?
You can use findAndModify with the option {"new" :true}
updates the document
returns the updated document (before anyone else can change it)
without the new option it would still update, but return the old document before the update.
*if this is what you are asking for

Is it possible to delete a field with MongoEngine, without strict=False?

I've got a lot of data in MongoDB, which we access primarily via MongoEngine, and sometimes data first ended up in field F1, and then we later decided that field F2 is a better place for it, so we moved it over there, and stopped using F1.
That's convenient, but now we've got a bunch of stale (or useless) data in old F1 keys, and new documents are being created with empty F1 keys, for no reason.
While MongoDB being schemaless is convenient, I still appreciate the strict=True feature (which is on by default), and try to avoid turning it off except when absolutely necessary. I don't like turning off all the safety checks on a collection.
So is there any way to delete a field F1 from my MongoDB collection, without downtime, and without strict=False?
If I remove the field from my Document subclass first, MongoEngine will complain when it tries to load existing documents.
If I remove the field from my database first, MongoEngine will create it for any new records, until the model is updated.
Is there any way with MongoEngine to say "This is an old field. You can load it (or ignore it) if it's there, but don't create it for any new documents"?
If I remove the field from my database first, MongoEngine will create it for any new records, until the model is updated
It's only true if you explicitly write to that field or if the field has a default value set. Otherwise the field won't exist in MongoDB.
So as first step I suggest to remove the code that writes to that field and remove the default value (or set it to None). Then it's safe to remove the field from the database.
Below a small proof:
import mongoengine
class Foo(mongoengine.Document):
a = mongoengine.IntField()
b = mongoengine.ListField(default=None)
f = Foo().save()
type(f.a) # NoneType
type(f.b) # NoneType
And the database query:
> db.foo.findOne()
{ "_id" : ObjectId("56c49ae8ee8b341b4ea02fcb") }

File versioning with GridFS

I'm trying to store versioned content in mongo DB with GridFS. Therefore I add a version field to the metadata of the file I'm storing. This all works well. Now I want to get the latest version without knowing the version. Here: Find the latest version of a document stored in MongoDB - GridFs someone mentions that findOne always returns the youngest (latest) file if matching the query. What is, what I want. But when I try this, I always get the first (oldest) file from findOne(). I'm using spring-data-mongodbversion 1.5.0.RELEASE
Here my current code:
public void storeFileToGridFs(ContentReference contentReference, InputStream content) {
Integer nextVersion = findLatestVersion(contentReference) + 1;
DBObject metadata = new BasicDBObject();
metadata.put("version", nextVersion);
metadata.put("definitionId", contentReference.getContentDefinitionId());
gridOperations.store(content, contentReference.getContentId().getValue(), metadata);
}
and to find the latest version:
private Integer findLatestVersion(ContentReference contentReference) {
Query query = new Query(GridFsCriteria.whereFilename().is(contentReference.getContentId().getValue()));
GridFSDBFile latestVersionRecord = gridOperations.findOne(query);
if (latestVersionRecord != null) {
Integer version = (Integer) latestVersionRecord.getMetaData().get("version");
return version;
} else return 0;
}
But, as already mentioned, the findLatestVersion() always returns 1 (except the first time, when it returns 0...
If I have this running, is there a way to only retrieve the metadata of the document? In findLatestVersion()it's not necessary to load the file itself.
findOne returns exactly one result, more specifically the first one in the collection matching the query.
I am not too sure whether the latest version is returned when using findOne. Please try find instead.
A more manual approach would be filtering a result set from querying for the file name for the highest value of version.
In general, the version field only shows how often a document was changed. It is used for something which is called optimistic locking, which works by checking the current version of a document against the one the changed document has. If the version in the database is higher than the one in the document to be saved, another process has made changes to the document and an exception is raised.
For storing versioned documents, git (via egit for example) might be a solution.
EDIT: After a quick research, here is how it works. File versioning should be done using the automatically set upload date from the metadata. Query for it, sort descending and use the first result. You do not need to set the version manually any more.
I know it's been a while since this question has been asked and I don't know whether the code has been the same back then, but I think this information may help future readers:
Looking at the source code shows that findOne completely ignores the sorting part defined in the query, while find actually makes use of it.
So you need to make a normal query with find and then select the first object found (refer to Markus W Mahlberg's answer for more information).
Try adding sorting to the query, like this:
GridFSDBFile latestVersionRecord = template.findOne(
new Query(GridFsCriteria.whereFilename().is(filename))
.with(new Sort(Sort.Direction.DESC, "version")));
once you have the GridFSDBFile, you can easily retrieve metadata without loading whole file with the method:
DBObject metadata = latestVersionRecord.getMetaData();
Hope it helps!

How to move object from one collection to another without changing _id

I have a queue mechanism which I want to keep small, so all queued objects, after they are done processing are moved to a different collection called history, where they are not updated anymore and are only there for reference and info.
Is it possible to take an object from one collection, remove it and insert it into another collection without changing the _id ?
I now solved this by creating a second id in the schema which I transfer to the new object, but I'd rather keep referencing _id.
Also if you think I'm overlooking something and don't need a second collection to keep my queueing mechanism fast, I'd love to hear about it.
Here's how I currently do it (using step and underscore)
`
// moving data to a new process in the history collection
var data = _.omit(process.toObject(), '_id');
console.log("new history data", data);
var history = new db.History(data);
history.save(this.parallel());
db.Queue.remove({_id : process._id }, this.parallel());
`
You can copy/move a doc from one collection to another without changing its _id just fine. If you create a new doc that already has an _id then Mongoose/Mongo will use it. And _id values only need to be unique within a single collection, not between collections.
I tried using delete delete object._id; in order to remove the property and allow mongodb to assign one itself, but for some reason delete did not work.
This works for me:
obj['_id'] = undefined;
If you are not going to use the _id then this will fix your problem.

Global dictionary filed for mongoengine documents

I want to dynamically update a global dictionary attribute for a collection in mongoengine. My task is to read a number of documents and annotate them with different descriptions. I want to update a global dictionary whenever a new description is added so that it is available for subsequent documents. How is this possible?
I hope that makes sense.
Since MongoDB is schema-less, you could store the global object in a the collection and it update it that way
class NormalDoc(mongoengine.Document)
attr1 = mongoengine.StringField()
# global attribute hidden in the collection
global_dict = mongoengine.DictField()
is_global = mongoengine.BooleanField(default=False)
There are better ways todo this (like putting it in a separate collection), but that would work if I understand correctly