I am new to Mongo. I wanted to atomically upsert a document based on multiple criteria. The document looks like the following:
{_id:..., hourOfTime:..., total:..., max:..., min:..., last:...}
This is basically hourly aggregation for number of clicks for an item identified by _id. The clicks for each item is is flushed from the application to MongoDB every five seconds. So, the document need to be updated every five seconds.
Here is the case. Lets say at t=t0, we have {_id:"nike", total:123, max:10, min:3, last:9} then at t=t1, I get message {_id:"nike", count: 12}. Now, for _id="nike", I need to do the following,
Increment total by 12
if max < 12, update max=12
if min > 12, update min=12
update last=12
I want all this operation to be atomic. I unable to convert this in one single query. Any help/hint is appreciated.
This cannot be done with a single query. Here is how I would do it:
Have a field on the document called locked. Run a findAndModify to grab the document if the locked field is false, and set the locked field to the Date() that it is locked at. This will stop other application instances from modifying the document, providing they also check for the locked field.
Modify the document application side, then atomically update it, setting locked to false.
As long as anything modifying the document runs a findAndModify on the locked field, this entire modification should be atomic.
Make sure to have a threshold at which a lock times out, so that documents do not become locked indefinitely if a machine blows up. This means that when updating the document the second time (and releasing the lock), the application should make sure that the date in the document is what it expects, to make sure it still has the lock.
Related
I have an application that runs multiple instances and that needs to perform this:
Read the document
Check a value of the timestamp field
If an incoming document is newer, then update (including the timestamp field).
Even if I use transactions, two parallel operations would be able to perform 1 and 2 simultaneously and then both will write to the database, potentially writing the "older" document last (since it will check the timestamp of the original document, rather than a "new" one)
So what I am looking for is some kind of a read lock on the document or some other mechanism that will be able to solve this.
I'm confused about how MongoDB updates works.
In the following docs: https://docs.mongodb.com/manual/core/write-operations-atomicity/ says:
In MongoDB, a write operation is atomic on the level of a single
document, even if the operation modifies multiple embedded documents
within a single document.
When a single write operation modifies multiple documents, the
modification of each document is atomic, but the operation as a whole
is not atomic and other operations may interleave.
I guess it means: if I'm updating all fields of a document I will be unable to see a partial update:
If I get the document before the update I will see it without any change
If I get the document after the update I will see it with all the changes
For a multiple elements the same behavior happens for each document. I guess we could say there is a transaction for each document update instead of a big one for all of them.
But let's say there are a lots of documents on the multiple update, and it takes a while to update all of them. What happen with the queries by other threads during the update?
They will see the old version? Or they will be blocked until the update finishes?
Other updates to same documents are possible during this big update? If so, could this intermediate update exclude some document from the big update?
They will see the old version? Or they will be blocked until the update finishes?
I guess other threads may see the old version of a document or the new, depending on whether they query the document before or after the update is finished, but they will never see a partial update on a document (i.e. one field changed and another not changed).
Other updates to same documents are possible during this big update? If so, could this intermediate update exclude some document from the big update?
Instead of big or small updates, think of 2 threads doing an update on the same document. Thread 1 sets fields {a:1, b:2} and thread 2 sets {b:3, c:4}. If the original document is {a:0, b:0, c:0} then we can have two scenarios:
Update 1 is executed before update 2:
The document will finally be {a:1, b:3, c:4}.
Update 2 is executed before update 1:
The document will finally be {a:1, b:2, c:4}.
I heared, that mongo can do it, but I can't find how.
Can mongo create collections, which will be autoremove in future, from time, which i can setup? Or Mongo can't do this magic?
mongodb cannot auto remove collections but it can auto remove BSON records. You just need to set ttl(Time to live) index on a date field that exists in BSON record .
You can read more here MongoDb: Expire Data from Collections by Setting TTL
Collections are auto created on the first write operation (insert, upsert, index creation). So this magic is covered.
If your removal is based on time, you could use cron or at to run this little script
mongo yourDBserver/yourDB --eval 'db.yourCollection.drop()’
As Sammaye pointed out, creating indices is a costly operation. I would assume there is something wrong with your data model. For semantically distinguishing documents, I'd rather create a field on them which does that and set an expiration date or a creation date and a time frame in which the documents are valid and use TTL indices to remove all of those documents.
For using an expiration date, you have to set a field to an ISODate and create a TTL index without a duration:
db.yourColl.ensureIndex({"yourExpirationDateField":1},{expireAfterSeconds:0})
In the case you want the documents to be valid for let's say a week after they are created, you would use the following:
db.yourColl.ensureIndex({"yourCreationDate":1},{expireAfterSeconds:604800})
Either way, here is what happens: Once every minute a background thread called TTLMonitor wakes up, gets all TTL indices for the server and starts processing them. It scans the TTL index, looking for the date values, adds the value given for "expireAfterSeconds" and deletes all documents which it determined to be invalid by now. This process takes some time, so don't expect the documents to be deleted on the very second they expire.
The big advantage of that approach: you don't need any triggering logic to be maintained, the deletes are done automagically in the background and you don't put any load on your application. Plus, using an expiration date, your have very granular control over when a document expires.
The drawback is ... ... Well, if I want to find one it would be that you have to insert a creation date for every document or calculate and insert an expiration date. And you have to send an administrative command to the mongod/mongos once in the application lifetime...
I'm building an application to handle ticket sales and expect to have really high demand. I want to try using MongoDB with multiple concurrent client nodes serving a node.js website (and gracefully handle failure of clients).
I've read "Limit the number of documents in a collection in mongodb" (which is completely unrelated) and "Is there a way to limit the number of records in certain collection" (but that talks about capped collections, where the new documents overwrite the oldest documents).
Is it possible to limit the number of documents in a collection to some maximum size, and have documents after that limit just be rejected. The simple example is adding ticket sales to the database, then failing if all the tickets are already sold out.
I considered having a NumberRemaining document, which I could atomically decerement until it reaches 0 but that leaves me with a problem if a node crashes between decrementing that number, and saving the purchase of the ticket.
Store the tickets in a single MongoDB document. As you can only atomically set one document at a time, you shouldn't have a problem with document dependencies that could have been solved by using a traditional transactional database system.
As a document can be up to 16MB, by storing only a ticket_id in a master document, you should be able to store plenty of tickets without needing to do any extra complex document management. While it could introduce a hot spot, the document likely won't be very large. If it does get large, you could use more than one document (by splitting them into multiple documents as one document "fills", activate another).
If that doesn't work, 10gen has a pattern that might fit.
My only solution so far (I'm hoping someone can improve on this):
Insert documents into an un-capped collection as they arrive. Keep the implicit _id value of ObjectID, which can be sorted and will therefore order the documents by when they were added.
Run all queries ordered by _id and limited to the max number of documents.
To determine whether an insert was "successful", run an additional query that checks that the newly inserted document is within the maximum number of documents.
My solution was: I use an extra count variable in another collection. This collection has a validation rule that avoids count variables to become negative. Count variable should always be non negative integer number.
"count": { "$gte": 0 }
The algorithm was simple. Decrement the count by one. If it succeed insert the document. If it fails it means there is no space left.
Vice versa for deletion.
Also you can use transactions to prevent failures(Count is decremented but service is failed just before insertion operation).
I have a collections of documents in mongodb, with the expireAfterSeconds property set on a date-type index.
For the sake of argument, the documents are set to expire after one hour.
When I update a document in this collection, which one of the following will happen?
a) The document will expire one hour after the original creation time.
b) The document will expire one hour after the update time.
c) The document will expire one hour after the indexed variable's time, whatever that may be.
d) None of the above
I think that it's c, but cannot find the reference to confirm it. Am I correct? Where is this documented?
[edit]: To clarify, the situation is that I'm storing password reset codes (that should expire.) and I want the old codes to stop working if a new code is requested. It's not very relevant though, since I can ensure the behaviour I want is always respected by simply deleting the old transaction. This question is not about my current problem, but about Mongo's behaviour.
The correct answer is c)
The expireAfterSeconds property always requires an index on a field which contains a BSON date, because the content of this date field is used to select entries for removal.
When you want an update of a document to reset the time-to-live, also update the indexed date field to the current time.
When you want an update to not affect the TTL, just don't update the date.
However, keep in mind that expireAfterSeconds doesn't guarantee immediate deletion of the document. The deletions are done by a background job which runs every minute. This job is low-priority and can be postponed by MongoDB when the current load is high. So when it's important for your use-case that the expire times are respected accurately to the second, you should add an additional check on the application level.
This feature is documented here: http://docs.mongodb.org/manual/tutorial/expire-data/
If you don't want to rely on mongo demon process for expiring the collection, then better to create an additional createdOn field on collection and compare it with the current timestamp to decide whether to use that document or not.