Mongo DB - Lock collection and insert record - mongodb

I am trying to achieve centralization logic with multiple VM's.
I have one application running on say 5 VM's. Only 1 VM will be responsible for doing one task.
To have that I am writing that VM's host name to the database.But to update that name to database, I had to achieve locking using java client API, As there can be 2-3 VM coming up # the same time.
How to achieve that ?
UPDATE :
I can use findandModify. But my code looks like this
{
if(collection.getCount({"taskName" :"Task1"}) == 0){
//insert record ------ **I can use findAndModify here**
}
}
But if two VM's come up at the same time then both will go inside if block, as the document is not available.
I understand that findAndModify is atomic. Hence After 1 VM issued findAndModify command we will have 1 record with hostname. But the next VM also will do the same operation and updates the record again with its hostname.
Please let me know if I am not clear with my question.

Updates per document in mongodb are atomic. So you do not have to implement a locking mechanism on the client side. As the comment states check here for your use case.
Update
On the query part you can check if document has "hostname" with $exists. If document with Task1 has hostname, it means there is already a VM responsible for the task. I do not know your whole use case but lets say, that VM1 is finished with the task it can update document to remove the field "hostname". Or you can use for hostname a default value like '' and check if hostname with '' and taskName with 'Task1'.

Related

thread safe increment value in db

I have come across a problem, not sure how to implement it with DB. I have go lang on the application side.
I have product table with column assigned as last_port_used. I need to assign ports to services when someone hits an api. It need to increment the last_port_id by 1 against its product name.
one possible solution would have been to use redis server and sync this value over there. Since we dont have redis. I wanted to achieve the same by psql.
I read more about locks and i think i need ACCESS EXCLUSIVE lock. is this the right way to do it?
product
id
name
start_port //11000
end_port//11999
last_port_used// 11023
How to handle it concurrently properly?
You could do simply:
UPDATE products SET last_port_used = last_port_used+1
WHERE id=...
AND last_port_used < end_port
RETURNING *
This will perform the update in a thread-safe manner, and only if a port number is available (last_port_used < end_port) and return the assigned port.
If you need to lock the row, you can also use SELECT FOR UPDATE.

Atomically query for all collection documents + watching for further changes

Our Java app saves its configurations in a MongoDB collections. When the app starts it reads all the configurations from MongoDB and caches them in Maps. We would like to use the change stream API to be able also to watch for updates of the configurations collections.
So, upon app startup, first we would like to get all configurations, and from now on - watch for any further change.
Is there an easy way to execute the following atomically:
A find() that retrieves all configurations (documents)
Start a watch() that will send all further updates
By atomically I mean - without potentially missing any update (between 1 and 2 someone could update the collection with new configuration).
To make sure I lose no update notifications, I found that I can use watch().startAtOperationTime(serverTime) (for MongoDB of 4.0 or later), as follows.
Query the MongoDB server for its current time, using command such as Document hostInfoDoc = mongoTemplate.executeCommand(new Document("hostInfo", 1))
Query for all interesting documents: List<C> configList = mongoTemplate.findAll(clazz);
Extract the server time from hostInfoDoc: BsonTimestamp serverTime = (BsonTimestamp) hostInfoDoc.get("operationTime");
Start the change stream configured with the saved server time ChangeStreamIterable<Document> changes = eventCollection.watch().startAtOperationTime(serverTime);
Since 1 ends before 2 starts, we know that the documents that were returned by 2 were at least same or fresher than the ones on that server time. And any updates that happened on or after this server time will be sent to us by the change stream (I don't care to run again redundant updates, because I use map as cache, so extra add/remove won't make a difference, as long as the last action arrives).
I think I could also use watch().resumeAfter(_idOfLastAddedDoc) (didn't try). I did not use this approach because of the following scenario: the collection is empty, and the first document is added after getting all (none) documents, and before starting the watch(). In that scenario I don't have previous document _id to use as resume token.
Update
Instead of using "hostInfo" for getting the server time, which couldn't be used in our production, I ended using "dbStats" like that:
Document dbStats= mongoOperations.executeCommand(new Document("dbStats", 1));
BsonTimestamp serverTime = (BsonTimestamp) dbStats.get("operationTime");

Mongo transactions and updates

If I've got an environment with multiple instances of the same client connecting to a MongoDB server and I want a simple locking mechanism to ensure single client access for a short time, can I safely use my own lock object?
Say I have one object with a lockState that can be "locked" or "unlocked" and the plan is everyone checks that it is "unlocked" before doing "stuff". To lock the system I say:
db.collection.update( { "lockState": "unlocked" }, { "lockState": "locked" })
(aka UPDATE lockObj SET lockState = 'locked' WHERE lockState = 'unlocked')
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
Both clients find the record by the query parameter of the update
Client 1 updates the record (which is an atomic operation)
update returns success
Client 2 updates the document (it's already found it before client 1 modified it)
update returns success
I realize this is probably a very contrived case that would be very hard to reproduce, but is it possible or does mongo somehow make client 2's update fail?
Alternative approach
Use insert instead of update. insert is atomic and will fail if the document already exists.
To lock the system: db.locks.insert({someId: 27, state: “locked”}).
If the insert succeeds - I've got the lock and since the update was atomic, no one else can have it.
If the insert fails - someone else must have the lock.
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
No, only one client at a time writes to the lock space (Global, Database, Collection or Document depending on your version and configuration) and the operations on that lock space are sequential and one or the other (read or write, not both) per document so that other connections will not mistakenly pick up a document in a inbetween state and think that it is not locked by another client.
All operations on a single document are atomic, whether update or insert.

paginated data with the help of mongo inbound adapter in spring integration

I am using mongo inbound adapter for retrieving data from mongo. Currently I am using below configuration.
<int-mongo:inbound-channel-adapter
id="mongoInboundAdapter" collection-name="updates_IPMS_PRICING"
mongo-template="mongoTemplatePublisher" channel="ipmsPricingUpdateChannelSplitter"
query="{'flagged' : false}" entity-class="com.snapdeal.coms.publisher.bean.PublisherVendorProductUpdate">
<poller max-messages-per-poll="2" fixed-rate="10000"></poller>
</int-mongo:inbound-channel-adapter>
I have around 20 records in my data base which qualifies the mentioned query but as I am giving max-messages-per-poll value 2 I was expecting that i will get maximum 2 records per poll.
but I am getting all the records which qualifies the mentioned query. Not sure what I am doing wrong.
Actually I'd suggest to raise a New Feature JIRA ticket for that query-expression to allow to specify org.springframework.data.mongodb.core.query.Query builder, which has skip() and limit() options and from there your issue can be fixed like:
<int-mongo:inbound-channel-adapter
query-expression="new BasicQuery('{\'flagged\' : false}').limit(2)"/>
The mongo adapter is designed to return a single message containing a collection of query results per poll. So max-messages-per-poll makes no difference here.
max-messages-per-poll is used to short-circuit the poller and, in your case, the second poll is done immediately rather than waiting 10 seconds again. After 2 polls, we wait again.
In order to implement paging, you will need to use a query-expression instead of query and maintain some state somewhere that can be included in the query on each poll.
For example, if the documents have some value that increments you can store off that value in a bean and use the value in the next poll to get the next one.

Is using Mongo's shardCollection command from multiple clients safe?

I have multiple clients accessing a Mongo cluster. Sometimes they need to create new collections. They call ensureIndex before doing any inserts.
Now I want to shard these collections. I intended to make each client call shardCollection before inserting into a new collection. But the clients are not coordinated with one another, so several clients might call shardCollection on the same (new) collection at once. (They will check if the collection exists first, but there's an inevitable race condition.)
The Mongo shardCollection documentation says:
Warning: Do not run more than one shardCollection command on the same collection at the same time.
Does this mean I have to either coordinate the clients, or pre-create collections from a dedicated separate process? (The set of possible collections isn't finite, so pre-creating is hard.)
Or is there a way to make two parallel shardCollection calls safe? I can guarantee that:
The multiple calls to shardCollection will be identical (same shard key, etc).
Each app will wait for its own call to shardCollection to complete before doing any inserts.
Therefore, shardCollection will complete successfully at least once on an empty collection before any documents are inserted.
Finally, the Mongo shell command sh.shardCollection doesn't include the warning above. It's implemented in the Mongo shell, so my driver (reactivemongo) doesn't provide it. Does that mean it includes some logic I should duplicate?
Rationale: my collections are logically partitioned by date and other parameters. That is, the collection name specifies a day and those other parameters. I create each collection I need, and call ensureIndex, before the first insert. This allows me to efficiently drop/backup/restore old collections.
Assuming you pass all the relevant checks (not capped, shard key passes, not a system collection etc.) then if you issue another shardCollection command you should just receive the message that the collection is already sharded (see here). If you guarantee that the commands will be the same (same shard key for each namespace), then you remove at least the competing requests race condition.
The big question is whether or not there might be a problematic race condition whereby the initial shardCollection command has not completed and you issue another identical command and the impact that might have - I think the only thing to do is test and see realistically. You may simply have to implement a check before allowing such a command to be run to avoid the race in the first place.
As for running the command, if the driver has not implemented a helper for you, then they usually implement a way to run a raw command. This is the case with reactivemongo (based on these docs), and if you look at the shell helper code (run without parentheses), you will note that it is just some quick sanity checks on arguments followed by a command call itself:
> sh.shardCollection
function ( fullName , key , unique ) {
sh._checkFullName( fullName )
assert( key , "need a key" )
assert( typeof( key ) == "object" , "key needs to be an object" )
var cmd = { shardCollection : fullName , key : key }
if ( unique )
cmd.unique = true;
return sh._adminCommand( cmd );
}
The string stored in the cmd variable is the piece you want when constructing your command (and note that it is then run against the admin database using the adminCommand helper).