I have multiple clients accessing a Mongo cluster. Sometimes they need to create new collections. They call ensureIndex before doing any inserts.
Now I want to shard these collections. I intended to make each client call shardCollection before inserting into a new collection. But the clients are not coordinated with one another, so several clients might call shardCollection on the same (new) collection at once. (They will check if the collection exists first, but there's an inevitable race condition.)
The Mongo shardCollection documentation says:
Warning: Do not run more than one shardCollection command on the same collection at the same time.
Does this mean I have to either coordinate the clients, or pre-create collections from a dedicated separate process? (The set of possible collections isn't finite, so pre-creating is hard.)
Or is there a way to make two parallel shardCollection calls safe? I can guarantee that:
The multiple calls to shardCollection will be identical (same shard key, etc).
Each app will wait for its own call to shardCollection to complete before doing any inserts.
Therefore, shardCollection will complete successfully at least once on an empty collection before any documents are inserted.
Finally, the Mongo shell command sh.shardCollection doesn't include the warning above. It's implemented in the Mongo shell, so my driver (reactivemongo) doesn't provide it. Does that mean it includes some logic I should duplicate?
Rationale: my collections are logically partitioned by date and other parameters. That is, the collection name specifies a day and those other parameters. I create each collection I need, and call ensureIndex, before the first insert. This allows me to efficiently drop/backup/restore old collections.
Assuming you pass all the relevant checks (not capped, shard key passes, not a system collection etc.) then if you issue another shardCollection command you should just receive the message that the collection is already sharded (see here). If you guarantee that the commands will be the same (same shard key for each namespace), then you remove at least the competing requests race condition.
The big question is whether or not there might be a problematic race condition whereby the initial shardCollection command has not completed and you issue another identical command and the impact that might have - I think the only thing to do is test and see realistically. You may simply have to implement a check before allowing such a command to be run to avoid the race in the first place.
As for running the command, if the driver has not implemented a helper for you, then they usually implement a way to run a raw command. This is the case with reactivemongo (based on these docs), and if you look at the shell helper code (run without parentheses), you will note that it is just some quick sanity checks on arguments followed by a command call itself:
> sh.shardCollection
function ( fullName , key , unique ) {
sh._checkFullName( fullName )
assert( key , "need a key" )
assert( typeof( key ) == "object" , "key needs to be an object" )
var cmd = { shardCollection : fullName , key : key }
if ( unique )
cmd.unique = true;
return sh._adminCommand( cmd );
}
The string stored in the cmd variable is the piece you want when constructing your command (and note that it is then run against the admin database using the adminCommand helper).
Related
I've got a document that needs to be read and updated. Meanwhile, it's quite likely that another process is doing the same which would break the document update.
For example, if Process A reads document d and adds field 'a' to it and writes the document, and Process B reads document d before Process A writes it, and adds field b and writes the document, then whichever process writes the changes out will get their change because it clobbers the change by the one that wrote first.
I've read this article and some other very complicated transaction articles around mongo. Can someone describe a simple solution to this - I have not come across something that makes me comfortable with this yet.
https://www.mongodb.com/blog/post/how-to-select--for-update-inside-mongodb-transactions
[UPDATE]- In addition, I'm trying to augment a document that might not yet exist. I need to create the document if it doesn't exist. I also need to read it to analyze it. One key is "relatedIds" (an array). I push to that array if the id is not found in it. Another method I have that needs to create the document if it doesn't exist adds to a separate collection of objects.
[ANOTHER UPDATE x2] --> From what I've been reading and getting from various sources - is that the only way to properly create a transaction for this - is to "findOneAndModify" the document to mark it as dirty with some field that will definitely update, such as "lock" with an objectId (since that will never result in a NO-OP - ie, it definitely causes a change).
If another operation tries to write to it, Mongo can now detect that this record is already part of a transaction.
Thus anything that writes to it will cause a writeError on that other operation. My transaction can then slowly work on that record and have a lock on it. When it writes it out and commits, that record is definitely not touched by anything else. If there's no way to do this without a transaction for some reason, then am I creating the transaction in the easiest way here?
Using Mongo's transactions is the "proper" way to go but i'll offer a simple solution that is sufficient ( with some caveats ).
The simplest solution would be to use findOneAndUpdate to read the document and update a new field, let's call it status, since it is atomic this is possible.
the query would look like so:
const doc = await db.collection.findOneAndUpdate(
{
_id: docId,
status: { $ne: 'processing' }
},
{
$set: {
status: 'processing'
}
}
);
so if dov.value is null then it means (assuming the document exists) that another process is processing it. When you finish processing you just need to reset status to be any other value.
Now because you are inherently locking this document from being read until the process finishes you have to make sure that you handle cases like an error thrown throughout the process, update failure, db connection issue's, etc .
Overall I would be cautious about using this approach as it will only "lock" the document for the "proper" queries ( every single process needs to be updated to use the status field ), which is a little problematic, depending on your usecase.
After working for awhile with the C driver , reading the tutorials and the API .
I little confused ,
According to this tutorial : http://api.mongodb.org/c/current/executing-command.html
i can execute DB and Collections commands which include also the CRUD commands.
And i can even get the Document cursor if i don't use "_simple" in the command API
so why do i need to use for example the mongoc_collection_insert() API command ?
What are the differences ? what is recommended ?
Thanks
This question is probably similar to what's the difference between using insert command or db.collection.insert() via the mongo shell.
mongoc_collection_insert() is specific function written to insert a document into a collection while mongoc_collection_command() is for executing any valid database commands on a collection.
I would recommend to use the API function (mongoc_collection_insert) whenever possible. For the following reasons:
The API functions had been written as an abstraction layer with a specific purpose so that you don't have to deal with other details related to the command.
For example, mongoc_collection_insert exposes the right parameters for inserting i.e. mongoc_write_concern_t and mongoc_insert_flags_t with the respective default value. On the other hand, mongoc_collection_command has broad range of parameters such as mongoc_read_prefs_t, skip, or limit which may not be relevant for inserting a document.
Any future changes for mongoc_collection_insert will more likely be considered with the correct context for insert.
Especially for CRUD, try to avoid using command because the MongoDB wire protocol specifies different request opcodes for command (OP_MSG: 1000) and insert (OP_INSERT: 2002).
If I've got an environment with multiple instances of the same client connecting to a MongoDB server and I want a simple locking mechanism to ensure single client access for a short time, can I safely use my own lock object?
Say I have one object with a lockState that can be "locked" or "unlocked" and the plan is everyone checks that it is "unlocked" before doing "stuff". To lock the system I say:
db.collection.update( { "lockState": "unlocked" }, { "lockState": "locked" })
(aka UPDATE lockObj SET lockState = 'locked' WHERE lockState = 'unlocked')
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
Both clients find the record by the query parameter of the update
Client 1 updates the record (which is an atomic operation)
update returns success
Client 2 updates the document (it's already found it before client 1 modified it)
update returns success
I realize this is probably a very contrived case that would be very hard to reproduce, but is it possible or does mongo somehow make client 2's update fail?
Alternative approach
Use insert instead of update. insert is atomic and will fail if the document already exists.
To lock the system: db.locks.insert({someId: 27, state: “locked”}).
If the insert succeeds - I've got the lock and since the update was atomic, no one else can have it.
If the insert fails - someone else must have the lock.
If two clients try to lock the system at the same time, is it possible that both clients can end up thinking they "have the lock"?
No, only one client at a time writes to the lock space (Global, Database, Collection or Document depending on your version and configuration) and the operations on that lock space are sequential and one or the other (read or write, not both) per document so that other connections will not mistakenly pick up a document in a inbetween state and think that it is not locked by another client.
All operations on a single document are atomic, whether update or insert.
I am trying to achieve centralization logic with multiple VM's.
I have one application running on say 5 VM's. Only 1 VM will be responsible for doing one task.
To have that I am writing that VM's host name to the database.But to update that name to database, I had to achieve locking using java client API, As there can be 2-3 VM coming up # the same time.
How to achieve that ?
UPDATE :
I can use findandModify. But my code looks like this
{
if(collection.getCount({"taskName" :"Task1"}) == 0){
//insert record ------ **I can use findAndModify here**
}
}
But if two VM's come up at the same time then both will go inside if block, as the document is not available.
I understand that findAndModify is atomic. Hence After 1 VM issued findAndModify command we will have 1 record with hostname. But the next VM also will do the same operation and updates the record again with its hostname.
Please let me know if I am not clear with my question.
Updates per document in mongodb are atomic. So you do not have to implement a locking mechanism on the client side. As the comment states check here for your use case.
Update
On the query part you can check if document has "hostname" with $exists. If document with Task1 has hostname, it means there is already a VM responsible for the task. I do not know your whole use case but lets say, that VM1 is finished with the task it can update document to remove the field "hostname". Or you can use for hostname a default value like '' and check if hostname with '' and taskName with 'Task1'.
I am running the following query with the purpose of updating a single element in all the existing documents in the collection. I am basically trying to clear their value to "0".
Here is the code:
MongoCollection collection = db.GetCollection(DataAccessConfiguration.Settings.CollectionName);
var query = Query.Exists("ElementName", true);
var update = Update.Set("ElementName", "0");
collection.Update(query, update);
It only updates a single document.
How can I update all elements at once?
Updates in MongoDB affect 0 or 1 documents by default (0 only if the query specifier doesn't match anything). To update all documents, you need to pass UpdateFlags.Multi as the third argument Update. There is also a 4-argument version of Update which accepts the "safe mode" flag as the fourth argument.
(Safe mode bundles a getLastError command with the update, and causes the driver to wait until the server acknowledges that the write has succeeded. There are various options to safe mode that will wait for acknowledgement from multiple servers if you are using a replica set, that will wait only for a certain period of time and then return with an error, etc).
Also be sure to see the C# driver documentation for details on the API.