Consider the following scenario with MongoDB:
Three writers (A,B,C) insert a document into the same collection.
A inserts first, followed by B, followed by C.
How can we guarantee A retrieves the ObjectId of the document he inserted and not B's document or C's document? Do we need to serialize the writes (i.e., only permit B to write after A inserts and retrieves the ObjectId), or does MongoDB offer some native functionality for this scenario?
Thanks!
We're on Rails.
the normal pattern here is for the driver to allocate the ObjectId and then you know what it is for the insert even before the server gets it.
You can generate the _id value in your client applications (writers) before inserting the document. This way you don't need to rely on the server generating the ObjectId an retrieving the correct value. Most MongoDB language drivers will do this for you automatically if you leave the _id blank.
Related
Simple question, do arrays keep their order when stored in MongoDB?
yep MongoDB keeps the order of the array.. just like Javascript engines..
Yes, in fact from a quick google search on the subject, it seems that it's rather difficult to re-order them: http://groups.google.com/group/mongodb-user/browse_thread/thread/1df1654889e664c1
I realise this is an old question, but the Mongo docs do now specify that all document properties retain their order as they are inserted. This naturally extends to arrays, too.
Document Field Order
MongoDB preserves the order of the document fields following write operations except for the following cases:
The _id field is always the first field in the document.
Updates that include renaming of field names may result in the reordering of fields in the document.
Changed in version 2.6: Starting in version 2.6, MongoDB actively attempts to preserve the field order in a document. Before version 2.6, MongoDB did not actively preserve the order of the fields in a document.
Alright, so I was defining a document model for a MongoDB database and my colleagues told me that we shouldn't use functional id for the "_id" field and that we should only use an auto generated ObjectId.
I don't understand why when I already have an unique id and another field to store a timestamp, from my point of view we are wasting time creating an useless index because in our case the generated id will never be used.
But I want to be sure since I'm a NoSQL noob, so:
Do you know any problems that could arise by having a functional id as the "_id" of a MongoDB collection?
Is there is any real advantages of using an auto generated ObjectId instead of a functional id for the "_id" field?
In case we want to migrate from MongoDB to some other database later, can the ObjectId be an advantage or a disadvantage?
In order of appearance:
Do you know any problems that could arise by having a functional id as the "_id" of a MongoDB collection?
The _id is a unique field for every document inside a collection.
Only problem that could arise with a functional id as the _id field is duplicates and logically, failure at the moment of insertion.
For that reason I would suggest keeping an eye on how you generate the _id in your function to guarantee its uniqueness.
Please check the MongoDB documentation for further details about ObjectID as your use case could benefit of using it to generate the _id field.
Is there is any real advantages of using an auto generated ObjectId instead of a functional id for the "_id" field?
This relates with your first question and there is a straightforward answer to this: an auto generated ObjectID within a collection for any inserted document will lower the risk of duplicate entries as opposed to a function-generated key.
In case we want to migrate from MongoDB to some other database later, can the ObjectId be an advantage or a disadvantage?
This depends on the nature of the database you are migrating to and its data model so there is no straightforward answer to this one.
MongoDB will guarantee that the document within the collection will have a unique _id and you should be aware of this fact at the moment of migration to other database system.
According to the MongoDB documentation, the _id field (if not specified) is automatically assigned a 12 byte ObjectId.
It says a unique index is created on this field on the creation of a collection, but what I want to know is how likely is it that two documents in different collections but still in the same database instance will have the same ID, if that can even happen?
I want my application to be able to retrieve a document using just the _id field without knowing which collection it is in, but if I cannot guarantee uniqueness based on the way MongoDB generates one, I may need to look for a different way of generating Id's.
Short Answer for your question is : Yes that's possible.
below post on similar topic helps you in understanding better:
Possibility of duplicate Mongo ObjectId's being generated in two different collections?
You are not required to use a BSON ObjectId for the id field. You could use a hash of a timestamp and some random number or a field with extremely high cardinality (an US SSN for example) in order to make it close to impossible that two objects in the world will share the same id
The _id_index requires the idto be unique per collection. Much like in an RDBMS, where two objects in two tables may very likely have the same primary key when it's an auto incremented integer.
You can not retrieve a document solely by it's _id. Any driver I am aware of requires you to explicitly name the collection.
My 2 cents: The only thing you could do is to manually iterate over the existing collections and query for the _id you are looking for. Which is... ...inefficient, to put it polite. I'd rather semantically distinguish the documents in question by an additional field than by the collection they belong to. And remember, mongoDB uses dynamic schemas, so there is no reason to separate documents which semantically belong together but have a different set of fields. I'd guess there is something seriously, dramatically wrong with you schema. Please elaborate so that we can help you with that.
What happens, if two clients, working with one MongoDB instance, perform and insert operation at same time without «forceServerObjectId: true». Is it possible to be generated equal ObjectIDs, is there may be a conflict?
There is an implied unique index on the _id field of every collection which makes it impossible for two objects with the same _id to exist in the same collection.
When two objects with the same _id value are stored with collection.save, one document will replace the other.
When they are stored with collection.insert, one of the inserts will fail with a duplicate key error.
But note that MongoDB ObjectIDs include a 24bit machine-ID. This makes it impossible for two clients to generate the same ID, unless they have the same machine-ID. And even then it's unlikely. That, of course, only applies when you let the MongoDB driver (or shell) auto-generate ObjectIDs. MongoDB allows to use any value of any type as a value for the _id field when you set it manually. When you do this (you shouldn't), it's your responsibility to ensure uniqueness.
I know that we can bulk update documents in mongodb with
db.collection.update( criteria, objNew, upsert, multi )
in one db call, but it's homogeneous, i.e. all those documents impacted are following one kind of criteria. But what I'd like to do is something like
db.collection.update([{criteria1, objNew1}, {criteria2, objNew2}, ...]
, to send multiple update request which would update maybe absolutely different documents or class of documents in single db call.
What I want to do in my app is to insert/update a bunch of objects with compound primary key, if the key is already existing, update it; insert it otherwise.
Can I do all these in one combine in mongodb?
That's two seperate questions. To the first one; there is no MongoDB native mechanism to bulk send criteria/update pairs although technically doing that in a loop yourself is bound to be about as efficient as any native bulk support.
Checking for the existence of a document based on an embedded document (what you refer to as compound key, but in the interest of correct terminology to avoid confusion it's better to use the mongo name in this case) and insert/update depending on that existence check can be done with upsert :
document A :
{
_id: ObjectId(...),
key: {
name: "Will",
age: 20
}
}
db.users.update({name:"Will", age:20}, {$set:{age: 21}}), true, false)
This upsert (update with insert if no document matches the criteria) will do one of two things depending on the existence of document A :
Exists : Performs update "$set:{age:21}" on the existing document
Doesn't exist : Create a new document with fields "name" and field
"age" with values "Will" and "20" respectively (basically the
criteria are copied into the new doc) and then the update is applied
($set:{age:21}). End result is a document with "name"="Will" and
"age"=21.
Hope that helps
we are seeing some benefits of $in clause.
our use case was to update the 'status' in a document for a large number number records.
In our first cut, we were doing a for loop and doing updates one by 1. But then we switched to using $in clause and that made a huge improvement.
There is no real benefit from doing updates the way you suggest.
The reason that there is a bulk insert API and that it is faster is that Mongo can write all the new documents sequentially to memory, and update indexes and other bookkeeping in one operation.
A similar thing happens with updates that affect more than one document: the update will traverse the index only once and update objects as they are found.
Sending multiple criteria with multiple criteria cannot benefit from any of these optimizations. Each criteria means a separate query, just as if you issued each update separately. The only possible benefit would be sending slightly fewer bytes over the connection. The database would still have to do each query separately and update each document separately.
All that would happen would be that Mongo would queue the updates internally and execute them sequentially (because only one update can happen at any one time), this is exactly the same as if all the updates were sent separately.
It's unlikely that the overhead in sending the queries separately would be significant, Mongo's global write lock will be the limiting factor anyway.