Update/Find by id limit - mongodb

When performing update/find only by _id should I specify $limit 1 or mongo already implicitly know that there will be only one record with specified id?

yes there will always be unique _id in every document of a collection. An _id is made from following and therefore it is always unique and you will only find 1 document corresponding to one _id
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.

Related

Best way to get the first inserted record in a collection of MongoDB

I need to fetch the first inserted record in a collection in MongoDB for which I am currently using the below query:
db.users.find({}).sort({"created_at":1}).limit(1);
But, this takes up a lot of memory. The collection has about 100K records.
What is the efficient way to do this?
MongoDB _id is unique identifier which is automatically generated upon insertion of document into MongoDB collection
_id field stores ObjectId value and is automatically indexed.
According to MongoDB documentation,
The 12-byte ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
According to description as mentioned into above question to fetch first inserted record please try executing following mongodb find operation into MongoDB shell.
db.users.find({}).sort({"_id":1}).limit(1);
In above query we have sorted result according to _id field since _id Object ID value consists of unix epoch timestamp
further to this you can add specific filters in query to get first record inserted for that criteria:
like suppose you collection contains data for storing employees from IT, ADMIN, FINANCE department and you want to look for the first document inserted for IT (i.e. first IT employee) then you can execute:
db.users.find({"Dept" : "IT"}).sort({"_id":1}).limit(1);
and similarly to find last employee:
db.users.find({"Dept" : "IT"}).sort({"_id":-1}).limit(1);
Note: for bigger collections/sharded collection it will take considerable time to get the result as it iterates entire _id field for ascending and descending criteria.

Is MongoDB _id (ObjectId) generated in an ascending order?

I know how the _id column contains a representation of timestamp when the document has been inserted into the collection. here is an online utility to convert it to timestamp: http://steveridout.github.io/mongo-object-time/
What I'm wondering is if the object id string itself is guaranteed maintain the ascending order or not? i.e. does this comparison always return true?
"newest object id" > "second newest object id"
No, there is no guarantee whatsoever. From the official documentation (at the time of the original answer):
The relationship between the order of ObjectId values and generation time is not strict within a single second. If multiple systems, or multiple processes or threads on a single system generate values, within a single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod process.
And from the latest docs
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
For mongo version >= 3.4, the Objectid generation is changed a little.
Its structs are:
a 4-byte value representing the seconds since the Unix epoch,
a 5-byte random value, and
a 3-byte counter, starting with a random value.
So the first 4 bytes are still the seconds since the Unix epoch, it is still almost ascending but not strictly.
https://docs.mongodb.com/manual/reference/bson-types/#objectid
_id: ObjectId(4 bytes timestamp, 3 bytes machine id, 2 bytes process id, 3 bytes incrementer)
This is the id structure. So only last 3 bytes will increment uniquely. So the answer of your question is yes.

Mongodb - must _id be globally unique when sharding

I want to shard a collection by their "foreign key" (userID) and not by their id field. I only need that the combination of userID and id is unique. But I am not sure if that is ok with mongodb.
Warning In any sharded collection where you are not sharding by the
_id field, you must ensure uniqueness of the _id field. The best way to ensure _id is always unique is to use ObjectId, or another
universally unique identifier (UUID.)
This is taken from: http://docs.mongodb.org/manual/tutorial/enforce-unique-keys-for-sharded-collections/#enforce-unique-keys-for-sharded-collections
Do I have to ensure that _id is unique? Or is it good enough if I always query by both userID and _id?
Unless you manually replace them, the auto-generated _id's are UUID's which, according to the documentation, consist of "a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter".
As you can see, an unique machine-ID is part of the UUID. That ensures that no two machines in the shard ever create the same UUID independently (unless they have the same machine-id - the likeliness for that is 1:16777215 and when it happens it can be easily verified). The only situation where you could theoretically have a duplicated UUID is when a single process creates more than 2^24 (over 16 million) UUIDs in a single second.
tl;dr: You don't have to worry about duplicate UUIDs - they are, as the documentation puts it, "designed to have a reasonably high probability of being unique when allocated".

Difference between "id" and "_id" fields in MongoDB

Is there any difference between using the field ID or _ID from a MongoDB document?
I am asking this, because I usually use "_id", however I saw this sort({id:-1}) in the documentation: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Sortbyidtosortbyinsertiontime
EDIT
Turns out the docs were wrong.
I expect it's just a typo in the documentation. The _id field is primary key for every document. It's called _id and is also accessible via id. Attempting to use an id key may result in a illegal ObjectId format error.
That section is just indicating that the automatically generated ObjectIDs start with a timestamp so it's possible to sort your documents automatically. This is pretty cool since the _id is automatically indexed in every collection. See http://www.mongodb.org/display/DOCS/Object+IDs for more information. Specifically under "BSON ObjectID Specification".
A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON.
The _id field is the default field for Bson ObjectId's and it is,by default, indexed.
_id and id are not the same. You may also choose to add a field called id if you want, but it will not be index unless you add an index.
It is just a typo in the docs.
id is an alias for _id in mongoid.id would return the _id of the document.
https://github.com/mongodb/mongoid/blob/master/lib/mongoid/fields.rb#L47
if the _id field is not specified an ObjectedId is generated automatically.
My two cents:
The _id field
MongoDB assigns an _id field to each document and assigns primary index on it. There're ways by which we can apply secondary indices as well. By default, MongoDB creates values for the _id field of type ObjectID. This value is defined in BSON spec and it's structured this way:
ObjectID (12 bytes HEX string) = Date (4 bytes, a timestamp value representing number of seconds since the Unix epoch) + MAC address (3 bytes) + PID (2 bytes) + Counter (3 bytes)

How are MongoDB's ObjectIds generated?

Are they somewhat random?
I mean....would people be able to break them apart?
They are not random and can be easily predicted :
A BSON ObjectID is a 12-byte value
consisting of a 4-byte timestamp
(seconds since epoch), a 3-byte
machine id, a 2-byte process id, and a
3-byte counter
http://www.mongodb.org/display/DOCS/Object+IDs
Heres a javascript implementation of the MongoDB ObjectID (http://jsfiddle.net/icodeforlove/rN3zb/)
function ObjectIdDetails (id) {
return {
seconds: parseInt(id.slice(0, 8), 16),
machineIdentifier: parseInt(id.slice(8, 14), 16),
processId: parseInt(id.slice(14, 18), 16),
counter: parseInt(id.slice(18, 24), 16)
};
}
So if you have enough of them they leak quite a bit of information about your infrastructure. And you also know the object creation dates for everything.
IE: how many servers do you have, and how many processes each server is running.
Generation
They are usually generated on the client side by the driver itself. For example, in ruby, BSON::ObjectID can be used:
https://github.com/mongodb/bson-ruby/blob/master/lib/bson/object_id.rb#L369
You can also generate your own ObjectIds. This is particularly useful if you want to use business identifiers.
Breakability
When using driver generated ObjectIds, is low
When using own business Id, is slightly higher depending on their predictability (login, consecutives identifiers...)
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
ObjectID is a 96-bit number which is composed as follows:
a 4-byte value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
a 3-byte machine identifier (usually derived from the MAC address),
a 2-byte process id, and
a 3-byte counter, starting with a random value.
From the MongoDB Official Document links
it shows :
ObjectId
ObjectIds are small, likely unique, fast to generate, and
ordered. ObjectId values consist of 12 bytes, where the first four
bytes are a timestamp that reflect the ObjectId’s creation.
Specifically:
a 4-byte value representing the seconds since the Unix epoch,
a
5-byte random value, and
a 3-byte counter, starting with a random
value.
In MongoDB, each document stored in a collection requires a
unique _id field that acts as a primary key. If an inserted document
omits the _id field, the MongoDB driver automatically generates an
ObjectId for the _id field.
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
Total 12 bytes:
4-byte timestamp value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
5-byte random value, and
3-byte incrementing counter, starting with a random value.
Example from mongo-go-driver:
var objectId [12]byte
// 4 bytes unix time-stamp second (big endian)
binary.BigEndian.PutUint32(objectId[0:4], uint32(timestamp.Unix()))
// global random number generated by driver
copy(objectId[4:9], processUnique[:])
// global counter by driver
putUint24(objectId[9:12], atomic.AddUint32(&objectIDCounter, 1))