inserting into a mongodb collection with manually specified _id

inserting into a mongodb collection with manually specified _id - mongodb

Suppose I have a collection in mongodb and the objects in the collection have an _id that is an ObjectID that I selected in some random manner completely external to mongoDB, such as starting with ObjectID 0000 ... 0000 and incrementing by 10000, or maybe just used a random number generator to make the ObjectID's.
Suppose I then go to add another item to the collection, but I don't have an ObjectID in mind for the new object, and am satisfied with letting the system pick one. Would the system ever select a ObjectID that was already a part of the collection?
If it is relevant, I am using the java API and the python API to do this.

_id is always unique. There's an implicit unique primary index on _id field.
So rest assured that MongoDB will not choose an _id field that has been taken.
Also the _id field is a 12 byte hexadecimal string in which :
4 bytes is for the date timestamp.
3 bytes is the MAC Address.
2 bytes is the process id.
3 bytes is the counter.
So if MongoDB chooses an _id for your document in a collection, it's definitely going to be unique from the other _id fields in other documents in your collection.
Hope this helps.

Related

Best way to get the first inserted record in a collection of MongoDB

I need to fetch the first inserted record in a collection in MongoDB for which I am currently using the below query:
db.users.find({}).sort({"created_at":1}).limit(1);
But, this takes up a lot of memory. The collection has about 100K records.
What is the efficient way to do this?

MongoDB _id is unique identifier which is automatically generated upon insertion of document into MongoDB collection
_id field stores ObjectId value and is automatically indexed.
According to MongoDB documentation,
The 12-byte ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
According to description as mentioned into above question to fetch first inserted record please try executing following mongodb find operation into MongoDB shell.
db.users.find({}).sort({"_id":1}).limit(1);
In above query we have sorted result according to _id field since _id Object ID value consists of unix epoch timestamp

further to this you can add specific filters in query to get first record inserted for that criteria:
like suppose you collection contains data for storing employees from IT, ADMIN, FINANCE department and you want to look for the first document inserted for IT (i.e. first IT employee) then you can execute:
db.users.find({"Dept" : "IT"}).sort({"_id":1}).limit(1);
and similarly to find last employee:
db.users.find({"Dept" : "IT"}).sort({"_id":-1}).limit(1);
Note: for bigger collections/sharded collection it will take considerable time to get the result as it iterates entire _id field for ascending and descending criteria.

Collection ID length in MongoDB

i am new to mongodb and stack overflow.
I want to know why on mongodb collection ID is of 24 hex characters?
what is importance of that?

Why is the default _id a 24 character hex string?
The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
The 12 bytes of an ObjectId are constructed using:
a 4 byte value representing the seconds since the Unix epoch
a 3 byte machine identifier
a 2 byte process id
a 3 byte counter (starting with a random value)
What is the importance of an ObjectId?
ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.
The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.
An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.
Do I have to use the default ObjectId?
No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.
The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.

Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values are 12 bytes in length, consisting of:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Create ObjectId and get timestamp from it
> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")
Reference
Read MongoDB official doc

Does _id field change in MongoDB when copying data from one collection to another?

We are planning on using MongoDB _id as a key that we would provide to the client. Therefore, the requirement is that this key should not change if we ever need to move the data from one collection to another. The copy will be performed using db.copyDatabase() or mongoimport.
One of the ways in which data can be copied from one collection to another is iterating through the documents in the first collection(C1) and inserting these documents in the second collection(C2). In this case _id should remain the same(in C2) because it would be present in the documents(of C1) being inserted(same as the case in which we would provide an _id ourselves).
However, if there is an alternate way in which documents are copied, the _id might change since it depends on :
(1) The UNIX timestamp
(2) Machine identifier
(3) ProcessId
(**This should only happen if MongoDB while copying removes _id from documents in C1 and regenerated them while inserting into C2?)
We want the _id values to be same irrespective of the location of the destination collection:
(1)within same database
(2)different database - same machine
(3)different database - different machine)
Thanks

No, the _id numbers will not change.
A new ObjectId is generated when a document without an _id field is inserted into the database. When you insert a document which already has an _id field, MongoDB won't touch it.
The timestamp, machine identifier and processID refer to those where the ObjectID was generated. This can be a database server, but it can also be generated by the MongoDB driver on the application server. In that case MongoDB will not change it on its own.
By the way: The _id can be an auto-generated ObjectId, but it doesn't have to. You can also use any other value as _id, as long as you can guarantee that it's unique. So when your data already has a natural key, you can use this as _id when you want to.

Difference between "id" and "_id" fields in MongoDB

Is there any difference between using the field ID or _ID from a MongoDB document?
I am asking this, because I usually use "_id", however I saw this sort({id:-1}) in the documentation: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Sortbyidtosortbyinsertiontime
EDIT
Turns out the docs were wrong.

I expect it's just a typo in the documentation. The _id field is primary key for every document. It's called _id and is also accessible via id. Attempting to use an id key may result in a illegal ObjectId format error.
That section is just indicating that the automatically generated ObjectIDs start with a timestamp so it's possible to sort your documents automatically. This is pretty cool since the _id is automatically indexed in every collection. See http://www.mongodb.org/display/DOCS/Object+IDs for more information. Specifically under "BSON ObjectID Specification".
A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON.

The _id field is the default field for Bson ObjectId's and it is,by default, indexed.
_id and id are not the same. You may also choose to add a field called id if you want, but it will not be index unless you add an index.
It is just a typo in the docs.

id is an alias for _id in mongoid.id would return the _id of the document.
https://github.com/mongodb/mongoid/blob/master/lib/mongoid/fields.rb#L47
if the _id field is not specified an ObjectedId is generated automatically.

My two cents:
The _id field
MongoDB assigns an _id field to each document and assigns primary index on it. There're ways by which we can apply secondary indices as well. By default, MongoDB creates values for the _id field of type ObjectID. This value is defined in BSON spec and it's structured this way:
ObjectID (12 bytes HEX string) = Date (4 bytes, a timestamp value representing number of seconds since the Unix epoch) + MAC address (3 bytes) + PID (2 bytes) + Counter (3 bytes)

Why mongoDB uses objectID?

{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
what exactly is the purpose of objectId? It's a big number that is generated using a timestamp.
If I see any nosql which is key-value, I query with key the value.
Here we use key and value in the as the data and use find () function.
So, I am trying to understand when we really need the objectid?
What are the reasons behind that giving access to the user to view the value of the object ID?
After reading the docs, one basic question is mongo DB as hash table type implementation?

After readying doc..one basic question is mongo DB as hash table type implementation?
MongoDB used BSON, a binary form of JSON. A JSON object is basically just a "hashtable" or a set of key / value pairs.
what exactly is the use of object id? that is a big number that is generated with time.
In MongoDB, each document you store must have an _id. If you do not set a value for _id, then MongoDB will automatically generate one for you. If you have a unique key when you are inserting the object, you can use that instead. For details on the ObjectId see here.
If I see any nosql which is key-value, I query with key the value.
MongoDB is not just key-value. MongoDB supports multiple indexes on a single collection, you can query on many different fields, not just the "key" or "id".

Object ID is similar to primary key in RDBMS
Whenever u insert a new document, mongodb will generate object ID.
Object ID is a 12 byte BSON Type.
First 4 Byte represents timestamp
next 3 byte unique machine identifier
next 2 byte process id
next 3 byte random increment counter
Returns the equivalent 16 digit hex

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse