Generation of _id vs. ObjectId autogeneration in MongoDB

Generation of _id vs. ObjectId autogeneration in MongoDB - mongodb

I'm developing an application that create permalinks. I'm not sure how save the documents in MondoDB. Two strategies:
ObjectId autogeneration
MongoDB autogenerates the _id. I need to create an index on the permalink field because I get the information by the permalink. Also I can access to the creation time of the ObjectId, using the getTimestamp() method, so datetime fields seems to be redundant but if I delete this field I need two calls to MongoDB one to take the information and another to take the timestamp.
{
"_id": ObjectId("5210a64f846cb004b5000001"),
"permalink": "ca8W7mc0ZUx43bxTuSGN",
"data": "a lot of stuff",
"datetime": ISODate("2013-08-18T11:47:43.460+-100")
}
Generate _id
I generate the _id with the permalink.
{
"_id": "ca8W7mc0ZUx43bxTuSGN",
"data": "a lot of stuff",
"datetime": ISODate("2013-08-18T11:47:43.460+-100")
}
I not see any advantage to use ObjectIds. Am I missing something?

ObjectIds are there for situations where you don't have a unique key for every document in a collection. They're unique, so you don't have to worry about conflicts and they shard reasonably well in large deployments without too much worry (they have they're pros and cons, read more here).
The ObjectId also contains the timestamp of the client where the ObjectId was generated (unless the DB server is configured to generate all keys). With that, as you noticed, you can use the time stamp to perform some date operations. However, if you plan on using the Aggregation Framework, you'll find that you can't use an ObjectId in any date operations currently (issue). If you want to use the AF, you'll need a second field that contains the date, unfortunately doubly storing it with the ObjectId's internal value.
If you can be assured that the _id you're generating is unique, then there's not much reason to use an ObjectId in your data structure.

Related

Define MongoDB compound keys as one key

is it somehow possible, to define one compound key, consisting of two mongoDB objectID's or numeric types, so to make one key out of it?
This is necessary, because I have lots of participants creating documents which they save into one big collection together, so I cannot be sure, that the MongoDB Object ID for each document is distinct. So I wanted to add some additional key, maybe one userID's number or email or something similar...
maybe 2 ObjectID's

ObjectId in MongoDb is hexadecimal value.
ObjectId() Returns a new ObjectId value. The 12-byte
ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
https://docs.mongodb.com/manual/reference/method/ObjectId/
Hence, the object Id will be uniquely auto-generated when you insert a document.
However, you can make a custom combination of hexadecimal value of length 24, when you insert a document.
For example,
1DCD6500 -- this can be custom hex identifier
A98AC7 -- another custom hex identifier
2B67 -- another custom hex identifier
A981CE -- Incremental custom hex identifier
Now if you try to insert a document with _id as 1DCD6500A98AC72B67A981CE. The document will be saved.
e.g. { "_id" : ObjectId("1DCD6500A98AC72B67A981CE"), "name" : "sample", "personid" : 39 }
So based on definition of the ObjectId you can make custom ObjectId.
But in that case you will be responsible to make sure ObjectId is unique, otherwise the mongodb will throw error
"E11000 duplicate key error collection:

You can use anything for your _id field. So this is possible:
db.collection.insertOne({
_id: {
"first": new ObjectId(),
"second": new ObjectId(),
}
})
The default unique index on the _id field also guarantees uniqueness on this kind of field.
However, I would doubt that this is a good solution to your problem as it would probably just defer the underlying problem (which really doesn't exist - kindly see this answer, too: How to generate unique object id in mongodb). Instead, I would suggest you have your clients create documents without specifying an _id explicitly and let MongoDB create the _id (on the server side or on the client side depending on your driver and your settings where client-side generation should be preferred). This will guarantee uniqueness (even when you do sharding).
There always is a unique index on your _id field anyway so to be on the super safe side with respect to run-time behaviour you could put a retrying exception handler in place on the client side for the (pretty much impossible) case that you end up with two identical _ids and hence an exception.
Also see this answer: Mongodb - must _id be globally unique when sharding

Converting Parse objectId to Mongo ObjectId?

I'm trying to migrate data from Parse to a new project that uses Mongo as its database (without Parse/Parse Server). Since the schemas are different between the two projects, I'm manually writing a migration script to achieve this.
As I understand it, Parse appears to use 10-character-long IDs for their objects (combinations of digits, lower-case letters, and upper-case letters), while Mongo uses 24-character-long IDs (12 bytes represented as hex).
Right now, when migrating data for a document from the old project to the new one, I'm using a function that converts the Parse ID to a unique Mongo ObjectId (it converts each character to a 2-digit hex value, then pads the 20-character string with 4 zeroes).
Is this a good approach? I'm avoiding using Mongo's automatic ObjectId generation in case I ever need to re-migrate any of the old Parse documents and find the matching document in the new database. I know automatically generated ObjectIds in Mongo also embed some other information like creation dates, but I don't think this would be important and I can just use my custom ObjectId generator? However, I'm not sure about the implications for performance/if I'm just going about this migration the wrong way.

The approach i recommend is letting Mongo auto-generate the ids and then storing Parse's ids in a new field called parseID for future reference if needed.
For example:
PARSE DATA:
"_id": ObjectId(1234567890),
"title": "Mongo Migrate",
"description": "Migrating from Parse to Mongo"
MONGO DATA:
"_id": ObjectId(1ad83e4k2ab8e0daa8ebde7), //mongo generated
"parseId":ObjectId(1234567890),
"title": "Mongo Migrate",
"description": "Migrating from Parse to Mongo"
Then if you need to match a document between the two databases later, you can write a script that goes along the lines of Parse.find({"_id": Mongo.parseId}).....

MongoDB uses _id as primary key by default. _id has to be unique to avoid collision. The way you are generating unique ObjectId to _id is fine. As long as they are unique, you could even reduce the 20-character pad to save space.

How to overwrite object Id's in Mongo db while creating an App in Sails

I am new to Sails and Mongo Db. Currently I am trying to implement a CRUD Function using Sails where I want to save user details in Mongo db.In the model I have the following attributes
"id":{
type:'Integer',
min:100,
autoincrement:true
},
attributes: {
name:{
type:'String',
required:true,
unique:true
},
email_id:{
type:'EMAIL',
required:false,
unique:false
},
age:{
type:'Integer',
required:false,
unique:false
}
}
I want to ensure that the _id is overridden with my values starting from 100 and is auto incremented with each new entry. I am using the waterline model and when I call the Api in DHC, I get the following output
"name": "abc"
"age": 30
"email_id": "abc#gmail.com"
"id": "5587bb76ce83508409db1e57"
Here the Id given is the object Id.Can somebody tell me how to override the object id with an Integer starting from 100 and is auto incremented with every new value.

Attention: Mongo id should be unique as possible in order to scale well. The default ObjectId is consist of a timestamp, machine ID, process ID and a random incrementing value. Leaving it with only the latter would make it collision prone.
However, sometimes you badly want to prettify the never-ending ObjectID value (i.e. to be shown in the URL after encoding). Then, you should consider using an appropriate atomic increment strategy.
Overriding the _id example:
db.testSOF.insert({_id:"myUniqueValue", a:1, b:1})
Making an Auto-Incrementing Sequence:
Use Counters Collection: Basically a separated collection which keeps track the last number of the sequence. Personally, I have found it more cohesive to store the findAndModify function in the system.js collection, although it lacks version control's capabilities.
Optimistic Loop
Edit:
I've found an issue in which the owner of sails-mongo said:
MongoDb doesn't have an auto incrementing attribute because it doesn't
support it without doing some kind of manual sequence increment on a
separate collection or document. We don't currently do this in the
adapter but it could be added in the future or if someone wants to
submit a PR. We do something similar for sails-disk and sails-redis to
get support for autoIncremeting fields.
He mentions the first technique I added in this answer:
Use Counters Collection. In the same issue, lewins shows a workaround.

Should I use the timestamp in "_id"?

I need monitor the time of the records been created, for further query and modify.
first thing flashed in my mind is give the document a "createDateTime" field, with the default value of "new Date()", but Mongodb said the document _id has a timestamp embedded with, and the id was generated when the document was created, so it sounds dummy to add a new field for that.
for too many times, I've seen people set a "createDateTime" for their data, and I don't know if they know about the details of mongodb's _id.
I want know should I use the _id as a "createDateTime" field? what is the best practice?
and the pros and cons.
thanks for any tips.

I'd actually say it depends on how you want to use the date.
For example, it's not actionable using the aggregation framework Date operators.
This will fail for example:
db.test.aggregate( { $group : { _id: { $year: "$_id" } } })
The following error occurs:
"errmsg" : "exception: can't convert from BSON type OID to Date"
(The date cannot be extracted from the ObjectId.)
So, operations that are normally simple date operations become much more complex if you wanted to do any sort of date math in an aggregation. It would be far easier to have a createDateTime stamp. Counting the number of documents created in a particular year and month would be simple using aggregation with a distinct createdDateTime field.
You can sort on an ObjectId, to some degree. The remaining 8 bytes of the ObjectId aren't sortable in a meaningful way. Most MongoDB drivers default to creating the ObjectId within the driver and not on the database. So, if you've got multiple clients (like web servers for example) creating new documents (and new ObjectIds), the time stamps will only be as accurate as the various servers.
Also, depending the precision you'd need, an ISODate value is stored using 8 bytes, rather than the 4 used in an ObjectId.

Yes, you should. There is no reason not to do, besides the human readability while directly looking into the database. See also here and here.
If you want to use the aggregation framework to group by the date within _id, this is not possible yet as WiredPrairie correctly said. There is an open jira ticket for that, you might watch. But of course you can do this with Map-Reduce and ObjectID.getTimestamp(). An example for that can be found here.

MongoDB - Query embbeded documents

I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?

When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.

Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.

For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.