Is using custom _id in meteor risky? - mongodb

I planned to create a document with an _id before inserting it into the DB.
I wanted to generate this _id using Meteor.uuid() (which theoretically always return a unique id) but I felt on this following git issue
Thanks, good catch. The reason that this wasn't documented is that
we'd eventually like to move away from string _ids to native binary
Mongo _ids. Since it's used in an example though, I think we should go
ahead and document it and cross that bridge later. I'll do this
It seems to be a difference between a string id and a binary mongo one. Back to my question, is there a good reason I should then avoid using my custom _id ?

When you create a Mongo.Collection you have the option to select the way Meteor handles creating _id for documents which do not have and _id already. You can read more about it here.
The key point to take away is if it does not already have an _id. You are free to use whatever custom _id field you want. At this point this is no longer a Meteor question, but a Mongo question. Read up on the pros and cons of manually setting the _id field in Mongo.
Back to your question, with respect to Meteor there is no reason to avoid creating your own _id fields. If the custom _id you want to use uniquely identify's that document, then you are good to go.
And don't worry about Meteor.uuid(). It's no longer documented, so I'd imagine it will eventually disappear.

MongoDB ObjectID are guaranteed unique by the algorithm, so are totally conflict-safe. You could have the same safeness only actually having somewhere a sort of incremental counter shared by all your application servers and persisted, so actually reimplementing what already on you db server.
From my POV you should choose between accepting a little risk and generating a random large ID, or using MongoDB to bring them to you in advance.
With the latter I mean you could implement your custom ID generation as an empty save on your collection, that you could later user within the actual document save: you'll pay the price of an additional database roundtrip but if your function involve moving files I'm sure it would be negligible.
My personal advice is the latter solution.

Related

Is there a risk of saving document in Mongo with _id from other DB?

I want to save documents to designated Mongo collection from other 3rd party API that uses Mongo too. I want to keep those id's so I would be able to check if I'm not saving duplicates.
Is there any risk that those ids may collide one day?
Is it possible to have isolated ObjectID generator for a specific collection?
(a) The likelihood is very low, but I will advise against it.
(b) Yes, it is. I can think of modifying it in the pr-save hook of your schema definition. There might also be modules out there for this.

Looking up submissions by ID with MongoDB/Mongoose

So I'm used to looking up submissions by an auto increment primary ID with MySQL, but after using the MongoDB ORM wrapper, Mongoose, I'm finding that since Mongo stores data within collections differently, there is not really any concept of a traditional auto-increment ID.
I'm stuck trying to figure out how to grab a submission now because normally I'd structure my URL like so:
submission/34/category/slug-goes-here.
Since the 34 now becomes an ugly string based UUID with Mongo, I don't necessary want to display that in my URLs, but I want a unique URL in order to look up my submissions.
I'm thinking of maybe having a set method that when I insert the submission into my database, it generates some kind of 6 character hash e.g. zhXk40 and looks it up like that.
I'm wondering if I do it like this what the performance trade-offs would be. If I made constraints on the slug and then looked them up with the slug, and verified that the category matched, would that be more efficient? Either way I'm going to have to check if the category and slug match, but I'm not sure if an ID is even really necessary in this case.
What's the best practice for creating a route + looking up some piece of data from the db based on that route?
The first thing you should know is:
The _id property doesn't necessarily is that "ugly" ObjectId string.
Actually, the _id just need to be unique within its collection, so if you want to use auto incrementing IDs, there are no problems, however...
If you plan to use sharding within your database, then using auto incrementing field as the _id is overkill.
Why? Read the accepted answer here: Should I implement auto-incrementing in MongoDB?
In my application, as we're not going to shard it, then we use a indexed, numeric ID just for easier usability to the final user, and internally all references are ObjectIds.
Also, here's an good tuto for creating a auto incrementing field in MongoDB: http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
Sheesh! If we were able to truly know best practice here we'd all be better off. The talking heads are still talking and will be for some time.
How I would approach this is to go as pretty as I could. If I had a usable text string to make it semantic that'd be best case.
If I couldn't do that I'd go with the hash thing you suggested.
With both solutions the challenge will be to ensure that it remains unique. That means lookups before you save.
On performance, it's the same as SQL. Index what you use to look things up with. Mongo does good with composite indexes so category name and hash will look up pretty quick.

strategy for creating MongoDB short ids that scale

I want to have a friendlier facing ids (ie Youtube style: /posts/cxB6Ey6) than MongoDB's ObjectID.
I read that for scalability its best to leave _id as an ObjectID so I thought about two solutions:
1) add an indexed postid field to each document
2) create a mapping collection between _id and the postid
in both cases use something like https://github.com/dylang/shortid to generate the short id, and while generating make sure that the id is unique by querying the database.
(can this query-generate-insert be an atomic operation?)
will those solutions have a noticeable impact on performance ?
what's the best strategy for doing this ?
The normal method of doing this is to base64 encode a unique id but:
add an indexed postid field to each document
You definitely want to go for this method. Out of the two I would say this method is easily the most scalable and performant, for one it would only need one round trip to get a short URLs details where as the second option would take 2. Another consideration is the shortage of index overhead of maintaining an extra collection, this is a bit of a no-brainer.
I would not replace the _id field within the document either since the default ObjectId could still be useful in the foreseeable future.
So this limits it down to a separate field and index (unique key) for the short code of a URL.
The next thing is that you don't want an ID which forces you to query the database for uniqueness prior to every insert. This is where the ObjectId shines. The ObjectId is good at being made within the client application while being unique in the database without having to specifically query those assumptions.
Unique ids that do not require querying the database first are normally time based. In PHP ( http://php.net/manual/en/function.uniqid.php ) and in the MongoDB Drivers ( http://docs.mongodb.org/manual/core/object-id/ ) and even the plug-in you linked on github ( https://github.com/dylang/shortid/blob/master/lib/shortid.js#L50 ) they all use time as a basis for being unique.
Considering the plug-in you linked does not query the database to check its own IDs uniqueness I would say that this plug-in probably is quite performant and if you use it with the first solution you stated you should get a good benchmark out of it.
If you want to replace build-in ObjectID with custom user-friendly short id's then do it. You can either use build-in _id field or add a new unique-indexed field id for your custom ids. The benefit of using build-in ObjectID's is than they won't duplicate even if your database is extremely large. So, by replacing them with short id's you take the risk of id duplication.
Now about the performance. I think that the best solution is not to query DB for id's, because with properly adjusted ids length the probability of duplication is extremely small. So, the best way to handle ids duplication in this model is to check Mongo responses. If it responded with "duplicate key error" then you shall generate a new one.
And now about scaling. To scale your custom ids you can just add a few more symbols to it. "Duplicate key error" shall be a trigger for making that change. Normally there shall be no such errors. So, if they started to appear then its time to scale.
I don't think that generating ObjectId for _id field affect directly scalability or performance. Whereby this can be happen?
Main difference is that ObjectIds are created by MongoDB and you don't burden yourself with responsibility for this. Otherwise you must by yourself to determine optimal size of id and to ensure unique value for each _id field of documents stored in collection. It's required because _id used as primary key. This can be justified if you have not very big collection and custom value of identifier is need for you.
But you have such additional benefits with _id field that stores ObjectId values as opportunity to create object id's from time and use this fact to your advantage in queries. Also you can get timestamp of ObjectId’s creation with getTimestamp() method. And sorting on _id in this case is equivalent to sorting by creation time.
But if you're going to use ObjectId in URLs or HTML then for security concerns you can encrypt it. To prevent leakage of information and access to object's creation time. It may be security risk.
About your solutions:
1) I suppose this's very convenient and flexible solution. In this case you can specify any value in postId which doesn't depend directly on _id.
But little disadvantage of this solution is that you have to have extra field and to create extra index. While _id is automatically indexed.
2) I don't think this's good solution from the point of view of performance and philosophy of noSQL approach.

MongoDB embedded documents vs. referencing by unique ObjectIds for a system user profile

I'd like to code a web app where most of the sections are dependent on the user profile (for example different to-do lists per person etc) and I'd love to use MongoDB. I was thinking of creating about 10 embedabble documents for the main profile document and keep everything related to one user inside his own document.
I don't see a clear way of using foreign keys for mongodb, the only way would be to create a field to_do_id with the type of ObjectId for example, but they would be totally unrelated internally, just happen to have the same Ids I'd have to query for.
Is there a limit on the number of embedded document types inside a top level document that could degrade performance?
How do you guys solve the issue of having a central profile document that most of the documents have to relate to in presenting a view per person?
Do you use semi foreign keys inside MongoDb and have fields with ObjectId types that would have some other document's unique Id instead of embedding them?
I cannot feel what approach should be taken when. Thank you very much!
There is no special limit with respect to performance. The larger the document, though, the longer it takes to transmit over the wire. The whole document is always retrieved.
I do it with references. You can choose between simple manual references and the database DBRef as per this page: http://www.mongodb.org/display/DOCS/Database+References
The link above documents how to have references in a document in a semi-foreign key way. The DBRef might be good for what you are trying to do, but the simple manual reference is very efficient.
I am not sure a general rule of thumb exists for which reference approach is best. Since I use Java or Groovy mostly, I like the fact that I get a DBRef object returned. I can check for this datatype and use that to decide how to handle the reference in a generic way.
So I tend to use a simple manual reference for references to different documents in the same collection, and a DBRef for references across collections.
I hope that helps.

Mongodb: object id as short primary key within a collection

How to make better use of objectId generate by MongoDB. I am not an expert user, but so far i ended up creating seperate id for my object (userid, postid) etc because the object id is too long and makes the url ugly if use as the main ID. I keep the _id intact as it help indexing etc. I was wondering about any better strategy so that one can use mongo objectId as more url friendly and easy to remember key. I read the key was a combination of date etc, so any of the part can be used unique within a collection for this purpose.
thanks,
bsr/
If you have an existing ID (say from an existing data set), then it's perfectly OK to override _id with the one you have.
...keeo the _id intact as it help indexing etc
MongoDB indexes the _id field by default. If you start putting integers in the _id field, they will be indexed like everything else.
So most RDBMs provide an "auto-increment" ID. This is nice for small datasets, but really poor in terms of scalability. If you're trying to insert data to 20 servers at once, how do you keep the "auto-increment" intact?
The normal answer is that you don't. Instead, you end up using things like GUIDs for those IDs. In the case of MongoDB, the ObjectId is already provided.
I was wondering about any better strategy so that one can use mongo objectId as more url friendly and easy to remember key
So the problem here is that "easy to remember" ID doesn't really mesh with "highly scalable database". When you have a billion documents, the IDs are not really "easy to remember".
So you have to make the trade-off here. If you have a table that can get really big, I suggest using the ObjectId. If you have a table that's relatively small and doesn't get updated often, (like a "lookup" table) then you can build your own auto-increment.
The choice is really up to you.
You can overwrite the _id yourself. There is no obligation for using the auto-generated object id. What is the problem with overriding _id inside your app according to your own needs?