mongoDB - URL as document ID - mongodb

Considering I want to create mongoDB documents for a bunch of distinct URLs: what would be the pros and cons (if any) of using the actual URL as the documents _id value instead of the default BSON ObjectId. Thanks in advance!
Cheers,
Greg

An overview of the subject here: http://www.mongodb.org/display/DOCS/Object+IDs
It has to be unique, you could potentially put yourself in the position of having to resolve collisions yourself. Better to leave the default _id alone and simply query against a field you're storing in the document, just how God (10gen) intended.
From http://www.mongodb.org/display/DOCS/BSON
The element name "_id" is reserved for use as a primary key id, but
you can store anything that is unique in that field. The database
expects that drivers will prevent users from creating documents that
violate these constraints.
From #mongodb
stupid _id values will probably make querying slow, but that's about it
And another user from #mongodb
Tell him the collisions will result in garbage data

Related

Does length of indexed field matter while searching?

The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.
I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents

Mongodb id on bulk insert performance

I have a class/object that have a guid and i want to use that field as the _id object when it is saved to Mongodb. Is it possible to use other value instead of the ObjectId?
Is there any performance consideration when doing bulk insert when there is an _id field? Is _id an index? If i set the _id to different field, would it slow down the bulk insert? I'm inserting about 10 million records.
1) Yes you can use that field as the id. There is no mention of what API (if any) you are using for inserting the documents. So if you would do the insertion at the command line, the command would be:
db.collection.insert({_id : <BSONString_version_of_your_guid_value>, field1 : value1, ...});
It doesn't have to be BsonString. Change it to whatever Bson value is closest matching to your guid's original type (except the array type. Arrays aren't allowed as the value of _id field).
2) As far as i know, there IS effect on performance when db.collection.insert when you provide your own ids, especially in bulk, BUT if the id's are sorted etc., there shouldn't be a performance loss. The reason, i am quoting:
The structure of index is a B-tree. ObjectIds have an excellent
insertion order as far as the index tree is concerned: they are always
increasing, meaning they are always inserted at the right edge of
B-tree. This, in turn, means that MongoDB only has to keep the right
edge of the B-Tree in memory.
Conversely, a random value in the _id field means that _ids will be
inserted all over the tree. Then the machine must move a page of the
index into memory, update a tiny piece of it, then probably ignore it
until it slides out of memory again. This is less efficient.
:from the book `50 Tips and Tricks for MongoDB Developers`
The tip's title says - "Override _id when you have your own simple, unique id." Clearly it is better to use your id if you have one and you don't need the properties of an ObjectId. And it is best if your ids are increasing for the reason stated above.
3) There is a default index on _id field by MongoDB.
So...
Yes. It is possible to use other types than ObjectId, including GUID that will be saved as BinData.
Yes, there are considerations. It's better if your _id is always increasing (like a growing number, or ObjectId) otherwise the index needs to rebuild itself more often. If you plan on using sharding, the _id should also be hashed evenly.
_id indeed has an index automatically.
It depends on the type you choose. See section 2.
Conclusion: It's better to keep using ObjectId unless you have a good reason not to.

Change size of Objectid

In MongoDb ObjectId is a 12-byte BSON type.
Is there any way to reduce the size of objectID?
No. It's a BSON data type. It's like asking a 32-bit integer to shrink itself.
Every object must have _id property, but you are not restricted to ObjectId.
Every document in a MongoDB collection needs to have a unique _id but the value does not have to be an ObjectId. Therefore, if you are looking to reduce the size of documents in your collection you have two choices:
Pick one of the unique properties of your documents and use it as the _id field. For example, if you have an accounts collection where the account ID--provided externally--is part of your data model, you could store the account ID in the _id field.
Manage primary keys for the collection yourself. Many drivers support custom primary key factories. As #assylias suggests, going with an int will give you good space savings but, still, you will use more space than if you can use one of the fields in your model as the _id.
BTW, the value of an _id field can be composite: you can use an Object/hash/map/dictionary. See, for example, this SO question.
If you are using some type of object/model framework on top of Mongo, I'd be careful with (1). Some frameworks have a hard time with developers overriding id generation. For example, I've had bad experience with Mongoid in Ruby. In that case, (2) may be the safer way to go as the generation happens at the driver layer.

MongoDB: How to set another field (different from _id) as ID of mongo document?

For example, what if I would like to have Username as unique ID of my mongo document instead of having default "_id"?
I'd like to achieve this using mongo.exe console.
MongoDB requires the _id property as the unique primary key that it automatically indexes.
You have two options:
Use the _id property and set it with the username.
Create a username property, then add an index on that new property. You will still have the _id, but can query using the username.
_id is important in replication. You can create a collection without an _id index, but you will never be able to replicate the database. Replication requires the _id index on every collection.
I know, that it is possible to have such record like {username: "John", age: 12}. I mean, without _id. How to do this?
I am unsure how #Serges answer was not clear enough however, to re-iterate, there is no way. Technically you can still take out the _id in a capped collection in some versions of MongoDB however, other than that you cannot.
Also I am concerned by your use of username, it might not be the best thing for activities such as sharding and could create "hotspots" of high acitivty on clusters etc, it is all theoretical since I am not in your position with your data, however, it is best to consider this stuff.

Mongo _id for subdocument array

I wish to add an _id as property for objects in a mongo array.
Is this good practice ?
Are there any problems with indexing ?
I wish to add an _id as property for objects in a mongo array.
I assume:
{
g: [
{ _id: ObjectId(), property: '' },
// next
]
}
Type of structure for this question.
Is this good practice ?
Not normally. _ids are unique identifiers for entities. As such if you are looking to add _id within a sub-document object then you might not have normalised your data very well and it could be a sign of a fundamental flaw within your schema design.
Sub-documents are designed to contain repeating data for that document, i.e. the addresses or a user or something.
That being said _id is not always a bad thing to add. Take the example I just stated with addresses. Imagine you were to have a shopping cart system and (for some reason) you didn't replicate the address to the order document then you would use an _id or some other identifier to get that sub-document out.
Also you have to take into consideration linking documents. If that _id describes another document and the properties are custom attributes for that document in relation to that linked document then that's okay too.
Are there any problems with indexing ?
An ObjectId is still quite sizeable so that is something to take into consideration over a smaller, less unique id or not using an _id at all for sub-documents.
For indexes it doesn't really work any different to the standard _id field on the document itself and a unique index across the field should work across the collection (scenario dependant, test your queries).
NB: MongoDB will not add an _id to sub-documents for you.