Does length of indexed field matter while searching? - mongodb

The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.

I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents

Related

MongoDB: Sort if value is present in array or not

question: I have a document where a field picked_by_ids is an array of all user ids. Now I have to sort this table such that if provided id is there in picked_by_ids then it should come at top and all other records should list after that.
Note: We may have a large table and need to add indexing for this. Also need to add pagination.
Thanks.

create unique id in mongodb from last inserted id using pymongo

Is there a way I can find the last inserted document and the field, i.e. _id or id such that I can increment and use when inserting a new document?
The issue is that I create my own id count, but I do not store this, now I've deleted records, I cannot seem to add new records because I am attempting to use the same id.
There is no way to check insertion order in MongoDB, because the database does not keep any metadata in the collections regading the documents.
If your _id field is generated server-side then you need to have a very good algorithm for this value in order to provide collision avoidance and uniqueness while at the same time following any sequential constraints that you might have.

_id field compared to index

I'm planning to add a Collection to a mongodb database that will have a text field that should be unique for each Document. Lookups from this Collection will almost always be based on this field. This field can contain as many as 100+ chars.
My question is, should this field be the _id field, or should I just add an index for it? What would the performance impact for either approach be?
I suggest you to use your unique text as _id.
It will reduce data size and eliminate an index. Here is the reference. 9th page will guide you.

The fastest way to show Documents with certain property first in MongoDB

I have collections with huge amount of Documents on which I need to do custom search with various different queries.
Each Document have boolean property. Let's call it "isInTop".
I need to show Documents which have this property first in all queries.
Yes. I can easy do sort in this field like:
.sort( { isInTop: -1 } );
And create proper index with field "isInTop" as last field in it. But this will be work slowly, as indexes in mongo works best with unique fields.
So is there is solution to show Documents with field "isInTop" on top of each query?
I see two solutions here.
First: set Documents wich need to be in top the _id from "future". As you know, ObjectId contains timestamp. So I can create ObjectId with timestamp from future and use natural order
Second: create separate collection for Ducuments wich need to be in top. And do queries in it first.
Is there is any other solutions for this problem? Which will work fater?
UPDATE
I have done this issue with sorting on custom field which represent rank.
Using the _id field trick you mention has the problem that at some point in time you will reach the special time, and you can't change the _id field (without inserting a new document and removing the old one).
Creating a special collection which just holds the ones you care about is probably the best option. It gives you the ability to logically (and to some extent, physically) separate the documents.
Newly introduced in mongodb there is also support for a "sparse" index which may fulfill your needs as well. You could only set the "isInTop" field when you want it to be special, and then create a sparse index on it which would not have the problems you would normally have with a single indexed boolean field (in btrees).

mongoDB - URL as document ID

Considering I want to create mongoDB documents for a bunch of distinct URLs: what would be the pros and cons (if any) of using the actual URL as the documents _id value instead of the default BSON ObjectId. Thanks in advance!
Cheers,
Greg
An overview of the subject here: http://www.mongodb.org/display/DOCS/Object+IDs
It has to be unique, you could potentially put yourself in the position of having to resolve collisions yourself. Better to leave the default _id alone and simply query against a field you're storing in the document, just how God (10gen) intended.
From http://www.mongodb.org/display/DOCS/BSON
The element name "_id" is reserved for use as a primary key id, but
you can store anything that is unique in that field. The database
expects that drivers will prevent users from creating documents that
violate these constraints.
From #mongodb
stupid _id values will probably make querying slow, but that's about it
And another user from #mongodb
Tell him the collisions will result in garbage data