Firestore - If I give Document ID in Alphabetical Order will it create hotspots/inefficiency? - google-cloud-firestore

I want to build users collection which will have Documents with IDs as A,B,C,...Z.
Each Document will contain a subcollection which will contain data of all users with starting letter A,B,...Z depending on document ID.
In Firestore documentation it is mentioned
Do not use monotonically increasing document IDs such as:
Customer1, Customer2, Customer3, ...
Product 1, Product 2, Product 3, ...
Such sequential IDs can lead to hotspots that impact latency.
So Does using alphabets as Document ID create the same issue and cause hotspots or some other in efficiencies?
Thanks

Yes, you could observe the same effect. It's best to use fully randomized IDs like you get if you use add() to create a new document.

Related

There is a doc in firestore that has a ID_number field when created. If I want to create another doc, how do I make its ID equal to the previous ID+1?

I'm using Flutter and Firestore, and I want to be able to create documents with an assigned ID field inside them, but I don't know how to make that the new document IDs Field is equal to the last document ID field number + 1.
For Example, if I have this field inside a document, I want to make the next document's correlativeNumber equal to this one + 1 = 87
The best option that you have is to store the last correlativeNumber into a document. Each time a new document is added, increment that number by 1. In this way, you can always know which number was used previously.
But there is something that should take into consideration. When it comes to document IDs, according to the official documentation:
Do not use monotonically increasing document IDs such as:
Customer1, Customer2, Customer3, ...
Product 1, Product 2, Product 3, ...
Such sequential IDs can lead to hotspots that impact latency.
So it's best to use the Firestore's built-in identifiers, which are by definition completely unique.

Sort documents in firestore collection chronologically

Is there a way to sort documents in collection chronologically when they are created? Currently, they are all over the place. For example, in To-Do app, when you add new item to collection, it should display at the bottom, last, not somewhere in the middle.
You will need to define an order based on some data in the document, and order your queries based on that field.
The typical solution for time-base order to make sure your documents all contain a timestamp field that you can use to sort them. When you call add() (or other methods to update data), you can tell Firestore to use the current time using FieldValue.serverTimestamp():
collection(...).add({
..., // your other fields
createdOn: FieldValue.serverTimestamp()
})
Then you can use that field to sort when querying with orderBy():
collection(...).orderBy('createdOn')
Try using DateTime.now() for the document ID. This should put the collection in chronological order.
For example:
Firestore.instance.collection('Posts').document(DateTime.now().toString()).setData({});

Does length of indexed field matter while searching?

The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.
I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents

How would I fetch random pairs from mongodb

So I have an interesting use case that I'm stuck trying to find a efficient mongo query for.
To begin, I have 12,000 categories with 100,000 posts. I need to randomly select a 100 pairs of posts, from random categories. The pairs are randomly selected from categories, but each pair must have both posts belonging to the same category.
Users look at each pair to rate and once they finish looking at the 100, they fetch another 100 random posts (preferably not any of the same pairs they've already seen).
So the requirements are:
Fetch 100 pairs of posts randomly from a random set of categories
Optional requirements:
Not to return the same pairs they've already rated
Mongo Collections
Users
Categories
Posts
CategoryId
Ratings (embedded collection in posts)
How would I do this in Mongo... should I move some of this data off of mongo to another db if it's easier?
Yes. Very interesting question. My suggestion is to put a randomVal field on your post documents. Then you can sort on {CategoryId: 1, randomVal: 1}. The result will be a cursor that groups all the posts by CategoryId but randomly within that grouping. If you conceptually think of this as an array, you can pick all the even indexed posts, and pair them with an odd neighbor to get unique random pairs within categories.
I think that how to select the random pairs from this list will take some experimentation, but my gut instinct is that the best approach would be to have a separate process that periodically caches a collection of pairs which are sorted by a separate randomVal2. The user facing queries would just increment through this pairs collection 100 at a time.
I think you can achieve this in two query. First you need to use aggregation framework and do a map reduce operation on Posts collection. In the map phase use category id as the key and emit post ids to reducer.
In the reduce phase choose two random id from each category. In the end of the map reduce you will have a list of Post ids. Then retrieve those posts from Posts collection.
Add a ratedBy field to Post document and when user rated a post add his or her userName to ratedBy field. Then use that field as a filter to your map reduce command in the first place so that you don't bring already rated documents to user.
Good luck

Mongoid: retrieving documents whose _id exists in another collection

I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.