How would I fetch random pairs from mongodb - mongodb

So I have an interesting use case that I'm stuck trying to find a efficient mongo query for.
To begin, I have 12,000 categories with 100,000 posts. I need to randomly select a 100 pairs of posts, from random categories. The pairs are randomly selected from categories, but each pair must have both posts belonging to the same category.
Users look at each pair to rate and once they finish looking at the 100, they fetch another 100 random posts (preferably not any of the same pairs they've already seen).
So the requirements are:
Fetch 100 pairs of posts randomly from a random set of categories
Optional requirements:
Not to return the same pairs they've already rated
Mongo Collections
Ratings (embedded collection in posts)
How would I do this in Mongo... should I move some of this data off of mongo to another db if it's easier?

Yes. Very interesting question. My suggestion is to put a randomVal field on your post documents. Then you can sort on {CategoryId: 1, randomVal: 1}. The result will be a cursor that groups all the posts by CategoryId but randomly within that grouping. If you conceptually think of this as an array, you can pick all the even indexed posts, and pair them with an odd neighbor to get unique random pairs within categories.
I think that how to select the random pairs from this list will take some experimentation, but my gut instinct is that the best approach would be to have a separate process that periodically caches a collection of pairs which are sorted by a separate randomVal2. The user facing queries would just increment through this pairs collection 100 at a time.

I think you can achieve this in two query. First you need to use aggregation framework and do a map reduce operation on Posts collection. In the map phase use category id as the key and emit post ids to reducer.
In the reduce phase choose two random id from each category. In the end of the map reduce you will have a list of Post ids. Then retrieve those posts from Posts collection.
Add a ratedBy field to Post document and when user rated a post add his or her userName to ratedBy field. Then use that field as a filter to your map reduce command in the first place so that you don't bring already rated documents to user.
Good luck


Firestore: Is it efficient to create a Firestore Collection per each user in your app? [duplicate]

I am making an app where users can follow each other. To decide how to model it in firestore I would like to know how does collection size affect query performance.
I first thought of making it like this:
So I would have main collection relationships, then each document would represent one user and he would have two collections - following and followers, and in those collections I would store documents with data like id,name,email,..
Then when user1 wants to see his followers, I would get all documents under relationships/userId_1/followers, and if he would like to see who he follows I would get documents under relationships/userId_1/following
I also thought about doing it like this:
--------user1:"user5id" (field)
--------user2:"user4id" (field)
.........(other fields)
--------user1:"user4id" (field)
--------user2:"user5id" (field)
.........(other fields)
I would have one main collection relationships where each document would represent one following relationship, document name would be firstUserId_secondUSerId (means firstUserId follows secondUserId) and I would also have two fields user1 and user2 that would store ids of two users where user1 follows user2
So if I am {myUserId} and I would like to get all the people who I follow I would do a query on relationships collection where user1 = myUserId
And if I would like to get all the people who follow me I would do a query on relationships collection where user2 = myUserId
since each document represents relation user1 follows user2.
So my question is which way would be more efficient with querying the data.
In first case each user would have collection of his followers/following and I would just get the documents, in second case relationship would have many document representing user1->follows->user2 relation.
I know that I would be billed by number of documents that query function returns, but how fast would it be if it would need to search through large collection.
Collection size has no bearing on the performance or cost of a query. Both are determined entirely by size of the result size (number of documents). So, a query for 10 documents out of 100 performs and costs the same as a query for 10 documents out of 100,000. The size of 10 is the only thing that matters here.
See also: Queries scale with the size of your result set, not the size of your data set

Firestore - If I give Document ID in Alphabetical Order will it create hotspots/inefficiency?

I want to build users collection which will have Documents with IDs as A,B,C,...Z.
Each Document will contain a subcollection which will contain data of all users with starting letter A,B,...Z depending on document ID.
In Firestore documentation it is mentioned
Do not use monotonically increasing document IDs such as:
Customer1, Customer2, Customer3, ...
Product 1, Product 2, Product 3, ...
Such sequential IDs can lead to hotspots that impact latency.
So Does using alphabets as Document ID create the same issue and cause hotspots or some other in efficiencies?
Yes, you could observe the same effect. It's best to use fully randomized IDs like you get if you use add() to create a new document.

Does length of indexed field matter while searching?

The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.
I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents

Querying MongoDB: retreive shops by name and by location with one single query

Hi folks!
I'm building a "search shops" application using MEAN Stack.
I store shops documents in MongoDB "location" collection like this:
_id: .....
name: ...//shop name
location : //...GEOJson
UI provides to the users one single input for shops searching. Basically, I would perform one single query to retrieve in the same results array:
All shops near the user (eventually limit to x)
All shops named "like" the input value
On logical side, I think this is a "$or like" query
Based on this answer
Using full text search with geospatial index on Mongodb
probably assign two special indexes (2dsphere and full text) to the collection is not the right manner to achieve this, anyway I think this is a different case just because I really don't want to apply sequential filter to results, "simply" want to retreive data with 2 distinct criteria.
If I should set indexes on my collection, of course the approach is to perform two distinct queries with two distinct mehtods ($near for locations and $text for name), and then merge the results with some server side logic to remove duplicate documents and sort them in some useful way for user experience, but I'm still wondering if exists a method to achieve this result with one single query.
So, the question is: is it possible or this kind of approach is out of MongoDB purpose?
Hope this is clear and hope that someone can teach something today!

Sorting in Elasticsearch based on Multiple indices

I need to perform sorting on Elasticsearch documents...
I have one index created for MongoDB collections 'products', which have price and product ratings in it.
I have another collection 'product_hits' in which I am save one record (product_id, IP etc.) on every click of particular product by user. Now I want to sort product documents on by considering Product hit count (maybe which I can get through aggregation), price and product rating.
In short I want to rank all the products based on price and popularity as other sites does.
How can I achieve this in elasticsearch?
I gone though scripting of elasticsearch and I am able to sort on price and product rating..... but I didn't find anything useful in which we can perform sort based on multiple indices.
is it possible?? or do I have to sort all records on my own through coding?
I am using play framework.
I hope this question can be understood... Its complex..!!!