Create a sort index for millions of records - mongodb

I have an application that runs a MongoDB database. This database will store 5 million documents per user. The web application that uses this database will display a 5 thousand of these documents on a page at any given time and will be a selection of the 5 million. Each document must have a sort order (or rank) such that the web page will be able to allow the user to sort their 5 thousand records, by dragging items up.down as they see fit (a sortable list).
I have read articles about how Trello uses a double-float decimal number to change the value of the sorted item in a list, but this seems to only allow for 50-odd worst-case sorts, so will not accommodate the large number of items in the users list. My questions is how do I do this?

Related

MongoDB Large documents with arrays VS Small and many documents

We have a food delivery app. We used MySQL but we are migrating to MongoDB now.
We have tables like this: Restaurants, menus, orders, etc...
To migrating MongoDB, I'm confused with how should I design schema for the 'menus' collection
Option 1)
Use the same way with MySQL, the 'menus' collection will have a lot of menus (documents) from different restaurants
Option 2)
Each store will have one document in the 'menus' collection, embed their menus into their document
To summarize, which one is best for MongoDb? 20000 small document vs 100 documents and each document contains 100 or 200 array/object inside of it.
I'm asking for performance and if this app grows with tens of new restaurants
Every menu document is almost 256 byte, for calculation, we can easily say each store will have menus between 50 and 200. (So we will not exceed the 16MB document limit of MongoDB)
Also, we need to use 'menus' in-app on two different pages, first one: 20 best food from different restaurants for your near location, the second one: each store will have its own page with their own menus.
Note: When a restaurant adds all of their menus, their menus stay 98% the same for at least 6-8 months. They don't modify/edit or change easily except price or little details.
In MongoDB, you store your data in the format you will want to read them later.
So you need to analyse your usecases:
20 best food from different restaurants for your near location
each store will have its own page with their own menus.
I'm not sure about the first use case, but it looks like you're always going to filter on restaurant (location) first, so in that case store the menu items with the restaurant.
If you want to filter directly on menu items, you would store them in a separate collection with dedicated indices. In your case, that could work too, if you add (denormalize) the geo location to each different menu item.
And you can also do both: store the menu items in the restaurant document ánd as separate documents.

Design Approaches to Storing Recently Used Items in MongoDB

I have a design problem with multiple solutions, I wanted to see if anyone could give insights on the quantitative trade offs to which approach is better.
Problem: I have a list of items in MongoDB, somewhere between 10 to 100,000 (reasonably). I would like to be able to present my top 100 recently used items in order from most recently used, to least recently used.
Solution 1: I add a timestamp to each of the MongoDB items. When I used the item, I update the timestamp for that item. To generate my top 100 recently used items, I query MongoDB, sort the items by last used timestamp, and then present the top 100 sorted items. My thoughts are, this solution adds the least amount of change to the current implementation, but could be slow if it's querying and sorting for example 100,000 items (especially if multiple users are performing this query).
Solution 2: I create a new MongoDB collection storing just the top 100 items in an array. Every time I used an item, I add an element to the array, and if the array is greater than 100 items, I pop the last item. My thoughts are, this solution seems to be faster since the query just returns everything from the recent collection.
In terms of performance (for the user), which approach is better? Or is there an alternate even better approach?

never show same document to same user twice

I have a server storing content 5,000 documents. Lets say I have 1 million users who all query for 50 new documents at their own pace, until all content has been seen.
I want to make sure that each user only sees and interacts with the content once and never again, like Tinder.
My first thought was to tag each document with a list of user-ids of the users who have seen the document. However, this list would get really long... like a list of 1 million user-ids per document - but this sounds like it would really kill query performance.
Does anyone have any better ideas of how I can return content to users just once and never again.
p.s i am planning on doing this build out with mongoDB
p.p.s i thought about making a list of 'document-ids-seen' and attaching that to the user's document, and then with every query made by that user 'filter' out results that match 'document-ids-seen', but same challenge here, the query length would grow linearly as the user keeps interacting and bringing in new content.
The solution depends on the exact meaning of "at their own pace".
Your second post suggests that the time schedule is up to the user, but she will be presented with the documents in an order determined by your application, like e.g. getting news items in the order of the timestamp of news creation. In that case, your timestamp or auto increment solution will work, and it has only a small impact on data volume and query complexity.
If, however, the user may also choose which documents to view, this won't work any more, as the documents already viewed may be scattered across the entire document set. A solution to handle this efficiently consists of two design ideas:
(a) Imagine whether most users, at a given point of time, will have viewed a small or a large part of the entire document set. If only a small selection of documents is expected to be of interest to a particular user, then the count of documents the user has viewed will be rather small. (E.g. assume the documents are about IT and one user only wants to look at MongoDB docs, another mainly at Linux docs.) If all users will be interested in most or all of documents, then the count of documents a particular user has not viewed will be small. (E.g. a set of news that everyone tries to follow.) Depending on which is the case, store only a small list of viewed/not viewed document ids with each user, which will also simplify the query for the documents still to be viewed.
(b) With each user, don't store a list of single document ids (viewed or not viewed), but a list of intervals of such ids. E.g., if you store ids of documents not yet viewed, and some documents get added to the database, then, when a user is opened, her highest interval will be updated from (someLowerId, formerHighestId) to (someLowerId, currentHighestId). When a user views a document, the interval containing its id gets split from (lowId, highId) to (lowId, viewedId - 1), (viewedId + 1, highId), where one or both of these intervals may get empty. Including or excluding intervals like these will also simplify the queries as opposed to listing single ids.
I just had the idea that I could avoid the many-to-many relationship of content-to-users' interaction altogether, if I put a time-stamp on each document, and therefore only queried for more documents after a particular time-stamp 'X'.
Where 'X' could be stored in my 'users' table.
So when opening the app, I would sync my 'users' table, then issue queries after time-stamp 'X', then when results are returned, I'd update my 'users' table again with my new time-stamp X.
Or 'x' could not be a time-stamp, 'x' could just be an auto-incrementing id

How to use limit operator on two collections in MongoDB?

I want to display a user's feed that contains documents from both post and activity collections. However, I only want to display a certain numbers of posts and activites documents and only show more when user scroll mouse to the bottom of the page. The problem is I don't know how to use limit operator on two separate collections.
There is one way to go about this is (for example, if I want to limit to 20 documents to show on user page) to query for 10 documents on post and 20 documents on activity and then sort them to get 20 ordered documents. But this introduces performance issue.
How could you go about this situation?
All operations in MongoDB tend to work on one collection only, and so do queries, and hence limit. You will have to query 20 entries from each collection, and sort in your application.

Data model advice for Mongo collection that contains complex objects

we store apple app data in a database (http://www.apple.com/itunes/affiliates/resources/documentation/itunes-enterprise-partner-feed.html).
we want to optimize for one type of query: find all apps that meet some criteria. criteria: (1) avg rating of app; (2) number of app ratings; (3) devices supported by app; (4) countries where app is sold; (5) current price of app; and (6) date when app went free. the query should be as fast as possible. example query: "find all apps with > 600 ratings, averages 5 stars, supports iPads and iPhones, is sold in the US, and dropped their price to $0.00 two days ago."
based on the apple schema, there is price information for every country. assuming apple supports 100 countries, each app will have 100 prices -- one for each country. we also need to store the historical prices for each app, meaning an app with 10 price changes will have 1000 prices (assuming 100 countries).
three questions:
1) how do you advise we store the price data in mongo to make queries fast? right now, we're thinking of storing prices as an array of objects. each object consists of three elements: (1) date; (2) country; (3) price.
2) if we store price data as objects in an array, what do we need to do to make searches against price data very fast. again, the common price search is something like, "find all apps that dropped their price to $0.00 2 days again in the USA store."
3) any gotchas we should be mindful of in storing the data?
Personally, I would have a separate collection for the daily price data -- 1 record per day per app (the compound natural key), with that day's set of 100 numbers for that app. This way the records will never need to grow or relocate -- that's a big win. With proper indexes, most any query against this collection can be made to perform well. Keep the field names small for more efficient storage.
I would keep a separate collection for the app "master data" -- 1 record per app. In those records you can memoize the most recent date the app went free, a snapshot of the most recent by-country price vector, and similar snapshot values of any other "summary" data that may form the selection criteria for an app search. Aggregations to compute and record such values, should they may become costly, can then be performed in the background at convenient times.
Hope that's a help! Great that you're asking these questions up front. :)