Any way in Firebase Firestore to sustainably compare data collections? - google-cloud-firestore

I'm fairly new to Firestore NoSQL but not databases or structuring. I know this is not a new question but I can't seem to wrap my head around some of the answers.
I have a simple react native rating style app that presents the items to be rated in a list format, most popular items first. My goal is to not present the user with the same item in the future or on refresh once they have already rated the item.
I have 3 collections, a standard 'user', a 'ratings', and a 'scores'. The item info and overall calculated rating score is held in the 'scores'. The 'ratings' collection keeps track of the actual user's score of each item so it can be searched and changed. The list is queried by pulling the 'scores' ordered by highest calced score.
Here's where I'm stuck. Even though I technically have a reference to everything a user has voted on in the 'ratings' collection, all of the examples of this I can find suggest to store a reference id array of each item voted on in the users 'user' collection. When I pull all the items to be scored I check each item id to see if it has already been scored before I push it into a data array. Here are the problems I see...
If the user has already scored 200 of the first 250 top results, Firestore will count 250 reads each refresh even though the user is only seeing 50 new items. This could scale to a huge cost/problem just to show a few new or extra items.
Firebase arrays I believe have a max of 40,000 index entries per document. Probably never an issue someone rating 40,000 different items but seems crazy to structure data with a dead end.
On the same concept of 40,000 entries I have played with the approach of storing user.id's in the 'score' collection on items the user has voted on. I tried using 'not-in' from Firestore docs.
const q = query(scoresRef, where('uids', 'not-in', [user.id]));
This seems to only work on single string entries and seems to have no impact on arrays with more than one index.
I don't know if I'm overlooking an obvious answer or if I'm trying to force the impossible. I just can't figure out how Firestore can backend anything big without some relational data. Sorry for the long question, I have been coding for weeks and now I'm stuck. Any code, links, or suggestions would be appreciated. Also, if this is just impossible with Firestore, any suggestions for a low cost cloud SQL backend that plays well with react native expo would also be appreciated.

Related

Firestore array-not-contains alternative solution

TL-DR
I have created a Flutter Firestore posts application. I want to present the user only new posts, which they didn't read yet.
How do I achieve that using Firestore query?
The problem
Each time a user sees a post, their id is added to the post views field.
Next time the user opens the app, I want to present only posts they didn't read yet.
The problem is that query array-not-contains is not supported. How do I achive that functionality?
You're going to have a real hard time with this because Firestore can only give you documents where you know something about the contents of that document. That's how indexes work - by recording data present in the document. Indexes don't track data not present in a document because that's basically an infinite amount of data.
If you are trying to track documents seen by the user, you would think to mark the document as "seen" using a boolean per user, or tracking the document ID somewhere. But as you can see, you can only query for documents that the user has seen, because that's the data present in the system.
What you can do is query for all documents, then query for all the documents the user has seen, then subtract the seen documents from all documents in order to get the unseen documents. But this probably doesn't scale in a way you'd like. (It's essentially the same problem with Firestore indexes not being able to surface documents without some known data present. Firestore won't do the equivalent of a SQL table scan, since that would be a lot of reads you'd have to pay for.)
You can kind of fake it by making sure there is a creation timestamp in each document, and record for each user the timestamp of the most recent seen document. If you require that the user must view the documents in chronological order, then you can simply query for documents with a creation timestamp greater than the timestamp of the latest document seen by the user. This is really as good as it's going to get with Firestore, since you can't query for the absence of data.

Suggested way to structure Firestore database for deeply nested set of spreadsheet like objects

My application is used for creating production budgets for complex projects (construction, media productions etc.)
The structure of the budget is as follows:
The budget contains "sections",
the sections contain "accounts"
the accounts contains "subaccounts"
the subaccounts contain line items.
Line items have a number of fields, (units, rate, currency, tax etc.) and a calculated total
Or perhaps using Firestore to do these cascading calculations is the wrong approach? I should just load a single complex budget document into my app, do all the cacluations and updates on the clients, and then write back the entire budget as a single document when the user presses "save budget"?
Certain fields in line items may have alpha numeric codes which represent numeric values, which a user can use instead of a hard-coded number, e.g. user can enter "=build-weeks" and define that with a formula that evaluates to say "7" which is then used in the calculation of a total.
Line items bubble up their totals, so subaccounts have total equal to the sum of their line items,
Accounts total equals the total of their subaccounts,
sections total equals sum of accounts totals,
and budget total is total of section totals.
The question si how to aggregate into this data into documents comprising the budget.
Budgets may be sort of long, say 5,000 linesitems or more in total. Single accounts may have hundreds of line items.
Users will most likely look at a all of the line items for a given account, so it occurred to me
to make individual documents for sections, accounts and subaccounts, and make line items a map within a sub account.
The problem main concern I have with this approach is that when the user changes, say the exchange rate of currency of a line item, or changes the calculated value of a named value like "build-weeks" I will ahve to retrieve all the individual line items containing that curency or named value, recalculate the total, and then bubble up the changes through the hierarchy.
This seems not that complicated if each line item is its own document, I can just search the collection for the presence of the code in question, recalculate the line item, and use a cloud function to bubble up teh changes maybe.
But if all the lineitems are contained in an array of maps within each subaccount map item,
it seems like it will be quite tedious to find and change them when necessary..
On the other hand -- keeping these documents so small seems like a lot of document reads when somebody is reviewing a budget, or say, printing it, If somebody just clicks on a bunch of accounts, it might be 100's of reads per click to retrieve all the line items and hundreds or a thousand writes when somebody changes the value of a often used named value like "build-weeks".
Does anybody have any thoughts on the obvious "right" answer to this? Or does it just depend on what I want to optimize for - firestore costs, responsiveness of app, complexity of code?
From my stand point, there is no obvious answer to your problem and indeed it does depend on what you want to optimize for.
However there are a few points that you need to consider on your decision:
Documents in Firestore have a limit of 1Mb/Document;
Documents in Firestore have a limit of 20000 fields;
Queries are shallow, so you don't get data from subcollections on the same query;
For considerations 1 and 2, this means that if you choose the design you database to a big document containing everything, even though you said that your app will have lots of data, I doubt that it will be more than the limits mentioned, still, do consider those. Also, how necessary is it to get all the data at once, this could represent performance and user battery/data usage issues (if you are making a mobile app).
For consideration 3, it means that you would have to make many reads if you choose to get all the data for your sections divided in subdocuments, this will mean more cost to you but better performance for users.
To make the right call on this problem I suggest that you talk to possible users of your solution and understand the problem that you are trying to fix and what they expect of the app. Also, it might be interesting to take a look at the How to Structure Your Data and Maps, Arrays and Subcollections videos, as they explain in a more visual way how Firestore behaves and it could be helpful to antecipate problems that the approach you choose could cause.
Hope I was able to help with these considerations.

Design Approaches to Storing Recently Used Items in MongoDB

I have a design problem with multiple solutions, I wanted to see if anyone could give insights on the quantitative trade offs to which approach is better.
Problem: I have a list of items in MongoDB, somewhere between 10 to 100,000 (reasonably). I would like to be able to present my top 100 recently used items in order from most recently used, to least recently used.
Solution 1: I add a timestamp to each of the MongoDB items. When I used the item, I update the timestamp for that item. To generate my top 100 recently used items, I query MongoDB, sort the items by last used timestamp, and then present the top 100 sorted items. My thoughts are, this solution adds the least amount of change to the current implementation, but could be slow if it's querying and sorting for example 100,000 items (especially if multiple users are performing this query).
Solution 2: I create a new MongoDB collection storing just the top 100 items in an array. Every time I used an item, I add an element to the array, and if the array is greater than 100 items, I pop the last item. My thoughts are, this solution seems to be faster since the query just returns everything from the recent collection.
In terms of performance (for the user), which approach is better? Or is there an alternate even better approach?

Is a good idea to store chat messages in a mongodb collection?

I'm developing a chat app with node.js, redis, socket.io and mongodb. MongoDB comes the last and for persisting the messages.
My question is what would be the best approach for this last step?
I'm afraid a collection with all the messages like
{
id,
from,
to,
datetime,
message
}
can get too big too soon, and is going to get very slow for reading purposes, what do you think?
Is there a better approach you already worked with?
In MongoDB, you store your data in the format you will want to read them later.
If what you read from the database is a list of messages filtered on the 'to' field and with a dynamic datetime filter, then this schema is the perfect fit.
Don't forget to add an index on the fields you will be querying on, then it will be reasonable fast to query them, even over millions of records.
If you would, for example, always show a full history of a full day, you would store all messages for a single day in one document. If both types of queries occur a lot, you would even store your messages in both formats.
If storage is an issue, you could also use capped collection, which will automatically delete messages of e.g. over 1 year old.
I think the db structure is fine, the way you mentioned in your question.
You may assign some unique id for chat between each pair and keep it in each record of chat. Retrieve based on that when you want to show it.
Say 12 is the unique id for chat between A and B, retrieve should be based on 12 when you want to show chat for A and B.
So your db structure can be like:-
{
id,
from,
to,
datetime,
message,
uid
}
Remember, you can optimize your retrieve, if you will give some limit(say 100 at a time) for retrieve. If user is scrolling beyond 100 retrieve more 100 chats. Which will solve lots of retrieve.
When using limit, retrieve based on date created and use sort with find query as well.
Just a thought here, are the messages plain text or are you allowed to share images and videos as well ?
If it's the latter then storing all the chats for a single day in one collection might not work out.
Actually if you have images and videos shares allowed then you need to take into account the. 16mb document restriction as well.

never show same document to same user twice

I have a server storing content 5,000 documents. Lets say I have 1 million users who all query for 50 new documents at their own pace, until all content has been seen.
I want to make sure that each user only sees and interacts with the content once and never again, like Tinder.
My first thought was to tag each document with a list of user-ids of the users who have seen the document. However, this list would get really long... like a list of 1 million user-ids per document - but this sounds like it would really kill query performance.
Does anyone have any better ideas of how I can return content to users just once and never again.
p.s i am planning on doing this build out with mongoDB
p.p.s i thought about making a list of 'document-ids-seen' and attaching that to the user's document, and then with every query made by that user 'filter' out results that match 'document-ids-seen', but same challenge here, the query length would grow linearly as the user keeps interacting and bringing in new content.
The solution depends on the exact meaning of "at their own pace".
Your second post suggests that the time schedule is up to the user, but she will be presented with the documents in an order determined by your application, like e.g. getting news items in the order of the timestamp of news creation. In that case, your timestamp or auto increment solution will work, and it has only a small impact on data volume and query complexity.
If, however, the user may also choose which documents to view, this won't work any more, as the documents already viewed may be scattered across the entire document set. A solution to handle this efficiently consists of two design ideas:
(a) Imagine whether most users, at a given point of time, will have viewed a small or a large part of the entire document set. If only a small selection of documents is expected to be of interest to a particular user, then the count of documents the user has viewed will be rather small. (E.g. assume the documents are about IT and one user only wants to look at MongoDB docs, another mainly at Linux docs.) If all users will be interested in most or all of documents, then the count of documents a particular user has not viewed will be small. (E.g. a set of news that everyone tries to follow.) Depending on which is the case, store only a small list of viewed/not viewed document ids with each user, which will also simplify the query for the documents still to be viewed.
(b) With each user, don't store a list of single document ids (viewed or not viewed), but a list of intervals of such ids. E.g., if you store ids of documents not yet viewed, and some documents get added to the database, then, when a user is opened, her highest interval will be updated from (someLowerId, formerHighestId) to (someLowerId, currentHighestId). When a user views a document, the interval containing its id gets split from (lowId, highId) to (lowId, viewedId - 1), (viewedId + 1, highId), where one or both of these intervals may get empty. Including or excluding intervals like these will also simplify the queries as opposed to listing single ids.
I just had the idea that I could avoid the many-to-many relationship of content-to-users' interaction altogether, if I put a time-stamp on each document, and therefore only queried for more documents after a particular time-stamp 'X'.
Where 'X' could be stored in my 'users' table.
So when opening the app, I would sync my 'users' table, then issue queries after time-stamp 'X', then when results are returned, I'd update my 'users' table again with my new time-stamp X.
Or 'x' could not be a time-stamp, 'x' could just be an auto-incrementing id