Skip specific documents when querying them from Firestore database - google-cloud-firestore

In my application user gets to pick specific documents out of the list, for example: 1,5,8 from the list containing documents 1,2,3,4,5,6,7,8,9. When logged into the application next time, I want to first fetch all of the chosen documents (considering pagination, because the number of documents user picked could be very high), and then start fetching the remaining documents as the user finishes viewing picked documents by scrolling down the list.
As it turns out, available Firestore querying methods are not capable of skipping the specific documents.
My current idea:
Make single document references for the user-specific documents and fetch them.
Make single document references for the documents between the range of user-specific documents (From the example that would be documents number: 2,3,4,6,7).
After that start making 'big queries' for the remaining documents.
This looks like a working solution, but I'm sure that there is a better way to accomplish the goal, since what I've done is not asynchronous and very slow. Help is appreciated!

Firestore doesn't have any way to exclude specific documents from queries. You may only include them using some existing field values. If you already know the documents to fetch, you can just get() them individually.
It sounds like you are already able to work around these requirements. I don't believe you have any alternatives.

Related

Firestore array-not-contains alternative solution

TL-DR
I have created a Flutter Firestore posts application. I want to present the user only new posts, which they didn't read yet.
How do I achieve that using Firestore query?
The problem
Each time a user sees a post, their id is added to the post views field.
Next time the user opens the app, I want to present only posts they didn't read yet.
The problem is that query array-not-contains is not supported. How do I achive that functionality?
You're going to have a real hard time with this because Firestore can only give you documents where you know something about the contents of that document. That's how indexes work - by recording data present in the document. Indexes don't track data not present in a document because that's basically an infinite amount of data.
If you are trying to track documents seen by the user, you would think to mark the document as "seen" using a boolean per user, or tracking the document ID somewhere. But as you can see, you can only query for documents that the user has seen, because that's the data present in the system.
What you can do is query for all documents, then query for all the documents the user has seen, then subtract the seen documents from all documents in order to get the unseen documents. But this probably doesn't scale in a way you'd like. (It's essentially the same problem with Firestore indexes not being able to surface documents without some known data present. Firestore won't do the equivalent of a SQL table scan, since that would be a lot of reads you'd have to pay for.)
You can kind of fake it by making sure there is a creation timestamp in each document, and record for each user the timestamp of the most recent seen document. If you require that the user must view the documents in chronological order, then you can simply query for documents with a creation timestamp greater than the timestamp of the latest document seen by the user. This is really as good as it's going to get with Firestore, since you can't query for the absence of data.

Check the subcollection exists before query in firestore

I'm implementing a social media app, where I put a subcollection of "following" under each user. I want to check if the subcollection exists before I query the subcollection, or the app will crash for querying a nonexistent collection. Is there a way to check this?
Collections don't really "exist" in the way that you're thinking. They simply appear when the first document is created, and they disappear when the last document is removed. There is no operation to simply create or remove a collection like a folder in a filesystem, and there is not operation to check to see if a collection "exists". A query against a collection with no documents will not fail (unless it was rejected by a security rule).
The only thing you can really do is query the collection to see if it has any documents at all. You can limit the query to 1 document if you want to minimize costs.

never show same document to same user twice

I have a server storing content 5,000 documents. Lets say I have 1 million users who all query for 50 new documents at their own pace, until all content has been seen.
I want to make sure that each user only sees and interacts with the content once and never again, like Tinder.
My first thought was to tag each document with a list of user-ids of the users who have seen the document. However, this list would get really long... like a list of 1 million user-ids per document - but this sounds like it would really kill query performance.
Does anyone have any better ideas of how I can return content to users just once and never again.
p.s i am planning on doing this build out with mongoDB
p.p.s i thought about making a list of 'document-ids-seen' and attaching that to the user's document, and then with every query made by that user 'filter' out results that match 'document-ids-seen', but same challenge here, the query length would grow linearly as the user keeps interacting and bringing in new content.
The solution depends on the exact meaning of "at their own pace".
Your second post suggests that the time schedule is up to the user, but she will be presented with the documents in an order determined by your application, like e.g. getting news items in the order of the timestamp of news creation. In that case, your timestamp or auto increment solution will work, and it has only a small impact on data volume and query complexity.
If, however, the user may also choose which documents to view, this won't work any more, as the documents already viewed may be scattered across the entire document set. A solution to handle this efficiently consists of two design ideas:
(a) Imagine whether most users, at a given point of time, will have viewed a small or a large part of the entire document set. If only a small selection of documents is expected to be of interest to a particular user, then the count of documents the user has viewed will be rather small. (E.g. assume the documents are about IT and one user only wants to look at MongoDB docs, another mainly at Linux docs.) If all users will be interested in most or all of documents, then the count of documents a particular user has not viewed will be small. (E.g. a set of news that everyone tries to follow.) Depending on which is the case, store only a small list of viewed/not viewed document ids with each user, which will also simplify the query for the documents still to be viewed.
(b) With each user, don't store a list of single document ids (viewed or not viewed), but a list of intervals of such ids. E.g., if you store ids of documents not yet viewed, and some documents get added to the database, then, when a user is opened, her highest interval will be updated from (someLowerId, formerHighestId) to (someLowerId, currentHighestId). When a user views a document, the interval containing its id gets split from (lowId, highId) to (lowId, viewedId - 1), (viewedId + 1, highId), where one or both of these intervals may get empty. Including or excluding intervals like these will also simplify the queries as opposed to listing single ids.
I just had the idea that I could avoid the many-to-many relationship of content-to-users' interaction altogether, if I put a time-stamp on each document, and therefore only queried for more documents after a particular time-stamp 'X'.
Where 'X' could be stored in my 'users' table.
So when opening the app, I would sync my 'users' table, then issue queries after time-stamp 'X', then when results are returned, I'd update my 'users' table again with my new time-stamp X.
Or 'x' could not be a time-stamp, 'x' could just be an auto-incrementing id

How to do global search in MongoDB?

What I mean as global search is searching for documents in specified collections, for example, searching for a name in both User and Organization collections and will return both user and organization documents that match the criteria.
Is it possible to simply copy the documents in User and Organization into another collection and do a search in it?
No, it is not possible to do a multi-collection search automatically. There's no reason however that you couldn't perform the same query on multiple collections and combine the results.
While you could duplicate the data into another collection for query purposes, if you need to be guaranteed that the source collection's values matches identically with the "index" collection, you'll need to implement your own multi-phase transaction (example) as MongoDb doesn't have a multi-collection atomic commit. Or, you can accept the fact that the "index" table may be out of sync. Of course, it could be periodically updated through custom code. Further, it means your working set has increased as you're double storing data. Also, if you then need to grab data from individual collections (to grab more of the source document), you've likely not gained anything and made things worse when compared to doing multiple queries in the first place.
You could store related documents in the same collection and take advantage of the built-in indexing offered. Of course, this comes with the caveat that if your documents are now typed, you may find it more challenging to build MongoDb indexes that are efficient. Every changing/new document must go through the indexing pipeline, which may introduce significant overhead.
If it's only a few collections, I'd just do multiple searches without understanding more deeply your requirements. If not, the second best would be to combine documents into a single collection. Last choice would be to copy the data.

mongodb document structure

My database has users collection,
each user has multiple documents,
each document has multiple sections
each section has multiple works
Users work with works collection very often (add new work, update works, delete works). So my question is what structure of collections should I make? works collection is 100-200 records per section.
Should I make work collection for all users with user _id or there is best solution?
Depends on what kind of queries you have. The guideline is to arrange documents so that you can fetch all you need in ideally one query.
On the other hand, what you probably want to avoid is to have mongo reallocate documents because there's not enough space for a in-place update. You can do that by preallocating enough space, or extracting that frequently changing part into its own collection.
As you can read in MongoDB docs,
Generally, for "contains" relationships between entities, embedding should be be chosen. Use linking when not using linking would result in duplication of data.
So if each user has only access to his documents, I think you're good. Just keep in mind there's a limitation on size (16MB I think) for documents which you should be careful about, since you're embedding lots of stuff.