Firestore strategy for excluding items the user has already seen - google-cloud-firestore

Imagine a tinder-like app.
Users see a feed of items, and rate some of them
The feed is loaded in batches of 20 items
Each batch should include the latest items, but exclude the ones the user has rated
So, I first load the collection of the user's rated items, and then:
const itemsCollection = await firebase.firestore().collection('items')
.orderBy('id')
.where('id', 'not-in', userRatedItems)
.orderBy('publish_date', 'desc')
.limit(20)
.get();
The problem is that 'not-in' is limited to searching in a list of 10 items, and the user's past ratings could be in the thousands.
I could theoretically load all rated items, and remove them from the list of candidates, but:
I don't know how many items to query for, since I don't know how many will be excluded as already rated
I have a feeling that there is a better strategy for what I am trying to do

Related

How to organize FireStore Collections and Documents based on app similar to BlaBlaCar rides

It's my first time working with FireStore. I'm working on a ridesharing app with Flutter that uses Firebase Auth where users can create trips and offer rides similarly to BlaBlaCar, where other users can send requests to join a ride. I’m having difficulty not only deciding the potential collections and paths to use, but also how to even structure it.
For simplicity at this stage, I want any user to be able to see all trips created, but when they go to their “My Rides” page, they will only see the rides that they’ve participated in. I would be grateful for any kind of feedback.
Here are the options I’ve considered:
Two collections, “Users” and “Trips”. The path would look something like this:
users/uid and trips/tripsId with a created_by field
One collection of “Users” and a sub-collection of “Trips". The path seems to make more sense to me, which would be users/uid/trips/tripId but then I don't know how other users could access all the rides on their home feed.
I'm inclined to go with the first option of two collections. Also very open to any other suggestions or help. Thanks.
I want any user to be able to see all trips created, but when they go
to their “My Rides” page, they will only see the rides that they’ve
participated in
I make the assumption that participating in a ride is either being the author or being a passenger of the ride.
I would go for 2 collections: one for users and one for trips. In a trip document you add two fields:
createdBy with the uid of the creator
participants: an Array where you store the author's uid and all the other participants uids (passengers)
This way you can easily query for:
All the rides
All the rides created by a user
All the rides for which a user is a participant, using arrayContains.
(Regarding the limit of 1 MiB for the maximum size for a document I guess this is not a problem because the number of passengers of a ride shouldn't be so huge that the Array fields size makes the document larger than 1 Mib!)
Note that the second approach with subcollections could also be used since you can query with collections group queries but, based on the elements in your question, I don't see any technical advantage.

How to query Firebase with about 3,000 parameters. If any of the parameters exists, take action

I have over 3,000 contacts in my phone. My firestore will hold over 1,000,000 contacts.
I can pull contacts from firestore to client to compare, I can push contact from client to firestore to compare. But I do not consider any of the two means effective.
pull over 1,000,000 records from firestore to check against 3,000 records is not efficient as online data may grow to over a billion.
pushing 3,000 records at 10 per request will lead to too much requests to the firestore and incur unnecessary cost.
What is the best way to compare these two data and return common elements?
If I were you, I will do like this way.
Dumping two databases and comparing them with another database.
Add one more flag for 1,000,000 contacts to know which one has the same value inside the database which has 3000 data.
Upload them(1,000,000 contacts) to Firebase,
To set up the filter to get querySnapshot.(Refer to Sample Code1)
When you have a new contact list(from 3000 data that one)
To use that new contact to filter that database(1,000,000 contacts) and remarked them with the flag 'contactExistingFlag'
Sample Code1
QuerySnapshot querySnapshot = await _firestore.collection('contactTable').where('contactExistingFlag', isEqualTo: 0).get();
//isEqualTo: 0 means that contact is new
//isEqualTo: 1 means that contact is existing
Sample Code2
QuerySnapshot querySnapshot = await _firestore.collection('contactTable').where('contactName', arrayContainsAny: ["New Member1", "New Member2"]).get();
//Use the array-contains-any operator to combine up to 10 array-contains clauses on the same field with a logical OR.
Firestore (both Cloud & Firebase) does not have any built-in operator to compare two sets. The only option is to iterate over each number and find if it has a match as recommended in this stackoverflow post. Searching for the Phone Contacts by sending them via Firebase Queries seems like a better approach.
There are ways to design your application in such a way that this kind of operation ( comparing 2 sets of numbers between address book & Firestore) is performed once in a lifetime per user signing up for an application. In future, if there is a new user getting added in Firestore database, we can do a reverse update i.e. check for that newly added number on every user's app (address book) and update that contact accordingly for given user's app if the match is found.
Note: querying the Firestore database for a matching document (with a matching phone number) does not imply billing for read operation against all documents. Please refer to the billing section of Firestore queries for more details.

Implement a firestore infinite scolling list which updates on collection changes

What am I trying to accomblish?
I am currently facing a bunch of problems implementing a real time updated infinite scrolling list with the firestore backend.
In my application I want to display comments (like in e.g. YouTube or other social media sites) to the user. Since the number of comments in a collection might be quite big, I see an option to paginate the collection, while receiving real time updates based on snapshots. So I initially load x comments with the option to load up to x more items whenever the user presses a button. In the image below x = 3.
The standard solution
Based on other SO questions I figured out that one is supposed to use the .limit() and the .startAfter() methods to implement such behaviour.
So the first page is loaded as:
query = this
.collection
.orderBy('date', descending: true)
.limit(pageSize);
query.snapshots().map((QuerySnapshot snap) {
lastVisible = snap.documents.last;
// convert the DocumentSnapshot into model object
});
All additional pages are loaded with the following code:
query = this.collection
.orderBy('date', descending: true)
.startAfterDocument(lastVisible)
.limit(pageSize);
Furthermore, I'd like to add that this code is located in a repository class which is used with the BLoC pattern similar to the code shown in Felix Angelov's Flutter Todos Tutorial.
While Felix uses a simple flutter list to show the items, I have a list of pages showing comments based on the data provided by their BLoCs. Note that each BLoC accesses a shared repository (parts of the repository code is shown below).
The Problem with the standard solution
With the code shown above I see multiple problems:
If a comment is inserted in the middle of the ordered collection (how is not of importance), the added comment is shown because of the Stream provided by the snapshot. However, another comment that already existed is not longer shown because of the .limit() operator in the query. One could increase the limit by one but I'm not sure how to edit a snapshot query. In the case that editing a snapshot query is not possible, one could create a new (and bigger) query, but that would cost additional reads.
Similar to 1., if a comment in the middle is deleted, the snapshot will return a list which does not longer contain the deleted comment, however another comment (which is already covered by a different page) appears. E.g., in the scenario shown in the image above 5 comments are loaded. Assuming that comment 3 is deleted, comment 2 will show twice.
Improving the standard solution
Based on these two problems discussed above, I decided that the solution is not sufficient and I implemented a solution which first loads x items by obtaining two "interval" documents. Then a query which fetches the required items in an interval using .startAtDocument() and .endAtDocument() is created, which eliminates the .limit() operator.
DocumentSnapshot pageStartDocument;
DocumentSnapshot pageEndDocument;
Future<Stream<List<Comment>>> comments() async {
// This fetches the first and next Document as initialization
// (maybe should be implemented in constructor)
if (pageStartDocument == null) {
Query query = collection
.orderBy('date', descending: true)
.limit(pageSize);
QuerySnapshot snap = await query.getDocuments();
pageStartDocument = snap.documents.first;
pageEndDocument = snap.documents.last;
} else {
Query query = collection
.orderBy('date', descending: true)
.startAfterDocument(pageEndDocument)
.limit(pageSize);
QuerySnapshot snap = await query.getDocuments();
pageStartDocument = snap.documents.first;
pageEndDocument = snap.documents.last;
}
// This fetches a subcollection of elements from the collection
// with the tradeof of double the reads
Query query = this
.collection
.orderBy('date', descending: true)
.startAtDocument(pageStartDocument)
.endAtDocument(pageEndDocument);
return query.snapshots().asyncMap((QuerySnapshot snap) async {
// convert the QuerySnapshot into model objects
});
As commented in the code, this solution has the following drawback:
Since a query is required to obtain the pageStartDocument and pageEndDocument, the number of reads is doubled, because all the data is read again when the second query is created. The performance impact might be neglectable because I believe the data is cashed, however having 2x database read cost can be significant.
Question:
Since I am not only implementing pagination but also real time updates (with collection insertions), the .limit() operator seems to be not working in my case.
How does one implement a pagination with real time updates (without double reads)?
Side Notes:
I watched how Todd Kerpelman devoures a massive gummy bear while explaining pagination, but in the video it seems to be not so trivial (and a point was made that a tradeoff might be necessary).
If further code from my side is required please say so in the comments.
For the scenario of comments it does not really makes sense that an item is inserted into the middle of the (sorted) collection. However I would like to understand how it should be implemented if the scenario requires such a feature.
this may come as a very late answer. The OP probably won't need help anymore, however for anyone who should stumble on this I wrote a tutorial with a solution that partly solve this:
the Bloc keep a list of stream subscription to keep trace of realtime updates to the list.
however concerning the insertion problem, since when you will have paginated streams based on a document cursor, upon insertion or deletion you necessarily need to reset your pagination stream subscriptions unless it is the last page.
Hence my solution around it was to update the list when modifications occur but reset it when insertions or deletions occur.
Here is the link to the tutorial :
https://link.medium.com/2SPf2Qsbsgb

MeteorJS + MongoDB: How should I set up my collections when users can have the same document?

I wasn't quite sure how to word my question in one line, but here's a more in depth description.
I'm building a Meteor app where users can "own" the same document. For example, a user has a list of movies they own, which of course multiple people can own the same movie. There are several ways I've thought of structuring my database/collections for this, but I'm not sure which would be best.
I should also note that the movie info comes from an external API, that I'm currently storing into my own database as people find them in my app to speed up the next lookup.
Option 1 (My current config):
One collection (Movies) that stores all the movies and their info. Another collection that basically stores a list of movie ids in each document based on userId. On startup, I get the list of ids, find the movies in my database, and store them in local collections (there are 3 of them). The benefit that I see from this is I only have to store the movie once. The downside that I've ran into so far is difficulty in keeping things in sync and properly loading on startup (waiting on the local collections to populate).
Option 2 :
A Movies collection that stores a list of movie objects for each user. This makes the initial lookup and updating very simple, but it means I'll be storing the same fairly large documents multiple times.
Option 3:
A Movies collection with an array of userids on each movie that own that movie. This sounds pretty good too, but when I update the movie with new info, will an upsert work and keep the userids safe?
Option 3 seems sensible. Some of the choice may depend on the scale of each collection or the amount of links (will many users own the same movie, will users own many movies).
Some helpful code snippits for using option 3:
Upsert a movie detail (does not affect any other fields on the document if it already exists):
Movies.upsert({name: "Jaws"}, {$set: {year: 1975}});
Set that a user owns a movie (also does not affect any other document fields. $addToSet will not add the value twice if it is already in the array while using $push instead would create duplicates):
Movies.update({_id: ~~some movie id~~}, {$addToSet: {userIds: ~~some user id~~}});
Set that a user no longer owns a movie:
Movies.update({_id: ~~some movie id~~}, {$pull: {userIds: ~~some user id~~}});
Find all movies that a user owns (mongo automatically searches the field's array value):
Movies.find({userIds: ~~some user id~~});
Find all movies that a user owns, but exclude the users field from the result (keep the document small in the case that movie.userIds is a large array or protect the privacy of other user-movie ownership):
Movies.find({userIds: ~~some user id~~}, {userIds: 0});

MongoDB storing user-specific data on shared collection objects

I'm designing an application that processes RSS feeds using MongoDB. Currently my collections are as follows:
Entry
fields: content, feed_id, title, publish_date, url
Feed
fields: description, title, url
User
fields: email_address
subscriptions (embedded collection; fields: feed_id, tags)
A user can subscribe to feeds which are linked from the embedded subscription collection. From the subscriptions I can get a list of all the feeds a user should see and also the corresponding entries.
How should I store entry status information (isRead, isStarred, etc.) that is specific to a user? When a user views an entry I need to record isRead = 1. Two common queries I need to be able to perform are:
Find all entries for a specific feed where isRead = 0 or no status exists currently
For a specific user, mark all entries prior to a publish date with isRead = 1 (this could be hundreds or even thousands of records so it must be efficient)
Hmm, this is a tricky one!
It makes sense to me to store a record for entries that are unread, and delete them when they're read. I'm basing this on the assumption that there will be more read posts than unread for each individual user, so you might as well not have documents for all of those already-read entries sitting around in your DB forever. It also makes it easier to not have to worry about the 16MB document size limit if you're not having to drag around years of history with you everywhere.
For starred entries, I would simply add an array of Entry ObjectIds to User. No need to make these subscription-specific; it'll be much easier to pull a list of items a User has starred that way.
For unread entries, it's a little more complex. I'd still add it as an array, but to satisfy your requirement of being able to quickly mark as-read entries before a specific date, I would denormalize and save the publish-date alongside the Entry ObjectId, in a new 'UnreadEntry' document.
User
fields: email_address, starred_entries[]
subscriptions (embedded collection; fields: feed_id, tags, unread_entries[])
UnreadEntry
fields: id is Entry ObjectId, publish_date
You need to be conscious of the document limit, but 16MB is one hell of a lot of unread entries/feeds, so be realistic about whether that's a limit you really need to worry about. (If it is, it should be fairly straightforward to break out User.subscriptions to its own document.)
Both of your queries now become fairly easy to write:
All entries for a specific feed that are unread:
user.subscriptions.find(feedID).unread_entries
Mark all entries prior to a publish date read:
user.subscriptions.find(feedID).unread_entries.where(publish_date.lte => my_date).delete_all
And, of course, if you simply need to mark all entries in a feed as read, that's very easy:
user.subscriptions.find(feedID).unread_entries.delete_all