Implement a firestore infinite scolling list which updates on collection changes - google-cloud-firestore

What am I trying to accomblish?
I am currently facing a bunch of problems implementing a real time updated infinite scrolling list with the firestore backend.
In my application I want to display comments (like in e.g. YouTube or other social media sites) to the user. Since the number of comments in a collection might be quite big, I see an option to paginate the collection, while receiving real time updates based on snapshots. So I initially load x comments with the option to load up to x more items whenever the user presses a button. In the image below x = 3.
The standard solution
Based on other SO questions I figured out that one is supposed to use the .limit() and the .startAfter() methods to implement such behaviour.
So the first page is loaded as:
query = this
.collection
.orderBy('date', descending: true)
.limit(pageSize);
query.snapshots().map((QuerySnapshot snap) {
lastVisible = snap.documents.last;
// convert the DocumentSnapshot into model object
});
All additional pages are loaded with the following code:
query = this.collection
.orderBy('date', descending: true)
.startAfterDocument(lastVisible)
.limit(pageSize);
Furthermore, I'd like to add that this code is located in a repository class which is used with the BLoC pattern similar to the code shown in Felix Angelov's Flutter Todos Tutorial.
While Felix uses a simple flutter list to show the items, I have a list of pages showing comments based on the data provided by their BLoCs. Note that each BLoC accesses a shared repository (parts of the repository code is shown below).
The Problem with the standard solution
With the code shown above I see multiple problems:
If a comment is inserted in the middle of the ordered collection (how is not of importance), the added comment is shown because of the Stream provided by the snapshot. However, another comment that already existed is not longer shown because of the .limit() operator in the query. One could increase the limit by one but I'm not sure how to edit a snapshot query. In the case that editing a snapshot query is not possible, one could create a new (and bigger) query, but that would cost additional reads.
Similar to 1., if a comment in the middle is deleted, the snapshot will return a list which does not longer contain the deleted comment, however another comment (which is already covered by a different page) appears. E.g., in the scenario shown in the image above 5 comments are loaded. Assuming that comment 3 is deleted, comment 2 will show twice.
Improving the standard solution
Based on these two problems discussed above, I decided that the solution is not sufficient and I implemented a solution which first loads x items by obtaining two "interval" documents. Then a query which fetches the required items in an interval using .startAtDocument() and .endAtDocument() is created, which eliminates the .limit() operator.
DocumentSnapshot pageStartDocument;
DocumentSnapshot pageEndDocument;
Future<Stream<List<Comment>>> comments() async {
// This fetches the first and next Document as initialization
// (maybe should be implemented in constructor)
if (pageStartDocument == null) {
Query query = collection
.orderBy('date', descending: true)
.limit(pageSize);
QuerySnapshot snap = await query.getDocuments();
pageStartDocument = snap.documents.first;
pageEndDocument = snap.documents.last;
} else {
Query query = collection
.orderBy('date', descending: true)
.startAfterDocument(pageEndDocument)
.limit(pageSize);
QuerySnapshot snap = await query.getDocuments();
pageStartDocument = snap.documents.first;
pageEndDocument = snap.documents.last;
}
// This fetches a subcollection of elements from the collection
// with the tradeof of double the reads
Query query = this
.collection
.orderBy('date', descending: true)
.startAtDocument(pageStartDocument)
.endAtDocument(pageEndDocument);
return query.snapshots().asyncMap((QuerySnapshot snap) async {
// convert the QuerySnapshot into model objects
});
As commented in the code, this solution has the following drawback:
Since a query is required to obtain the pageStartDocument and pageEndDocument, the number of reads is doubled, because all the data is read again when the second query is created. The performance impact might be neglectable because I believe the data is cashed, however having 2x database read cost can be significant.
Question:
Since I am not only implementing pagination but also real time updates (with collection insertions), the .limit() operator seems to be not working in my case.
How does one implement a pagination with real time updates (without double reads)?
Side Notes:
I watched how Todd Kerpelman devoures a massive gummy bear while explaining pagination, but in the video it seems to be not so trivial (and a point was made that a tradeoff might be necessary).
If further code from my side is required please say so in the comments.
For the scenario of comments it does not really makes sense that an item is inserted into the middle of the (sorted) collection. However I would like to understand how it should be implemented if the scenario requires such a feature.

this may come as a very late answer. The OP probably won't need help anymore, however for anyone who should stumble on this I wrote a tutorial with a solution that partly solve this:
the Bloc keep a list of stream subscription to keep trace of realtime updates to the list.
however concerning the insertion problem, since when you will have paginated streams based on a document cursor, upon insertion or deletion you necessarily need to reset your pagination stream subscriptions unless it is the last page.
Hence my solution around it was to update the list when modifications occur but reset it when insertions or deletions occur.
Here is the link to the tutorial :
https://link.medium.com/2SPf2Qsbsgb

Related

cloud_firestore package: different behaviour, equivalent queries

I am running a Flutter mobile app that queries data points from Firestore. Until very recently, I have been running the following query:
return firestore
.collection('organisations/$organisationId/alerts/$alertId/deviceTrails/$deviceTrailId/markers')
.where('deviceCreatedUtc', isGreaterThanOrEqualTo: timestamp)
.snapshots().handleError(handleFirestoreError);
I found, while running this query, that it would work well and provide a certain number of snapshots, but that it would stop generating snapshots without throwing any errors after a period of a few minutes. Changing the query to the following seemed to resolve this error (snapshots became more reliable):
return firestore
.collection('organisations/$organisationId/alerts/$alertId/deviceTrails/$deviceTrailId/markers')
.orderBy('deviceCreatedUtc')
.startAt([timestamp])
.snapshots().handleError(handleFirestoreError);
Other than the ordering (which is not strictly necessary in my case, since I am adding the points to my on-device database instead of using them directly), there does not appear to be much in the way of functional difference between these queries. But the former fails silently, while the latter is more reliable.
Is there any reason why this would happen? And is one of the queries intrinsically more efficient than the other?
As mentioned by Frank van Puffelen, two snippets should do exactly the same (outside of ordering as you said). You can fill a bug on the repo here :
For more information you can refer to the Documentation (listen a document with onsnapshot() method )and Documentation(working with list of data in flutter firebase.)
index.js
const query = db.collection('cities').where('state', '==', 'CA');
const observer = query.onSnapshot(querySnapshot => {
console.log(`Received query snapshot of size ${querySnapshot.size}`);
// ...
}, err => {
console.log(`Encountered error: ${err}`);
});

query modified documents and load other from cache in firestore

To reduce the number of reads it is a general technique to maintain timestamp of last edits in documents and comparing timestamp to load only modified documents.
Here is an example from firebase docs:
db.collection('groups')
.where('participants', 'array-contains', 'user123')
.where('lastUpdated', '>', lastFetchTimestamp)
.orderBy('lastUpdated', 'desc')
.limit(25)
They claim this would reduce the reads.
I tried implementing the use-case, I have a document as shown below:
I have sections in my app where I use scorecards to list top scorers, My query is as follows
private void loadFriendScores(UserScorecard scorecard) {
Query friendScoreQuery=scorecardRef.whereIn("uid", scorecard.getFriendsList())
.whereGreaterThan("lastActive", scorecard.getLastActive()).limit(5);
FirestoreRecyclerOptions<UserScorecard> friends = new FirestoreRecyclerOptions
.Builder<UserScorecard>()
.setQuery(friendScoreQuery, UserScorecard.class)
.setLifecycleOwner(getViewLifecycleOwner())
.build();
TopScoresAdapter friendsAdapter = new TopScoresAdapter(friends, getContext(), this);
binding.topScorersFriendsRcv.setAdapter(friendsAdapter);
binding.topScorersFriendsRcv.setLayoutManager(new LinearLayoutManager(getContext()));
}
I assumed the query to load all modified changes along with others (from cache):
The screen on android is as follows:
While I expected it to load all of my friendlist (as I understood from docs).
I suppose they did not mention that we need to fetch the cached list, there is a way to do a cached request in firestore.
but I'm not sure if this is reliable perhaps the cache will be cleaned and the last request would be empty ,
then, you should save the last response using the localstorage library
#react-native-async-storage/async-storage
I'm struggling myself with the costs issue. The reads are way higher then 50 reads and I'm not sure how to count them properly. so I upvoted the issue

Firestore pagination by offset

I would like to create two queries, with pagination option. On the first one I would like to get the first ten records and the second one I would like to get the other all records:
.startAt(0)
.limit(10)
.startAt(9)
.limit(null)
Can anyone confirm that above code is correct for both condition?
Firestore does not support index or offset based pagination. Your query will not work with these values.
Please read the documentation on pagination carefully. Pagination requires that you provide a document reference (or field values in that document) that defines the next page to query. This means that your pagination will typically start at the beginning of the query results, then progress through them using the last document you see in the prior page.
From CollectionReference:
offset(offset) → {Query}
Specifies the offset of the returned results.
As Doug mentioned, Firestore does not support Index/offset - BUT you can get similar effects using combinations of what it does support.
Firestore has it's own internal sort order (usually the document.id), but any query can be sorted .orderBy(), and the first document will be relative to that sorting - only an orderBy() query has a real concept of a "0" position.
Firestore also allows you to limit the number of documents returned .limit(n)
.endAt(), .endBefore(), .startAt(), .startBefore() all need either an object of the same fields as the orderBy, or a DocumentSnapshot - NOT an index
what I would do is create a Query:
const MyOrderedQuery = FirebaseInstance.collection().orderBy()
Then first execute
MyOrderedQuery.limit(n).get()
or
MyOrderedQuery.limit(n).get().onSnapshot()
which will return one way or the other a QuerySnapshot, which will contain an array of the DocumentSnapshots. Let's save that array
let ArrayOfDocumentSnapshots = QuerySnapshot.docs;
Warning Will Robinson! javascript settings is usually by reference,
and even with spread operator pretty shallow - make sure your code actually
copies the full deep structure or that the reference is kept around!
Then to get the "rest" of the documents as you ask above, I would do:
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).get()
or
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).onSnapshot()
which will start AFTER the last returned document snapshot of the FIRST query. Note the re-use of the MyOrderedQuery
You can get something like a "pagination" by saving the ordered Query as above, then repeatedly use the returned Snapshot and the original query
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).limit(n).get() // page forward
MyOrderedQuery.endBefore(ArrayOfDocumentSnapshots[0]).limit(n).get() // page back
This does make your state management more complex - you have to hold onto the ordered Query, and the last returned QuerySnapshot - but hey, now you're paginating.
BIG NOTE
This is not terribly efficient - setting up a listener is fairly "expensive" for Firestore, so you don't want to do it often. Depending on your document size(s), you may want to "listen" to larger sections of your collections, and handle more of the paging locally (Redux or whatever) - Firestore Documentation indicates you want your listeners around at least 30 seconds for efficiency. For some applications, even pages of 10 can be efficient; for others you may need 500 or more stored locally and paged in smaller chucks.

Contention-friendly database architecture for large documents and inner arrays

Context
I have a database with a collection of documents using this schema (shortened schema because some data is irrelevant to my problem):
{
title: string;
order: number;
...
...
...
modificationsHistory: HistoryEntry[];
items: ListRow[];
finalItems: ListRow[];
...
...
...
}
These documents can easily reach 100 or 200 kB, depending on the amount of items and finalItems that they hold. It's also very important that they are updated as fast as possible, with the smallest bandwidth usage possible.
This is inside a web application context, using Angular 9 and #angular/fire 6.0.0.
Problems
When the end user edits one item inside the object's item array, like editing just a property, reflecting that inside the database requires me to send the entire object, because firestore's update method doesn't support array indexes inside the field path, the only operations that can be done on arrays are adding or deleting an element as described inside documentation.
However, updating an element of the items array by sending the entire document creates poor performances for anyone without a good connection, which is the case for a lot of my users.
Second issue is that having everything in realtime inside one document makes collaboration hard in my case, because some of these elements can be edited by multiple users at the same time, which creates two issues:
Some write operations may fail due to too much contention on the document if two updates are made in the same second.
The updates are not atomic as we're sending the entire document at once, as it doesn't use transactions to avoid using bandwidth even more.
Solutions I already tried
Subcollections
Description
This was a very simple solution: create a subcollection for items, finalItems and modificationsHistory arrays, making them easy to edit as they now have their own ID so it's easy to reach them to update them.
Why it didn't work
Having a list with 10 finalItems, 30 items and 50 entries inside modificationsHistory means that I need to have a total of 4 listeners opened for one element to be listened entirely. Considering the fact that a user can have many of these elements opened at once, having several dozens of documents being listened creates an equally bad performance situation, probably even worse in a full user case.
It also means that if I want to update a big element with 100 items and I want to update half of them, it'll cost me one write operation per item, not to mention the amount of read operations needed to check permissions, etc, probably 3 per write so 150 read + 50 write just to update 50 items in an array.
Cloud Function to update the document
const {
applyPatch
} = require('fast-json-patch');
function applyOffsets(data, entries) {
entries.forEach(customEntry => {
const explodedPath = customEntry.path.split('/');
explodedPath.shift();
let pointer = data;
for (let fragment of explodedPath.slice(0, -1)) {
pointer = pointer[fragment];
}
pointer[explodedPath[explodedPath.length - 1]] += customEntry.offset;
});
return data;
}
exports.updateList = functions.runWith(runtimeOpts).https.onCall((data, context) => {
const listRef = firestore.collection('lists').doc(data.uid);
return firestore.runTransaction(transaction => {
return transaction.get(listRef).then(listDoc => {
const list = listDoc.data();
try {
const [standard, custom] = JSON.parse(data.diff).reduce((acc, entry) => {
if (entry.custom) {
acc[1].push(entry);
} else {
acc[0].push(entry);
}
return acc;
}, [
[],
[]
]);
applyPatch(list, standard);
applyOffsets(list, custom);
transaction.set(listRef, list);
} catch (e) {
console.log(data.diff);
}
});
});
});
Description
Using a diff library, I was making a diff between previous document and the new updated one, and sending this diff to a GCF that was operating the update using the transaction API.
Benefits of this approach being that since transaction happens inside GCF, it's super fast and doesn't consume too much bandwidth, plus the update only requires a diff to be sent, not the entire document anymore.
Why it didn't work
In reality, the cloud function was really slow and some updates were taking over 2 seconds to be made, they could also fail due to contention, without firestore connector knowing it, so no possibility to ensure data integrity in this case.
I will be edited accordingly to add more solutions if I find other stuff to try
Question
I feel like I'm missing something, like if firestore had something I just didn't know at all that could solve my use case, but I can't figure out what it is, maybe my previously tested solutions were badly implemented or I missed something important. What did I miss? Is it even possible to achieve what I want to do? I am open to data remodeling, query changes, anything, as it's mostly for learning purpose.
You should be able to reduce the bandwidth required to update your documents by using Maps instead of Arrays to store your data. This would allow you to send only the item that is being updated using its key.
I don't know how involved this would be for you to change, but it sounds like less work than the other options.
You said that it's not impossible for your documents to reach 200kb individually. It would be good to keep in mind that Firestore limits document size to 1mb. If you plan on supporting documents beyond that, you will need to find a way to fragment the data.
Regarding your contention issues... You might consider a system that "locks" the document and prevents it from receiving updates while another user is attempting to save. You could use a simple message system built with websockets or Firebase FCM to do this. A client would subscribe to the document's channel, and publish when they are attempting an update. Other clients would then receive a notice that the document is being updated and have to wait before they can save their own changes.
Also, I don't know what the contents of modificationsHistory look like, but that sounds to me like the type of data that you might keep in a subcollection instead.
Of the solutions you tried, the subcollection seems like the most scalable to me. You could look into the possibility of not using onSnapshot listeners and instead create your own event system to notify clients of changes. I suppose it could work similar to the "locking" system I mentioned above. A client sends an event when it updates an item belonging to a document. Other clients subscribed to that document's channel will know to check the database for the newest version.
Your diff-approach appeared mostly sensible, details aside.
You should store items inline, but defer modificationsHistory into a sub collection. For the entire root document, record which elements of modificationsHistory have been merged yet (by timestamp should suffice), and all elements not merged yet, you have to re-apply individually on each client, querying with aforementioned timestamp.
Each entry in modificationsHistory should not describe a single diff, but whenever possible a set of diffs.
Apply changes from modificationsHistory collections onto items in batch, deferred via GCF. You may defer this arbitrarily far, and you may want to exclude modifications performed only in the last few seconds, to account for not established consistency in Firestore. There is no risk of contention, that way.
Cleanup from the modificationsHistory collection has to be deferred even further, until you can be sure that no client has still access to an older revision of the root document. Especially if you consider that the client is not strictly required to update the root document when the listener is triggered.
You may need to reconstruct the patch stack on the client side if modificationsHistory changes in unexpected ways due to eventual consistency constraints. E.g. if you have a total order in the set of patches, you need to re-apply the patch stack from base image if the collection unexpectedly suddenly contains "older" patches unknown to the client before.
All in all, you should be able avoid frequent updates all together, and limit this solely to inserts into to modificationsHistory sub-collection. With bandwidth requirements not exceeding the cost of fetching the entire document once, plus streaming the collection of not-yet-applied patches. No contention expected.
You can tweak for how long clients may ignore hard updates to the root document, and how many changes they may batch client-side before submitting a new diff. Latter is also a tradeof with regard to how many documents another client has to fetch initially, with regard to max-documents-per-query limits.
If you require other information which are likely to suffer from contention, like list of users currently having a specific document open, that should go into sub-collections as well.
Should the latency for seeing changes by other users eventually turn out to be unacceptable, you may opt for an additional, real-time capable data channel for distribution of patches on a specific document. ActiveMQ or some other message broker operated on dedicated resources, running independently from FireStore.

Swift and Cloud Firestore Transactions - getDocuments?

Transactions in Cloud Firestore support getting a document using transaction.getDocument, but even though there is a .getDocuments method, there doesn’t seem to be a .getDocuments for getting multiple documents that works with a transaction.
I have a Yelp-like app using a Cloud Firestore database with the following structure:
- Places to rate are called spots.
- Each spot has a document in the spots collection (identified by a unique documentID).
- Each spot can have a reviews collection containing all reviews for that spot.
- Each review is identified by its own unique documentID, and each review document contains a rating of the spot.
Below is an image of my Cloud Firestore setup with some data.
I’ve tried to create a transaction getting data for all of the reviews in a spot, with the hope that I could then make an updated calculation of average review & save this back out to a property of the spot document. I've tried using:
let db = Firestore.firestore()
db.runTransaction({ (transaction, errorPointer) -> Any? in
let ref = db.collection("spots").document(self.documentID).collection("reviews")
guard let document = try? transaction.getDocuments(ref) else {
print("*** ERROR trying to get document for ref = \(ref)")
return nil
}
…
Xcode states:
Value of type ‘Transaction’ has no member ‘getDocuments’.
There is a getDocument, which that one can use to get a single document (see https://firebase.google.com/docs/firestore/manage-data/transactions).
Is it possible to get a collection of documents in a transaction? I wanted to do this because each place I'm rating (spot) has an averageRating, and whenever there's a change to one of the ratings, I want to call a function that:
- starts a transaction (done)
- reads in all of the current reviews for that spot (can't get to work)
- calculates the new averageRating
- updates the spot with the new averageRating value.
I know Google's FriendlyEats uses a technique where each change is applied to the current average rating value, but I'd prefer to make a precise re-calculation with each change to keep numerical precision (even if it's a bit more expensive w/an additional query).
Thanks for advice.
No. Client libraries do not allow you to make queries inside of transactions. You can only request specific documents inside of a query. You could do something really hacky, like run the query outside of the transaction, then request every individual document inside the transaction, but I would not recommend that.
What might be better is to run this on the server side. Like, say, with a Cloud Function, which does allow you to run queries inside transactions. More importantly, you no longer have to trust the client to update the average review score for a restaurant, which is a Bad Thing.
That said, I still might recommend using a Cloud Function that does some of the same logic that Friendly Eats does, where you say something along the lines of New average = Old average + new review / (Total number of reviews) It'll make sure you're not performing excessive reads if your app gets really popular.