Firebase/cloud firestore: onSnapshot() vs on()

Firebase/cloud firestore: onSnapshot() vs on() - google-cloud-firestore

I have been using onSnapshot successfully to alert my code to changes in underlying data, as in
// Set up to listen for changes to the "figures" collection, that is,
// someone has created a new figure that we will want to list on the screen.
setFiguresListener: function () {
// `figuresCR` is a collection reference defined elsewhere
return this.figuresCR.onSnapshot((iFigs) => {
iFigs.forEach((fSnap) => {
const aFigure = figureConverter.fromFirestore(fSnap, null);
const dbid = aFigure.guts.dbid; // ID of the "figure" in the database
nos2.theFigures[dbid] = aFigure; // update the local copy of the data
});
nos2.ui.update();
console.log(` Listener gets ${iFigs.size} figures`);
});
But I now read about on in the docs. It explains:
[The on() function] Listens for data changes at a particular location.
This is the primary way to read data from a Database. Your callback
will be triggered for the initial data and again whenever the data
changes. Use off( )to stop receiving updates. See Retrieve Data on
the Web for more details.
The syntax is a bit different, and on() seems to do much the same as onSnapshot().
So what is the real difference? Should we be using on() instead of onSnapshot()?

on() is an operation for reading from Firebase Realtime Database. That's a completely different database with different APIs than Firestore. They have essentially no overlap. There is no on() operation with Firestore.
If you're working with Firestore, ignore all the documentation about Realtime Database, and stick to using onSnapshot() for getting realtime updates.

Other tyros who fall into this tar pit: in the API doc pages, you might think that since firestore is a database under firebase, you could look for help under firebase.database. But no: look only in the next section, firebase.firestore.

Related

Get stream of data from DB | Sembast | Flutter

I'm working with Sembast nowadays and was wondering If there's any way to create a stream of data that could get me all the values inside the DB. My requirement is to setup a listener on that stream so that whenever the data change is triggered, I could do something with it.
Documentation on Sembast is pretty limited and I'm now sure how I can do this. Usually I use the .find method to fetch all the values from within my db. I've been using a stringMapFactory to store my records.
Can we do this ? Any help would be really appreciated.

Sorry for the poor documentation.
It is quite similar to firestore. You can listen to all changes in a store
// Track every store changes
var query = store.query();
var subscription = query.onSnapshots(db).listen((snapshots) {
// snapshots always contains the list of all records
// ...
});
Basically you have a query on the store (with or without filter) that you can query or listen for changes.
See https://pub.dev/documentation/sembast/latest/sembast/QueryRef/onSnapshots.html
https://github.com/tekartik/sembast.dart/blob/master/sembast/doc/change_listener.md

If you use Hive as db, may use Hive Box as listenable.
ValueListenableBuilder<Box<YOUR_BOX_MODEL>>(
valueListenable: box.listenable(),
builder: (context,value,child){}
)

Sometimes my Cloud Function returns old data from Firestore. Is it a cache problem?

Client-side, I'm using a listener to detect if the "notifications" collection of the user changes. The App calls a Cloud Function that retrieves the last three unread notifications and the total number of unread notifications.
In my App, I have this:
Listener
firestore.collection("users")
.doc(uid)
.collection("notifications")
.snapshots().listen((QuerySnapshot querySnapshot) {
NotificationsPreviewModel notificationsPreview =
await _cloudFunctionsService.getNotificationsPreview(doctor.id)
})
Cloud Function
exports.getNotificationsPreview = functions.https.onCall(async (data, context) => {
const userId = data.userId;
let notifications = [];
const notificationsDocuments = await db
.collection("users")
.doc(userId)
.collection("notifications")
.orderBy("dateTime", "desc")
.get();
notifications = notificationsDocuments.docs.map((rawNotification) =>
rawNotification.data()).filter((element) => element.unread == true);
const notificationsNumber = notifications.length;
notifications = notifications.slice(0, 3);
return { "notifications": notifications, "notificationsNumber": notificationsNumber };
});
The Cloud Function gets called only when a change is detected, so it shouldn't return old data.
The error appears only the first time the Cloud Function is called from the App's start, but not always. The following calls don't generate the error.
How can I solve this? For now, I've added a delay of 500ms, and it works perfectly, but it's not a real solution.

Based on your description, it sounds like you see some form of latency while collecting the data from Firestore. Retrieving data from the Cloud takes time, and a delay of 500ms is not excessive.
I am not familiar with Flutter enough to comment on your code. However, according to the documentation for Java:
By default, get() attempts to provide up-to-date data when possible by waiting for data from the server, but it may return cached data or fail if you are offline and the server cannot be reached. This behavior can be altered via the Source parameter.
Source:
By providing a Source value, these methods can be configured to fetch results only from the server, only from the local cache, or attempt to fetch results from the server and fall back to the cache (which is the default).
If you are online, get() checks the server for the latest data, which can take between 300ms and 1500ms depending on several factors. For example, where is your Firestore instance located in comparison to your Cloud Function and client? Try adjusting the delay and see if you can identify the timing.
There are also some soft limits you should be aware of as this might also impact your timings for how quickly you can retrieve the data. There is a maximum sustained write rate to a document of 1 per second. Sustaining a write rate above once per second increases latency and causes contention errors.

As for the documentation:
When you set a listener, Cloud Firestore sends your listener an initial snapshot of the data, and then another snapshot each time the document changes.
It seems that you are initially receiving the snapshot of the data, and then the following updates, as expected.
You can check some possible solutions to this in this post.

How to persist aggregate/read model from "EventStore" in a database?

Trying to implement Event Sourcing and CQRS for the first time, but got stuck when it came to persisting the aggregates.
This is where I'm at now
I've setup "EventStore" an a stream, "foos"
Connected to it from node-eventstore-client
I subscribe to events with catchup
This is all working fine.
With the help of the eventAppeared event handler function I can build the aggregate, whenever events occur. This is great, but what do I do with it?
Let's say I build and aggregate that is a list of Foos
[
{
id: 'some aggregate uuidv5 made from barId and bazId',
barId: 'qwe',
bazId: 'rty',
isActive: true,
history: [
{
id: 'some event uuid',
data: {
isActive: true,
},
timestamp: 123456788,
eventType: 'IsActiveUpdated'
}
{
id: 'some event uuid',
data: {
barId: 'qwe',
bazId: 'rty',
},
timestamp: 123456789,
eventType: 'FooCreated'
}
]
}
]
To follow CQRS I will build the above aggregate within a Read Model, right? But how do I store this aggregate in a database?
I guess just a nosql database should be fine for this, but I definitely need a db since I will put a gRPC APi in front of this and other read models / aggreates.
But what do I actually go from when I have built the aggregate, to when to persist it in the db?
I once tried following this tutorial https://blog.insiderattack.net/implementing-event-sourcing-and-cqrs-pattern-with-mongodb-66991e7b72be which was super simple, since you'd use mongodb both as the event store and just create a view for the aggregate and update that one when new events are incoming. It had it's flaws and limitations (the aggregation pipeline) which is why I now turned to "EventStore" for the event store part.
But how to persist the aggregate, which is currently just built and stored in code/memory from events in "EventStore"...?
I feel this may be a silly question but do I have to loop over each item in the array and insert each item in the db table/collection or do you somehow have a way to dump the whole array/aggregate there at once?
What happens after? Do you create a materialized view per aggregate and query against that?
I'm open to picking the best db for this, whether that is postgres/other rdbms, mongodb, cassandra, redis, table storage etc.
Last question. For now I'm just using a single stream "foos", but at this level I expect new events to happen quite frequently (every couple of seconds or so) but as I understand it you'd still persist it and update it using materialized views right?
So given that barId and bazId in combination can be used for grouping events, instead of a single stream I'd think more specialized streams such as foos-barId-bazId would be the way to go, to try and reduce the frequency of incoming new events to a point where recreating materialized views will make sense.
Is there a general rule of thumb saying not to recreate/update/refresh materialized views if the update frequency gets below a certain limit? Then the only other a lternative would be querying from a normal table/collection?
Edit:
In the end I'm trying to make a gRPC api that has just 2 rpcs - one for getting a single foo by id and one for getting all foos (with optional field for filtering by status - but that is not so important). The simplified proto would look something like this:
rpc GetFoo(FooRequest) returns (Foo)
rpc GetFoos(FoosRequest) returns (FooResponse)
message FooRequest {
string id = 1; // uuid
}
// If the optional status field is not specified, return all foos
message FoosRequest {
// If this field is specified only return the Foos that has isActive true or false
FooStatus status = 1;
enum FooStatus {
UNKNOWN = 0;
ACTIVE = 1;
INACTIVE = 2;
}
}
message FoosResponse {
repeated Foo foos;
}
message Foo {
string id = 1; // uuid
string bar_id = 2 // uuid
string baz_id = 3 // uuid
boolean is_active = 4;
repeated Event history = 5;
google.protobuf.Timestamp last_updated = 6;
}
message Event {
string id = 1; // uuid
google.protobuf.Any data = 2;
google.protobuf.Timestamp timestamp = 3;
string eventType = 4;
}
The incoming events would look something like this:
{
id: 'some event uuid',
barId: 'qwe',
bazId: 'rty',
timestamp: 123456789,
eventType: 'FooCreated'
}
{
id: 'some event uuid',
isActive: true,
timestamp: 123456788,
eventType: 'IsActiveUpdated'
}
As you can see there is no uuid to make it possible to GetFoo(uuid) in the gRPC API, which is why I'll generate a uuidv5 with the barId and bazId, which will combined, be a valid uuid. I'm making that in the projection / aggregate you see above.
Also the GetFoos rpc will either return all foos (if status field is left undefined), or alternatively it'll return the foo's that has isActive that matches the status field (if specified).
Yet I can't figure out how to continue from the catchup subscription handler.
I have the events stored in "EventStore" (https://eventstore.com/), using a subscription with catchup, I have built an aggregate/projection with an array of Foo's in the form that I want them, but to be able to get a single Foo by id from a gRPC API of mine, I guess I'll need to store this entire aggregate/projection in a database of some sort, so I can connect and fetch the data from the gRPC API? And every time a new event comes in I'll need to add that event to the database also or how is this working?
I think I've read every resource I can possibly find on the internet, but still I'm missing some key pieces of information to figure this out.
The gRPC is not so important. It could be REST I guess, but my big question is how to make the aggregated/projected data available to the API service (possible more API's will need it as well)? I guess I will need to store the aggregated/projected data with the generated uuid and history fields in a database to be able to fetch it by uuid from the API service, but what database and how is this storing process done, from the catchup event handler where I build the aggregate?

I know exactly how you feel! This is basically what happened to me when I first tried to do CQRS and ES.
I think you have a couple of gaps in your knowledge which I'm sure you will rapidly plug. You hydrate an aggregate from the event stream as you are doing. That IS your aggregate persisted. The read model is something different. Let me explain...
Your read model is the thing you use to run queries against and to provide data for display to a UI for example. Your aggregates are not (directly) involved in that. In fact they should be encapsulated. Meaning that you can't 'see' their state from the outside. i.e. no getter and setters with the exception of the aggregate ID which would have a getter.
This article gives you a helpful overview of how it all fits together: CQRS + Event Sourcing – Step by Step
The idea is that when an aggregate changes state it can only do so via an event it generates. You store that event in the event store. That event is also published so that read models can be updated.
Also looking at your aggregate it looks more like a typical read model object or DTO. An aggregate is interested in functionality, not properties. So you would expect to see void public functions for issuing commands to the aggregate. But not public properties like isActive or history.
I hope that makes sense.
EDIT:
Here are some more practical suggestions.
"To follow CQRS I will build the above aggregate within a Read Model, right? "
You do not build aggregates in the read model. They are separate things on separate sides of the CQRS side of the equation. Aggregates are on the command side. Queries are done against read models which are different from aggregates.
Aggregates have public void functions and no getter or setters (with the exception of the aggregate id). They are encapsulated. They generate events when their state changes as a result of a command being issued. These events are stored in an event store and are used to recover the state of an aggregate. In other words, that is how an aggregate is stored.
The events go on to be published so the event handlers and other processes can react to them and update the read model and or trigger new cascading commands.
"Last question. For now I'm just using a single stream "foos", but at this level I expect new events to happen quite frequently (every couple of seconds or so) but as I understand it you'd still persist it and update it using materialized views right?"
Every couple of seconds is very likely to be fine. I'm more concerned at the persist and update using materialised views. I don't know what you mean by that but it doesn't sound like you have the right idea. Views should be very simple read models. No need to complex relations like you find in an RDMS. And is therefore highly optimised fast for reading.

There can be a lot of confusion on all the terminologies and jargon used in DDD and CQRS and ES. I think in this case, the confusion lies in what you think an aggregate is. You mention that you would like to persist your aggregate as a read model. As #Codescribler mentioned, at the sink end of your event stream, there isn't a concept of an aggregate. Concretely, in ES, commands are applied onto aggregates in your domain by loading previous events pertaining to that aggregate, rehydrating the aggregate by folding each previous event onto the aggregate and then applying the command, which generates more events to be persisted in the event store.
Down stream, a subscribing process receives all the events in order and builds a read model based on the events and data contained within. The confusion here is that this read model, at this end, is not an aggregate per se. It might very well look exactly like your aggregate at the domain end or it could be only creating a read model that doesn't use all the events and or the event data.
For example, you may choose to use every bit of information and build a read model that looks exactly like the aggregate hydrated up to the newest event(likely your source of confusion). You may instead have another process that builds a read model that only tallies a specific type of event. You might even subscribe to multiple streams and "join" them into a big read model.
As for how to store it, this is really up to you. It seems to me like you are taking the events and rebuilding your aggregate plus a history of events in a memory structure. This, of course, doesn't scale, which is why you want to store it at rest in a database. I wouldn't use the memory structure, since you would need to do a lot of state diffing when you flush to the database. You should be modify the database directly in response to each individual event. Ideally, you also transactionally store the stream count with said modification so you don't process the same event again in the case of a failure.
Hope this helps a bit.

Meteor - using snychronised non-persistent / in-memory MongoDB on the server

in a Meteor app, having real-time reactive updates between all connected clients is achieved with writing in collections, publishing and subscribing the right data. In normal case this means also database writes.
But what if I would like to sync particular data which does not need to be persistent and I would like to save the overhead of writing in the database ? Is it possible to use mini-mongo or other in-memory caching on the server by still preserving DDP synchronisation to all clients ?
Example
In my app I have a multiple collapsed threads and I want to show, which users currently expanded particular thread
Viewed by: Mike, Johny, Steven ...
I can store the information in the threads collection or make make a separate viewers collection and publish the information to the clients. But there is actually no meaning in making this information persistent an having the overhead of database writes.
I am confused by the collections documentation. which states:
OPTIONS
connection Object
The server connection that will manage this collection. Uses the default connection if not specified. Pass the return value of calling DDP.connect to specify a different server. Pass null to specify no connection.
and
... when you pass a name, here’s what happens:
...
On the client (and on the server if you specify a connection), a Minimongo instance is created.
But If I create a new collection and pass the option object with conneciton: null
// Creates a new Mongo collections and exports it
export const Presentations = new Mongo.Collection('presentations', {connection: null});
/**
* Publications
*/
if (Meteor.isServer) {
// This code only runs on the server
Meteor.publish(PRESENTATION_BY_MAP_ID, (mapId) => {
check(mapId, nonEmptyString);
return Presentations.find({ matchingMapId: mapId });
});
}
no data is being published to the clients.

TLDR: it's not possible.
There is no magic in Meteor that allow data being synced between clients while the data doesn't transit by the MongoDB database. The whole sync process through publications and subscriptions is triggered by MongoDB writes. Hence, if you don't write to database, you cannot sync data between clients (using the native pub/sub system available in Meteor).

After countless hours of trying everything possible I found a way to what I wanted:
export const Presentations = new Mongo.Collection('presentations', Meteor.isServer ? {connection: null} : {});
I checked the MongoDb and no presentations collection is being created. Also, n every server-restart the collection is empty. There is a small downside on the client, even the collectionHanlde.ready() is truthy the findOne() first returns undefined and is being synced afterwards.
I don't know if this is the right/preferable way, but it was the only one working for me so far. I tried to leave {connection: null} in the client code, but wasn't able to achieve any sync even though I implemented the added/changed/removed methods.
Sadly, I wasn't able to get any further help even in the meteor forum here and here

Meteor Pub / Sub behaviour

I'm currently implementing a realtime search function in my app and I've come across some behaviour which I'm confused about.
The background is: I have two subscriptions from the same MongoDB database on my server, named posts.
The first subscription subscribes to the latest 50 posts, and sends the data to the MiniMongo collection Posts.
The second subscriptions subscribes to the post matching whatever search is entered by the user, and sends this data to MiniMongo collection PostsSearch as per below.
// client
Posts = new Mongo.Collection('posts');
PostsSearch = new Mongo.Collection('postsSearch');
// server
Meteor.publish('postsPub', function(options, search) {
return Posts.find(search, options);
});
Meteor.publish('postsSearchPub', function(options, search) {
var self = this;
var subHandle = Posts.find(search, options).observeChanges({
added: function (id, fields) {
self.added("postsSearch", id, fields);
}
});
self.ready();
});
My question is, we know from the docs:
If you pass a name when you create the collection, then you are
declaring a persistent collection — one that is stored on the server
and seen by all users. Client code and server code can both access the
same collection using the same API.
However this isn't the case with PostsSearch. When a user starts searching on the client, the functionality works perfectly as expected - the correct cursors are sent to the client.
However I do not see a postsSearch in my MongoDB database and likewise, PostsSearch isn't populated on any other client other than my own.
How is this happening? What is self.added("postsSearch", id, fields); appearing to do that's it's able to send cursors down the wire to the client but not to the MongoDB database.

According to this doc, self.added("postsSearch", id, fields); informs the client-side that a document has been added to the postsSeach collection.
And according to Meteor.publish:
Alternatively, a publish function can directly control its published record set by calling the functions added (to add a new document to the published record set), ...
So I'm guessing that self.added does both of these operations: Adds a document to the published record set, and informs the client (that has subscribed to the current publication) of this addition.
Now if you see Meteor.subscribe:
When you subscribe to a record set, it tells the server to send records to the client. The client stores these records in local Minimongo collections, with the same name as the collection argument used in the publish handler's added, changed, and removed callbacks. Meteor will queue incoming records until you declare the Mongo.Collection on the client with the matching collection name.
This suggests 2 things:
You have to subscribe in order to receive the data from the server-side database.
Some kind of client-side code must exist in order to create a client-only postsSearch collection. (this is because you said, this collection doesn't exist on server-side database).
The 2nd point can be achieved quite easily, for example:
if(Meteor.isClient) {
postsSearch = new Mongo.Collection(null);
}
In the above example, the postsSearch collection will exist only on the client and not on the server.
And regarding the 1st, being subscribed to postsSearchPub will automatically send data for the postsSearch collection to the client (even if said collection doesn't exist in the server-side database. This is because of the explicit call to self.added).
Something to check out: According to this doc, self.ready(); calls the onReady callback of the subscription. It would be useful to see what is there in this callback, perhaps the client-only postsSearch collection is defined there?

From the doc:
this.added(collection, id, fields)
Call inside the publish function.
Informs the subscriber that a document has been added to the record set.
This means that the line self.added("postsSearch", id, fields); emulates the fact that an insert has been done to the PostsSearch collection although it's obviously not the case.
Concerning the absence of MongoDB collection, it could be related to Meteor laziness which creates the MongoDB collection at first insert, not sure though.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse