What Subject only emits items after subscribing - rx-java2

In rxJava, I want a subject to only start emitting "new" items after the subscribe method is called. The closest I found to this is PublishSubject but the docs state the following:
PublishSubject emits to an observer only those items that are emitted
by the source Observable(s) subsequent to the time of the
subscription.
Note that a PublishSubject may begin emitting items immediately upon
creation (unless you have taken steps to prevent this), and so there
is a risk that one or more items may be lost between the time the
Subject is created and the observer subscribes to it.
It isn't exactly clear what is meant by "unless you have taken steps to prevent this". Or is there a better subject type I can use?

You can use the publish operator:
it does not begin emitting items when it is subscribed to, but only when the Connect operator is applied to it. In this way you can prompt an Observable to begin emitting items at a time of your choosing.

Related

How to subscribe to an entire collection and a document within it simultaneously without duplicating the amount of reads?

I am writing for learning purposes a cross-platform to-do app with Flutter and Firestore. Currently, I have the following design, and I would like to know if there are better alternatives.
One of the main screens of the app shows a list of all tasks. It does this by subscribing to the corresponding Firestore collection, which we'll say is /tasks for simplicity.
FirebaseFirestore.instance.collection("tasks").snapshots()
Each tile in the ListView of tasks can be clicked. Clicking a tile opens a new screen (with Navigator.push) showing details about that specific task.
Importantly, this screen also needs to update in real-time, so it is not enough to just pass it the (local, immutable) task object from the main screen. Instead, this screen subscribes to the individual Firestore document corresponding to that task.
FirebaseFirestore.instance.collection("tasks").doc(taskId).snapshots()
This makes sense to me logically: the details page only needs to know about that specific document, so it only subscribes to it to avoid receiving unnecessary updates.
The problem is since the collection-wide subscription for the main screen is still alive while the details screen is open, if the document /tasks/{taskId} gets updated, both listeners will trigger. According to the answers in this, this and this question, this means I will get charged for two (duplicate) reads for any single update to that document.
Furthermore, each task can have subtasks. This is reflected in Firestore as a tasks subcollection for each task. For example, a nested task could have the path: /tasks/abc123/tasks/efg875/tasks/aay789. The main page could show all tasks regardless of nesting by using a collection group query on "tasks". The aforementioned details page also shows the tasks' subtasks by listening to the subcollection. This allows to make complex queries on subtasks (filtering, ordering, etc.), but again the disadvantage is getting duplicate reads for every update to a subtask.
The alternative designs that occur to me are:
Only keep a single app-wide subscription to the entire set of tasks (be it a flat collection or a collection group query) and do any and all selection, filtering, etc. on the client. For example, the details page of a task would use the same collection-wide subscription and select the appropriate task out of the set every time. Any filtering and ordering of tasks/subtasks would be done on the client.
Advantages: no duplicate reads, minimizes the Firestore cost.
Disadvantages: might be more battery intensive for the client, and code would become more complex as I'd have to select the appropriate data out of the entire set of tasks in every situation.
Cancel the collection-wide subscription when opening the details page and re-start it when going back to the main screen. This means when the details page is open, only updates to that specific task will be received, and without being duplicated as two reads.
Advantages: no duplicate reads.
Disadvantages: re-starting the subscription when going back to the main screen means reading all of the documents in the first snapshot, i.e. one read per task, which might actually make the problem worse. Also, it could be quite complicated to code.
Do any of these designs seem the best? Is there another better alternative I'm missing?
Create a TaskService or something similar in your app that handles listening to the FirebaseFirestore.instance.collection("tasks").snapshots() call, then in your app, subscribe to updates to that service rather than Firebase itself (you can create two Stream objects, one for global updates, one for specific updates).
Then, you've only one read going on in your Firebase collection. Everything is handled app side.
Pseudo-code:
class TaskService {
final List<Task> _tasks = [];
final StreamController<List<Task>> _signalOnTasks = StreamController.broadcast();
final StreamController<Task> _signalOnTask = StreamController.broadcast();
get List<Task> allTasks => _tasks;
Stream<List<Task>> get onTasks => _signalOnTasks.stream;
Stream<List<Task>> get onTask => _signalOnTask.stream;
void init() {
FirebaseFirestore.instance.collection("tasks").snapshots().listen(_onData);
}
void _onData(snapshot) {
/// get/update our tasks (maybe check for duplicates or whatever)
_tasks.addAll(snapshot.documents);
/// dispatch our signal streams
_signalOnTasks.add(snapshot.documents);
for(final task in snapshot.documents) {
_signalOnTask.add(task);
}
}
}
You can make TaskService and InheritedWidget to get access to it wherever (or use the provider package), the add your listeners to whatever stream you're interested in. You'll need just to check in your listener to onTask that it's the correct task before doing anything with it.

KafkaStreams: forwarding only updated keys from Transformer

In my KafkaStreams app I have a registered local store (simple counters) that is updated in the transform method.
In the punctuate method I basically loop over the KV-store and push all the data to the output topic (so even if the value hasn't been updated).
One idea is to store the update timestamp for every key and forward only records updated since the last punctuate call.
But I think there should be a more convenient solution for that.
How to make this more performant and forward updated entries only?
As indicated in the comments from Matthias, keeping track of updated the records is not supported at the moment.
Your approach of updating a timestamp in the value (or creating a "Value Wrapper" object that contains a timestamp you can modify) and checking if an update has occurred since the last punctuate call is a valid approach.
-Bill

CQRS and Passing Data

Suppose I have an aggregate containing some data and when it reaches a certain state, I'd like to take all that state and pass it to some outside service. For argument and simplicity's sake, lets just say it is an aggregate that has a list and when all items in that list are checked off, I'd like to send the entire state to some outside service. Now when I'm handling the command for checking off the last item in the list, I'll know that I'm at the end but it doesn't seem correct to send it to the outside system from the processing of the command. So given this scenario what is the recommended approach if the outside system requires all of the state of the aggregate. Should the outside system build its own copy of the data based on the aggregate events or is there some better approach?
Should the outside system build its own copy of the data based on the aggregate events.
Probably not -- it's almost never a good idea to share the responsibility of rehydrating an aggregate from its history. The service that owns the object should be responsible for rehydration.
First key idea to understand is when in the flow the call to the outside service should happen.
First, the domain model processes the command arguments, computing the update to the event history, including the ChecklistCompleted event.
The application takes that history, and saves it to the book of record
The transaction completes successfully.
At this point, the application knows that the operation was successful, but the caller doesn't. So the usual answer is to be thinking of an asynchronous operation that will do the rest of the work.
Possibility one: the application takes the history that it just saved, and uses that history to create schedule a task to rehydrate a read-only copy of the aggregate state, and then send that state to the external service.
Possibility two: you ditch the copy of the history that you have now, and fire off an asynchronous task that has enough information to load its own copy of the history from the book of record.
There are at least three ways that you might do this. First, you could have the command schedule the task as before.
Second, you could have a event handler listening for ChecklistCompleted events in the book of record, and have that handler schedule the task.
Third, you could read the ChecklistCompleted event from the book of record, and publish a representation of that event to a shared bus, and let the handler in the external service call you back for a copy of the state.
I was under the impression that one bounded context should not reach out to get state from another bounded context but rather keep local copies of the data it needed.
From my experience, the key idea is that the services shouldn't block each other -- or more specifically, a call to service B should not block when service A is unavailable. Responding to events is fundamentally non blocking; does it really matter that we respond to an asynchronously delivered event by making an asynchronous blocking call?
What this buys you, however, is independent evolution of the two services - A broadcasts an event, B reacts to the event by calling A and asking for a representation of the aggregate that B understands, A -- being backwards compatible -- delivers the requested representation.
Compare this with requiring a new release of B every time the rehydration logic in A changes.
Udi Dahan raised a challenging idea - the notion that each piece of data belongs to a singe technical authority. "Raw business data" should not be replicated between services.
A service is the technical authority for a specific business capability.
Any piece of data or rule must be owned by only one service.
So in Udi's approach, you'd start to investigate why B has any responsibility for data owned by A, and from there determine how to align that responsibility and the data into a single service. (Part of the trick: the physical view of a service can span process boundaries; in other words, a process may be composed from components that belong to more than one service).
Jeppe Cramon series on microservices is nicely sourced, and touches on many of the points above.
You should never externalise your state. Reporting on that state is a function of the read side, as it produces reports and you'll need that data to call the service. The structure of your state is plastic, and you shouldn't have an external service that relies up that structure otherwise you'll have to update both in lockstep which is a bad thing.
There is a blog that puts forward a strong argument that the process manager is the correct place to put this type of feature (calling an external service), because that's the appropriate place for orchestrating events.

Can I use Time as globally unique event version?

I found time as the best value as event version.
I can merge perfectly independent events of different event sources on different servers whenever needed without being worry about read side event order synchronization. I know which event (from server 1) had happened before the other (from server 2) without the need for global sequential event id generator which makes all read sides to depend on it.
As long as the time is a globally ever sequential event version , different teams in companies can act as distributed event sources or event readers And everyone can always relay on the contract.
The world's simplest notification from a write side to subscribed read sides followed by a query pulling the recent changes from the underlying write side can simplify everything.
Are there any side effects I'm not aware of ?
Time is indeed increasing and you get a deterministic number, however event versioning is not only serves the purpose of preventing conflicts. We always say that when we commit a new event to the event store, we send the new event version there as well and it must match the expected version on the event store side, which must be the previous version plus exactly one. If there will be a thousand or three millions of ticks between two events - I do not really care, this does not give me the information I need. And if I have missed one event on the go is critical to know. So I would not use anything else than incremental counter, with events versioned per aggregate/stream.

Akka Persistence: Where do the execution of the Command Goes when it is not simply a state update

Just for clarification: Where do the execution of a command goes, when the execution is not simply a state update (like in most examples found online)
For instance, in my case,
The Command is FetchLastHistoryChangeSet which consist in fetching the last history changeset from an external service based on where we left off last time. In other words the time of the newest change of the previous history ChangeSet Fetched.
The Event would be HistoryChangeSetFetched(changeSet, time). In correlation to what has been said above, the time should be that of the newest change of the newly history ChangeSet Fetched (as per the command event currently being handled)
Now in all example that i see, it is always: (i) validating the command, then, (ii) persisting the event, and finally (iii) handling the event.
It is in handling the event that i have seen custom code added in addition to the updatestate logic. Where, the custom code is usually added after the update state function. But this custom is most of the time about sending message back to the sender, or broadcasting it to the event bus.
As per my example, it is clear that i need to do quite few operation to actually call Persist(HistoryChangeSetFetched(changeSet, time)). Indeed i need the new changeset, and the time of the newest change of it.
The only way i see it possible is to do the fetch in the validating the command
That is:
case FetchLastHistoryChangeSet => val changetuple = if ValidateCommand(FetchLastHistoryChangeSet) persit(HistoryChangeSetFetched(changetuple._1, changetuple._2)) { historyChangeSetFetched =>
updateState(historyChangeSetFetched)
}
Where the ValidateCommand(FetchLastHistoryChangeSet)
would have as logic, to read last changeSet time (newest change of the changeSet), fetch a new changeset based on it, if it exist, get the time of its newest change, and return the tuple.
My question is, is that how it is supposed to work. Validating command
can be something as complex as that ? i.e. actually executing the
command ?
As it says in the documentation: "validation can mean anything, from simple inspection of a command message's fields up to a conversation with several external services"
So I think what you're trying to do is exactly right. Any interaction with an external service must be done at the command validation stage.