Can I use RefCount but also react to each subscribe call? - system.reactive

I am trying to create an observable that meets the following requirements:
1) When the first client subscribes then the observable needs to connect to some backend service, and push out an initial value
2) When successive clients subscribe then the observable should push out a new value
3) When the final client disposes then the observable should disconnect from the backend system.
4) The backend service also calls OnNext regularly with other messages
So far I have something like below. I can't work out how I can react to each subscribe call but only call the disposer on the final dispose.
var o = Observable.Create((IObserver<IModelBrokerEvent> observer) =>
{
observer.OnNext(newValue);
_backendThingy.Subscribe(observer.OnNext);
return Disposable.Create(() =>
{
_backendThingy.Unsubscribe(observer.OnNext);
});
}
_observable = Observable.Defer(() => o).Publish().RefCount();

There are several ways to do things similar to what you are talking about, but without exact semantics I am limited to proving a few generic solutions...
Replay Subject
The simplest is as such:
var source = ...;
var n = 1
var shared = source.Replay(n).RefCount();
The Replay operator ensures that each new subscription receives the latest n values from the source Observable. However, it does not re-invoke any subscription logic to the source Observable to achieve this. In effect, assuming the source stream has emitted values already, subsequent subscriptions will receive the previous n values synchronously upon subscription. RefCount does what you might think it should do: Connect the Replay upon the first subscription, and dispose of the connection upon the last unsubscribe.
Bidirection Communication via Proxy
Replay solves the most common use case, in that the source stream is capable of keeping itself up-to-date relatively well. However, if the source stream is updated only periodically, and new subscriptions should constitute an immediate update, then you may want to get more fancy:
var service = ...;
var source = service.Publish().RefCount();
var forceUpdate = Observable.Defer(() => Observable.Start(service.PingForUpdate));
var shared = Observable.Merge(source, forceUpdate);
Where the subscription to the server constitutes a new connection, and the PingForUpdate method indicates to the service that it's consumer would like to have a new value ASAP (which then forces the service to output said value.
Merging Periodic Updates with Initial Latest Value
Using the bidirectional communication method denotes that all consumers of this service will receive the latest value from the source upon any new subscription. However, it may be possible that we only want the latest value for the new subscriber, and all other consumers should receive values on the periodic basis.
For this, we can change the code a bit.
var service = ...;
var source = service.Publish().RefCount();
var latest = Observable.Defer(() => service.GetLatestAsObservable());
var shared = Observable.Merge(source, forceUpdate);
Where the subscription to the server constitutes a new connection, and the GetLatestAsObservable method simply retrieves the latest value from the service asynchronously via an Observable.
However, with this method you must also choose how to handle race conditions, such that you request the latest value, but a newer value is yielded before the request for the latest value is returned.

Related

Maintaining cold observable semantics with a hot observable

I have a requirement to read items from an external queue, and persist them to a JDBC store. The items must be processed one-by-one, and the next item must only be read from the external queue once the previous item has been successfully persisted. At any given time there may or may not be an item available to read, and if not the application must block until the next item is available.
In order to enforce the one-by-one semantics, I decided to use a cold Observable using the generate method:
return Observable.generate(emitter -> {
final Future<Message> receivedFuture = ...;
final Message message = receivedFuture.get();
emitter.onNext(message);
});
This seems to work as expected for the receiving side.
In order to persist the data to the database, I decided to make use of the Vertx JDBCPool library.
messageObservable
.flatMapSingle(message ->
jdbcPool.prepareQuery("...")
.rxExecute(Tuple.of(...)) // produces a hot observable
)
According to the Vertx docs, the JDBCPool RX methods all produce hot observables.
The problem here seems to be that the flatmap to the JDBCPool method causes the entire chain to become hot. This has the undesirable consequence that messages are read from the queue before the previous message was persisted.
In other words, instead of
Read message 1
Write message 1
Read message 2
Write message 2
I now get
Read message 1
Read message 2
Read message 3
Write message 1
Read message 4
Write message 2
The only solution I have at the moment is to do a very undesirable thing and put the JDBCPool query in its own chain:
messageObservable
.flatMapSingle(message ->
Single.just(
jdbcPool.prepareQuery("...")
.rxExecute(Tuple.of(...))
.blockingWait()
)
I want to know what if there is a way I can combine both the one-by-one semantics of a cold observable stream in combination with a hot observable operation, while keeping the chain intact.

kafka asynchronous send not really asynchronous?

I am using KafkaProducer from the kafka-client 1.0.0 library, and as per the documentation, the method Future<RecordMetadata> send(ProducerRecord<K, V> record) will immediately return but actually, but looks like not. This method also calls another method which is doSend (see below for the snippet) in the same class, and inside this method, it is waiting for the metadata of the topic, which I think is necessary as it is related to partitions and etc.
/**
* Implementation of asynchronously send a record to a topic.
*/
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
TopicPartition tp = null;
try {
// first make sure the metadata for the topic is available
ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
Cluster cluster = clusterAndWaitTime.cluster;
Is there any other options that it is fully asynchronous? The problem with this why I wanted it to be fully asynchronous is because if some of the servers in the bootstrap.servers are not responding, it will wait with the time based on max.block.ms, but i don't actually want it to wait, but instead, i just wanted it to return.
The documentation where i saw that it is gonna return immediately:
KafkaProducer java doc
The send is asynchronous and this method will return immediately once
the record has been stored in the buffer of records waiting to be
sent. This allows sending many records in parallel without blocking to
wait for the response after each one.
your analysis is correct - kafka has a (sometimes) blocking "non-blocking" API.
this has been brought up before - https://cwiki.apache.org/confluence/display/KAFKA/KIP-286%3A+producer.send%28%29+should+not+block+on+metadata+update - but never prioritized.
It's as asynchronous as it can be. Kafka maintains a cache of metadata that gets updated occasionally to keep it current and in your scenario you only wait if that cache is stale or not initialized. Once the cache is initialized there's no wait.
If your code has a single upcoming send() that must be executed as quickly as possible, you might try sending a prepatory partitionsFor() method call to the producer to see if you can't force update the cache if needed.
Aside from that, there will always be the potential, occasional wait for the metadata cache to be refreshed.

how to get result of Kafka streams aggregate task and send the data to another service?

I use Kafka streams to process the real-time data and I need to do some aggregate operations for data of a windowed time.
I have two questions about the aggregate operation.
How to get the aggregated data? I need to send it to a 3rd service.
After the aggregate operation, I can't send message to a 3rd service, the code doesn't run.
Here is my code:
stream = builder.stream("topic");
windowedKStream = stream.map(XXXXX).groupByKey().windowedBy("5mins");
ktable = windowedKStream.aggregate(()->"", new Aggregator(K,V,result));
// my data is stored in 'result' variable, but I can't get it at the end of the 5 mins window.
// I need to send the 'result' to a 3rd service. But I don't know where to temporarily store it and then how to get it.
// below is the code the call a 3rd service, but the code can't be executed(reachable).
// I think it should be executed every 5 mins when thewindows is over. But it isn't.
result = httpclient.execute('result');
I guess might want to do something like:
ktable.toStream().foreach((k,v) -> httpclient.execute(v));
Each time the KTable is updated (with caching disabled), the update record will be sent downstream, and foreach will be executed with v being the current aggregation result.

How to do Kafka stream transformations (map / flatMap) taking into account values in a Key/Value store?

My task is the following:
I am monitoring time synchronization events from a third-party measuring device. This time synchronization is a bit flaky so I want to detect when synchronization stops and issue an alarm.
For this, I am producing the synchronization events to a Kafka topic. I have three different events going on:
Synchronization request
Synchronization successful
Synchronization failed because other device did not respond
So, what I want to do:
When request is received, and nothing is received after a certain amount of time, I want to issue a "timeout" alarm
When request is received, and within the timeout period, a success event arrives, I want to issue a "timeout" if no request arrives after the timeout time
When a failure event arrives, I want to issue the "other device did not respond" alarm
I am currently in the process of setting up a Kafka-Streams application, and I need to store the state in case this application crashes (should not, but I want to be sure), so I set this up the following:
val builder = new StreamsBuilder
val storeBuilder = Stores.
keyValueStoreBuilder(Stores.persistentKeyValueStore("timesync-alarms"),
Serdes.String(),
logEntrySerde)
builder.addStateStore(storeBuilder)
val eventStream = builder.stream(sourceTopic, Consumed.`with`(Serdes.String(), logEntrySerde))
Now, I am stuck. What I basically think I need to do have a flatMap function on the eventStream, that, whenever an event arrives:
Queries the store for the last event that was processed
Decides if an alarm is to be raised
Updates the store with the currently-received event
Produces the alarm, if any
So, how do I achieve steps 1 and 3 here? Or am I conceptually wrong and have to do it differently?
I think you don't need to use state store directly. You can create two streams - one with sync request events, the second one with sync responses (success, fail) and join them:
requestStream.outerJoin(responseStream, (leftVal, rightVal) -> ...,
JoinWindows.of(timeout), ...);
In the case of timeout rightVal is null.
If you want to send alarms to a separate topic, you can simply filter the joined stream and write all failures (error responses and timeouts) to the topic. Otherwise you can use peek() method and trigger some action inside. Here is a simple example: https://github.com/djarza/football-events/blob/master/football-ui/src/main/java/org/djar/football/ui/projection/StatisticsPublisher.java

Service Fabric reliable queue long operation

I'm trying to understand some best practices for service fabric.
If I have a queue that is added to by a web service or some other mechanism and a back end task to process that queue what is the best approach to handle long running operations in the background.
Use TryPeekAsync in one transaction, process and then if successful use TryDequeueAsync to finally dequeue.
Use TryDequeueAsync to remove an item, put it into a dictionary and then remove from the dictionary when complete. On startup of the service, check the
dictionary for anything pending before the queue.
Both ways feel slightly wrong, but I can't work out if there is a better way.
One option is to process the queue in RunAsync, something along the lines of this:
protected override async Task RunAsync(CancellationToken cancellationToken)
{
var store = await StateManager.GetOrAddAsync<IReliableQueue<T>>("MyStore").ConfigureAwait(false);
while (!cancellationToken.IsCancellationRequested)
{
using (var tx = StateManager.CreateTransaction())
{
var itemFromQueue = await store.TryDequeueAsync(tx).ConfigureAwait(false);
if (!itemFromQueue.HasValue)
{
await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken).ConfigureAwait(false);
continue;
}
// Process item here
// Remmber to clone the dequeued item if it is a custom type and you are going to mutate it.
// If success, await tx.CommitAsync();
// If failure to process, either let it run out of the Using transaction scope, or call tx.Abort();
}
}
}
Regarding the comment about cloning the dequeued item if you are to mutate it, look under the "Recommendations" part here:
https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/
One limitation with Reliable Collections (both Queue and Dictionary), is that you only have parallelism of 1 per partition. So for high activity queues it might not be the best solution. This might be the issue you're running into.
What we've been doing is to use ReliableQueues for situations where the write amount is very low. For higher throughput queues, where we need durability and scale, we're using ServiceBus Topics. That also gives us the advantage that if a service was Stateful only due to to having the ReliableQueue, it can now be made stateless. Though this adds a dependency to a 3rd party service (in this case ServiceBus), and that might not be an option for you.
Another option would be to create a durable pub/sub implementation to act as the queue. I've done tests before with using actors for this, and it seemed to be a viable option, without spending too much time on it, since we didn't have any issues depending on ServiceBus. Here is another SO about that Pub/sub pattern in Azure Service Fabric
If very slow use 2 queues.. One a fast one where you store the work without interruptions and a slow one to process it. RunAsync is used to move messages from the fast to the slow.