How to use/control RxJava Observable.cache - reactive-programming

I am trying to use the RxJava caching mechanism ( RxJava2 ) but i can't seem to catch how it works or how can i control the cached contents since there is the cache operator.
I want to verify the cached data with some conditions before emitting the new data.
for example
someObservable.
repeat().
filter { it.age < maxAge }.
map(it.name).
cache()
How can i check and filter the cache value and emit it if its succeeds and if not then i will request a new value.
since the value changes periodically i need to verify if the cache is still valid before i can request a new one.
There is also ObservableCache<T> class but i can't find any resources of using it.
Any help would be much appreciated. Thanks.

This is not how replay/ cache works. Please read the #replay/ #cache documentation first.
replay
This operator returns a ConnectableObservable, which has some methods (#refCount/ #connect/ #autoConnect) for connecting to the source.
When #replay is applied without an overload, the source subscription is multicasted and all emitted values sind connection will be replayed. The source subscription is lazy and can connect to the source via #refCount/ #connect/ #autoConnect.
Returns a ConnectableObservable that shares a single subscription to the underlying ObservableSource that will replay all of its items and notifications to any future Observer.
Applying #relay without any connect-method (#refCount/ #connect/ #autoConnect) will not emit any values on subscription
A Connectable ObservableSource resembles an ordinary ObservableSource, except that it does not begin emitting items when it is subscribed to, but only when its connect method is called.
replay(1)#autoConnect(-1) / #refCount(1) / #connect
Applying replay(1) will cache the last value and will emit the cached value on each subscription. The #autoConnect will connect open an connection immediately and stay open until a terminal event (onComplete, onError) happens. #refCount is smiular, but will disconnect from the source, when all subscriber disappear. The #connect opreator can be used, when you need to wait, when alle subscriptions have been done to the observable, in order not to miss values.
usage
#replay(1) -- most of the it should be used at the end of the observable.
sourcObs.
.filter()
.map()
.replay(bufferSize)
.refCount(connectWhenXSubsciberSubscribed)
caution
applying #replay without a buffer-limit or expiration date will lead to memory-leaks, when you observale is infinite
cache / cacheWithInitialCapacity
Operators are similar to #replay with autoConnect(1). The operators will cache every value and replay on each subsciption.
The operator subscribes only when the first downstream subscriber subscribes and maintains a single subscription towards this ObservableSource. In contrast, the operator family of replay() that return a ConnectableObservable require an explicit call to ConnectableObservable.connect().
Note: You sacrifice the ability to dispose the origin when you use the cache Observer so be careful not to use this Observer on ObservableSources that emit an infinite or very large number of items that will use up memory. A possible workaround is to apply takeUntil with a predicate or another source before (and perhaps after) the application of cache().
example
#Test
fun skfdsfkds() {
val create = PublishSubject.create<Int>()
val cacheWithInitialCapacity = create
.cacheWithInitialCapacity(1)
cacheWithInitialCapacity.subscribe()
create.onNext(1)
create.onNext(2)
create.onNext(3)
cacheWithInitialCapacity.test().assertValues(1, 2, 3)
cacheWithInitialCapacity.test().assertValues(1, 2, 3)
}
usage
Use cache operator, when you can not control the connect phase
This is useful when you want an ObservableSource to cache responses and you can't control the subscribe/dispose behavior of all the Observers.
caution
As with replay() the cache is unbounded and could lead to memory-leaks.
Note: The capacity hint is not an upper bound on cache size. For that, consider replay(int) in combination with ConnectableObservable.autoConnect() or similar.
further reading
https://blog.danlew.net/2018/09/25/connectable-observables-so-hot-right-now/
https://blog.danlew.net/2016/06/13/multicasting-in-rxjava/

If your event source (Observable) is an expensive operation, such as reading from a database, you shouldn't use Subject to observe the events, since that will repeat the expensive operation for each subscriber. Caching can also be risky with infinite streams due to "OutOfMemory" exceptions. A more appropriate solution may be ConnectableObservable, which only performs the source operation once, and broadcasts the updated value to all subscribers.
Here is a code sample. I didn't bother creating an infinite periodic stream or including error handling to keep the example simple. Let me know if it does what you need.
class RxJavaTest {
private final int maxValue = 50;
private final ConnectableObservable<Integer> source =
Observable.<Integer>create(
subscriber -> {
log("Starting Event Source");
subscriber.onNext(readFromDatabase());
subscriber.onNext(readFromDatabase());
subscriber.onNext(readFromDatabase());
subscriber.onComplete();
log("Event Source Terminated");
})
.subscribeOn(Schedulers.io())
.filter(value -> value < maxValue)
.publish();
void run() throws InterruptedException {
log("Starting Application");
log("Subscribing");
source.subscribe(value -> log("Subscriber 1: " + value));
source.subscribe(value -> log("Subscriber 2: " + value));
log("Connecting");
source.connect();
// Add sleep to give event source enough time to complete
log("Application Terminated");
sleep(4000);
}
private Integer readFromDatabase() throws InterruptedException {
// Emulate long database read time
log("Reading data from database...");
sleep(1000);
int randomValue = new Random().nextInt(2 * maxValue) + 1;
log(String.format("Read value: %d", randomValue));
return randomValue;
}
private static void log(Object message) {
System.out.println(
Thread.currentThread().getName() + " >> " + message
);
}
}
Here's the output:
main >> Starting Application
main >> Subscribing
main >> Connecting
main >> Application Terminated
RxCachedThreadScheduler-1 >> Starting Event Source
RxCachedThreadScheduler-1 >> Reading data from database...
RxCachedThreadScheduler-1 >> Read value: 88
RxCachedThreadScheduler-1 >> Reading data from database...
RxCachedThreadScheduler-1 >> Read value: 42
RxCachedThreadScheduler-1 >> Subscriber 1: 42
RxCachedThreadScheduler-1 >> Subscriber 2: 42
RxCachedThreadScheduler-1 >> Reading data from database...
RxCachedThreadScheduler-1 >> Read value: 37
RxCachedThreadScheduler-1 >> Subscriber 1: 37
RxCachedThreadScheduler-1 >> Subscriber 2: 37
RxCachedThreadScheduler-1 >> Event Source Terminated.
Note the following:
Events only start firing once connect() is called on the source, not when observers subscribe to the source.
Database calls are only made once per event update
Filtered values are not emitted to subscribers
All subscribers are executed in the same thread
Application terminates before the events are processed due to concurrency. Normally your app will run in an event loop, so your app will remain responsive during slow operations.

Related

SubscribeOn does not change the thread pool for the whole chain

I want to trigger longer running operation via rest request and WebFlux. The result of a call should just return an info that operation has started. The long running operation I want to run on different scheduler (e.g. Schedulers.single()). To achieve that I used subscribeOn:
Mono<RecalculationRequested> recalculateAll() {
return provider.size()
.doOnNext(size -> log.info("Size: {}", size))
.doOnNext(size -> recalculate(size))
.map(RecalculationRequested::new);
}
private void recalculate(int toRecalculateSize) {
Mono.just(toRecalculateSize)
.flatMapMany(this::toPages)
.flatMap(page -> recalculate(page))
.reduce(new RecalculationResult(), RecalculationResult::increment)
.subscribeOn(Schedulers.single())
.subscribe(result -> log.info("Result of recalculation - success:{}, failed: {}",
result.getSuccess(), result.getFailed()));
}
private Mono<RecalculationResult> recalculate(RecalculationPage pageToRecalculate) {
return provider.findElementsToRecalculate(pageToRecalculate.getPageNumber(), pageToRecalculate.getPageSize())
.flatMap(this::recalculateSingle)
.reduce(new RecalculationResult(), RecalculationResult::increment);
}
private Mono<RecalculationResult> recalculateSingle(ElementToRecalculate elementToRecalculate) {
return recalculationTrigger.recalculate(elementToRecalculate)
.doOnNext(result -> {
log.info("Finished recalculation for element: {}", elementToRecalculate);
})
.doOnError(error -> {
log.error("Error during recalculation for element: {}", elementToRecalculate, error);
});
}
From the above I want to call:
private void recalculate(int toRecalculateSize)
in a different thread. However, it does not run on a single thread pool - it uses a different thread pool. I would expect subscribeOn change it for the whole chain. What should I change and why to execute it in a single thread pool?
Just to mention - method:
provider.findElementsToRecalculate(...)
uses WebClient to get elements.
One caveat of subscribeOn is it does what it says: it runs the act of "subscribing" on the provided Scheduler. Subscribing flows from bottom to top (the Subscriber subscribes to its parent Publisher), at runtime.
Usually you see in documentation and presentations that subscribeOn affects the whole chain. That is because most operators / sources will not themselves change threads, and by default will start sending onNext/onComplete/onError signals from the thread from which they were subscribed to.
But as soon as one operator switches threads in that top-to-bottom data path, the reach of subscribeOn stops there. Typical example is when there is a publishOn in the chain.
The source of data in this case is reactor-netty and netty, which operate on their own threads and thus act as if there was a publishOn at the source.
For WebFlux, I'd say favor using publishOn in the main chain of operators, or alternatively use subscribeOn inside of inner chains, like inside flatMap.
As per the documentation , all operators prefixed with doOn , are sometimes referred to as having a “side-effect”. They let you peek inside the sequence’s events without modifying them.
If you want to chain the 'recalculate' step after 'provider.size()' do it with flatMap.

Function now executing properly after subscribe

I am having a Mono object, On which I have subscribed for doOnsuccess, In this method again I am saving the data in DB(CouchBase Using ReactiveCouchbaseRepository). after that, I am not getting any logs for Line1 and line2.
But this is working fine if I do not save this object, means I am getting logs for line 2.
Mono<User> result = context.getPayload(User.class);
result.doOnSuccess( user -> {
System.out.println("############I got the user"+user);
userRepository.save(user).doOnSuccess(user2->{
System.out.println("user saved"); // LINE 1
}).subscribe();
System.out.println("############"+user); // LINE2
}).subscribe();
Your code snippet is breaking a few rules you should follow closely:
You should not call subscribe from within a method/lambda that returns a reactive type such as Mono or Flux; this will decouple the execution from the main task while they'll both still operate on that shared state. This often ends up on issues because things are trying to read twice the same stream. It's a bit like you're trying to create two separate threads that try reading on the same outputstream.
you should not do I/O operations in doOnXYZ operators. Those are "side-effects" operators, meaning they are useful for logging, increment counters.
What you should try instead to chain Reactor operators to create a single reactive pipeline and return the reactive type for the final client to subscribe on it. In a Spring WebFlux application, the HTTP clients (through the WebFlux engine) are subscribing.
Your code snippet could look like this:
Mono<User> result = context.getPayload(User.class)
.doOnSuccess(user -> System.out.println("############Received user "+user))
.flatMap(user -> {return userRepository.save(user)})
.doOnSuccess(user -> System.out.println("############ Saved "+user));
return result;

Spring Reactor | Batching the input without mutating

I'm trying to batch the records constantly emitted from a streaming source (Kafka) and call my service in a batch of 100.
What I get as the input is a single record. I'm trying what's the best way to achieve it in the Reactive way using Spring Reactor without having to have a mutation and locking outside the pipeline.
Here is my naive attempt which simply reflects my sequential way of thinking:
Mono.just(input)
.subscribe(i -> {
batches.add(input);
if(batches.size() >= 100) {
// Invoke another reactive pipeline.
// Clear the batch (requires locking in order to be thread safe).
}
});
What's the best way to achieve batching on a streaming source using reactor.
.buffer(100) or bufferTimeout(100, Duration.ofSeconds(xxx) comes to the rescue
Using Flux.buffer or Flux.bufferTimeout you will be capable of gathering the fixed amount of elements into the List
StepVerifier.create(
Flux.range(0, 1000)
.buffer(100)
)
.expectNextCount(10)
.expectComplete()
.verify()
Update for the use case
In case, when the input is a single value, suppose like an invocation of the method with parameter:
public void invokeMe(String element);
You may adopt UnicastProcessor technique and transfer all data to that processor so then it will take care of batching
class Batcher {
final UnicastProcessor processor = UnicastProcessor.create();
public void invokeMe(String element) {
processor.sink().next(element);
// or Mono.just(element).subscribe(processor);
}
public Flux<List<String>> listen() {
return processor.bufferTimeout(100, Duration.ofSeconds(5));
}
}
Batcher batcher = new Batcher();
StepVerifier.create(
batcher.listen()
)
.then(() -> Flux.range(0, 1000)
.subscribe(i -> batcher.invokeMe("" + i)))
.expectNextCount(10)
.thenCancel()
.verify()
From that example, we might learn how to provide a single point of receiving events and then listen to results of the batching process.
Please note that UnicastPorcessor allows only one subscriber, so it will be useful for the model when there is one interested party in batching results and many data producers. In a case when you have subscribers as many as producers you may want to use one of the next processors -> DirectProcessor, TopicProcessor, WorkerQueueProcessor. To learn more about Reactor Processors follow the link

Sharing cold and hot observables

I'm confused by the behavior of a shared stream that is created using Rx.Observable.just.
For example:
var log = function(x) { console.log(x); };
var cold = Rx.Observable
.just({ foo: 'cold' });
cold.subscribe(log); // <-- Logs three times
cold.subscribe(log);
cold.subscribe(log);
var coldShare = Rx.Observable
.just({ foo: 'cold share' })
.share();
coldShare.subscribe(log); // <-- Only logs once
coldShare.subscribe(log);
coldShare.subscribe(log);
Both streams only emit one event, but the un-shared one can be subscribed to three times. Why is this?
I need to "fork" a stream but share its value (and then combine the forked streams).
How can I share the value of a stream but also subscribe to it multiple times?
I realize that this is probably related to the concept of "cold" and "hot" observables. However:
Is the stream created by Rx.Observable.just() cold or hot?
How is one supposed to determine the answer to the previous question?
Is the stream created by Rx.Observable.just() cold or hot?
Cold.
How is one supposed to determine the answer to the previous question?
I guess the documentation is the only guide.
How can I share the value of a stream but also subscribe to it multiple times?
You are looking for the idea of a connectable observable. By example:
var log = function(x) { console.log(x); };
var coldShare = Rx.Observable
.just({ foo: 'cold share' })
.publish();
coldShare.subscribe(log); // Does nothing
coldShare.subscribe(log); // Does nothing
coldShare.subscribe(log); // Does nothing
coldShare.connect(); // Emits one value to its three subscribers (logs three times)
var log = function(x) {
document.write(JSON.stringify(x));
document.write("<br>");
};
var coldShare = Rx.Observable
.just({ foo: 'cold share' })
.publish();
coldShare.subscribe(log); // <-- Only logs once
coldShare.subscribe(log);
coldShare.subscribe(log);
coldShare.connect();
<script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/4.0.7/rx.all.min.js"></script>
The example above logs three times. Using publish and connect, you essentially "pause" the observable until the call to connect.
See also:
How do I share an observable with publish and connect?
Are there 'hot' and 'cold' operators?
I don-t understand your first question, but about the last one, as I have been having problem getting that one too:
Rxjs implementation of Observables/Observers is based on the observer pattern, which is similar to the good old callback mechanism.
To exemplify, here is the basic form of creating an observable (taken from the doc at https://github.com/Reactive-Extensions/RxJS/blob/master/doc/api/core/operators/create.md)
var source = Rx.Observable.create(function (observer) {
observer.onNext(42);
observer.onCompleted();
// Note that this is optional, you do not have to return this if you require no cleanup
return function () {
console.log('disposed');
};
});
Rx.Observable.create takes as argument a function (say factory_fn to be original) which takes an observer. Your values are generated by a computation of your choice in the body of factory_fn, and because you have the observer in parameter you can process/push the generated values when you see fit. BUT factory_fn is not executed, it is just registered (like a callback would). It will be called everytime there is a subscribe(observer) on the related observable (i.e. the one returned by Rx.Observable.create(factory_fn).
Once subscription is done (creation callback called), values flow to your observer according to the logic in the factory function and it remains that way till your observable completes or the observer unsubscribes (supposing you did implement an action to cancel value flow as the return value of factory_fn).
What that basically means is by default, Rx.Observables are cold.
My conclusion after using quite a bit of the library, is that unless it is duely documented, the only way to know FOR SURE the temperature of an observable is to eye the source code. Or add a side effect somewhere, subscribe twice and see if the side effect happens twice or only once (which is what you did). That, or ask on stackoverflow.
For instance, Rx.fromEvent produce hot observables, as you can see from the last line in the code (return new EventObservable(element, eventName, selector).publish().refCount();). (code here : https://github.com/Reactive-Extensions/RxJS/blob/master/src/core/linq/observable/fromevent.js). The publish operator is among those operators which turns a cold observable into a hot one. How that works is out of scope so I won-t detail it here.
But Rx.DOM.fromWebSocket does not produce hot observables (https://github.com/Reactive-Extensions/RxJS-DOM/blob/master/src/dom/websocket.js). Cf. How to buffer stream using fromWebSocket Subject
Confusion often comes I think from the fact that we conflate the actual source (say stream of button clicks) and its representation (Rx.Observable). It is unfortunate when that happens but what we imagine as hot sources can end up being represented by a cold Rx.Observable.
So, yes, Rx.Observable.just creates cold observables.

Play 1.2.3 framework - Right way to commit transaction

We have a HTTP end-point that takes a long time to run and can also be called concurrently by users. As part of this request, we update the model inside a synchronized block so that other (possibly concurrent) requests pick up that change.
E.g.
MyModel m = null;
synchronized (lockObject) {
m = MyModel.findById(id);
if (m.status == PENDING) {
m.status = ACTIVE;
} else {
//render a response back to user that the operation is not allowed
}
m.save(); //Is not expected to be called unless we set m.status = ACTIVE
}
//Long running operation continues here. It can involve further changes to instance "m"
The reason for the synchronized block is to ensure that even concurrent requests get to pick up the latest status. However, the underlying JPA does not commit my changes (m.save()) until the request is complete. Since this is a long-running request, I do not want to wait until the request is complete and still want to ensure that other callers are notified of the change in status. I tried to call "m.em().flush(); JPA.em().getTransaction().commit();" after m.save(), but that makes the transaction unavailable for the subsequent action as part of the same request. Can I just given "JPA.em().getTransaction().begin();" and let Play handle the transaction from then on? If not, what is the best way to handle this use-case?
UPDATE:
Based on the response, I modified my code as follows:
MyModel m = null;
synchronized (lockObject) {
m = MyModel.findById(id);
if (m.status == PENDING) {
m.status = ACTIVE;
} else {
//render a response back to user that the operation is not allowed
}
m.save(); //Is not expected to be called unless we set m.status = ACTIVE
}
new MyModelUpdateJob(m.id).now();
And in my job, I have the following line:
doJob() {
MyModel m = MyModel.findById(id);
print m.status; //This still prints the old status as-if m.save() had no effect...
}
What am I missing?
Put your update code in a job an call
new MyModelUpdateJob(id).now().get();
thus the update will be done in another transaction that is commited at the end of the job
ouch, as soon as you add more play servers, you will be in trouble. You may want to play with optimistic locking in your example or and I advise against it pessimistic locking....ick.
HOWEVER, looking at your code, maybe read the article Building on Quicksand. I am not sure you need a synchronized block in that case at all...try to go after being idempotent.
In your case if
1. user 1 and user 2 both call that method and it is pending, then it goes to active(Idempotent)
If user 1 or user 2 wins, well that would be like you had the synchronization block anyways.
I am sure however you have a more complex scenario not shown here, BUT READ that article Building on Quicksand as it really changes the traditional way of thinking and is how google and amazon and very large scale systems operate.
Another option for distributed transactions across play servers is zookeeper which the big large nosql guys use BUT only as a last resort ;) ;)
later,
Dean