Function now executing properly after subscribe - reactive-programming

I am having a Mono object, On which I have subscribed for doOnsuccess, In this method again I am saving the data in DB(CouchBase Using ReactiveCouchbaseRepository). after that, I am not getting any logs for Line1 and line2.
But this is working fine if I do not save this object, means I am getting logs for line 2.
Mono<User> result = context.getPayload(User.class);
result.doOnSuccess( user -> {
System.out.println("############I got the user"+user);
userRepository.save(user).doOnSuccess(user2->{
System.out.println("user saved"); // LINE 1
}).subscribe();
System.out.println("############"+user); // LINE2
}).subscribe();

Your code snippet is breaking a few rules you should follow closely:
You should not call subscribe from within a method/lambda that returns a reactive type such as Mono or Flux; this will decouple the execution from the main task while they'll both still operate on that shared state. This often ends up on issues because things are trying to read twice the same stream. It's a bit like you're trying to create two separate threads that try reading on the same outputstream.
you should not do I/O operations in doOnXYZ operators. Those are "side-effects" operators, meaning they are useful for logging, increment counters.
What you should try instead to chain Reactor operators to create a single reactive pipeline and return the reactive type for the final client to subscribe on it. In a Spring WebFlux application, the HTTP clients (through the WebFlux engine) are subscribing.
Your code snippet could look like this:
Mono<User> result = context.getPayload(User.class)
.doOnSuccess(user -> System.out.println("############Received user "+user))
.flatMap(user -> {return userRepository.save(user)})
.doOnSuccess(user -> System.out.println("############ Saved "+user));
return result;

Related

How to use/control RxJava Observable.cache

I am trying to use the RxJava caching mechanism ( RxJava2 ) but i can't seem to catch how it works or how can i control the cached contents since there is the cache operator.
I want to verify the cached data with some conditions before emitting the new data.
for example
someObservable.
repeat().
filter { it.age < maxAge }.
map(it.name).
cache()
How can i check and filter the cache value and emit it if its succeeds and if not then i will request a new value.
since the value changes periodically i need to verify if the cache is still valid before i can request a new one.
There is also ObservableCache<T> class but i can't find any resources of using it.
Any help would be much appreciated. Thanks.
This is not how replay/ cache works. Please read the #replay/ #cache documentation first.
replay
This operator returns a ConnectableObservable, which has some methods (#refCount/ #connect/ #autoConnect) for connecting to the source.
When #replay is applied without an overload, the source subscription is multicasted and all emitted values sind connection will be replayed. The source subscription is lazy and can connect to the source via #refCount/ #connect/ #autoConnect.
Returns a ConnectableObservable that shares a single subscription to the underlying ObservableSource that will replay all of its items and notifications to any future Observer.
Applying #relay without any connect-method (#refCount/ #connect/ #autoConnect) will not emit any values on subscription
A Connectable ObservableSource resembles an ordinary ObservableSource, except that it does not begin emitting items when it is subscribed to, but only when its connect method is called.
replay(1)#autoConnect(-1) / #refCount(1) / #connect
Applying replay(1) will cache the last value and will emit the cached value on each subscription. The #autoConnect will connect open an connection immediately and stay open until a terminal event (onComplete, onError) happens. #refCount is smiular, but will disconnect from the source, when all subscriber disappear. The #connect opreator can be used, when you need to wait, when alle subscriptions have been done to the observable, in order not to miss values.
usage
#replay(1) -- most of the it should be used at the end of the observable.
sourcObs.
.filter()
.map()
.replay(bufferSize)
.refCount(connectWhenXSubsciberSubscribed)
caution
applying #replay without a buffer-limit or expiration date will lead to memory-leaks, when you observale is infinite
cache / cacheWithInitialCapacity
Operators are similar to #replay with autoConnect(1). The operators will cache every value and replay on each subsciption.
The operator subscribes only when the first downstream subscriber subscribes and maintains a single subscription towards this ObservableSource. In contrast, the operator family of replay() that return a ConnectableObservable require an explicit call to ConnectableObservable.connect().
Note: You sacrifice the ability to dispose the origin when you use the cache Observer so be careful not to use this Observer on ObservableSources that emit an infinite or very large number of items that will use up memory. A possible workaround is to apply takeUntil with a predicate or another source before (and perhaps after) the application of cache().
example
#Test
fun skfdsfkds() {
val create = PublishSubject.create<Int>()
val cacheWithInitialCapacity = create
.cacheWithInitialCapacity(1)
cacheWithInitialCapacity.subscribe()
create.onNext(1)
create.onNext(2)
create.onNext(3)
cacheWithInitialCapacity.test().assertValues(1, 2, 3)
cacheWithInitialCapacity.test().assertValues(1, 2, 3)
}
usage
Use cache operator, when you can not control the connect phase
This is useful when you want an ObservableSource to cache responses and you can't control the subscribe/dispose behavior of all the Observers.
caution
As with replay() the cache is unbounded and could lead to memory-leaks.
Note: The capacity hint is not an upper bound on cache size. For that, consider replay(int) in combination with ConnectableObservable.autoConnect() or similar.
further reading
https://blog.danlew.net/2018/09/25/connectable-observables-so-hot-right-now/
https://blog.danlew.net/2016/06/13/multicasting-in-rxjava/
If your event source (Observable) is an expensive operation, such as reading from a database, you shouldn't use Subject to observe the events, since that will repeat the expensive operation for each subscriber. Caching can also be risky with infinite streams due to "OutOfMemory" exceptions. A more appropriate solution may be ConnectableObservable, which only performs the source operation once, and broadcasts the updated value to all subscribers.
Here is a code sample. I didn't bother creating an infinite periodic stream or including error handling to keep the example simple. Let me know if it does what you need.
class RxJavaTest {
private final int maxValue = 50;
private final ConnectableObservable<Integer> source =
Observable.<Integer>create(
subscriber -> {
log("Starting Event Source");
subscriber.onNext(readFromDatabase());
subscriber.onNext(readFromDatabase());
subscriber.onNext(readFromDatabase());
subscriber.onComplete();
log("Event Source Terminated");
})
.subscribeOn(Schedulers.io())
.filter(value -> value < maxValue)
.publish();
void run() throws InterruptedException {
log("Starting Application");
log("Subscribing");
source.subscribe(value -> log("Subscriber 1: " + value));
source.subscribe(value -> log("Subscriber 2: " + value));
log("Connecting");
source.connect();
// Add sleep to give event source enough time to complete
log("Application Terminated");
sleep(4000);
}
private Integer readFromDatabase() throws InterruptedException {
// Emulate long database read time
log("Reading data from database...");
sleep(1000);
int randomValue = new Random().nextInt(2 * maxValue) + 1;
log(String.format("Read value: %d", randomValue));
return randomValue;
}
private static void log(Object message) {
System.out.println(
Thread.currentThread().getName() + " >> " + message
);
}
}
Here's the output:
main >> Starting Application
main >> Subscribing
main >> Connecting
main >> Application Terminated
RxCachedThreadScheduler-1 >> Starting Event Source
RxCachedThreadScheduler-1 >> Reading data from database...
RxCachedThreadScheduler-1 >> Read value: 88
RxCachedThreadScheduler-1 >> Reading data from database...
RxCachedThreadScheduler-1 >> Read value: 42
RxCachedThreadScheduler-1 >> Subscriber 1: 42
RxCachedThreadScheduler-1 >> Subscriber 2: 42
RxCachedThreadScheduler-1 >> Reading data from database...
RxCachedThreadScheduler-1 >> Read value: 37
RxCachedThreadScheduler-1 >> Subscriber 1: 37
RxCachedThreadScheduler-1 >> Subscriber 2: 37
RxCachedThreadScheduler-1 >> Event Source Terminated.
Note the following:
Events only start firing once connect() is called on the source, not when observers subscribe to the source.
Database calls are only made once per event update
Filtered values are not emitted to subscribers
All subscribers are executed in the same thread
Application terminates before the events are processed due to concurrency. Normally your app will run in an event loop, so your app will remain responsive during slow operations.

SubscribeOn does not change the thread pool for the whole chain

I want to trigger longer running operation via rest request and WebFlux. The result of a call should just return an info that operation has started. The long running operation I want to run on different scheduler (e.g. Schedulers.single()). To achieve that I used subscribeOn:
Mono<RecalculationRequested> recalculateAll() {
return provider.size()
.doOnNext(size -> log.info("Size: {}", size))
.doOnNext(size -> recalculate(size))
.map(RecalculationRequested::new);
}
private void recalculate(int toRecalculateSize) {
Mono.just(toRecalculateSize)
.flatMapMany(this::toPages)
.flatMap(page -> recalculate(page))
.reduce(new RecalculationResult(), RecalculationResult::increment)
.subscribeOn(Schedulers.single())
.subscribe(result -> log.info("Result of recalculation - success:{}, failed: {}",
result.getSuccess(), result.getFailed()));
}
private Mono<RecalculationResult> recalculate(RecalculationPage pageToRecalculate) {
return provider.findElementsToRecalculate(pageToRecalculate.getPageNumber(), pageToRecalculate.getPageSize())
.flatMap(this::recalculateSingle)
.reduce(new RecalculationResult(), RecalculationResult::increment);
}
private Mono<RecalculationResult> recalculateSingle(ElementToRecalculate elementToRecalculate) {
return recalculationTrigger.recalculate(elementToRecalculate)
.doOnNext(result -> {
log.info("Finished recalculation for element: {}", elementToRecalculate);
})
.doOnError(error -> {
log.error("Error during recalculation for element: {}", elementToRecalculate, error);
});
}
From the above I want to call:
private void recalculate(int toRecalculateSize)
in a different thread. However, it does not run on a single thread pool - it uses a different thread pool. I would expect subscribeOn change it for the whole chain. What should I change and why to execute it in a single thread pool?
Just to mention - method:
provider.findElementsToRecalculate(...)
uses WebClient to get elements.
One caveat of subscribeOn is it does what it says: it runs the act of "subscribing" on the provided Scheduler. Subscribing flows from bottom to top (the Subscriber subscribes to its parent Publisher), at runtime.
Usually you see in documentation and presentations that subscribeOn affects the whole chain. That is because most operators / sources will not themselves change threads, and by default will start sending onNext/onComplete/onError signals from the thread from which they were subscribed to.
But as soon as one operator switches threads in that top-to-bottom data path, the reach of subscribeOn stops there. Typical example is when there is a publishOn in the chain.
The source of data in this case is reactor-netty and netty, which operate on their own threads and thus act as if there was a publishOn at the source.
For WebFlux, I'd say favor using publishOn in the main chain of operators, or alternatively use subscribeOn inside of inner chains, like inside flatMap.
As per the documentation , all operators prefixed with doOn , are sometimes referred to as having a “side-effect”. They let you peek inside the sequence’s events without modifying them.
If you want to chain the 'recalculate' step after 'provider.size()' do it with flatMap.

Flux subscribeOn(elastic(), true) with flatMap not executing on different threads

want to execute elements in the flux in asynchronously on different threads.
but its not executing them on different threads. am i missing something?
below is the code.
public Mono<Map<Object, Object>> execute(List<Empolyee> empolyeeList) {
return Flux.fromIterable(empolyeeList).subscribeOn(elastic(), true).flatMap(empolyee -> {
return empolyeeService.getDepts(empolyee).flatMap(result -> {
// ---
// ---
// ---
return Mono.just(result);
});
}).collectMap(result -> result.getName().trim(), result -> fieldResult.getValue());
}
taken from the documentation
subscribeOn applies to the subscription process, when that backward
chain is constructed. As a consequence, no matter where you place the
subscribeOn in the chain, it always affects the context of the source
emission.
It does not work as you think. It applies to when someone subscribes. Their entire request will be placed on it's own tread. So there is an absolute guarantee that no two requests will end up on the same thread.
The subscribeOn method
Made the flux as parallel flux and used runOn(elastic()). its working as expected
//Making flux as parallel flux, we can also use ParallelFlux instead of below
Flux.fromIterable(empolyeeList).parallel()
//running on elastic scheduler
.runOn(elastic()).flatMap(empolyee -> {
}

Spring Reactor | Batching the input without mutating

I'm trying to batch the records constantly emitted from a streaming source (Kafka) and call my service in a batch of 100.
What I get as the input is a single record. I'm trying what's the best way to achieve it in the Reactive way using Spring Reactor without having to have a mutation and locking outside the pipeline.
Here is my naive attempt which simply reflects my sequential way of thinking:
Mono.just(input)
.subscribe(i -> {
batches.add(input);
if(batches.size() >= 100) {
// Invoke another reactive pipeline.
// Clear the batch (requires locking in order to be thread safe).
}
});
What's the best way to achieve batching on a streaming source using reactor.
.buffer(100) or bufferTimeout(100, Duration.ofSeconds(xxx) comes to the rescue
Using Flux.buffer or Flux.bufferTimeout you will be capable of gathering the fixed amount of elements into the List
StepVerifier.create(
Flux.range(0, 1000)
.buffer(100)
)
.expectNextCount(10)
.expectComplete()
.verify()
Update for the use case
In case, when the input is a single value, suppose like an invocation of the method with parameter:
public void invokeMe(String element);
You may adopt UnicastProcessor technique and transfer all data to that processor so then it will take care of batching
class Batcher {
final UnicastProcessor processor = UnicastProcessor.create();
public void invokeMe(String element) {
processor.sink().next(element);
// or Mono.just(element).subscribe(processor);
}
public Flux<List<String>> listen() {
return processor.bufferTimeout(100, Duration.ofSeconds(5));
}
}
Batcher batcher = new Batcher();
StepVerifier.create(
batcher.listen()
)
.then(() -> Flux.range(0, 1000)
.subscribe(i -> batcher.invokeMe("" + i)))
.expectNextCount(10)
.thenCancel()
.verify()
From that example, we might learn how to provide a single point of receiving events and then listen to results of the batching process.
Please note that UnicastPorcessor allows only one subscriber, so it will be useful for the model when there is one interested party in batching results and many data producers. In a case when you have subscribers as many as producers you may want to use one of the next processors -> DirectProcessor, TopicProcessor, WorkerQueueProcessor. To learn more about Reactor Processors follow the link

Cancelling an Entity Framework Query

I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?
Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}