how to find all the data in a Flux(Parent) are processed by its inner non-blocking Flux or Mono(child)? - reactive-programming

I have a aggregator utility class, where i have to joint more than one cassandra table data. my production code will looks like below but not exactly same.
#Autowired FollowersRepository followersRepository;
#Autowired TopicRepository topicRepository;
#GetMapping("/info")
public Flux<FullDetails> getData(){
return Flux.create(emitter ->{
followersRepository.findAll()
.doOnNext(data -> {
List<String> all = data.getTopiclist(); //will get list of topic id
List<Alltopics> processedList = new ArrayList<Alltopics>();
all.forEach(action -> {
topicRepository.findById(action) //will get full detail about topic
.doOnSuccess(topic ->{
processedList.add(topic);
if (processedList.size() >= all.size()) {
FullDetails fulldetails = new FullDetails(action,processedList);
emitter.next(fulldetails);
//emitter.complete();
}
})
.subscribe();
});
})
.doOnComplete(() ->{
System.out.println("All the data are processed !!!");
//emitter.complete(); // executing if all the data are pushed from database not waiting for doOnNext method to complete.
})
.subscribe();
});
}
For more details, refer the code here CodeLink.
I have tried with doOnComplete and doOnFinally for outer Flux, it is not waiting for all inner Non-blocking calls to complete.
I want to call onComplete, after processing all the nested Mono/Flux(non-blocking) request inside Flux.
For nested blocking flux/mono, the outer flux doOnComplete method is executing after completion of inner Flux/Mono.
PostScript(PS):-
In below example, i am not able find where to place emitter.complete().
because doOnComplete() method is called before completion of all the inner Mono.
Request Body:-
[{ "content":"Intro to React and operators", "author":"Josh Long", "name":"Spring WebFlux" },{ "content":"Intro to Flux", "author":"Josh Long", "name":"Spring WebFlux" },{ "content":"Intro to Mono", "author":"Josh Long", "name":"Spring WebFlux" }]
My Rest Controller:-
#PostMapping("/topics")
public Flux<?> loadTopic(#RequestBody Flux<Alltopics> data)
{
return Flux.create(emitter ->{
data
.map(topic -> {
topic.setTopicid(null ==topic.getTopicid() || topic.getTopicid().isEmpty()?UUID.randomUUID().toString():topic.getTopicid());
return topic;
})
.doOnNext(topic -> {
topicRepository.save(topic).doOnSuccess(persistedTopic ->{
emitter.next(persistedTopic);
//emitter.complete();
}).subscribe();
})
.doOnComplete(() -> {
//emitter.complete();
System.out.println(" all the data are processed!!!");
}).subscribe();
});
}

Here are a few rules that you should follow when writing a reactive pipeline:
doOnXYZ operators should never be used to do lots of I/O, latency involved operations or any reactive operation. Those should be used for "side-effects" operations, such as logging, metrics and so on.
you should never subscribe from within a pipeline or a method that returns a reactive type. This decouples the processing of this operation from the main pipeline, meaning there's no guarantee you'll get the expected result at the right time nor that the complete/error signals will be known to your application.
you should never block from within a pipeline or a method that returns a reactive type. This will create critical issues to your application at runtime.
Now because your code snippet is quite convoluted, I'll just give you the general direction to follow with another code snippet.
#GetMapping("/info")
public Flux<FullDetails> getData(){
return followersRepository.findAll()
.flatMap(follower -> {
Mono<List<Alltopics>> topics = topicRepository.findAllById(follower.getTopiclist()).collectList();
return topics.map(topiclist -> new FullDetails(follower.getId(), topiclist));
});
}

Related

How to return the processing result of a signal?

Especially if the signal processing needs to invoke an/some activities, how can I achieve that?
I tried to return data or exception but it doesn't work.
Data cannot be returned from signal method. Throwing exception will block workflow execution.
Common mistakes
It's wrong to return data in a signal method, or throw an exception -- because signal method is meant to be Asynchronous. The processing must be like Kafka processing messages and you can't return the result via the method returning.
So below code will NOT work:
public class SampleWorkflow{
public Result mySignalMethod(SignalRequest req){
Result result = activityStub.execute(req)
if(...){
throw new RuntimeException(...)
}
return result
}
}
What should you do
What you must do:
Make sure signal don't return anything
Use a query method to return the results
In signal method processing, store the results into workflow state so that query can return the states
A bonus if you also use the design pattern to store signal request into a queue, and let workflow method to process the signal. This will give you some benefits
Guarantee FIFO ordering of signal processing
Make sure reset workflow won't run into issues -- after reset, signals will be preserved and moved to earlier position of the workflow history. Sometimes workflow are not initialized to replay the signals.
Also make exception handling easier
See this design pattern in sample code: Cadence Java sample/Temporal java sample
If we applied all above, the sample code should be like below :
public class SampleWorkflow{
private Queue<SignalRequest> queue = new Queue<>();
private Response<Result> lastSignalResponse;
public void myWorkflowMethod(){
Async.procedure(
() -> {
while (true) {
Workflow.await(() -> !queue.isEmpty());
final SignalRequest req =
queue.poll();
// alternatively, you can use async to start an activity:
try{
Result result = activityStub.execute(req);
}catch (ActivityException e){
lastSignalResponse = new Response( e );
}
if(...){
lastSignalResponse = new Response( new RuntimeException(...) );
}else{
lastSignalResponse = new Response( result);
}
}
});
...
}
public Response myQueryMethod(){
return lastSignalResponse;
}
public Result mySignalMethod(SignalRequest req){
queue.add(req)
}
}
And in the application code, you should signal and then query the workflow to get the result:
workflowStub.mySignalMethod(req)
Response response = workflowStub.myQueryMethod()
Follow this sample-Cadence / sample-Temporal if you want to use aysnc activity
Why
Signal is executed via Workflow decision task(Workflow task in Temporal). A decision task cannot return result. In current design, there is no mechanism to let a decision task return result to application code.
Throw exception in workflow code will either block the decision task or fail the workflow).
Query method is designed to return result. -- However, query cannot schedule activity or modify workflow states.
It's a missing part to let app code to make a synchronous API call to update and return data. It needs a complicated design: https://github.com/temporalio/proposals/pull/53

WebFlux/Reactor: checking conditions before+after Flux execution with doOnComplete

I'm already querying some external resource with Flux.using(). Now I want to implement a kind of optimistic locking: read some state before query starts to execute and check if it was updated after query is finished. If so - throw some exception to break http request handling.
I've achieved this by using doOnComplete:
final AtomicReference<String> initialState = new AtomicReference<>();
return Flux.just("some", "constant", "data")
.doOnComplete(() -> initialState.set(getState()))
.concatWith(Flux.using(...)) //actual data query
.doOnComplete(() -> {if (!initialState.get().equals(getState())) throw new RuntimeException();})
.concatWithValues("another", "constant", "data")
My questions:
Is it correct? Is it guaranteed that 1st doOnComplete lambda would be finished before Flux.using() and is it guaranteed that 2nd doOnComplete lambda would be executed strictly after?
Does more elegant solution exists?
The first doOnComplete would be executed after Flux.just("some", "constant", "data") emits all elements and the second one after emitted Publisher defined in concatWith completes successfully. This is working because both publishers have a finite number of elements.
With the proposed approach, however the pre-/postconditions from a particular operation are handled outside of the operations at a higher level. In other words, the condition check belonging to the operation is leaking to the flux definition.
Suggestion, pushing the condition check down to the operation:
var otherElements = Flux.using( // actual data query
() -> "other",
x -> {
var initialState = getState();
return Flux.just(x).doOnComplete(() ->
{ if (!initialState.equals(getState())) throw new IllegalStateException(); }
);
},
x -> { }
);
Flux.just("some", "constant", "data")
.concatWith(otherElements)
.concatWith(Mono.just("another")) // "constant", "data" ...

How to stop sending to kafka topic when control goes to catch block Functional kafka spring

could you please advise , how can I stop sending to my 3rd kafka topic, when the control reaches the catch block, currently the message is sent to both error topic as well as the topic to which it should send in case of normal processing. A snippet of code is as below:
#Component
public class Abc {
private final StreamBridge streamBridge;
public Abc (StreamBridge streamBridge)
this.streamBridge = streamBridge;
#Bean
public Function<KStream<String, KafkaClass>, KStream<String,KafkaClass>> hiProcess() {
return input -> input.map((key,value) -> {
try{
KafkaClass stream = processFunction();
}
catch(Exception e) {
Message<KakfaClass> mess = MessageBuilder.withPayload(value).build();
streamBridge.send("errProcess-out-0". mess);
}
return new KeyValue<>(key, stream);
})
}
}
This can be implemented using the following pattern:
KafkaClass stream;
return input -> input
.branch((k, v) -> {
try {
stream = processFunction();
return true;
}
catch (Exception e) {
Message<KakfaClass> mess = MessageBuilder.withPayload(value).build();
streamBridge.send("errProcess-out-0". mess);
return false;
}
},
(k, v) -> true)[0]
.map((k, v) -> new KeyValue<>(k, stream));
Here, we are using the branching feature (API) of KStream to split your input into two paths - normal flow and the one causing the errors. This is accomplished by providing two filters to the branch method call. The first filter is the normal flow in which you call the processFunction method and get a response back. If we don't get an exception, the filter returns true, and the result of the branch operation is captured in the first element of the output array [0] which is processed downstream in the map operation in which it sends the final result to the outbound topic.
On the other hand, if it throws an exception, it sends whatever is necessary to the error topic using StreamBridge and the filter returns false. Since the downstream map operation is only performed on the first element of the array from branching [0], nothing will be sent outbound. When the first filter returns false, it goes to the second filter which always returns true. This is a no-op filter where the results are completely ignored.
One downside of this particular implementation is that you need to store the response from processFunction in an instance field and then mutate on each incoming KStream record so that you can access its value in the final map method where you send the output. However, for this particular use case, this may not be an issue.

How to do Async Http Call with Apache Beam (Java)?

Input PCollection is http requests, which is a bounded dataset. I want to make async http call (Java) in a ParDo , parse response and put results into output PCollection. My code is below. Getting exception as following.
I cound't figure out the reason. need a guide....
java.util.concurrent.CompletionException: java.lang.IllegalStateException: Can't add element ValueInGlobalWindow{value=streaming.mapserver.backfill.EnrichedPoint#2c59e, pane=PaneInfo.NO_FIRING} to committed bundle in PCollection Call Map Server With Rate Throttle/ParMultiDo(ProcessRequests).output [PCollection]
Code:
public class ProcessRequestsFn extends DoFn<PreparedRequest,EnrichedPoint> {
private static AsyncHttpClient _HttpClientAsync;
private static ExecutorService _ExecutorService;
static{
AsyncHttpClientConfig cg = config()
.setKeepAlive(true)
.setDisableHttpsEndpointIdentificationAlgorithm(true)
.setUseInsecureTrustManager(true)
.addRequestFilter(new RateLimitedThrottleRequestFilter(100,1000))
.build();
_HttpClientAsync = asyncHttpClient(cg);
_ExecutorService = Executors.newCachedThreadPool();
}
#DoFn.ProcessElement
public void processElement(ProcessContext c) {
PreparedRequest request = c.element();
if(request == null)
return;
_HttpClientAsync.prepareGet((request.getRequest()))
.execute()
.toCompletableFuture()
.thenApply(response -> { if(response.getStatusCode() == HttpStatusCodes.STATUS_CODE_OK){
return response.getResponseBody();
} return null; } )
.thenApply(responseBody->
{
List<EnrichedPoint> resList = new ArrayList<>();
/*some process logic here*/
System.out.printf("%d enriched points back\n", result.length());
}
return resList;
})
.thenAccept(resList -> {
for (EnrichedPoint enrichedPoint : resList) {
c.output(enrichedPoint);
}
})
.exceptionally(ex->{
System.out.println(ex);
return null;
});
}
}
The Scio library implements a DoFn which deals with asynchronous operations. The BaseAsyncDoFn might provide you the handling you need. Since you're dealing with CompletableFuture also take a look at the JavaAsyncDoFn.
Please note that you necessarily don't need to use the Scio library, but you can take the main idea of the BaseAsyncDoFn since it's independent of the rest of the Scio library.
The issue that your hitting is that your outputting outside the context of a processElement or finishBundle call.
You'll want to gather all your outputs in memory and output them eagerly during future processElement calls and at the end within finishBundle by blocking till all your calls finish.

Rxjava User-Retry observable with .cache operator?

i've an observable that I create with the following code.
Observable.create(new Observable.OnSubscribe<ReturnType>() {
#Override
public void call(Subscriber<? super ReturnType> subscriber) {
try {
if (!subscriber.isUnsubscribed()) {
subscriber.onNext(performRequest());
}
subscriber.onCompleted();
} catch (Exception e) {
subscriber.onError(e);
}
}
});
performRequest() will perform a long running task as you might expect.
Now, since i might be launching the same Observable twice or more in a very short amount of time, I decided to write such transformer:
protected Observable.Transformer<ReturnType, ReturnType> attachToRunningTaskIfAvailable() {
return origObservable -> {
synchronized (mapOfRunningTasks) {
// If not in maps
if ( ! mapOfRunningTasks.containsKey(getCacheKey()) ) {
Timber.d("Cache miss for %s", getCacheKey());
mapOfRunningTasks.put(
getCacheKey(),
origObservable
.doOnTerminate(() -> {
Timber.d("Removed from tasks %s", getCacheKey());
synchronized (mapOfRunningTasks) {
mapOfRunningTasks.remove(getCacheKey());
}
})
.cache()
);
} else {
Timber.d("Cache Hit for %s", getCacheKey());
}
return mapOfRunningTasks.get(getCacheKey());
}
};
}
Which basically puts the original .cache observable in a HashMap<String, Observable>.
This basically disallows multiple requests with the same getCacheKey() (Example login) to call performRequest() in parallel. Instead, if a second login request arrives while another is in progress, the second request observable gets "discarded" and the already-running will be used instead. => All the calls to onNext are going to be cached and sent to both subscribers actually hitting my backend only once.
Now, suppouse this code:
// Observable loginTask
public void doLogin(Observable<UserInfo> loginTask) {
loginTask.subscribe(
(userInfo) -> {},
(throwable) -> {
if (userWantsToRetry()) {
doLogin(loinTask);
}
}
);
}
Where loginTask was composed with the previous transformer. Well, when an error occurs (might be connectivity) and the userWantsToRetry() then i'll basically re-call the method with the same observable. Unfortunately that has been cached and I'll receive the same error without hitting performRequest() again since the sequence gets replayed.
Is there a way I could have both the "same requests grouping" behavior that the transformer provides me AND the retry button?
Your question has a lot going on and it's hard to put it into direct terms. I can make a couple recommendations though. Firstly your Observable.create can be simplified by using an Observable.defer(Func0<Observable<T>>). This will run the func every time a new subscriber is subscribed and catch and channel any exceptions to the subscriber's onError.
Observable.defer(() -> {
return Observable.just(performRequest());
});
Next, you can use observable.repeatWhen(Func1<Observable<Void>, Observable<?>>) to decide when you want to retry. Repeat operators will re-subscribe to the observable after an onComplete event. This particular overload will send an event to a subject when an onComplete event is received. The function you provide will receive this subject. Your function should call something like takeWhile(predicate) and onComplete when you do not want to retry again.
Observable.just(1,2,3).flatMap((Integer num) -> {
final AtomicInteger tryCount = new AtomicInteger(0);
return Observable.just(num)
.repeatWhen((Observable<? extends Void> notifications) ->
notifications.takeWhile((x) -> num == 2 && tryCount.incrementAndGet() != 3));
})
.subscribe(System.out::println);
Output:
1
2
2
2
3
The above example shows that retries are aloud when the event is not 2 and up to a max of 22 retries. If you switch to a repeatWhen then the flatMap would contain your decision as to use a cached observable or the realWork observable. Hope this helps!