How to do Async Http Call with Apache Beam (Java)? - apache-beam

Input PCollection is http requests, which is a bounded dataset. I want to make async http call (Java) in a ParDo , parse response and put results into output PCollection. My code is below. Getting exception as following.
I cound't figure out the reason. need a guide....
java.util.concurrent.CompletionException: java.lang.IllegalStateException: Can't add element ValueInGlobalWindow{value=streaming.mapserver.backfill.EnrichedPoint#2c59e, pane=PaneInfo.NO_FIRING} to committed bundle in PCollection Call Map Server With Rate Throttle/ParMultiDo(ProcessRequests).output [PCollection]
Code:
public class ProcessRequestsFn extends DoFn<PreparedRequest,EnrichedPoint> {
private static AsyncHttpClient _HttpClientAsync;
private static ExecutorService _ExecutorService;
static{
AsyncHttpClientConfig cg = config()
.setKeepAlive(true)
.setDisableHttpsEndpointIdentificationAlgorithm(true)
.setUseInsecureTrustManager(true)
.addRequestFilter(new RateLimitedThrottleRequestFilter(100,1000))
.build();
_HttpClientAsync = asyncHttpClient(cg);
_ExecutorService = Executors.newCachedThreadPool();
}
#DoFn.ProcessElement
public void processElement(ProcessContext c) {
PreparedRequest request = c.element();
if(request == null)
return;
_HttpClientAsync.prepareGet((request.getRequest()))
.execute()
.toCompletableFuture()
.thenApply(response -> { if(response.getStatusCode() == HttpStatusCodes.STATUS_CODE_OK){
return response.getResponseBody();
} return null; } )
.thenApply(responseBody->
{
List<EnrichedPoint> resList = new ArrayList<>();
/*some process logic here*/
System.out.printf("%d enriched points back\n", result.length());
}
return resList;
})
.thenAccept(resList -> {
for (EnrichedPoint enrichedPoint : resList) {
c.output(enrichedPoint);
}
})
.exceptionally(ex->{
System.out.println(ex);
return null;
});
}
}

The Scio library implements a DoFn which deals with asynchronous operations. The BaseAsyncDoFn might provide you the handling you need. Since you're dealing with CompletableFuture also take a look at the JavaAsyncDoFn.
Please note that you necessarily don't need to use the Scio library, but you can take the main idea of the BaseAsyncDoFn since it's independent of the rest of the Scio library.

The issue that your hitting is that your outputting outside the context of a processElement or finishBundle call.
You'll want to gather all your outputs in memory and output them eagerly during future processElement calls and at the end within finishBundle by blocking till all your calls finish.

Related

How to return the processing result of a signal?

Especially if the signal processing needs to invoke an/some activities, how can I achieve that?
I tried to return data or exception but it doesn't work.
Data cannot be returned from signal method. Throwing exception will block workflow execution.
Common mistakes
It's wrong to return data in a signal method, or throw an exception -- because signal method is meant to be Asynchronous. The processing must be like Kafka processing messages and you can't return the result via the method returning.
So below code will NOT work:
public class SampleWorkflow{
public Result mySignalMethod(SignalRequest req){
Result result = activityStub.execute(req)
if(...){
throw new RuntimeException(...)
}
return result
}
}
What should you do
What you must do:
Make sure signal don't return anything
Use a query method to return the results
In signal method processing, store the results into workflow state so that query can return the states
A bonus if you also use the design pattern to store signal request into a queue, and let workflow method to process the signal. This will give you some benefits
Guarantee FIFO ordering of signal processing
Make sure reset workflow won't run into issues -- after reset, signals will be preserved and moved to earlier position of the workflow history. Sometimes workflow are not initialized to replay the signals.
Also make exception handling easier
See this design pattern in sample code: Cadence Java sample/Temporal java sample
If we applied all above, the sample code should be like below :
public class SampleWorkflow{
private Queue<SignalRequest> queue = new Queue<>();
private Response<Result> lastSignalResponse;
public void myWorkflowMethod(){
Async.procedure(
() -> {
while (true) {
Workflow.await(() -> !queue.isEmpty());
final SignalRequest req =
queue.poll();
// alternatively, you can use async to start an activity:
try{
Result result = activityStub.execute(req);
}catch (ActivityException e){
lastSignalResponse = new Response( e );
}
if(...){
lastSignalResponse = new Response( new RuntimeException(...) );
}else{
lastSignalResponse = new Response( result);
}
}
});
...
}
public Response myQueryMethod(){
return lastSignalResponse;
}
public Result mySignalMethod(SignalRequest req){
queue.add(req)
}
}
And in the application code, you should signal and then query the workflow to get the result:
workflowStub.mySignalMethod(req)
Response response = workflowStub.myQueryMethod()
Follow this sample-Cadence / sample-Temporal if you want to use aysnc activity
Why
Signal is executed via Workflow decision task(Workflow task in Temporal). A decision task cannot return result. In current design, there is no mechanism to let a decision task return result to application code.
Throw exception in workflow code will either block the decision task or fail the workflow).
Query method is designed to return result. -- However, query cannot schedule activity or modify workflow states.
It's a missing part to let app code to make a synchronous API call to update and return data. It needs a complicated design: https://github.com/temporalio/proposals/pull/53

In Spring WebClient used with block(), get body on error

I am using WebClient from Spring WebFlux to communicate with a REST API backend from a Spring client.
When this REST API backend throws an exception, it answers with a specific format (ErrorDTO) that I would like to collect from my client.
What I have tried to do is to make my client throw a GestionUtilisateurErrorException(ErreurDTO) containing this body once the server answers with a 5xx HTTP status code.
I have tried several options :
I/ onStatus
#Autowired
WebClient gestionUtilisateursRestClient;
gestionUtilisateursRestClient
.post()
.uri(profilUri)
.body(Mono.just(utilisateur), UtilisateurDTO.class)
.retrieve()
.onStatus(HttpStatus::is5xxServerError,
response -> {
ErreurDTO erreur = response.bodyToMono(ErreurDTO.class).block();
return Mono.error(new GestionUtilisateursErrorException(erreur));
}
)
.bodyToMono(Void.class)
.timeout(Duration.ofMillis(5000))
.block();
This method doesn't work because webclient doesn't allow me to call the block method in the onStatus. I am only able to get a Mono object and I can't go further from here.
It seems like "onStatus" method can't be used in a WebClient blocking method, which means I can throw a custom Exception, but I can't populate it with the data from the response body.
II/ ExchangeFilterFunction
#Bean
WebClient gestionUtilisateursRestClient() {
return WebClient.builder()
.baseUrl(gestionUtilisateursApiUrl)
.defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
.filter(ExchangeFilterFunction.ofResponseProcessor(this::gestionUtilisateursExceptionFilter))
.build();
}
private Mono<ClientResponse> gestionUtilisateursExceptionFilter(ClientResponse clientResponse) {
if(clientResponse.statusCode().isError()){
return clientResponse.bodyToMono(ErreurDTO.class)
.flatMap(erreurDto -> Mono.error(new GestionUtilisateursErrorException(
erreurDto
)));
}
return Mono.just(clientResponse);
}
This method works but throw a reactor.core.Exceptions$ReactiveException that I am struggling to catch properly (reactor.core.Exceptions is not catchable, and ReactiveException is private).
This Exception contains in its Cause the exception I need to catch (GestionUtilisateurErrorException) but I need a way to catch it properly.
I also tried to use "onErrorMap" and "onErrorResume" methods but none of them worked the way I needed.
Edit 1 :
I am now using the following workaround even if I feel it's a dirty way to do what I need :
gestionUtilisateursRestClient
.post()
.uri(profilUri)
.body(Mono.just(utilisateur), UtilisateurDTO.class)
.retrieve()
.onStatus(h -> h.is5xxServerError(),
response -> {
return response.bodyToMono(ErreurDTO.class).handle((erreur, handler) -> {
LOGGER.error(erreur.getMessage());
handler.error(new GestionUtilisateursErrorException(erreur));
});
}
)
.bodyToMono(String.class)
.timeout(Duration.ofMillis(5000))
.block();
}
catch(Exception e) {
LOGGER.debug("Erreur lors de l'appel vers l'API GestionUtilisateur (...)");
if(ExceptionUtils.getRootCause(e) instanceof GestionUtilisateursErrorException) {
throw((GestionUtilisateursErrorException) e.getCause());
}
else {
throw e;
}
}
Here, it throws the expected GestionUtilisateursErrorException that I can handle synchronously.
I might implement this in a global handler to avoid writing this code around each call to my API.
Thank you.
Kevin
I've encountered a similar case for accessing the response body that might be of use to you using the Mono.handle() method (see https://projectreactor.io/docs/core/release/api/index.html?reactor/core/publisher/Mono.html).
Here handler is a SynchronousSink (see https://projectreactor.io/docs/core/release/api/reactor/core/publisher/SynchronousSink.html) and can call at most next(T) one time, and either complete() or error().
In this case, I call 'handler.error()' with a new GestionUtilisateursErrorException constructed with the 'erreur'.
.onStatus(h -> h.is5xxServerError(),
response -> {
return response.bodyToMono(ErreurDTO.class).handle((erreur, handler) -> {
// Do something with erreur e.g.
log.error(erreur.getErrorMessage());
// Call handler.next() and either handler.error() or handler.complete()
handler.error(new GestionUtilisateursErrorException(erreur));
});
}
)

Non-blocking functional methods with Reactive Mongo and Web client

I have a micro service which reads objects from a database using a ReactiveMongoRepository interface.
The goal is to take each one of those objects and push it to a AWS Lambda function (after converting it to a DTO). If the result of that lambda function is in the 200 range, mark the object as being a success otherwise ignore.
In the old days of a simple Mongo Repository and a RestTemplate this is would be a trivial task. However I'm trying to understand this Reactive deal, and avoid blocking.
Here is the code I've come up with, I know I'm blocking on the webClient, but how do I avoid that?
#Override
public Flux<Video> index() {
return videoRepository.findAllByIndexedIsFalse().flatMap(video -> {
final SearchDTO searchDTO = SearchDTO.builder()
.name(video.getName())
.canonicalPath(video.getCanonicalPath())
.objectID(video.getObjectID())
.userId(video.getUserId())
.build();
// Blocking call
final HttpStatus httpStatus = webClient.post()
.uri(URI.create(LAMBDA_ENDPOINT))
.body(BodyInserters.fromObject(searchDTO)).exchange()
.block()
.statusCode();
if (httpStatus.is2xxSuccessful()) {
video.setIndexed(true);
}
return videoRepository.save(video);
});
}
I'm calling the above from a scheduled task, and I don't really care about that actual result of the index() method, just what happens during.
#Scheduled(fixedDelay = 60000)
public void indexTask() {
indexService
.index()
.log()
.subscribe();
}
I've read a bunch of blog posts etc on the subject but they're all just simple CRUD operations without anything happening in the middle so don't really give me a full picture of how to implement these things.
Any help?
Your solution is actually quite close.
In those cases, you should try and decompose the reactive chain in steps and not hesitate to turn bits into independent methods for clarity.
#Override
public Flux<Video> index() {
Flux<Video> unindexedVideos = videoRepository.findAllByIndexedIsFalse();
return unindexedVideos.flatMap(video -> {
final SearchDTO searchDTO = SearchDTO.builder()
.name(video.getName())
.canonicalPath(video.getCanonicalPath())
.objectID(video.getObjectID())
.userId(video.getUserId())
.build();
Mono<ClientResponse> indexedResponse = webClient.post()
.uri(URI.create(LAMBDA_ENDPOINT))
.body(BodyInserters.fromObject(searchDTO)).exchange()
.filter(res -> res.statusCode().is2xxSuccessful());
return indexedResponse.flatMap(response -> {
video.setIndexed(true);
return videoRepository.save(video);
});
});
my approach, maybe a little bit more readable. But I admit I didn't run it so not 100% guarantee that it will work.
public Flux<Video> index() {
return videoRepository.findAll()
.flatMap(this::callLambda)
.flatMap(videoRepository::save);
}
private Mono<Video> callLambda(final Video video) {
SearchDTO searchDTO = new SearchDTO(video);
return webClient.post()
.uri(URI.create(LAMBDA_ENDPOINT))
.body(BodyInserters.fromObject(searchDTO))
.exchange()
.map(ClientResponse::statusCode)
.filter(HttpStatus::is2xxSuccessful)
.map(t -> {
video.setIndexed(true);
return video;
});
}

Vert.x: How to wait for a future to complete

Is there a way to wait for a future to complete without blocking the event loop?
An example of a use case with querying Mongo:
Future<Result> dbFut = Future.future();
mongo.findOne("myusers", myQuery, new JsonObject(), res -> {
if(res.succeeded()) {
...
dbFut.complete(res.result());
}
else {
...
dbFut.fail(res.cause());
}
}
});
// Here I need the result of the DB query
if(dbFut.succeeded()) {
doSomethingWith(dbFut.result());
}
else {
error();
}
I know the doSomethingWith(dbFut.result()); can be moved to the handler, yet if it's long, the code will get unreadable (Callback hell ?) It that the right solution ? Is that the omny solution without additional libraries ?
I'm aware that rxJava simplifies the code, but as I don't know it, learning Vert.x and rxJava is just too much.
I also wanted to give a try to vertx-sync. I put the dependency in the pom.xml; everything got downloaded fine but when I started my app, I got the following error
maurice#mickey> java \
-javaagent:~/.m2/repository/co/paralleluniverse/quasar-core/0.7.5/quasar-core-0.7.5-jdk8.jar \
-jar target/app-dev-0.1-fat.jar \
-conf conf/config.json
Error opening zip file or JAR manifest missing : ~/.m2/repository/co/paralleluniverse/quasar-core/0.7.5/quasar-core-0.7.5-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
I know what the error means in general, but I don't know in that context... I tried to google for it but didn't find any clear explanation about which manifest to put where. And as previously, unless mandatory, I prefer to learn one thing at a time.
So, back to the question : is there a way with "basic" Vert.x to wait for a future without perturbation on the event loop ?
You can set a handler for the future to be executed upon completion or failure:
Future<Result> dbFut = Future.future();
mongo.findOne("myusers", myQuery, new JsonObject(), res -> {
if(res.succeeded()) {
...
dbFut.complete(res.result());
}
else {
...
dbFut.fail(res.cause());
}
}
});
dbFut.setHandler(asyncResult -> {
if(asyncResult.succeeded()) {
// your logic here
}
});
This is a pure Vert.x way that doesn't block the event loop
I agree that you should not block in the Vertx processing pipeline, but I make one exception to that rule: Start-up. By design, I want to block while my HTTP server is initialising.
This code might help you:
/**
* #return null when waiting on {#code Future<Void>}
*/
#Nullable
public static <T>
T awaitComplete(Future<T> f)
throws Throwable
{
final Object lock = new Object();
final AtomicReference<AsyncResult<T>> resultRef = new AtomicReference<>(null);
synchronized (lock)
{
// We *must* be locked before registering a callback.
// If result is ready, the callback is called immediately!
f.onComplete(
(AsyncResult<T> result) ->
{
resultRef.set(result);
synchronized (lock) {
lock.notify();
}
});
do {
// Nested sync on lock is fine. If we get a spurious wake-up before resultRef is set, we need to
// reacquire the lock, then wait again.
// Ref: https://stackoverflow.com/a/249907/257299
synchronized (lock)
{
// #Blocking
lock.wait();
}
}
while (null == resultRef.get());
}
final AsyncResult<T> result = resultRef.get();
#Nullable
final Throwable t = result.cause();
if (null != t) {
throw t;
}
#Nullable
final T x = result.result();
return x;
}

how to find all the data in a Flux(Parent) are processed by its inner non-blocking Flux or Mono(child)?

I have a aggregator utility class, where i have to joint more than one cassandra table data. my production code will looks like below but not exactly same.
#Autowired FollowersRepository followersRepository;
#Autowired TopicRepository topicRepository;
#GetMapping("/info")
public Flux<FullDetails> getData(){
return Flux.create(emitter ->{
followersRepository.findAll()
.doOnNext(data -> {
List<String> all = data.getTopiclist(); //will get list of topic id
List<Alltopics> processedList = new ArrayList<Alltopics>();
all.forEach(action -> {
topicRepository.findById(action) //will get full detail about topic
.doOnSuccess(topic ->{
processedList.add(topic);
if (processedList.size() >= all.size()) {
FullDetails fulldetails = new FullDetails(action,processedList);
emitter.next(fulldetails);
//emitter.complete();
}
})
.subscribe();
});
})
.doOnComplete(() ->{
System.out.println("All the data are processed !!!");
//emitter.complete(); // executing if all the data are pushed from database not waiting for doOnNext method to complete.
})
.subscribe();
});
}
For more details, refer the code here CodeLink.
I have tried with doOnComplete and doOnFinally for outer Flux, it is not waiting for all inner Non-blocking calls to complete.
I want to call onComplete, after processing all the nested Mono/Flux(non-blocking) request inside Flux.
For nested blocking flux/mono, the outer flux doOnComplete method is executing after completion of inner Flux/Mono.
PostScript(PS):-
In below example, i am not able find where to place emitter.complete().
because doOnComplete() method is called before completion of all the inner Mono.
Request Body:-
[{ "content":"Intro to React and operators", "author":"Josh Long", "name":"Spring WebFlux" },{ "content":"Intro to Flux", "author":"Josh Long", "name":"Spring WebFlux" },{ "content":"Intro to Mono", "author":"Josh Long", "name":"Spring WebFlux" }]
My Rest Controller:-
#PostMapping("/topics")
public Flux<?> loadTopic(#RequestBody Flux<Alltopics> data)
{
return Flux.create(emitter ->{
data
.map(topic -> {
topic.setTopicid(null ==topic.getTopicid() || topic.getTopicid().isEmpty()?UUID.randomUUID().toString():topic.getTopicid());
return topic;
})
.doOnNext(topic -> {
topicRepository.save(topic).doOnSuccess(persistedTopic ->{
emitter.next(persistedTopic);
//emitter.complete();
}).subscribe();
})
.doOnComplete(() -> {
//emitter.complete();
System.out.println(" all the data are processed!!!");
}).subscribe();
});
}
Here are a few rules that you should follow when writing a reactive pipeline:
doOnXYZ operators should never be used to do lots of I/O, latency involved operations or any reactive operation. Those should be used for "side-effects" operations, such as logging, metrics and so on.
you should never subscribe from within a pipeline or a method that returns a reactive type. This decouples the processing of this operation from the main pipeline, meaning there's no guarantee you'll get the expected result at the right time nor that the complete/error signals will be known to your application.
you should never block from within a pipeline or a method that returns a reactive type. This will create critical issues to your application at runtime.
Now because your code snippet is quite convoluted, I'll just give you the general direction to follow with another code snippet.
#GetMapping("/info")
public Flux<FullDetails> getData(){
return followersRepository.findAll()
.flatMap(follower -> {
Mono<List<Alltopics>> topics = topicRepository.findAllById(follower.getTopiclist()).collectList();
return topics.map(topiclist -> new FullDetails(follower.getId(), topiclist));
});
}