Non-blocking functional methods with Reactive Mongo and Web client - mongodb

I have a micro service which reads objects from a database using a ReactiveMongoRepository interface.
The goal is to take each one of those objects and push it to a AWS Lambda function (after converting it to a DTO). If the result of that lambda function is in the 200 range, mark the object as being a success otherwise ignore.
In the old days of a simple Mongo Repository and a RestTemplate this is would be a trivial task. However I'm trying to understand this Reactive deal, and avoid blocking.
Here is the code I've come up with, I know I'm blocking on the webClient, but how do I avoid that?
#Override
public Flux<Video> index() {
return videoRepository.findAllByIndexedIsFalse().flatMap(video -> {
final SearchDTO searchDTO = SearchDTO.builder()
.name(video.getName())
.canonicalPath(video.getCanonicalPath())
.objectID(video.getObjectID())
.userId(video.getUserId())
.build();
// Blocking call
final HttpStatus httpStatus = webClient.post()
.uri(URI.create(LAMBDA_ENDPOINT))
.body(BodyInserters.fromObject(searchDTO)).exchange()
.block()
.statusCode();
if (httpStatus.is2xxSuccessful()) {
video.setIndexed(true);
}
return videoRepository.save(video);
});
}
I'm calling the above from a scheduled task, and I don't really care about that actual result of the index() method, just what happens during.
#Scheduled(fixedDelay = 60000)
public void indexTask() {
indexService
.index()
.log()
.subscribe();
}
I've read a bunch of blog posts etc on the subject but they're all just simple CRUD operations without anything happening in the middle so don't really give me a full picture of how to implement these things.
Any help?

Your solution is actually quite close.
In those cases, you should try and decompose the reactive chain in steps and not hesitate to turn bits into independent methods for clarity.
#Override
public Flux<Video> index() {
Flux<Video> unindexedVideos = videoRepository.findAllByIndexedIsFalse();
return unindexedVideos.flatMap(video -> {
final SearchDTO searchDTO = SearchDTO.builder()
.name(video.getName())
.canonicalPath(video.getCanonicalPath())
.objectID(video.getObjectID())
.userId(video.getUserId())
.build();
Mono<ClientResponse> indexedResponse = webClient.post()
.uri(URI.create(LAMBDA_ENDPOINT))
.body(BodyInserters.fromObject(searchDTO)).exchange()
.filter(res -> res.statusCode().is2xxSuccessful());
return indexedResponse.flatMap(response -> {
video.setIndexed(true);
return videoRepository.save(video);
});
});

my approach, maybe a little bit more readable. But I admit I didn't run it so not 100% guarantee that it will work.
public Flux<Video> index() {
return videoRepository.findAll()
.flatMap(this::callLambda)
.flatMap(videoRepository::save);
}
private Mono<Video> callLambda(final Video video) {
SearchDTO searchDTO = new SearchDTO(video);
return webClient.post()
.uri(URI.create(LAMBDA_ENDPOINT))
.body(BodyInserters.fromObject(searchDTO))
.exchange()
.map(ClientResponse::statusCode)
.filter(HttpStatus::is2xxSuccessful)
.map(t -> {
video.setIndexed(true);
return video;
});
}

Related

how to merge the response of webClient call after calling 5 times and save the complete response in DB

i have scenario like:
i have to check the table if entry is available in DB then if available i need to call the same external api n times using webclient, collect all the response and save them in DB. if entry is not available in DB call the old flow.
here is my implementation. need suggestions to improve it. without for-each
public Mono<List<ResponseObject>> getdata(String id, Req obj) {
return isEntryInDB(id) //checking the entry in DB
.flatMap(
x -> {
final List<Mono<ResponseObject>> responseList = new ArrayList<>();
IntStream.range(0, obj.getQuantity()) // quantity decides how many times api call t happen
.forEach(
i -> {
Mono<ResponseObject> responseMono =
webClientCall(
id,
req.getType())
.map(
res ->
MapperForMappingDataToDesriedFormat(res));
responseList.add(responseMono);
});
return saveToDb(responseList);
})
.switchIfEmpty(oldFlow(id, req)); //if DB entry is not there take this existing flow.
need some suggestions to improve it without using foreach.
I would avoid using IntStream and rather use native operator to reactor called Flux in this case.
You can replace, InsStream.range with Flux.range. Something like this:
return isEntryPresent("123")
.flatMapMany(s -> Flux.range(0, obj.getQuantity())
.flatMap(this::callApi))
.collectList()
.flatMap(this::saveToDb)
.switchIfEmpty(Mono.defer(() ->oldFlow(id, req)));
private Mono<Object> saveToDb(List<String> stringList){
return Mono.just("done");
}
private Mono<String> callApi(int id) {
return Mono.just("iterating" + id);
}
private Mono<String> isEntryPresent(String id) {
return Mono.just("string");
}

Vertx delayed batch process

How can I process a list of delayed jobs in Vertx (actually
hundreds of HTTP GET requests, to limited API that bans fast requesting hosts)? now, I am using this code and it gets blocked because Vertx starts all requests at once. It is desirable to process each request with a 5-second delay between each request.
public void getInstrumnetDailyInfo(Instrument instrument,
Handler<AsyncResult<OptionInstrument>> handler) {
webClient
.get("/Loader")
.addQueryParam("i", instrument.getId())
.timeout(30000)
.send(
ar -> {
if (ar.succeeded()) {
String html = ar.result().bodyAsString();
Integer thatData = processHTML(html);
instrument.setThatData(thatData);
handler.handle(Future.succeededFuture(instrument));
} else {
// error
handler.handle(Future.failedFuture("error " +ar.cause()));
}
});
}
public void start(){
List<Instrument> instruments = loadInstrumentsList();
instruments.forEach(
instrument -> {
webClient.getInstrumnetDailyInfo(instrument,
async -> {
if(async.succeeded()){
instrumentMap.put(instrument.getId(), instrument);
}else {
log.warn("getInstrumnetDailyInfo: ", async.cause());
}
});
});
}
You can consider using a timer to fire events (rather than all at startup).
There are two variants in Vertx,
.setTimer() that fires a specific event after a delay
vertx.setTimer(interval, new Handler<T>() {});
and
2. .setPeriodic() that fires every time a specified period of time has passed.
vertx.setPeriodic(interval, new Handler<Long>() {});
setPeriodic seems to be what you are looking for.
You can get more info from the documentation
For more sophisticated Vertx scheduling use-cases, you can have a look at Chime or other schedulers or this module
You could use any out of the box rate limiter function and adapt it for async use.
An example with the RateLimiter from Guava:
// Make permits available at a rate of one every 5 seconds
private RateLimiter limiter = RateLimiter.create(1 / 5.0);
// A vert.x future that completes when it obtains a throttle permit
public Future<Double> throttle() {
return vertx.executeBlocking(p -> p.complete(limiter.acquire()), true);
}
Then...
throttle()
.compose(d -> {
System.out.printf("Waited %.2f before running job\n", d);
return runJob(); // runJob returns a Future result
});

Vert.x: How to wait for a future to complete

Is there a way to wait for a future to complete without blocking the event loop?
An example of a use case with querying Mongo:
Future<Result> dbFut = Future.future();
mongo.findOne("myusers", myQuery, new JsonObject(), res -> {
if(res.succeeded()) {
...
dbFut.complete(res.result());
}
else {
...
dbFut.fail(res.cause());
}
}
});
// Here I need the result of the DB query
if(dbFut.succeeded()) {
doSomethingWith(dbFut.result());
}
else {
error();
}
I know the doSomethingWith(dbFut.result()); can be moved to the handler, yet if it's long, the code will get unreadable (Callback hell ?) It that the right solution ? Is that the omny solution without additional libraries ?
I'm aware that rxJava simplifies the code, but as I don't know it, learning Vert.x and rxJava is just too much.
I also wanted to give a try to vertx-sync. I put the dependency in the pom.xml; everything got downloaded fine but when I started my app, I got the following error
maurice#mickey> java \
-javaagent:~/.m2/repository/co/paralleluniverse/quasar-core/0.7.5/quasar-core-0.7.5-jdk8.jar \
-jar target/app-dev-0.1-fat.jar \
-conf conf/config.json
Error opening zip file or JAR manifest missing : ~/.m2/repository/co/paralleluniverse/quasar-core/0.7.5/quasar-core-0.7.5-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
I know what the error means in general, but I don't know in that context... I tried to google for it but didn't find any clear explanation about which manifest to put where. And as previously, unless mandatory, I prefer to learn one thing at a time.
So, back to the question : is there a way with "basic" Vert.x to wait for a future without perturbation on the event loop ?
You can set a handler for the future to be executed upon completion or failure:
Future<Result> dbFut = Future.future();
mongo.findOne("myusers", myQuery, new JsonObject(), res -> {
if(res.succeeded()) {
...
dbFut.complete(res.result());
}
else {
...
dbFut.fail(res.cause());
}
}
});
dbFut.setHandler(asyncResult -> {
if(asyncResult.succeeded()) {
// your logic here
}
});
This is a pure Vert.x way that doesn't block the event loop
I agree that you should not block in the Vertx processing pipeline, but I make one exception to that rule: Start-up. By design, I want to block while my HTTP server is initialising.
This code might help you:
/**
* #return null when waiting on {#code Future<Void>}
*/
#Nullable
public static <T>
T awaitComplete(Future<T> f)
throws Throwable
{
final Object lock = new Object();
final AtomicReference<AsyncResult<T>> resultRef = new AtomicReference<>(null);
synchronized (lock)
{
// We *must* be locked before registering a callback.
// If result is ready, the callback is called immediately!
f.onComplete(
(AsyncResult<T> result) ->
{
resultRef.set(result);
synchronized (lock) {
lock.notify();
}
});
do {
// Nested sync on lock is fine. If we get a spurious wake-up before resultRef is set, we need to
// reacquire the lock, then wait again.
// Ref: https://stackoverflow.com/a/249907/257299
synchronized (lock)
{
// #Blocking
lock.wait();
}
}
while (null == resultRef.get());
}
final AsyncResult<T> result = resultRef.get();
#Nullable
final Throwable t = result.cause();
if (null != t) {
throw t;
}
#Nullable
final T x = result.result();
return x;
}

Repeat Single based on onSuccess() value

I want to repeat a Single based on the single value emitted in onSuccess(). Here is a working example
import org.reactivestreams.Publisher;
import io.reactivex.Flowable;
import io.reactivex.Single;
import io.reactivex.functions.Function;
public class Temp {
void main() {
Job job = new Job();
Single.just(job)
.map(this::processJob)
.repeatWhen(new Function<Flowable<Object>, Publisher<?>>() {
#Override
public Publisher<?> apply(Flowable<Object> objectFlowable) throws Exception {
// TODO repeat when Single emits false
return null;
}
})
.subscribe();
}
/**
* returns true if process succeeded, false if failed
*/
boolean processJob(Job job) {
return true;
}
class Job {
}
}
I understand how repeatWhen works for Observables by relying on the "complete" notification. However since Single doesn't receive that notification I'm not sure what the Flowable<Object> is really giving me. Also why do I need to return a Publisher from this function?
Instead of relying on a boolean value, you could make your job throw an exception when it fails:
class Job {
var isSuccess: Boolean = false
}
fun processJob(job: Job): String {
if (job.isSuccess) {
return "job succeeds"
} else {
throw Exception("job failed")
}
}
val job = Job()
Single.just(job)
.map { processJob(it) }
.retry() // will resubscribe until your job succeeds
.subscribe(
{ value -> print(value) },
{ error -> print(error) }
)
i saw a small discrepancy in the latest docs and your code, so i did a little digging...
(side note - i think the semantics of retryWhen seem like the more appropriate operator for your case, so i've substituted it in for your usage of repeatWhen. but i think the root of your problem remains the same in either case).
the signature for retryWhen is:
retryWhen(Function<? super Flowable<Throwable>,? extends Publisher<?>> handler)
that parameter is a factory function whose input is a source that emits anytime onError is called upstream, giving you the ability to insert custom retry logic that may be influenced through interrogation of the underlying Throwable. this begins to answer your first question of "I'm not sure what the Flowable<Object> is really giving me" - it shouldn't be Flowable<Object> to begin with, it should be Flowable<Throwable> (for the reason i just described).
so where did Flowable<Object> come from? i managed to reproduce IntelliJ's generation of this code through it's auto-complete feature using RxJava version 2.1.17. upgrading to 2.2.0, however, produces the correct result of Flowable<Throwable>. so, see if upgrading to the latest version generates the correct result for you as well.
as for your second question of "Also why do I need to return a Publisher from this function?" - this is used to determine if re-subscription should happen. if the factory function returns a Publisher that emits a terminal state (ie calls onError() or onComplete()) re-subscription will not happen. however, if onNext() is called, it will. (this also explains why the Publisher isn't typed - the type doesn't matter. the only thing that does matter is what kind of notification it publishes).
another way to rewrite this, incorporating the above, might be as follows:
// just some type to use as a signal to retry
private class SpecialException extends RuntimeException {}
// job processing results in a Completable that either completes or
// doesn't (by way of an exception)
private Completable rxProcessJob(Job job) {
return Completable.complete();
// return Completable.error(new SpecialException());
}
...
rxProcessJob(new Job())
.retryWhen(errors -> {
return errors.flatMap(throwable -> {
if(throwable instanceof SpecialException) {
return PublishProcessor.just(1);
}
return PublishProcessor.error(throwable);
});
})
.subscribe(
() -> {
System.out.println("## onComplete()");
},
error -> {
System.out.println("## onError(" + error.getMessage() + ")");
}
);
i hope that helps!
The accepted answer would work, but is hackish. You don't need to throw an error; simply filter the output of processJob which converts the Single to a Maybe, and then use the repeatWhen handler to decide how many times, or with what delay, you may want to resubscribe. See Kotlin code below from a working example, you should be able to easily translate this to Java.
filter { it }
.repeatWhen { handler ->
handler.zipWith(1..3) { _, i -> i }
.flatMap { retryCount -> Flowable.timer(retryDelay.toDouble().pow(retryCount).toLong(), TimeUnit.SECONDS) }
.doOnNext { log.warn("Retrying...") }
}

How to do Async Http Call with Apache Beam (Java)?

Input PCollection is http requests, which is a bounded dataset. I want to make async http call (Java) in a ParDo , parse response and put results into output PCollection. My code is below. Getting exception as following.
I cound't figure out the reason. need a guide....
java.util.concurrent.CompletionException: java.lang.IllegalStateException: Can't add element ValueInGlobalWindow{value=streaming.mapserver.backfill.EnrichedPoint#2c59e, pane=PaneInfo.NO_FIRING} to committed bundle in PCollection Call Map Server With Rate Throttle/ParMultiDo(ProcessRequests).output [PCollection]
Code:
public class ProcessRequestsFn extends DoFn<PreparedRequest,EnrichedPoint> {
private static AsyncHttpClient _HttpClientAsync;
private static ExecutorService _ExecutorService;
static{
AsyncHttpClientConfig cg = config()
.setKeepAlive(true)
.setDisableHttpsEndpointIdentificationAlgorithm(true)
.setUseInsecureTrustManager(true)
.addRequestFilter(new RateLimitedThrottleRequestFilter(100,1000))
.build();
_HttpClientAsync = asyncHttpClient(cg);
_ExecutorService = Executors.newCachedThreadPool();
}
#DoFn.ProcessElement
public void processElement(ProcessContext c) {
PreparedRequest request = c.element();
if(request == null)
return;
_HttpClientAsync.prepareGet((request.getRequest()))
.execute()
.toCompletableFuture()
.thenApply(response -> { if(response.getStatusCode() == HttpStatusCodes.STATUS_CODE_OK){
return response.getResponseBody();
} return null; } )
.thenApply(responseBody->
{
List<EnrichedPoint> resList = new ArrayList<>();
/*some process logic here*/
System.out.printf("%d enriched points back\n", result.length());
}
return resList;
})
.thenAccept(resList -> {
for (EnrichedPoint enrichedPoint : resList) {
c.output(enrichedPoint);
}
})
.exceptionally(ex->{
System.out.println(ex);
return null;
});
}
}
The Scio library implements a DoFn which deals with asynchronous operations. The BaseAsyncDoFn might provide you the handling you need. Since you're dealing with CompletableFuture also take a look at the JavaAsyncDoFn.
Please note that you necessarily don't need to use the Scio library, but you can take the main idea of the BaseAsyncDoFn since it's independent of the rest of the Scio library.
The issue that your hitting is that your outputting outside the context of a processElement or finishBundle call.
You'll want to gather all your outputs in memory and output them eagerly during future processElement calls and at the end within finishBundle by blocking till all your calls finish.