Schedule running a reactive stream for every 1 min - apache-kafka

I have a reactive stream that gets some data, loops through the data, processes the data, finally writes the data to Kafka
public Flux<M> sendData(){
Flux.fromIterable(o.getC()).publishOn(Schedulers.boundedElastic())
.flatMap(id->
Flux.fromIterable(getM(id)).publishOn(Schedulers.boundedElastic())
.flatMap( n -> {
return Flux.fromIterable(o.getD()).publishOn(Schedulers.boundedElastic())
.flatMap(d -> return Flux.just(sendToKafka));
})
)
.doOnError(throwable -> {
log.debug("Error while reading data : {} ", throwable.getMessage());
return;
})
.subscribe();
}
public void run(String... args){
sendData();
}
I want this workflow to be run every minute. Can some one help me understand how to schedule this within the stream?

You can do something like this if you want to run something per minute.
Flux.interval(Duration.ofMinutes(1))
.onBackpressureDrop()
.flatMap(n -> sendData())
.subscribeOn(Schedulers.boundedElastic())
.subscribe()

Related

Vertx delayed batch process

How can I process a list of delayed jobs in Vertx (actually
hundreds of HTTP GET requests, to limited API that bans fast requesting hosts)? now, I am using this code and it gets blocked because Vertx starts all requests at once. It is desirable to process each request with a 5-second delay between each request.
public void getInstrumnetDailyInfo(Instrument instrument,
Handler<AsyncResult<OptionInstrument>> handler) {
webClient
.get("/Loader")
.addQueryParam("i", instrument.getId())
.timeout(30000)
.send(
ar -> {
if (ar.succeeded()) {
String html = ar.result().bodyAsString();
Integer thatData = processHTML(html);
instrument.setThatData(thatData);
handler.handle(Future.succeededFuture(instrument));
} else {
// error
handler.handle(Future.failedFuture("error " +ar.cause()));
}
});
}
public void start(){
List<Instrument> instruments = loadInstrumentsList();
instruments.forEach(
instrument -> {
webClient.getInstrumnetDailyInfo(instrument,
async -> {
if(async.succeeded()){
instrumentMap.put(instrument.getId(), instrument);
}else {
log.warn("getInstrumnetDailyInfo: ", async.cause());
}
});
});
}
You can consider using a timer to fire events (rather than all at startup).
There are two variants in Vertx,
.setTimer() that fires a specific event after a delay
vertx.setTimer(interval, new Handler<T>() {});
and
2. .setPeriodic() that fires every time a specified period of time has passed.
vertx.setPeriodic(interval, new Handler<Long>() {});
setPeriodic seems to be what you are looking for.
You can get more info from the documentation
For more sophisticated Vertx scheduling use-cases, you can have a look at Chime or other schedulers or this module
You could use any out of the box rate limiter function and adapt it for async use.
An example with the RateLimiter from Guava:
// Make permits available at a rate of one every 5 seconds
private RateLimiter limiter = RateLimiter.create(1 / 5.0);
// A vert.x future that completes when it obtains a throttle permit
public Future<Double> throttle() {
return vertx.executeBlocking(p -> p.complete(limiter.acquire()), true);
}
Then...
throttle()
.compose(d -> {
System.out.printf("Waited %.2f before running job\n", d);
return runJob(); // runJob returns a Future result
});

How to perform an action if nothing is emitted from the observable after X seconds?

I am subscribed to an observable, but I only want to listen to it for say 3 seconds. If nothing is emitted after 3 seconds, I want to perform a particular action. I have looked at lot of places, but can't seem to find an answer.
An example of it is listening for a web response for 3 seconds. If nothing is received after 3 seconds, I want to print out "Timed out" etc.
How can one achieve this in RxJava?
Have you tried the timeout operator?
public static void main(String[] args) throws InterruptedException {
requestToApi()
.timeout(3, TimeUnit.SECONDS)
.subscribe(
System.out::println,
error -> System.out.println("Timed out")
);
Thread.sleep(5000);
}
private static Observable<String> requestToApi() {
// simulate a request to an API that will take 5 seconds to return a response
return Observable.just("response").delay(5, TimeUnit.SECONDS);
}

rxjava complete after retryWhen on completeable

I'm using the retryWhen operator on a Completeable, is there a way to tell it to complete from the retry Flowable?
something like this -
PublishSubject<?> retrySubject = PublishSubject.create();
public void someFunction() {
someCompletable.retryWhen(new Function<Flowable<Throwable>, Publisher<?>>() {
#Override
public Publisher<?> apply(Flowable<Throwable> throwableFlowable) throws Exception {
return throwableFlowable.flatMap(throwable -> retrySubject.toFlowable(BackpressureStrategy.MISSING));
}
}).subscribe();
}
public void ignoreError(){
retrySubject.onComplete();
}
You can't stop a flatMap by giving it an empty source. Also every error will keep subscribing more and more observers to that subject that causes memory leak.
Use takeUntil to stop a sequence via the help of another source:
PublishProcessor<Throwable> stopProcessor = PublishProcessor.create();
source.retryWhen(errors ->
errors.takeUntil(
stopProcessor
)
.flatMap(error -> Flowable.timer(1, TimeUnit.SECONDS))
)
stopProcessor.onComplete();
Edit If you want to reuse the same subject, you can suppress items on the stop path:
PublishProcessor<Integer> stopProcessor = PublishProcessor.create();
source.retryWhen(errors ->
errors.takeUntil(
stopProcessor.ignoreElements().toFlowable()
)
.flatMap(error -> stopProcessor)
)
// retry
stopProcessor.onNext(1);
// stop
stopProcessor.onComplete();

Combining groupBy and flatMap(maxConcurrent, ...) in RxJava/RxScala

I have incoming processing requests, of which I want do not want too many processing concurrently due to depleting shared resources. I also would prefer requests which share some unique key to not be executed concurrently:
def process(request: Request): Observable[Answer] = ???
requestsStream
.groupBy(request => request.key)
.flatMap(maxConcurrentProcessing, { case (key, requestsForKey) =>
requestsForKey
.flatMap(1, process)
})
However, the above doesn't work because the observable per key never completes. What is the correct way to achieve this?
What doesn't work:
.flatMap(maxConcurrentProcessing, { case (key, requestsForKey) =>
// Take(1) unsubscribes after the first, causing groupBy to create a new observable, causing the next request to execute concurrently
requestsForKey.take(1)
.flatMap(1, process)
})
.flatMap(maxConcurrentProcessing, { case (key, requestsForKey) =>
// The idea was to unsubscribe after 100 milliseconds to "free up" maxConcurrentProcessing
// This discards all requests after the first if processing takes more than 100 milliseconds
requestsForKey.timeout(100.millis, Observable.empty)
.flatMap(1, process)
})
Here's how I managed to achieve this. For each unique key I am assigning dedicated single thread scheduler (so that messages with the same key are processed in order):
#Test
public void groupBy() throws InterruptedException {
final int NUM_GROUPS = 10;
Observable.interval(1, TimeUnit.MILLISECONDS)
.map(v -> {
logger.info("received {}", v);
return v;
})
.groupBy(v -> v % NUM_GROUPS)
.flatMap(grouped -> {
long key = grouped.getKey();
logger.info("selecting scheduler for key {}", key);
return grouped
.observeOn(assignScheduler(key))
.map(v -> {
String threadName = Thread.currentThread().getName();
Assert.assertEquals("proc-" + key, threadName);
logger.info("processing {} on {}", v, threadName);
return v;
})
.observeOn(Schedulers.single()); // re-schedule
})
.subscribe(v -> logger.info("got {}", v));
Thread.sleep(1000);
}
In my case the number of keys (NUM_GROUPS) is small so I create dedicated scheduler for each key:
Scheduler assignScheduler(long key) {
return Schedulers.from(Executors.newSingleThreadExecutor(
r -> new Thread(r, "proc-" + key)));
}
In case the number of keys is infinite or too large to dedicate a thread for each one, you can create a pool of schedulers and reuse them like this:
Scheduler assignScheduler(long key) {
// assign randomly
return poolOfSchedulers[random.nextInt(SIZE_OF_POOL)];
}

Rxjava User-Retry observable with .cache operator?

i've an observable that I create with the following code.
Observable.create(new Observable.OnSubscribe<ReturnType>() {
#Override
public void call(Subscriber<? super ReturnType> subscriber) {
try {
if (!subscriber.isUnsubscribed()) {
subscriber.onNext(performRequest());
}
subscriber.onCompleted();
} catch (Exception e) {
subscriber.onError(e);
}
}
});
performRequest() will perform a long running task as you might expect.
Now, since i might be launching the same Observable twice or more in a very short amount of time, I decided to write such transformer:
protected Observable.Transformer<ReturnType, ReturnType> attachToRunningTaskIfAvailable() {
return origObservable -> {
synchronized (mapOfRunningTasks) {
// If not in maps
if ( ! mapOfRunningTasks.containsKey(getCacheKey()) ) {
Timber.d("Cache miss for %s", getCacheKey());
mapOfRunningTasks.put(
getCacheKey(),
origObservable
.doOnTerminate(() -> {
Timber.d("Removed from tasks %s", getCacheKey());
synchronized (mapOfRunningTasks) {
mapOfRunningTasks.remove(getCacheKey());
}
})
.cache()
);
} else {
Timber.d("Cache Hit for %s", getCacheKey());
}
return mapOfRunningTasks.get(getCacheKey());
}
};
}
Which basically puts the original .cache observable in a HashMap<String, Observable>.
This basically disallows multiple requests with the same getCacheKey() (Example login) to call performRequest() in parallel. Instead, if a second login request arrives while another is in progress, the second request observable gets "discarded" and the already-running will be used instead. => All the calls to onNext are going to be cached and sent to both subscribers actually hitting my backend only once.
Now, suppouse this code:
// Observable loginTask
public void doLogin(Observable<UserInfo> loginTask) {
loginTask.subscribe(
(userInfo) -> {},
(throwable) -> {
if (userWantsToRetry()) {
doLogin(loinTask);
}
}
);
}
Where loginTask was composed with the previous transformer. Well, when an error occurs (might be connectivity) and the userWantsToRetry() then i'll basically re-call the method with the same observable. Unfortunately that has been cached and I'll receive the same error without hitting performRequest() again since the sequence gets replayed.
Is there a way I could have both the "same requests grouping" behavior that the transformer provides me AND the retry button?
Your question has a lot going on and it's hard to put it into direct terms. I can make a couple recommendations though. Firstly your Observable.create can be simplified by using an Observable.defer(Func0<Observable<T>>). This will run the func every time a new subscriber is subscribed and catch and channel any exceptions to the subscriber's onError.
Observable.defer(() -> {
return Observable.just(performRequest());
});
Next, you can use observable.repeatWhen(Func1<Observable<Void>, Observable<?>>) to decide when you want to retry. Repeat operators will re-subscribe to the observable after an onComplete event. This particular overload will send an event to a subject when an onComplete event is received. The function you provide will receive this subject. Your function should call something like takeWhile(predicate) and onComplete when you do not want to retry again.
Observable.just(1,2,3).flatMap((Integer num) -> {
final AtomicInteger tryCount = new AtomicInteger(0);
return Observable.just(num)
.repeatWhen((Observable<? extends Void> notifications) ->
notifications.takeWhile((x) -> num == 2 && tryCount.incrementAndGet() != 3));
})
.subscribe(System.out::println);
Output:
1
2
2
2
3
The above example shows that retries are aloud when the event is not 2 and up to a max of 22 retries. If you switch to a repeatWhen then the flatMap would contain your decision as to use a cached observable or the realWork observable. Hope this helps!