Combining groupBy and flatMap(maxConcurrent, ...) in RxJava/RxScala - scala

I have incoming processing requests, of which I want do not want too many processing concurrently due to depleting shared resources. I also would prefer requests which share some unique key to not be executed concurrently:
def process(request: Request): Observable[Answer] = ???
requestsStream
.groupBy(request => request.key)
.flatMap(maxConcurrentProcessing, { case (key, requestsForKey) =>
requestsForKey
.flatMap(1, process)
})
However, the above doesn't work because the observable per key never completes. What is the correct way to achieve this?
What doesn't work:
.flatMap(maxConcurrentProcessing, { case (key, requestsForKey) =>
// Take(1) unsubscribes after the first, causing groupBy to create a new observable, causing the next request to execute concurrently
requestsForKey.take(1)
.flatMap(1, process)
})
.flatMap(maxConcurrentProcessing, { case (key, requestsForKey) =>
// The idea was to unsubscribe after 100 milliseconds to "free up" maxConcurrentProcessing
// This discards all requests after the first if processing takes more than 100 milliseconds
requestsForKey.timeout(100.millis, Observable.empty)
.flatMap(1, process)
})

Here's how I managed to achieve this. For each unique key I am assigning dedicated single thread scheduler (so that messages with the same key are processed in order):
#Test
public void groupBy() throws InterruptedException {
final int NUM_GROUPS = 10;
Observable.interval(1, TimeUnit.MILLISECONDS)
.map(v -> {
logger.info("received {}", v);
return v;
})
.groupBy(v -> v % NUM_GROUPS)
.flatMap(grouped -> {
long key = grouped.getKey();
logger.info("selecting scheduler for key {}", key);
return grouped
.observeOn(assignScheduler(key))
.map(v -> {
String threadName = Thread.currentThread().getName();
Assert.assertEquals("proc-" + key, threadName);
logger.info("processing {} on {}", v, threadName);
return v;
})
.observeOn(Schedulers.single()); // re-schedule
})
.subscribe(v -> logger.info("got {}", v));
Thread.sleep(1000);
}
In my case the number of keys (NUM_GROUPS) is small so I create dedicated scheduler for each key:
Scheduler assignScheduler(long key) {
return Schedulers.from(Executors.newSingleThreadExecutor(
r -> new Thread(r, "proc-" + key)));
}
In case the number of keys is infinite or too large to dedicate a thread for each one, you can create a pool of schedulers and reuse them like this:
Scheduler assignScheduler(long key) {
// assign randomly
return poolOfSchedulers[random.nextInt(SIZE_OF_POOL)];
}

Related

What would cause SingleEmitter.onSuccess() to generate a NoSuchElement exception?

I have a Single flow organized like this:
getSomething() // returns Single<>
.flatMap(something -> {
// various things
return Single.defer( () -> {
// various other things
return Single.<SomeType>create(emitter -> {
// some more stuff
someCallbackApi(result -> {
if (result.isError()) {
emitter.onError( result.getCause() );
} else {
// guaranteed non-null data
emitter.onSuccess( result.getData() ); // this generates NoSuchElement
}
});
});
})
.retryWhen( ... )
.flatMap( data -> handle(data) )
.retryWhen( ... );
})
.retryWhen( ... )
.onErrorResumeNext(error -> process(error))
.subscribe(data -> handleSuccess(data), error -> handleError(error));
In test cases, the callback api Single successfully retries a number of times (determined by the test case), and every time on the last retry, the call to emitter.onSuccess() generates the exception below. What is going on? I haven't been able to restructure or change the downstream operators or subscribers to avoid the problem.
java.util.NoSuchElementException: null
at io.reactivex.internal.operators.flowable.FlowableSingleSingle$SingleElementSubscriber.onComplete(FlowableSingleSingle.java:116)
at io.reactivex.subscribers.SerializedSubscriber.onComplete(SerializedSubscriber.java:168)
at io.reactivex.internal.operators.flowable.FlowableRepeatWhen$WhenReceiver.onComplete(FlowableRepeatWhen.java:118)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:426)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onComplete(FlowableFlatMap.java:338)
at io.reactivex.internal.operators.flowable.FlowableZip$ZipCoordinator.drain(FlowableZip.java:210)
at io.reactivex.internal.operators.flowable.FlowableZip$ZipSubscriber.onNext(FlowableZip.java:381)
at io.reactivex.processors.UnicastProcessor.drainFused(UnicastProcessor.java:363)
at io.reactivex.processors.UnicastProcessor.drain(UnicastProcessor.java:396)
at io.reactivex.processors.UnicastProcessor.onNext(UnicastProcessor.java:458)
at io.reactivex.processors.SerializedProcessor.onNext(SerializedProcessor.java:103)
at io.reactivex.internal.operators.flowable.FlowableRepeatWhen$WhenSourceSubscriber.again(FlowableRepeatWhen.java:171)
at io.reactivex.internal.operators.flowable.FlowableRetryWhen$RetryWhenSubscriber.onError(FlowableRetryWhen.java:76)
at io.reactivex.internal.operators.single.SingleToFlowable$SingleToFlowableObserver.onError(SingleToFlowable.java:67)
at io.reactivex.internal.operators.single.SingleFlatMap$SingleFlatMapCallback$FlatMapSingleObserver.onError(SingleFlatMap.java:116)
at io.reactivex.internal.operators.flowable.FlowableSingleSingle$SingleElementSubscriber.onError(FlowableSingleSingle.java:97)
at io.reactivex.subscribers.SerializedSubscriber.onError(SerializedSubscriber.java:142)
at io.reactivex.internal.operators.flowable.FlowableRepeatWhen$WhenReceiver.onError(FlowableRepeatWhen.java:112)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.checkTerminate(FlowableFlatMap.java:567)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:374)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.innerError(FlowableFlatMap.java:606)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onError(FlowableFlatMap.java:672)
at io.reactivex.internal.subscriptions.EmptySubscription.error(EmptySubscription.java:55)
at io.reactivex.internal.operators.flowable.FlowableError.subscribeActual(FlowableError.java:40)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableZip$ZipCoordinator.drain(FlowableZip.java:249)
at io.reactivex.internal.operators.flowable.FlowableZip$ZipSubscriber.onNext(FlowableZip.java:381)
at io.reactivex.processors.UnicastProcessor.drainFused(UnicastProcessor.java:363)
at io.reactivex.processors.UnicastProcessor.drain(UnicastProcessor.java:396)
at io.reactivex.processors.UnicastProcessor.onNext(UnicastProcessor.java:458)
at io.reactivex.processors.SerializedProcessor.onNext(SerializedProcessor.java:103)
at io.reactivex.internal.operators.flowable.FlowableRepeatWhen$WhenSourceSubscriber.again(FlowableRepeatWhen.java:171)
at io.reactivex.internal.operators.flowable.FlowableRetryWhen$RetryWhenSubscriber.onError(FlowableRetryWhen.java:76)
at io.reactivex.internal.operators.single.SingleToFlowable$SingleToFlowableObserver.onError(SingleToFlowable.java:67)
at io.reactivex.internal.operators.single.SingleFlatMap$SingleFlatMapCallback$FlatMapSingleObserver.onError(SingleFlatMap.java:116)
at io.reactivex.internal.disposables.EmptyDisposable.error(EmptyDisposable.java:78)
at io.reactivex.internal.operators.single.SingleError.subscribeActual(SingleError.java:42)
at io.reactivex.Single.subscribe(Single.java:3603)
at io.reactivex.internal.operators.single.SingleFlatMap$SingleFlatMapCallback.onSuccess(SingleFlatMap.java:84)
at io.reactivex.internal.operators.flowable.FlowableSingleSingle$SingleElementSubscriber.onComplete(FlowableSingleSingle.java:114)
at io.reactivex.subscribers.SerializedSubscriber.onComplete(SerializedSubscriber.java:168)
at io.reactivex.internal.operators.flowable.FlowableRetryWhen$RetryWhenSubscriber.onComplete(FlowableRetryWhen.java:82)
at io.reactivex.internal.subscriptions.DeferredScalarSubscription.complete(DeferredScalarSubscription.java:134)
at io.reactivex.internal.operators.single.SingleToFlowable$SingleToFlowableObserver.onSuccess(SingleToFlowable.java:62)
at io.reactivex.internal.operators.single.SingleCreate$Emitter.onSuccess(SingleCreate.java:67)
Solved:
Many thanks to #dano for pointing out the retryWhen behavior when used with Single. In this case, the outermost retryWhen operator had a bad terminating condition, roughly like:
.retryWhen(errors -> errors.zipWith( Flowable.range(1, maxRetries), ...)
.flatMap( zipped -> {
if (zipped.retryCount() <= maxRetries) {
return Flowable.just(0L);
}
return Flowable.error( new Exception() );
})
...Flowable.range() will complete when it has generated the last number, which will cause the Single to emit NoSuchElement. Just bumping the count argument to Flowable.range() by one is enough to fix the problem:
.retryWhen(errors -> errors.zipWith( Flowable.range(1, maxRetries + 1), ...)
.flatMap( zipped -> {
if (zipped.retryCount() <= maxRetries) {
return Flowable.just(0L);
}
return Flowable.error( new Exception() );
})
This is happening because of the way you implemented the callback you passed to retryWhen. The retryWhen docuementation states (emphasis mine):
Re-subscribes to the current Single if and when the Publisher returned
by the handler function signals a value.
If the Publisher signals an onComplete, the resulting Single will
signal a NoSuchElementException.
One of the Flowable instances you're returning inside of the calls to retryWhen is emitting onComplete, which leads to the NoSuchElementException.
Here's a very simple example that produces the same error:
Single.error(new Exception("hey"))
.retryWhen(e -> Flowable.just(1))
.subscribe(System.out::println, e -> e.printStackTrace());
The stacktrace this produces starts with this, same as yours:
java.util.NoSuchElementException
at io.reactivex.internal.operators.flowable.FlowableSingleSingle$SingleElementSubscriber.onComplete(FlowableSingleSingle.java:116)
at io.reactivex.subscribers.SerializedSubscriber.onComplete(SerializedSubscriber.java:168)
at io.reactivex.internal.operators.flowable.FlowableRepeatWhen$WhenReceiver.onComplete(FlowableRepeatWhen.java:118)
You don't include any of your code from inside the retryWhen calls, so I can't say exactly what you did wrong, but generally you want to chain whatever you do to the Flowable that is passed in. So my example above would look like this, if we really wanted to retry forever:
Single.error(new Exception("hey"))
.retryWhen(e -> e.flatMap(ign -> Flowable.just(1)))
.subscribe(System.out::println, e -> e.printStackTrace());

Schedule running a reactive stream for every 1 min

I have a reactive stream that gets some data, loops through the data, processes the data, finally writes the data to Kafka
public Flux<M> sendData(){
Flux.fromIterable(o.getC()).publishOn(Schedulers.boundedElastic())
.flatMap(id->
Flux.fromIterable(getM(id)).publishOn(Schedulers.boundedElastic())
.flatMap( n -> {
return Flux.fromIterable(o.getD()).publishOn(Schedulers.boundedElastic())
.flatMap(d -> return Flux.just(sendToKafka));
})
)
.doOnError(throwable -> {
log.debug("Error while reading data : {} ", throwable.getMessage());
return;
})
.subscribe();
}
public void run(String... args){
sendData();
}
I want this workflow to be run every minute. Can some one help me understand how to schedule this within the stream?
You can do something like this if you want to run something per minute.
Flux.interval(Duration.ofMinutes(1))
.onBackpressureDrop()
.flatMap(n -> sendData())
.subscribeOn(Schedulers.boundedElastic())
.subscribe()

Kafka Streams: action on n-th event

I'm trying to find the best way how to perform an action on n-th event in Kafka Streams.
My case: I have an input stream with some Events. I have to filter them by eventType == login and on each n-th login (let's say, fifth) for the same accountId send this Event to the output stream.
After some investigation and different tries, I have the version of the code below (I'm using Kotlin).
data class Event(
val payload: Any = {},
val accountId: String,
val eventType: String = ""
)
// intermediate class to keep the key and value of the original event
data class LoginEvent(
val eventKey: String,
val eventValue: Event
)
fun process() {
val userLoginsStoreBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("logins"),
Serdes.String(),
Serdes.Integer()
)
val streamsBuilder = StreamsBuilder().addStateStore(userCheckInsStoreBuilder)
val inputStream = streamsBuilder.stream<String, String>(inputTopic)
inputStream.map { key, event ->
KeyValue(key, json.readValue<Event>(event))
}.filter { _, event -> event.eventType == "login" }
.map { key, event -> KeyValue(event.accountId, LoginEvent(key, event)) }
.transform(
UserLoginsTransformer("logins", 5),
"logins"
)
.filter { _, value -> value }
.map { key, _ -> KeyValue(key.eventKey, json.writeValueAsString(key.eventValue)) }
.to("fifth_login", Produced.with(Serdes.String(), Serdes.String()))
...
}
class UserLoginsTransformer(private val storeName: String, private val loginsThreshold: Int = 5) :
TransformerSupplier<String, CheckInEvent, KeyValue< LoginEvent, Boolean>> {
override fun get(): Transformer<String, LoginEvent, KeyValue< LoginEvent, Boolean>> {
return object : Transformer<String, LoginEvent, KeyValue< LoginEvent, Boolean>> {
private lateinit var store: KeyValueStore<String, Int>
#Suppress("UNCHECKED_CAST")
override fun init(context: ProcessorContext) {
store = context.getStateStore(storeName) as KeyValueStore<String, Int>
}
override fun transform(key: String, value: LoginEvent): KeyValue< LoginEvent, Boolean> {
val counter = (store.get(key) ?: 0) + 1
return if (counter == loginsThreshold) {
store.delete(key)
KeyValue(value, true)
} else {
store.put(key, counter)
KeyValue(value, false)
}
}
override fun close() {
}
}
}
}
My biggest concern is that transform function is not thread-safe in my case. I've checked the implementation of the KV-store that is used in my case and this is RocksDB store (non-transactional) so the value may be updated between reading and comparison and the wrong event will be sent to the output.
My other ideas:
Use materialized views as a store without a transformer but I'm stuck with implementation.
Create a custom persistent KV store that will use TransactionalRocksDB (not sure if it is worth).
Create a custom persistent KV store that will use ConcurrentHashMap inside (it may lead to the high memory consumption in case of many users that we are expecting).
One more note: I'm using Spring Cloud Stream so maybe this framework has a built-in solution for my case but I didn't find it.
I would appreciate any suggestions. Thanks in advance.
My biggest concern is that transform function is not thread-safe in my case. I've checked the implementation of the KV-store that is used in my case and this is RocksDB store (non-transactional) so the value may be updated between reading and comparison and the wrong event will be sent to the output.
There is no reason to be concerned. If you run with multiple threads, each thread will have it's own RocksDB that store one shard of the overall data (note that the overall state is sharded based in input topic partitions and a single shard is never processed by different threads). Hence, your code will work correctly. The only thing you need to ensure is, that data is partitions by accountId, such that login events of a single account go to the same shard.
If you input data is already partitioned by accountId when written into your input topic, you don't need to do anything. If not, and you can control the upstream application, it might be simplest to use a custom partitioner in the upstream's application producer to get the partitioning you need. If you can't change the upstream application, you would need to repartition the data after you have set the accountId as new key, ie, by doing a through() before you call transform().

Rxjava User-Retry observable with .cache operator?

i've an observable that I create with the following code.
Observable.create(new Observable.OnSubscribe<ReturnType>() {
#Override
public void call(Subscriber<? super ReturnType> subscriber) {
try {
if (!subscriber.isUnsubscribed()) {
subscriber.onNext(performRequest());
}
subscriber.onCompleted();
} catch (Exception e) {
subscriber.onError(e);
}
}
});
performRequest() will perform a long running task as you might expect.
Now, since i might be launching the same Observable twice or more in a very short amount of time, I decided to write such transformer:
protected Observable.Transformer<ReturnType, ReturnType> attachToRunningTaskIfAvailable() {
return origObservable -> {
synchronized (mapOfRunningTasks) {
// If not in maps
if ( ! mapOfRunningTasks.containsKey(getCacheKey()) ) {
Timber.d("Cache miss for %s", getCacheKey());
mapOfRunningTasks.put(
getCacheKey(),
origObservable
.doOnTerminate(() -> {
Timber.d("Removed from tasks %s", getCacheKey());
synchronized (mapOfRunningTasks) {
mapOfRunningTasks.remove(getCacheKey());
}
})
.cache()
);
} else {
Timber.d("Cache Hit for %s", getCacheKey());
}
return mapOfRunningTasks.get(getCacheKey());
}
};
}
Which basically puts the original .cache observable in a HashMap<String, Observable>.
This basically disallows multiple requests with the same getCacheKey() (Example login) to call performRequest() in parallel. Instead, if a second login request arrives while another is in progress, the second request observable gets "discarded" and the already-running will be used instead. => All the calls to onNext are going to be cached and sent to both subscribers actually hitting my backend only once.
Now, suppouse this code:
// Observable loginTask
public void doLogin(Observable<UserInfo> loginTask) {
loginTask.subscribe(
(userInfo) -> {},
(throwable) -> {
if (userWantsToRetry()) {
doLogin(loinTask);
}
}
);
}
Where loginTask was composed with the previous transformer. Well, when an error occurs (might be connectivity) and the userWantsToRetry() then i'll basically re-call the method with the same observable. Unfortunately that has been cached and I'll receive the same error without hitting performRequest() again since the sequence gets replayed.
Is there a way I could have both the "same requests grouping" behavior that the transformer provides me AND the retry button?
Your question has a lot going on and it's hard to put it into direct terms. I can make a couple recommendations though. Firstly your Observable.create can be simplified by using an Observable.defer(Func0<Observable<T>>). This will run the func every time a new subscriber is subscribed and catch and channel any exceptions to the subscriber's onError.
Observable.defer(() -> {
return Observable.just(performRequest());
});
Next, you can use observable.repeatWhen(Func1<Observable<Void>, Observable<?>>) to decide when you want to retry. Repeat operators will re-subscribe to the observable after an onComplete event. This particular overload will send an event to a subject when an onComplete event is received. The function you provide will receive this subject. Your function should call something like takeWhile(predicate) and onComplete when you do not want to retry again.
Observable.just(1,2,3).flatMap((Integer num) -> {
final AtomicInteger tryCount = new AtomicInteger(0);
return Observable.just(num)
.repeatWhen((Observable<? extends Void> notifications) ->
notifications.takeWhile((x) -> num == 2 && tryCount.incrementAndGet() != 3));
})
.subscribe(System.out::println);
Output:
1
2
2
2
3
The above example shows that retries are aloud when the event is not 2 and up to a max of 22 retries. If you switch to a repeatWhen then the flatMap would contain your decision as to use a cached observable or the realWork observable. Hope this helps!

API Observable with dynamic caching

An API I'm polling has a field that defines the time that value is cached, cachedUntil. The goal is to create an Observable that polls and emits an event every time the cache has expired. The thing that distinguishes this case, is that the caching is not regular. I.e. Observable.interval does not apply.
In what ways is it possible to implement an Observable that has this behaviour?
The following snippet gives a function that polls the API, emits the requested events and return the cachedUntil delay to the next call.
def getContracts(subscriber: Subscriber[Set[EveContract]]): Option[Long] = {
logger.debug("Fetching new contracts")
try {
val response = parser.getResponse(auth)
if(response == null) {
subscriber.onError(new RuntimeException("Unable to fetch contracts from EVE servers"))
None
}
else if(response.hasError) {
logger.error(response.getError.toString)
subscriber.onError(new RuntimeException(response.getError.toString))
None
} else {
subscriber.onNext(response.getAll.toSet) // Emit new polled data
Some(response.getCachedUntil.getTime - new Date().getTime) // Return the cache delay
}
} catch {
case aex: ApiException ⇒
logger.error("An error occurred when querying the EVE API.")
logger.debug("ApiException: ", aex)
subscriber.onError(aex)
None
}
}
It is possible to use Scheduler workers to reschedule a call togetContracts:
Observable[Set[EveContract]](observer ⇒ {
val worker = Schedulers.newThread().createWorker()
def scheduleContracts(delay: Long) {
worker.schedule(new Action0 {
override def call(){
if(!observer.isUnsubscribed) {
val delay = getContracts(observer)
delay match {
// Reschedule a contract fetch after time d has passed.
case Some(d) ⇒
logger.debug(s"Rescheduling contract fetch in: ${d / 1000} s")
scheduleContracts(d)
case _ ⇒
// Otherwise do nothing
logger.debug("Not rescheduling contract fetch, an error has occured.")
}
} else {
logger.trace("Subscriber has unsubscribed.")
}
}
}, delay, TimeUnit.MILLISECONDS)
}
scheduleContracts(0L)
})
However, I'm very interested in possible other solutions.