Suppose we want to have Flux pipeline to process all messages supplied from several threads. Let's consider the code below:
#Test
public void testFluxCreate() throws InterruptedException {
EmitterProcessor<String> processor = EmitterProcessor.create();
CountDownLatch latch = new CountDownLatch(1);
AtomicLong counter = new AtomicLong();
AtomicLong batch = new AtomicLong();
Flux<List<String>> flux = processor
.doOnSubscribe(ss -> System.out.println(nm() + " : subscribing to + ss))
.onBackpressureError()
.buffer(7)
.publishOn(Schedulers.immediate())
.doOnNext(it -> {
counter.addAndGet(it.size());
System.out.println(batch.incrementAndGet() + " : " + nm() + "Batch: " + it.size());
})
;
CompletableFuture<Void> producer = CompletableFuture.runAsync(() -> {
IntStream.range(1, 1001).forEach(it -> {
//sleep();
processor.onNext("Message-" + it);
});
});
CompletableFuture<Void> producer2 = CompletableFuture.runAsync(() -> {
IntStream.range(1, 1001).forEach(it -> {
//sleep();
processor.onNext("Message2-" + it);
});
});
CompletableFuture<Void> future = CompletableFuture.allOf(producer, producer2).thenAccept(it -> processor.onComplete());
flux.doOnComplete(latch::countDown).subscribe();
future.join();
latch.await();
System.out.println("Total: " + counter);
}
The counter shows us that each time we execute this code the actual number of messages processed is different.
What's wrong with this implementation?
How can we ensure that all the messages were processed before the program ends?
What's wrong with this implementation?
When I run the code I get the following in the logs early after start:
18:39:12.590 [ForkJoinPool.commonPool-worker-1] DEBUG reactor.core.publisher.Operators - Duplicate Subscription has been detected
java.lang.IllegalStateException: Spec. Rule 2.12 - Subscriber.onSubscribe MUST NOT be called more than once (based on object equality)
at reactor.core.Exceptions.duplicateOnSubscribeException(Exceptions.java:162)
at reactor.core.publisher.Operators.reportSubscriptionSet(Operators.java:502)
at reactor.core.publisher.Operators.setOnce(Operators.java:607)
at reactor.core.publisher.EmitterProcessor.onNext(EmitterProcessor.java:245)
at de.schauder.reactivethreads.demo.StackoverflowQuicky.lambda$null$2(StackoverflowQuicky.java:54)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:557)
at de.schauder.reactivethreads.demo.StackoverflowQuicky.lambda$main$3(StackoverflowQuicky.java:52)
I'm not familiar with EmitterProcessor but it seems onNext is not thread safe and I'm strongly suspecting that this is the cause for the missing events.
How can we ensure that all the messages were processed before the program ends?
I'd use two separate Producers and merge those. Also I think you don't need the count down latch.
public static void main(String[] args) {
AtomicLong counter = new AtomicLong();
AtomicLong batch = new AtomicLong();
EmitterProcessor<String> processor1 = EmitterProcessor.create();
EmitterProcessor<String> processor2 = EmitterProcessor.create();
Thread thread1 = constructThread(processor1);
Thread thread2 = constructThread(processor2);
Flux<List<String>> flux = processor1.mergeWith(processor2)
.buffer(7)
.onBackpressureError()
.publishOn(Schedulers.immediate())
.doOnNext(it -> {
counter.addAndGet(it.size());
System.out.println(batch.incrementAndGet() + " : Batch: " + it.size());
}).doOnComplete(() -> {
System.out.println("Total count: " + counter.get());
});
thread1.start();
thread2.start();
flux.blockLast();
}
private static Thread constructThread(EmitterProcessor<String> processor) {
return new Thread(() -> {
IntStream.range(1, 1001).forEach(it -> {
processor.onNext("Message2-" + it);
});
processor.onComplete();
});
}
Note about my comment:
onBackpressureError() causes the Flux to emit an error when the subscriber can't handle all the events fast enough, so this could explain the mismatch, but you'd see an exception.
Related
I have a reactive stream that gets some data, loops through the data, processes the data, finally writes the data to Kafka
public Flux<M> sendData(){
Flux.fromIterable(o.getC()).publishOn(Schedulers.boundedElastic())
.flatMap(id->
Flux.fromIterable(getM(id)).publishOn(Schedulers.boundedElastic())
.flatMap( n -> {
return Flux.fromIterable(o.getD()).publishOn(Schedulers.boundedElastic())
.flatMap(d -> return Flux.just(sendToKafka));
})
)
.doOnError(throwable -> {
log.debug("Error while reading data : {} ", throwable.getMessage());
return;
})
.subscribe();
}
public void run(String... args){
sendData();
}
I want this workflow to be run every minute. Can some one help me understand how to schedule this within the stream?
You can do something like this if you want to run something per minute.
Flux.interval(Duration.ofMinutes(1))
.onBackpressureDrop()
.flatMap(n -> sendData())
.subscribeOn(Schedulers.boundedElastic())
.subscribe()
I am using Apache Curator Leader Election Recipe : https://curator.apache.org/curator-recipes/leader-election.html in my application.
Zookeeper version : 3.5.7
Curator : 4.0.1
Below are the sequence of steps:
1. Whenever my tomcat server instance is getting up, I create a single CuratorFramework instance(single instance per tomcat server) and start it :
CuratorFramework client = CuratorFrameworkFactory.newClient(connectionString, retryPolicy);
client.start();
if(!client.blockUntilConnected(10, TimeUnit.MINUTES)){
LOGGER.error("Zookeeper connection could not establish!");
throw new RuntimeException("Zookeeper connection could not establish");
}
Create an instance of LSAdapter and start it:
LSAdapter adapter = new LSAdapter(client, <some_metadata>);
adapter.start();
Below is my LSAdapter class :
public class LSAdapter extends LeaderSelectorListenerAdapter implements Closeable {
//<Class instance variables defined>
public LSAdapter(CuratorFramework client, <some_metadata>) {
leaderSelector = new LeaderSelector(client, <path_to_be_used_for_leader_election>, this);
leaderSelector.autoRequeue();
}
public void start() throws IOException {
leaderSelector.start();
}
#Override
public void close() throws IOException {
leaderSelector.close();
}
#Override
public void takeLeadership(CuratorFramework client) throws Exception {
final int waitSeconds = (int) (5 * Math.random()) + 1;
LOGGER.info(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
LOGGER.debug(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
while (true) {
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
//do leader tasks
} catch (InterruptedException e) {
LOGGER.error(name + " was interrupted.");
//cleanup
Thread.currentThread().interrupt();
} finally {
}
}
}
}
When server instance is getting down, close LSAdapter instance(which application is using) and close CuratorFramework client created
CloseableUtils.closeQuietly(lsAdapter);
curatorFrameworkClient.close();
The issue I am facing is that at times, when server is restarted, no leader gets elected. I checked that by tracing the log inside takeLeadership(). I have two tomcat server instances with above code, connecting to same zookeeper quorum and most of the times one of the instance becomes leader but when this issue happens, both of them becomes follower. Please suggest what am I doing wrong.
As I answered on Curator's Jira, you are swallowing the interrupted exception. When you get InterruptedException you must exit your takeLeadership(). In your code example, you are merely resetting the interrupted state and continuing the loop - this will cause an infinite loop of interrupted exceptions, btw. After calling Thread.currentThread().interrupt(); you should exit the while loop.
When I run the following snippet I don't see the backpressure.
public static void main(String[] args) throws InterruptedException {
MyFileProcessor pro = new MyFileProcessor();
Timer t = new Timer();
t.start();
Disposable x = pro
.generateFlowable(
new File("path\\to\\file.raw"))
.subscribeOn(Schedulers.io(), false).observeOn(Schedulers.io()).map(y -> {
System.out.println(Thread.currentThread().getName() + " xxx");
return y;
})
.subscribe(onNext -> {
System.out.println(Thread.currentThread().getName() + " " + new String(onNext));
Thread.sleep(100);
}, Throwable::printStackTrace, () -> {
System.out.println("Done");
t.end();
System.out.println(t.getTotalTime());
});
Thread.sleep(1000000000);
}
When I run the class above I get an alternating lines of
RxCachedThreadScheduler-1 xxx
RxCachedThreadScheduler-1 Line1
....
Its using the same thread.
Now when I move the observeOn to just before the subscribe, I see a bunch of
RxCachedThreadScheduler-1 xxx
Followed by a bunch of
RxCachedThreadScheduler-1 Line1
I am assuming this is back pressure but still the thread used is the same.
Why am I seeing this behavior?
Why is only one thread being utilized?
There is no operator as such for the observeOn to operate on, so why am I seeing this behavior?
[edit]
public Flowable<byte[]> generateFlowable(File file) {
return Flowable.generate(() -> new BufferedInputStream(new FileInputStream(file)), (bufferedIs, output) -> {
try {
byte[] data = getMessageRawData(bufferedIs);
if (data != null)
output.onNext(data);
else
output.onComplete();
}
catch (Exception e) {
output.onError(e);
}
return bufferedIs;
}, bufferedIs -> {
try {
bufferedIs.close();
}
catch (IOException ex) {
RxJavaPlugins.onError(ex);
}
});
}
Why is only one thread being utilized?
Works correctly because you check the running thread after observeOn and thus you are supposed to see the same thread there and below, no matter what happens above it. subscribeOn affects generateFlowable where, I suppose, you don't print the current thread and thus you don't see it runs on a different IO thread.
Now when I move the observeOn to just before the subscribe
There shouldn't be any difference unless something odd happens in generateFlowable.
I'm fairly new to RxJava and struggling with an use case that seems quite common to me :
Gather multiple requests from different parts of the application, aggregate them, make a single resource call and dispatch the results to each subscriber.
I've tried a lot of different approaches, using subjects, connectable observables, deferred observables... none did the trick so far.
I was quite optimistic about this approach but turns out it fails just like the others :
//(...)
static HashMap<String, String> requests = new HashMap<>();
//(...)
#Test
public void myTest() throws InterruptedException {
TestScheduler scheduler = new TestScheduler();
Observable<String> interval = Observable.interval(10, TimeUnit.MILLISECONDS, scheduler)
.doOnSubscribe(() -> System.out.println("new subscriber!"))
.doOnUnsubscribe(() -> System.out.println("unsubscribed"))
.filter(l -> !requests.isEmpty())
.doOnNext(aLong -> System.out.println(requests.size() + " requests to send"))
.flatMap(aLong -> {
System.out.println("requests " + requests);
return Observable.from(requests.keySet()).take(10).distinct().toList();
})
.doOnNext(strings -> System.out.println("calling aggregate for " + strings + " (from " + requests + ")"))
.flatMap(Observable::from)
.doOnNext(s -> {
System.out.println("----");
System.out.println("removing " + s);
requests.remove(s);
})
.doOnNext(s -> System.out.println("remaining " + requests));
TestSubscriber<String> ts1 = new TestSubscriber<>();
TestSubscriber<String> ts2 = new TestSubscriber<>();
TestSubscriber<String> ts3 = new TestSubscriber<>();
TestSubscriber<String> ts4 = new TestSubscriber<>();
Observable<String> defer = buildObservable(interval, "1");
defer.subscribe(ts1);
Observable<String> defer2 = buildObservable(interval, "2");
defer2.subscribe(ts2);
Observable<String> defer3 = buildObservable(interval, "3");
defer3.subscribe(ts3);
scheduler.advanceTimeBy(200, TimeUnit.MILLISECONDS);
Observable<String> defer4 = buildObservable(interval, "4");
defer4.subscribe(ts4);
scheduler.advanceTimeBy(100, TimeUnit.MILLISECONDS);
ts1.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts2.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts3.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts4.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts1.assertValue("1");
ts2.assertValue("2"); //fails (test stops here)
ts3.assertValue("3"); //fails
ts4.assertValue("4"); //fails
}
public Observable<String> buildObservable(Observable<String> interval, String key) {
return Observable.defer(() -> {
System.out.printf("creating observable for key " + key);
return Observable.create(subscriber -> {
requests.put(key, "xxx");
interval.doOnNext(s -> System.out.println("filtering : key/val " + key + "/" + s))
.filter(s1 -> s1.equals(key))
.doOnError(subscriber::onError)
.subscribe(s -> {
System.out.println("intern " + s);
subscriber.onNext(s);
subscriber.onCompleted();
subscriber.unsubscribe();
});
});
}
)
;
}
Output :
creating observable for key 1new subscriber!
creating observable for key 2new subscriber!
creating observable for key 3new subscriber!
3 requests to send
requests {3=xxx, 2=xxx, 1=xxx}
calling aggregate for [3, 2, 1] (from {3=xxx, 2=xxx, 1=xxx})
----
removing 3
remaining {2=xxx, 1=xxx}
filtering : key/val 1/3
----
removing 2
remaining {1=xxx}
filtering : key/val 1/2
----
removing 1
remaining {}
filtering : key/val 1/1
intern 1
creating observable for key 4new subscriber!
1 requests to send
requests {4=xxx}
calling aggregate for [4] (from {4=xxx})
----
removing 4
remaining {}
filtering : key/val 1/4
The test fails at the second assertion (ts2 not receiving "2")
Turns out the pseudo-aggregation works as expected, but the values are not dispatched to the corresponding subscribers (only the first subscriber receives it)
Any idea why?
Also, I feel like I'm missing the obvious here. If you think of a better approach, I'm more than willing to hear about it.
EDIT : Adding some context regarding what I want to achieve.
I have a REST API exposing data via multiple endpoints (eg. user/{userid}). This API also makes it possible to aggregate requests (eg. user/user1 & user/user2) and get the corresponding data in one single http request instead of two.
My goal is to be able to automatically aggregate the requests made from different parts of my application in a given time frame (say 10ms) with a max batch size (say 10), make an aggregate http request, then dispatch the results to the corresponding subscribers.
Something like this :
// NOTE: those calls can be fired from anywhere in the app, and randomly combined. The timing and order is completely unpredictable
//ts : 0ms
api.call(userProfileRequest1).subscribe(this::show);
api.call(userProfileRequest2).subscribe(this::show);
//--> after 10ms, should fire one single http aggregate request with those 2 calls, map the response items & send them to the corresponding subscribers (that will show the right user profile)
//ts : 20ms
api.call(userProfileRequest3).subscribe(this::show);
api.call(userProfileRequest4).subscribe(this::show);
api.call(userProfileRequest5).subscribe(this::show);
api.call(userProfileRequest6).subscribe(this::show);
api.call(userProfileRequest7).subscribe(this::show);
api.call(userProfileRequest8).subscribe(this::show);
api.call(userProfileRequest9).subscribe(this::show);
api.call(userProfileRequest10).subscribe(this::show);
api.call(userProfileRequest11).subscribe(this::show);
api.call(userProfileRequest12).subscribe(this::show);
//--> should fire a single http aggregate request RIGHT AWAY (we hit the max batch size) with the 10 items, map the response items & send them to the corresponding subscribers (that will show the right user profile)
The test code I wrote (with just strings) and pasted at the top of this question is meant to be a proof of concept for my final implementation.
Your Observable is not well constructed
public Observable<String> buildObservable(Observable<String> interval, String key) {
return interval.doOnSubscribe(() -> System.out.printf("creating observable for key " + key))
.doOnSubscribe(() -> requests.put(key, "xxx"))
.doOnNext(s -> System.out.println("filtering : key/val " + key + "/" + s))
.filter(s1 -> s1.equals(key));
}
When you subsribe in a subscriber : it's offen a bad design.
I'm not shure to understand what you want to achieve, but I think my code should be pretty close to yours.
Please note that, for all side effects, I use doMethods (like doOnNext, doOnSubscribe) to show I explicitly show that I want to do a side effect.
I replace your defer call by returning directly the interval : as you want to emit all interval events in your custom observable build in your defer call, returning the interval observable is better.
Please note, that you filtering your interval Observable :
Observable<String> interval = Observable.interval(10, TimeUnit.MILLISECONDS, scheduler)
.filter(l -> !requests.isEmpty()).
// ...
So, as soon you'll put something into requests map, interval will stop emmiting.
I don't understand what you wants to achieve with the request map, but please note that you may want to avoid side effects, and updating this map is clearly a side effect.
Update regarding comments
You may want to use the buffer operator to aggregate request, and then perform request in a bulk way :
PublishSubject<String> subject = PublishSubject.create();
TestScheduler scheduler = new TestScheduler();
Observable<Pair> broker = subject.buffer(100, TimeUnit.MILLISECONDS, 10, scheduler)
.flatMapIterable(list -> list) // you can bulk calls here
.flatMap(id -> Observable.fromCallable(() -> api.call(id)).map(response -> Pair.of(id, response)));
TestSubscriber<Object> ts1 = new TestSubscriber<>();
TestSubscriber<Object> ts2 = new TestSubscriber<>();
TestSubscriber<Object> ts3 = new TestSubscriber<>();
TestSubscriber<Object> ts4 = new TestSubscriber<>();
broker.filter(pair -> pair.id.equals("1")).take(1).map(pair -> pair.response).subscribe(ts1);
broker.filter(pair -> pair.id.equals("2")).take(1).map(pair -> pair.response).subscribe(ts2);
broker.filter(pair -> pair.id.equals("3")).take(1).map(pair -> pair.response).subscribe(ts3);
broker.filter(pair -> pair.id.equals("4")).take(1).map(pair -> pair.response).subscribe(ts4);
subject.onNext("1");
subject.onNext("2");
subject.onNext("3");
scheduler.advanceTimeBy(1, TimeUnit.SECONDS);
ts1.assertValue("resp1");
ts2.assertValue("resp2");
ts3.assertValue("resp3");
ts4.assertNotCompleted();
subject.onNext("4");
scheduler.advanceTimeBy(1, TimeUnit.SECONDS);
ts4.assertValue("resp4");
ts4.assertCompleted();
If you want to perform network request collapsin, you may want to check Hystrix : https://github.com/Netflix/Hystrix
I am having an issue using RxJava backpressure. Basically, I have one producer that produces more items than the consumer can handle and want to have some buffer queue to handle only items that I can deal with, and request when I complete some of them, like in this example:
object Tester extends App {
Observable[Int] { subscriber =>
(1 to 100).foreach { e =>
subscriber.onNext(e)
Thread.sleep(100)
println("produced " + e + "(" + Thread.currentThread().getName + Thread.currentThread().getId + ")")
}
}
.subscribeOn(NewThreadScheduler())
.observeOn(ComputationScheduler())
.subscribe(
new Subscriber[Int]() {
override def onStart(): Unit = {
request(2)
}
override def onNext(value: Int): Unit = {
Thread.sleep(1000)
println("consumed " + value + "(" + Thread.currentThread().getName + Thread.currentThread().getId + ")")
request(1)
}
override def onCompleted(): Unit = {
println("finished ")
}
})
Thread.sleep(100000)
I expect to get output like
produced 1(RxNewThreadScheduler-113)
consumed 1(RxComputationThreadPool-312)
produced 2(RxNewThreadScheduler-113)
consumed 2(RxComputationThreadPool-312)
produced 3(RxNewThreadScheduler-113)
consumed 3(RxComputationThreadPool-312)
......
but instead, I get
produced 1(RxNewThreadScheduler-113)
produced 2(RxNewThreadScheduler-113)
produced 3(RxNewThreadScheduler-113)
produced 4(RxNewThreadScheduler-113)
produced 5(RxNewThreadScheduler-113)
produced 6(RxNewThreadScheduler-113)
produced 7(RxNewThreadScheduler-113)
produced 8(RxNewThreadScheduler-113)
produced 9(RxNewThreadScheduler-113)
consumed 1(RxComputationThreadPool-312)
produced 10(RxNewThreadScheduler-113)
produced 11(RxNewThreadScheduler-113)
produced 12(RxNewThreadScheduler-113)
produced 13(RxNewThreadScheduler-113)
.....
When you implement your Observable using Observable.create it is up to you to manage backpressure (which is not a simple task). Here your observable simply ignores reactive pull requests (you just iterate, not waiting for a request to call the iterator's next() method).
If possible, try to use Observable factory methods like range, etc... and composing using map/flatMap to obtain the desired source Observable, as those will respect backpressure.
Otherwise, have a look at the experimental utility classes introduced recently for correctly managing backpressure in a OnSubscribe implementation: AsyncOnSubscribe and SyncOnSubscribe.
Here is a quite naïve example:
Observable<Integer> backpressuredObservable =
Observable.create(SyncOnSubscribe.createStateful(
() -> 0, //starts the state at 0
(state, obs) -> {
int i = state++; //first i is 1 as desired
obs.next(i);
if (i == 100) { //maximum is 100, stop there
obs.onCompleted();
}
return i; //update the state
}));