Threads still alive after subscription - reactive-programming

We are currently migrating from RxJava2 to Project Reactor. To parallize work, we are creating a new Thread via Schedulers.newThread() for every parallel HTTP request. We cannot reuse Threads because Spring's SecurityContext is bound to ThreadLocal.
Recently we ran into the error, that after some time the JVM runs into OutOfMemory Exceptions, because the created Threads were not disposed leading to thousands of parked Threads, with RxJava the Threads we diposed after a successful HTTP request.
For RxJava the code would be the following, showing no active additional Threads when having a Breakpoint in the last line.
Observable<String> deferred = Observable.fromCallable(() -> "1")
.subscribeOn(io.reactivex.schedulers.Schedulers.newThread());
Observable<String> deferred2 = Observable.fromCallable(() -> "2")
.subscribeOn(io.reactivex.schedulers.Schedulers.newThread());
System.out.println("obs: " + deferred.blockingSingle());
System.out.println("obs2: " + deferred2.blockingSingle());
However for Project Reactor, both Threads are still alive after both stdouts.
Mono<String> mono1 = Mono.fromSupplier(() -> "1").subscribeOn(Schedulers.newSingle("single"));
Mono<String> mono2 = Mono.fromSupplier(() -> "1").subscribeOn(Schedulers.newSingle("single"));
System.out.println("Mono1: " + mono1.block());
System.out.println("Mono2: " + mono2.block());
A solution for this would be to tear down the scheduler manually in onFinally:
Scheduler newSingle = Schedulers.newSingle("single");
Mono<String> doFinally = Mono.defer(() -> Mono.fromSupplier(() -> "1")).subscribeOn(newSingle).doFinally(s -> {
newSingle.dispose();
});
However, is this really necessary, or is there a way to establish the same behavior as in RxJava2?

Related

How to create a Bulk consumer in Spring webflux and Kafka

I need to poll kafka and process events in bulk. In Reactor kafka, since its a steaming API, I am getting events as stream. Is there a way to combine and get a fixed max size of events.
This is what I doing currently.
final Flux<Flux<ConsumerRecord<String, String>>> receive = KafkaReceiver.create(eventReceiverOptions)
.receiveAutoAck();
receive
.concatMap(r -> r)
.doOnEach(listSignal -> log.info("got one message"))
.map(consumerRecords -> consumerRecords.value())
.collectList()
.flatMap(strings -> {
log.info("Read messages of size {}", strings.size());
return processBulkMessage(strings)
.doOnSuccess(aBoolean -> log.info("Processed records"))
.thenReturn(strings);
}).subscribe();
But code just hangs after collectList and never goes to the last flatMap.
Thanks In advance.
You just do a "flattening" with your plain .concatMap(r -> r) therefore you fully eliminate what is there is a batching originally built by that receiveAutoAck(). To have a stream of lists for your processBulkMessage() to process consider to move all the batch logic into that concatMap():
.concatMap(batch -> batch
.doOnEach(listSignal -> log.info("got one message"))
.map(ConsumerRecord::value)
.collectList())
.flatMap(strings -> {

subscribeOn(Schedulers.parallel()) is not working

I am learning reactor core and following this https://www.baeldung.com/reactor-core
ArrayList<Integer> arrList = new ArrayList<Integer>();
System.out.println("Before: " + arrList);
Flux.just(1, 2, 3, 4)
.log()
.map(i -> i * 2)
.subscribeOn(Schedulers.parallel())
.subscribe(arrList::add);
System.out.println("After: " + arrList);
when I execute above line of code, gives out.
Before: []
[DEBUG] (main) Using Console logging
After: []
Above lines of code should start execution in another thread but it is not working at all.
Can somebody help me on this ?
As mentioned in the Reactor documentation for the various subscribe methods:
Keep in mind that since the sequence can be asynchronous, this will
immediately return control to the calling thread. This can give the
impression the consumer is not invoked when executing in a main thread
or a unit test for instance.
This means that the end of the main method is reached, and thus the main thread exits before any thread is able to subscribe to the Reactive chain, as mentioned by Piotr.
What you want to do is wait till the entire chain completes before printing the contents of the array.
The naive way of doing this is:
ArrayList<Integer> arrList = new ArrayList<>();
System.out.println("Before: " + arrList);
Flux.just(1, 2, 3, 4)
.log()
.map(i -> i * 2)
.subscribeOn(Schedulers.parallel())
.doOnNext(arrList::add)
.blockLast();
System.out.println("After: " + arrList);
Here, you block execution on the main thread until the last element on the Flux is processed. Thus the last System.out will not execute until your ArrayList is fully populated.
Remember that the way code will run in a Console application vs a server environment like Netty, is a little different. The only way to make a Console application wait for all subscriptions to kick in, is to block.
But blocking is not permitted on parallel threads. So this approach would not work in, say, a Netty environment. There your server would be running until explicitly shutdown, and so a subscribe would be fine.
However, in the above code snippet you are blocking not just to prevent the application from exiting, but also to wait before you read the data that has been populated.
An improvement to the above code would be as follows:
ArrayList<Integer> arrList = new ArrayList<>();
System.out.println("Before: " + arrList);
Flux.just(1, 2, 3, 4)
.log()
.map(i -> i * 2)
.subscribeOn(Schedulers.parallel())
.doOnNext(arrList::add)
.doOnComplete(() -> System.out.println("After: " + arrList))
.blockLast();
Even here, the doOnComplete accesses data from outside the reactive chain. To prevent this, you would collect the elements of the Flux in the chain itself, like this:
System.out.println("Before.");
Flux.just(1, 2, 3, 4)
.log()
.map(i -> i * 2)
.subscribeOn(Schedulers.parallel())
.collectList()
.doOnSuccess(list -> System.out.println("After: " + list))
.block();
Again, remember that when running in Netty, (say a Spring Webflux application), the above code would end in a subscribe().
Note, though, that switching from a Flux to a List (or any Collection) means you are switching out of the reactive paradigm into imperative programming. You should be able to implement any functionality within the Reactive paradigm itself.
I think there is some confusion. When you call subscribeOn(Schedulers.parallel()). You specify that you want to receive items on the different thread. Also you have to slow down your code so the subscribe cen actually kick in (that is why I added Thread.sleep(100)). If you run the code that i have passed it works. You see there is no magic synchronization mechanism in reactor.
ArrayList<Integer> arrList = new ArrayList<Integer>();
Flux.just(1, 2, 3, 4)
.log()
.map(i -> i * 2)
.subscribeOn(Schedulers.parallel())
.subscribe(
t -> {
System.out.println(t + " thread id: " + Thread.currentThread().getId());
arrList.add(t);
}
);
System.out.println("size of arrList(before the wait): " + arrList.size());
System.out.println("Thread id: "+ Thread.currentThread().getId() + ": id of main thread ");
Thread.sleep(100);
System.out.println("size of arrList(after the wait): " + arrList.size());
If you want to add your items to the list in parallel reactor is not a good choice. Better to use parallel streams in java 8.
List<Integer> collect = Stream.of(1, 2, 3, 4)
.parallel()
.map(i -> i * 2)
.collect(Collectors.toList());
That tutorial you posted is not very precise when it comes to concurrency part. To the author credit he/she says that more articles is to come. But I don't think that should post that particular example at all as it creates confusion. I suggest not trusting resources on the internet that much :)

How to dispatch incoming NetSocket handlers into different event loop threads?

I'm trying to use Vertx to implement a TCP server, accepting incoming connections and then handling different sockets. Since each socket can be handled independently, the handlers belonging to different sockets are supposed to run in different event loop threads concurrently.
According to Vert.x document,
Standard verticles are assigned an event loop thread when they are created and the start method is called with that event loop. When you call any other methods that takes a handler on a core API from an event loop then Vert.x will guarantee that those handlers, when called, will be executed on the same event loop.
I think, this code snippet can print different thread names:
Vertx vertx = Vertx.vertx(); // The number of event loop threads is 2*core.
vertx.createNetServer().connectHandler(socket -> {
vertx.deployVerticle(new AbstractVerticle() {
#Override
public void start() throws Exception {
socket.handler(buffer -> {
log.trace(socket.toString() + ": Socket Message");
socket.close();
});
}
});
}).listen(port);
But unfortunately, all handlers were located in the same thread.
23:59:42.359 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#253fa4f2: Socket Message
23:59:42.364 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#465f1533: Socket Message
23:59:42.365 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#5ab8dac: Socket Message
23:59:42.366 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#5fc72993: Socket Message
23:59:42.367 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#38ee66d7: Socket Message
23:59:42.368 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#6a60a74: Socket Message
23:59:42.369 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#5f3921e1: Socket Message
23:59:42.370 [vert.x-eventloop-thread-1] TRACE Server - io.vertx.core.net.impl.NetSocketImpl#39d41024: Socket Message
... more than 100+ lines ...
An opposite example is similar to this echo server written in BOOST.ASIO. The handlers run in different event loop threads if a thread pool is used to execute io_service::run().
So, my question is how to run these handlers concurrently?
Actually, you do something entirely different from what you intend.
Each time you receive connection on your socket, you launch a new actor,
Simplest way to prove that:
Vertx vertx = Vertx.vertx(); // The number of event loop threads is 2*core.
vertx.createHttpServer().requestHandler(request -> {
vertx.deployVerticle(new AbstractVerticle() {
String uuid = UUID.randomUUID().toString(); // Some random unique number
#Override
public void start() throws Exception {
request.response().end(uuid + " " + Thread.currentThread().getName());
}
});
}).listen(8888);
vertx.setPeriodic(1000, r -> {
System.out.println(vertx.deploymentIDs().size()); // Print verticles count every second
});
I'm using httpServer just because it's easier to check in browser.
As wrong as it may be, you'll still see that you should receive different threads:
fe931b18-89cc-4c6a-9d6a-8565bb1f1c12 vert.x-eventloop-thread-9
277330da-4df8-4e91-bd8f-82c0f62156d0 vert.x-eventloop-thread-11
bbd3207c-80a4-41d8-9be5-b40727badc84 vert.x-eventloop-thread-13
Now to how you should do it:
// We create 10 workers
for (int i = 0; i < 10; i++) {
vertx.deployVerticle(new AbstractVerticle() {
#Override
public void start() {
vertx.eventBus().consumer("processMessage", (request) -> {
// Do something smart
// Reply
request.reply("I'm on thread " + Thread.currentThread().getName());
});
}
});
}
// This is your handler
vertx.createHttpServer().requestHandler(request -> {
// Only one server, that should dispatch events to workers as quickly as possible
vertx.eventBus().send("processMessage", null, (response) -> {
if (response.succeeded()) {
request.response().end("Request :" + response.result().body().toString());
}
// Handle errors
});
}).listen(8888);
vertx.setPeriodic(1000, r -> {
System.out.println(vertx.deploymentIDs().size()); // Notice that number of workers doesn't change
});
It's not possible to determine which event loop Vert.x will assign to each of your verticles without more details (number of cores of your test machines for example).
Anyway, it is not a good idea to deploy a verticle per incoming connection. Verticles are units of deployment in Vert.x. You would typically create one per "functionality".
Back to your use case, the purpose of event driven programming is precisely to avoid using a thread per connection. You can handle a lot of concurrent connections with a single event loop. If you have multiple cores on your machine then you can deploy multiple instances of your verticle to use them all (1 event loop per core).
int processors = Runtime.getRuntime().availableProcessors();
Vertx vertx = Vertx.vertx();
vertx.deployVerticle(TCPServerVerticle.class.getName(), new DeploymentOptions().setInstances(processors));
public class TCPServerVerticle extends AbstractVerticle {
#Override
public void start(Future<Void> startFuture) throws Exception {
vertx.createNetServer().connectHandler(socket -> {
socket.handler(buffer -> {
log.trace(socket.toString() + ": Socket Message");
socket.close();
});
}).listen(port, ar -> {
if (ar.succeeded()) {
startFuture.complete();
} else {
startFuture.fail(ar.cause());
}
});
}
}
With Vertx TCP server sharing the connect handlers will be called on a round-robin fashion.

Aggregate resource requests & dispatch responses to each subscriber

I'm fairly new to RxJava and struggling with an use case that seems quite common to me :
Gather multiple requests from different parts of the application, aggregate them, make a single resource call and dispatch the results to each subscriber.
I've tried a lot of different approaches, using subjects, connectable observables, deferred observables... none did the trick so far.
I was quite optimistic about this approach but turns out it fails just like the others :
//(...)
static HashMap<String, String> requests = new HashMap<>();
//(...)
#Test
public void myTest() throws InterruptedException {
TestScheduler scheduler = new TestScheduler();
Observable<String> interval = Observable.interval(10, TimeUnit.MILLISECONDS, scheduler)
.doOnSubscribe(() -> System.out.println("new subscriber!"))
.doOnUnsubscribe(() -> System.out.println("unsubscribed"))
.filter(l -> !requests.isEmpty())
.doOnNext(aLong -> System.out.println(requests.size() + " requests to send"))
.flatMap(aLong -> {
System.out.println("requests " + requests);
return Observable.from(requests.keySet()).take(10).distinct().toList();
})
.doOnNext(strings -> System.out.println("calling aggregate for " + strings + " (from " + requests + ")"))
.flatMap(Observable::from)
.doOnNext(s -> {
System.out.println("----");
System.out.println("removing " + s);
requests.remove(s);
})
.doOnNext(s -> System.out.println("remaining " + requests));
TestSubscriber<String> ts1 = new TestSubscriber<>();
TestSubscriber<String> ts2 = new TestSubscriber<>();
TestSubscriber<String> ts3 = new TestSubscriber<>();
TestSubscriber<String> ts4 = new TestSubscriber<>();
Observable<String> defer = buildObservable(interval, "1");
defer.subscribe(ts1);
Observable<String> defer2 = buildObservable(interval, "2");
defer2.subscribe(ts2);
Observable<String> defer3 = buildObservable(interval, "3");
defer3.subscribe(ts3);
scheduler.advanceTimeBy(200, TimeUnit.MILLISECONDS);
Observable<String> defer4 = buildObservable(interval, "4");
defer4.subscribe(ts4);
scheduler.advanceTimeBy(100, TimeUnit.MILLISECONDS);
ts1.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts2.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts3.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts4.awaitTerminalEvent(1, TimeUnit.SECONDS);
ts1.assertValue("1");
ts2.assertValue("2"); //fails (test stops here)
ts3.assertValue("3"); //fails
ts4.assertValue("4"); //fails
}
public Observable<String> buildObservable(Observable<String> interval, String key) {
return Observable.defer(() -> {
System.out.printf("creating observable for key " + key);
return Observable.create(subscriber -> {
requests.put(key, "xxx");
interval.doOnNext(s -> System.out.println("filtering : key/val " + key + "/" + s))
.filter(s1 -> s1.equals(key))
.doOnError(subscriber::onError)
.subscribe(s -> {
System.out.println("intern " + s);
subscriber.onNext(s);
subscriber.onCompleted();
subscriber.unsubscribe();
});
});
}
)
;
}
Output :
creating observable for key 1new subscriber!
creating observable for key 2new subscriber!
creating observable for key 3new subscriber!
3 requests to send
requests {3=xxx, 2=xxx, 1=xxx}
calling aggregate for [3, 2, 1] (from {3=xxx, 2=xxx, 1=xxx})
----
removing 3
remaining {2=xxx, 1=xxx}
filtering : key/val 1/3
----
removing 2
remaining {1=xxx}
filtering : key/val 1/2
----
removing 1
remaining {}
filtering : key/val 1/1
intern 1
creating observable for key 4new subscriber!
1 requests to send
requests {4=xxx}
calling aggregate for [4] (from {4=xxx})
----
removing 4
remaining {}
filtering : key/val 1/4
The test fails at the second assertion (ts2 not receiving "2")
Turns out the pseudo-aggregation works as expected, but the values are not dispatched to the corresponding subscribers (only the first subscriber receives it)
Any idea why?
Also, I feel like I'm missing the obvious here. If you think of a better approach, I'm more than willing to hear about it.
EDIT : Adding some context regarding what I want to achieve.
I have a REST API exposing data via multiple endpoints (eg. user/{userid}). This API also makes it possible to aggregate requests (eg. user/user1 & user/user2) and get the corresponding data in one single http request instead of two.
My goal is to be able to automatically aggregate the requests made from different parts of my application in a given time frame (say 10ms) with a max batch size (say 10), make an aggregate http request, then dispatch the results to the corresponding subscribers.
Something like this :
// NOTE: those calls can be fired from anywhere in the app, and randomly combined. The timing and order is completely unpredictable
//ts : 0ms
api.call(userProfileRequest1).subscribe(this::show);
api.call(userProfileRequest2).subscribe(this::show);
//--> after 10ms, should fire one single http aggregate request with those 2 calls, map the response items & send them to the corresponding subscribers (that will show the right user profile)
//ts : 20ms
api.call(userProfileRequest3).subscribe(this::show);
api.call(userProfileRequest4).subscribe(this::show);
api.call(userProfileRequest5).subscribe(this::show);
api.call(userProfileRequest6).subscribe(this::show);
api.call(userProfileRequest7).subscribe(this::show);
api.call(userProfileRequest8).subscribe(this::show);
api.call(userProfileRequest9).subscribe(this::show);
api.call(userProfileRequest10).subscribe(this::show);
api.call(userProfileRequest11).subscribe(this::show);
api.call(userProfileRequest12).subscribe(this::show);
//--> should fire a single http aggregate request RIGHT AWAY (we hit the max batch size) with the 10 items, map the response items & send them to the corresponding subscribers (that will show the right user profile)
The test code I wrote (with just strings) and pasted at the top of this question is meant to be a proof of concept for my final implementation.
Your Observable is not well constructed
public Observable<String> buildObservable(Observable<String> interval, String key) {
return interval.doOnSubscribe(() -> System.out.printf("creating observable for key " + key))
.doOnSubscribe(() -> requests.put(key, "xxx"))
.doOnNext(s -> System.out.println("filtering : key/val " + key + "/" + s))
.filter(s1 -> s1.equals(key));
}
When you subsribe in a subscriber : it's offen a bad design.
I'm not shure to understand what you want to achieve, but I think my code should be pretty close to yours.
Please note that, for all side effects, I use doMethods (like doOnNext, doOnSubscribe) to show I explicitly show that I want to do a side effect.
I replace your defer call by returning directly the interval : as you want to emit all interval events in your custom observable build in your defer call, returning the interval observable is better.
Please note, that you filtering your interval Observable :
Observable<String> interval = Observable.interval(10, TimeUnit.MILLISECONDS, scheduler)
.filter(l -> !requests.isEmpty()).
// ...
So, as soon you'll put something into requests map, interval will stop emmiting.
I don't understand what you wants to achieve with the request map, but please note that you may want to avoid side effects, and updating this map is clearly a side effect.
Update regarding comments
You may want to use the buffer operator to aggregate request, and then perform request in a bulk way :
PublishSubject<String> subject = PublishSubject.create();
TestScheduler scheduler = new TestScheduler();
Observable<Pair> broker = subject.buffer(100, TimeUnit.MILLISECONDS, 10, scheduler)
.flatMapIterable(list -> list) // you can bulk calls here
.flatMap(id -> Observable.fromCallable(() -> api.call(id)).map(response -> Pair.of(id, response)));
TestSubscriber<Object> ts1 = new TestSubscriber<>();
TestSubscriber<Object> ts2 = new TestSubscriber<>();
TestSubscriber<Object> ts3 = new TestSubscriber<>();
TestSubscriber<Object> ts4 = new TestSubscriber<>();
broker.filter(pair -> pair.id.equals("1")).take(1).map(pair -> pair.response).subscribe(ts1);
broker.filter(pair -> pair.id.equals("2")).take(1).map(pair -> pair.response).subscribe(ts2);
broker.filter(pair -> pair.id.equals("3")).take(1).map(pair -> pair.response).subscribe(ts3);
broker.filter(pair -> pair.id.equals("4")).take(1).map(pair -> pair.response).subscribe(ts4);
subject.onNext("1");
subject.onNext("2");
subject.onNext("3");
scheduler.advanceTimeBy(1, TimeUnit.SECONDS);
ts1.assertValue("resp1");
ts2.assertValue("resp2");
ts3.assertValue("resp3");
ts4.assertNotCompleted();
subject.onNext("4");
scheduler.advanceTimeBy(1, TimeUnit.SECONDS);
ts4.assertValue("resp4");
ts4.assertCompleted();
If you want to perform network request collapsin, you may want to check Hystrix : https://github.com/Netflix/Hystrix

Rate-limiting multiple observables created by multiple threads using RxJava

I'm developing a simple REST application that leverages on RxJava to send requests to a remote server (1). For each incoming request to the REST API a request is sent (using RxJava and RxNetty) to (1). Everything is working fine but now I have a new use case:
In order to not bombard (1) with too many request I need to implement rate limiting. One way to solve this (I assume) would be to add each Observable created when sending a request to (1) into another Observable (2) that does the actual rate-limiting. (2) will then act more or less like a queue and process the outbound requests as fast as possible (but not faster than the rate limit). Here's some pseudo-like code:
Observable<MyResponse> r1 = createRequestToExternalServer() // In thread 1
Observable<MyResponse> r2 = createRequestToExternalServer() // In thread 2
// Somehow send r1 and r2 to the "rate limiter" observable, (2)
rateLimiterObservable.sample(1 / rate, TimeUnit.MILLISECONDS)
How would I use Rx/RxJava to solve this?
I'd use a hot timer along with an atomic counter that keeps track the remaining connection for the given duration:
int rate = 5;
long interval = 1000;
AtomicInteger remaining = new AtomicInteger(rate);
ConnectableObservable<Long> timer = Observable
.interval(interval, TimeUnit.MILLISECONDS)
.doOnNext(e -> remaining.set(rate))
.publish();
timer.connect();
Observable<Integer> networkCall = Observable.just(1).delay(150, TimeUnit.MILLISECONDS);
Observable<Integer> limitedNetworkCall = Observable
.defer(() -> {
if (remaining.getAndDecrement() != 0) {
return networkCall;
}
return Observable.error(new RuntimeException("Rate exceeded"));
});
Observable.interval(100, TimeUnit.MILLISECONDS)
.flatMap(t -> limitedNetworkCall.onErrorReturn(e -> -1))
.take(20)
.toBlocking()
.forEach(System.out::println);