KafkaReceiver not running parallel for each partition? - apache-kafka

I'm trying to do read the messages concurrently with partition-based Ordering. I referred to the below example and I added delay (10sec) to mimic the workload. but I have noticed even though it runs multiple threads, it's not running concurrently. It only runs one message by message.
My topic consists of 20 partitions so I'm expecting to run at least 20 threads parallel and wait 10 sec (delay I put) and then run again and again.
enter link description here
private fun onPayment(it: ReceiverRecord<String, Any>): Mono<ReceiverRecord<String, Any>> {
println(
"SMF message [${Thread.currentThread().name}], Offset - ${
it.receiverOffset().offset()
} , Partition ${
it.receiverOffset().topicPartition()
}, Value - ${it.value()} ${SimpleDateFormat("hh:mm:ss").format(Date())}"
)
return Mono.just(it).delayElement(Duration.ofMillis(10000))
}
fun doSubscribe(consumer: (ReceiverRecord<String, Any>) -> Mono<ReceiverRecord<String, Any>>) {
val reOption = KafkaConfig().kafkaConsumerConfig().subscription(Collections.singleton("payTopic"))
.commitInterval(Duration.ZERO)
.commitBatchSize(0)
val receiver = KafkaReceiver.create(reOption)
Flux.defer(receiver::receive).groupBy { m ->
m.receiverOffset().topicPartition()
}.flatMap { grpFlux ->
grpFlux.publishOn(schd).concatMap { m ->
consumer(m).map {
m.receiverOffset().commit()
}
}
}.subscribe()
}
And its prints sequentially one by one and consume only from one partition.

Related

Rxjava takeuntil wait until first stream finishes

I got a simulator class that emit items, it can be controlled to start emitting by:
Simulator.emiting -> Observable, true -> started emitting, false, stopped emitting
The items are exposed: Simulator.items -> Observable
The items will be processed, processing happen much slower than emitting (processing potentially occurs on another thread)
I am trying to get an observable that signal when "emitting+processing" starts and ends, so:
from: start emitting , 1, 2, ,3, end emitting
to: start emitting and processing, 1------, 2-----, 3-----, end emitting and processing
how can I get the emitting+processing observable ? I tried using
simulator.items.map { process(it) }.takeUntil(simulator.emitting.filter { it == false }) but this will stop before processing finishes.
So it looks like this is a trivial problem, using zip operator
val stoppedEmitting = simulator.emitting.filter { it == false }
val emitted = simulator.items.takeUntil(stoppedEmitting )
val processed = emitted.map { item -> process(item) }
then "zip" op will wait until the last item get processed:
val processingFlow = emitted.zipWith(processed) { item, processedItem -> ... }
processing.subscribe { }

Where's the bottleneck when I wait for a Kafka message then return a value in Actix Web?

I am trying to communicate between 2 microservices written in Rust and Node.js using Kafka.
I'm using actix-web as web framework and rdkafka as Kafka client for Rust. On the Node.js side, it queries stuff from the database and returns it as JSON to the Rust server via Kafka.
The flow:
Request -> Actix Web -> Kafka -> Node -> Kafka -> Actix Web -> Response
The logic is the request hits an endpoint on Actix Web, then creates a message to request something to another micro-service and waits until it sends back (verify by Kafka message key), and returns it to the user as an HTTP response.
I got it to work, but the performance is very slow (I am stress-testing with wrk).
I'm not sure why it's performing slow but as I was digging down, I found that if I add a delay on the Node.js side for 5 seconds and I create 2 requests to actix-web where the requests are different by a second, it will respond with a 5 and 10-second delay.
The benchmark is around 3k requests per second, using the following command:
wrk http://localhost:8080 -d 20s -t 2 -c 200
This makes me guess that something might be blocking the thread for each request.
Here is the source code and the repo:
use std::{
sync::Arc,
time::{
Duration,
Instant
}
};
use actix_web::{
App,
HttpServer,
get,
rt,
web::Data
};
use futures::TryStreamExt;
use tokio::time::sleep;
use num_cpus;
use rand::{
distributions::Alphanumeric,
Rng
};
use rdkafka::{
ClientConfig,
Message,
consumer::{
Consumer,
StreamConsumer
},
producer::{
FutureProducer,
FutureRecord
}
};
const TOPIC: &'static str = "exp-queue_general-5";
#[derive(Clone)]
pub struct AppState {
pub producer: Arc<FutureProducer>,
pub receiver: flume::Receiver<String>
}
fn generate_key() -> String {
rand::thread_rng()
.sample_iter(&Alphanumeric)
.take(8)
.map(char::from)
.collect()
}
#[get("/")]
async fn landing(state: Data<AppState>) -> String {
let key = generate_key();
let t1 = Instant::now();
let producer = &state.producer;
let receiver = &state.receiver;
producer
.send(
FutureRecord::to(&format!("{}-forth", TOPIC))
.key(&key)
.payload("Hello From Rust"),
Duration::from_secs(8)
)
.await
.expect("Unable to send message");
println!("Producer take {} ms", t1.elapsed().as_millis());
let t2 = Instant::now();
let value = receiver
.recv()
.unwrap_or("".to_owned());
println!("Receiver take {} ms", t2.elapsed().as_millis());
println!("Process take {} ms\n", t1.elapsed().as_millis());
value
}
#[get("/status")]
async fn heartbeat() -> &'static str {
// ? Concurrency delay check
sleep(Duration::from_secs(1)).await;
"Working"
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
// ? Assume that the whole node is just Rust instance
let mut cpus = num_cpus::get() / 2 - 1;
if cpus < 1 {
cpus = 1;
}
println!("Cpus {}", cpus);
let producer: FutureProducer = ClientConfig::new()
.set("bootstrap.servers", "localhost:9092")
.set("linger.ms", "25")
.set("queue.buffering.max.messages", "1000000")
.set("queue.buffering.max.ms", "25")
.set("compression.type", "lz4")
.set("retries", "40000")
.set("retries", "0")
.set("message.timeout.ms", "8000")
.create()
.expect("Kafka config");
let (tx, rx) = flume::unbounded::<String>();
rt::spawn(async move {
let consumer: StreamConsumer = ClientConfig::new()
.set("bootstrap.servers", "localhost:9092")
.set("group.id", &format!("{}-back", TOPIC))
.set("queued.min.messages", "200000")
.set("fetch.error.backoff.ms", "250")
.set("socket.blocking.max.ms", "500")
.create()
.expect("Kafka config");
consumer
.subscribe(&vec![format!("{}-back", TOPIC).as_ref()])
.expect("Can't subscribe");
consumer
.stream()
.try_for_each_concurrent(
cpus,
|message| {
let txx = tx.clone();
async move {
let result = String::from_utf8_lossy(
message
.payload()
.unwrap_or("Error serializing".as_bytes())
).to_string();
txx.send(result).expect("Tx not sending");
Ok(())
}
}
)
.await
.expect("Error reading stream");
});
let state = AppState {
producer: Arc::new(producer),
receiver: rx
};
HttpServer::new(move || {
App::new()
.app_data(Data::new(state.clone()))
.service(landing)
.service(heartbeat)
})
.workers(cpus)
.bind("0.0.0.0:8080")?
.run()
.await
}
I found some solved issues on GitHub which recommended using actors instead which I also did as a separate branch.
This has worse performance than the main branch, performing around 200-300 requests per second.
I don't know where the bottleneck is or what's the thing that blocking the request.

RxJava2 : 2 separate observerable output and and merge output of same observables differ

Snippet1 , I can see the sysout from both subscribers.
Snippet2 , I dont see output from the second observable.
Why is the merge not working for me?
Snippet1
x = createQ2Flowable().subscribeOn(Schedulers.computation())
.observeOn(Schedulers.io())
.filter(predicate -> !predicate.toString().contains("<log realm=\"\""))
.subscribe(onNext -> System.out.println("Q2->" + onNext));
y = createMetricsFlowable().subscribeOn(Schedulers.computation())
.observeOn(Schedulers.io())
.subscribe(onNext -> System.out.println("metrics->" + onNext));
Snippet2
createQ2Flowable().mergeWith(createMetricsFlowable())
.subscribeOn(Schedulers.computation())
.subscribe(onNext -> System.out.println(onNext));
[edit]: Added flowable creators
private Flowable<String> createMetricsFlowable() {
return Flowable.create(source -> {
Space sp = SpaceFactory.getSpace("rxObservableFeeder");
while (running()) {
String line = (String) sp.in("RXTmFeeder");
source.onNext(line);
}
}, BackpressureStrategy.BUFFER);
}
private Flowable<String> createQ2Flowable() {
return Flowable.create(source -> {
Space sp = SpaceFactory.getSpace("LoggerSpace");
while (running()) {
LogEvent line = (LogEvent) sp.in("rxLoggingKey");
source.onNext(line.toString());
}
}, BackpressureStrategy.BUFFER);
}
From the comments:
try
createQ2Flowable()
.subscribeOn(Schedulers.computation()) // <-------------------------
.mer‌​geWith(createMetrics‌​Flowable()
.subscribe‌​On(Schedulers.comput‌​ation()) // <-------------------------
)
Now I need to know why it happened
Given the detailed implementation, you have two synchronous Flowables. When you merge them, the first Flowable is subscribed to and starts emitting immediately and never giving back the control to mergeWith, therefore the second Flowable is never subscribed to.
The subscribeOn after mergeWith is not equivalent to the solution provided above. You have to explicitly have both Flowables subscribed on a background thread so mergeWith can subscribe to the second Flowable after now that the synchronous looping has been moved off from the thread the mergeWith uses for subscribing to its sources.

Using range in zipWith also emits all items from range sequence before zipper function applied

The question is about RxJava2.
Noticed that zipping Throwable that comes from retryWhen with range emits all items from Observable.range before zipper function has been applied. Also, range emits sequence even if zipWith wasn't called. For example this source code
Observable.create<String> {
println("subscribing")
it.onError(RuntimeException("always fails"))
}
.retryWhen {
it.zipWith(Observable.range(1, 3).doOnNext { println("range $it") },
BiFunction { t: Throwable, i: Int -> i })
.flatMap {
System.out.println("delay retry by $it + second(s)")
Observable.timer(it.toLong(), TimeUnit.SECONDS)
}
}./*subscribe*/
gives the following result
range 1
range 2
range 3
subscribing
delay retry by 1 + second(s)
subscribing
delay retry by 2 + second(s)
subscribing
delay retry by 3 + second(s)
subscribing
onComplete
Replacing onError in observable creation also don't eliminate emitting range items. So the question is why it's happening as Range is cold.
Observables in 2.x don't have backpressure thus a range operator will emit all its items as soon as it can. Your case, however, can use a normal counter incremented along the error notification of the retry handler:
source.retryWhen(e -> {
int[] counter = { 0 };
return e.takeWhile(v -> ++counter[0] < 4)
.flatMap(v -> Observable.timer(counter[0], TimeUnit.SECONDS));
})

Embedded kafka throws exception when producing within multiple scalatest suits

Here is how my test suit is configured.
"test payments" should {
"Add 100 credits" in {
runTeamTest { team =>
withRunningKafka {
val addCreditsRequest = AddCreditsRequest(team.id.stringify, member1Email, 100)
TestCommon.makeRequestAndCheck(
member1Email,
TeamApiGenerated.addCredits().url,
Helpers.POST,
Json.toJson(addCreditsRequest),
OK
)
val foundTeam = TestCommon.waitForFuture(TeamDao.findOneById(team.id))
foundTeam.get.credits mustEqual initialCreditAmount + 100
}
}
}
"deduct 100 credits" in {
runTeamTest { team =>
withRunningKafka {
val deductCreditsRequest = DeductCreditsRequest(team.id.stringify, member1Email, 100)
TestCommon.makeRequestAndCheck(
member1Email,
TeamApiGenerated.deductCredits().url,
Helpers.POST,
Json.toJson(deductCreditsRequest),
OK
)
val foundTeam = TestCommon.waitForFuture(TeamDao.findOneById(team.id))
foundTeam.get.credits mustEqual initialCreditAmount - 100
}
}
}
Within Scalatest, the overarching suit name is "test payments" and the subsequent tests inside it have issues after the first one is run. If I run each of the two tests individually, they will succeed, but if I run the entire suit, the first succeeds and the second returns a org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. exception. The code above doesn't display the code within the controllers that are being tested, but within the controller, I have a kafka consumer that is constantly polling and close() isn't called on it within the tests.
I'd suggest you use companion object methods EmbeddedKafka.start() and EmbeddedKafka.stop() in the beforeAll and afterAll sections. This way you also avoid stopping / starting Kafka again for a single test class.
Also try to make sure you're not trying to start 2 or more instances of Kafka on the same port at the same time.