Kafka Streams stops if I use persistentKeyValueStore but works fine with inMemoryKeyValueStore (running in Docker container) - apache-kafka

I'm obviously a beginner with kafka/kafka streams. I just need to read given messages from a few topics, given their id. While our actual topology is fairly complex, this Stream app just needs to achieve this single simple goal
This is how a store is created :
final StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.table(
topic,
Materialized.<String, String>as( persistentKeyValueStore(storeNameOf(topic)))
.withKeySerde(Serdes.String()).withValueSerde(Serdes.String())
.withCachingDisabled());
// Materialized.<String, String>as( inMemoryKeyValueStore(storeNameOf(topic)))
// .withKeySerde(Serdes.String()).withValueSerde(Serdes.String())
// .withCachingDisabled());
);
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), new Properties() {{ /** config items go here**/ }})
kafkaStreams.start();
//logic for awaiting kafkaStreams to reach `RUNNING` state as well as InvalidStateStoreException handling (by retrying) is ommited for simplicity :
ReadOnlyKeyValueStore<String, String> replyStore = kafkaStreams.store(storeNameOf(topicName), QueryableStoreTypes.keyValueStore());
So, when using the commented inMemoryKeyValueStore materialization replyStore is sucessfully created and I can query the values within without a problem
With persistentKeyValueStore the last line fails with java.lang.IllegalStateException: KafkaStreams is not running. State is ERROR. Note that I do check that KafkaStreams is in state RUNNING before the store call; the ERROR state is reached somehow within the call rather.
Do you think i might have missed anything when setting up the persistent store? Debugging hints would also greatly help, i'm quite stuck here I must confess
Thanks !
Edit : The execution happens under a docker container. This was quite relevant but I ommited to add initialy

As Matthias J. Sax pointed out in comment form, to debug the problem the uncaughtExceptionHandler registration helped greatly .
The actual issue was due to an incompatibility between RocksDB and the docker image I was using (so changed from openjdk:8-jdk-alpine to anapsix/alpine-java:8 )
Related :
https://issues.apache.org/jira/browse/KAFKA-4988
UnsatisfiedLinkError: /tmp/snappy-1.1.4-libsnappyjava.so Error loading shared library ld-linux-x86-64.so.2: No such file or directory

Related

R2DBC MSSQL r2dbc.mssql.client.ReactorNettyClient : Connection has been closed by peer

I have started to work on a Spring WebFlux and R2DBC project. Mainly, my code works fine.
But after some elements I am receiving this warning
r2dbc.mssql.client.ReactorNettyClient : Connection has been closed by peer
after this warning I am getting this exception and normally program stops to read from Flux which source is R2DBC driver.
ReactorNettyClient$MssqlConnectionClosedException: Connection unexpectedly closed
My main pipeline like this;
Sinks.Empty<Void> completionSink = Sinks.empty();
Flux<Event> events = service.getPairs(
taskProperties.A,
taskProperties.B);
events
.flatMap(some operation)
.doOnComplete(() -> {
log.info("Finished Job");
completionSink.emitEmpty(Sinks.EmitFailureHandler.FAIL_FAST);
})
.subscribe();
completionSink.asMono().block();
After run, flatMap requesting 256 element as a default, then after fetching trying to request(1) for next signal.
At somewhere between 280. and 320. element it is getting above error. It is not idempotent, sometimes it reads 280 element sometimes it is reading 303, 315 etc.
I think it is about network maybe? But not sure and cannot find the reason. Do I need a pool or something different?
Sorry if I missed anything, in case you want I will try to update here.
Thank you in advance
I have tried to change request size of flatMap to unbounded, adding scheduler, default r2dbc pool but for now I don't have any clue.

Alpakka Kafka stream never getting terminated

We are using Alpakka Kafka streams for consuming events from Kafka. Here is how the stream is defined as:
ConsumerSettings<GenericKafkaKey, GenericKafkaMessage> consumerSettings =
ConsumerSettings
.create(actorSystem, new KafkaJacksonSerializer<>(GenericKafkaKey.class),
new KafkaJacksonSerializer<>(GenericKafkaMessage.class))
.withBootstrapServers(servers).withGroupId(groupId)
.withClientId(clientId).withProperties(clientConfigs.defaultConsumerConfig());
CommitterSettings committerSettings = CommitterSettings.create(actorSystem)
.withMaxBatch(20)
.withMaxInterval(Duration.ofSeconds(30));
Consumer.DrainingControl<Done> control =
Consumer.committableSource(consumerSettings, Subscriptions.topics(topics))
.mapAsync(props.getMessageParallelism(), msg ->
CompletableFuture.supplyAsync(() -> consumeMessage(msg), actorSystem.dispatcher())
.thenCompose(param -> CompletableFuture.supplyAsync(() -> msg.committableOffset())))
.toMat(Committer.sink(committerSettings), Keep.both())
.mapMaterializedValue(Consumer::createDrainingControl)
.run(materializer);
Here is the piece of code that is shutting down the stream:
CompletionStage<Done> completionStage = control.drainAndShutdown(actorSystem.dispatcher());
completionStage.toCompletableFuture().join();
I tried doing a get too on the completable future. But neither join nor get on future are returning. Have anyone else too faced similar problem? Is there something that I am doing wrong here?
If you want to control stream termination from outside the stream, you need to use a KillSwitch : https://doc.akka.io/docs/akka/current/stream/stream-dynamic.html
Your usage looks correct and I can't identify anything that would hinder draining.
A common thing to miss with Alpakka Kafka consumers is the stop-timeout which defaults to 30 seconds.
When using the DrainingControl you can safely set it to 0 seconds.
See https://doc.akka.io/docs/alpakka-kafka/current/consumer.html#draining-control

Use kafka to detect changes on values

I have a streaming application that continuously takes in a stream of coordinates along with some custom metadata that also includes a bitstring. This stream is produced onto a kafka topic using producer API. Now another application needs to process this stream [Streams API] and store the specific bit from the bit string and generate alerts when this bit changes
Below is the continuous stream of messages that need to be processed
{"device_id":"1","status_bit":"0"}
{"device_id":"2","status_bit":"1"}
{"device_id":"1","status_bit":"0"}
{"device_id":"3","status_bit":"1"}
{"device_id":"1","status_bit":"1"} // need to generate alert with change: 0->1
{"device_id":"3","status_bits":"1"}
{"device_id":"2","status_bit":"1"}
{"device_id":"3","status_bits":"0"} // need to generate alert with change 1->0
Now I would like to write these alerts to another kafka topic like
{"device_id":1,"init":0,"final":1,"timestamp":"somets"}
{"device_id":3,"init":1,"final":0,"timestamp":"somets"}
I can save the current bit in the state store using something like
streamsBuilder
.stream("my-topic")
.mapValues((key, value) -> value.getStatusBit())
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
.reduce((oldAggValue, newMessageValue) -> newMessageValue, Materialized.as("bit-temp-store"));
but I am unable to understand how can I detect this change from the existing bit. Do I need to query the state store somehow inside the processor topology? If yes? How? If no? What else could be done?
Any suggestions/ideas that I can try(maybe completely different from what I am thinking) are also appreciated. I am new to Kafka and thinking in terms of event driven streams is eluding me.
Thanks in advance.
I am not sure this is the best approach, but in the similar task I used an intermediate entity to capture the state change. In your case it will be something like
streamsBuilder.stream("my-topic").groupByKey()
.aggregate(DeviceState::new, new Aggregator<String, Device, DeviceState>() {
public DeviceState apply(String key, Device newValue, DeviceState state) {
if(!newValue.getStatusBit().equals(state.getStatusBit())){
state.setChanged(true);
}
state.setStatusBit(newValue.getStatusBit());
state.setDeviceId(newValue.getDeviceId());
state.setKey(key);
return state;
}
}, TimeWindows.of(…) …).filter((s, t) -> (t.changed())).toStream();
In the resulting topic you will have the changes. You can also add some attributes to DeviceState to initialise it first, depending whether you want to send the event, when the first device record arrives, etc.

How to run periodic tasks in an Apache Storm topology?

I have an Apache Storm topology and would like to perform a certain action every once in a while. I'm not sure how to approach this in a way which would be natural and elegant.
Should it be a Bolt or a Spout using ScheduledExecutorService, or something else?
Tick tuples are a decent option https://kitmenke.com/blog/2014/08/04/tick-tuples-within-storm/
Edit: Here's the essential code for your bolt
#Override
public Map<String, Object> getComponentConfiguration() {
// configure how often a tick tuple will be sent to our bolt
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 300);
return conf;
}
Then you can use TupleUtils.isTick(tuple) in execute to check whether the received tuple is a tick tuple.
I don't know if this is a correct approach, but it seems to be working fine:
At the end of the prepare method of a Bolt, I added a call to intiScheduler(), which contains the following code:
Calendar calendar = Calendar.getInstance();
ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
scheduler.scheduleAtFixedRate(new PeriodicAction() [class implementing Runnable], millisToFullHour(calendar) [wanna start at the top of the hour], 60*60*1000 [run every hour], TimeUnit.MILLISECONDS);
This needs to be used with caution though, because the bolt can have multiple instances depending on your setup.

Kafka Streams application issues with version 0.10.2.0

I had an existing Kafka Streams application which was working fine with 0.10.1.1. Updated to the new 0.10.2.0 Kafka Streams library along with the new broker as well (although the new library is backward compatible with 0.10.1.1). Quick background
I have a REST API built on top of the interactive query metadata based API
It searches the local store and also queries remote store (using the StreamsMetadata obtained by KafkaStreams.allMetadataForStore method)
Use the application.server config param in order for this to work
Application is working fine with a single application instance. As soon as I start another instance and query the store via the REST, I experience the following issues
If I execute the search on the node which started first, the local store search failed with this exception
SEVERE: Error - the state store, my-store, may have migrated to another instance.
org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, in-memory-avg-store, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.StreamThreadStateStoreProvider.stores(StreamThreadStateStoreProvider.java:49)
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:55)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:699)
**at mycode.getLocalMetrics(myclass.java:121)**
**at mycode.remote(myclass.java:98)**
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
If I execute the search on the new node, the local store search was fine, but I see a nullpointer in the .forEach(new Consumer() line
ks.allMetadataForStore(storeName)
.stream()
.filter(sm -> !(sm.host().equals(thisInstance.host()) && sm.port() == thisInstance.port())) //only query remote node stores
.forEach(new Consumer<StreamsMetadata>() {
#Override
public void accept(StreamsMetadata t) {
//some logic
}
});
What could I be missing ??
Maybe you need to add a little 'sleep-time' for the creatation of StreamsMetadata.When i comment out the Thread.sleep(1000L);,i have the same error.
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();
Thread.sleep(1000L);//If commented out,error occur
System.out.println(streams.allMetadataForStore("statestore").size());
If you start multiple instances, store might migrate from one instance to another. If you collect the metadata about store location before a store migrates, you metadata is not correct anymore. Thus, you need to refresh your metadata (ie, collect it again).
The exception org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, in-memory-avg-store, may have migrated to another instance indicates that your meta data is stall.
So you can catch this exception and retry to recover from it.
Based on Matthias J. Sax comments, one can load stream store inside stream statelistener like below
streams.setStateListener((newState, oldState) -> {
if (newState == State.RUNNING && oldState.REBALNCING) {
System.out.println(streams.allMetadataForStore("statestore").size());
}});