I have a Kafka Streams application where I am joining a KStream that reads
from "topic1" with a GlobalKTable that reads from "topic2" and then with
another GlobalKTable that reads from "topic3".
When I try to push messages to all 3 topics at the same time then I get following exception -
org.apache.kafka.streams.errors.InvalidStateStoreException
If I push messages one by one in these topics i.e push messages in topic2 then in topic3 and then in topic1, then I do not get this
exception.
I have also added StateListener before I start KafkaStreams
KafkaStreams.StateListener stateListener = new KafkaStreams.StateListener() {
#Override
public void onChange (KafkaStreams.State newState, KafkaStreams.State oldState) {
if(newState == KafkaStreams.State.REBALANCING) {
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
}
};
streams.setStateListener(stateListener);
streams.start();
Also I wait till the store is queryable after the stream has started by calling following method
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (final InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}
Following is the Kafka Streams and GlobalKTable join code:
KStream<String, GenericRecord> topic1KStream =
builder.stream(
"topic1",
Consumed.with(Serdes.String(), genericRecordSerde)
);
GlobalKTable<String, GenericRecord> topic2KTable =
builder.globalTable(
"topic2",
Consumed.with(Serdes.String(), genericRecordSerde),
Materialized.<String, GenericRecord, KeyValueStore<Bytes, byte[]>>as("topic2-global-store")
.withKeySerde(Serdes.String())
.withValueSerde(genericRecordSerde)
);
GlobalKTable<String, GenericRecord> topic3KTable =
builder.globalTable(
"topic3",
Consumed.with(Serdes.String(), genericRecordSerde),
Materialized.<String, GenericRecord, KeyValueStore<Bytes, byte[]>>as("topic3-global-store")
.withKeySerde(Serdes.String())
.withValueSerde(genericRecordSerde)
);
KStream<String, MergedObj> stream_topic1_topic2 = topic1KStream.join(
topic2KTable,
(topic2Id, topic1Obj) -> topic1.get("id").toString(),
(topic1Obj, topic2Obj) -> new MergedObj(topic1Obj, topic2Obj)
);
final KStream<String, GenericRecord> enrichedStream =
stream_topic1_topic2.join(
topic3KTable,
(topic2Id, mergedObj) -> mergedObj.topic3Id(),
(mergedObj, topic3Obj) -> new Enriched(
mergedObj.topic1Obj,
mergedObj.topic2Obj,
topic3Obj
).enrich()
);
enrichedStream.to("enrichedStreamTopic", Produced.with(Serdes.String(),getGenericRecordSerde()));
The above code is very similar to this.
When I try to push messages to all 3 topics at the same time then I get
following exception:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_1, processor=KSTREAM-SOURCE-0000000000, topic=topic1,
partition=1, offset=61465,
stacktrace=org.apache.kafka.streams.errors.InvalidStateStoreException:
Store topic2-global-store is currently closed.
at
org.apache.kafka.streams.state.internals.WrappedStateStore.validateStoreOpen(WrappedStateStore.java:66)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(CachingKeyValueStore.java:150)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(CachingKeyValueStore.java:37)
at
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.get(MeteredKeyValueStore.java:135)
at
org.apache.kafka.streams.processor.internals.ProcessorContextImpl$KeyValueStoreReadOnlyDecorator.get(ProcessorContextImpl.java:245)
at
org.apache.kafka.streams.kstream.internals.KTableSourceValueGetterSupplier$KTableSourceValueGetter.get(KTableSourceValueGetterSupplier.java:49)
at
org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.process(KStreamKTableJoinProcessor.java:71)
at
org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117)
at
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:183)
at
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:162)
at
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:122)
at
org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:364)
at
org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:199)
at
org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:420)
at
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:890)
at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
I fixed the issue
in my code I had auto.register.schemas=false because I had manually registered schemas for all my topics.
After I set auto.register.schemas=true and re-ran streams application it worked fine. I think it needs this flag for its internal topics.
Related
I have a reactive kafka application that reads data from a topic, transforms the message and writes to another topic. I have multiple partitions in the topic so I am creating multiple consumers to read from the topics in parallel. Each consumer runs on a different thread. But looks like kafka send runs on the same thread even though it is called from different consumers.
I tested by logging the thread name to understand the thread workflow, the receive thread name is different for each consumer, but on kafka send [kafkaProducerTemplate.send] the thread name [Thread name: producer-1] is the same for all the consumers. I don't understand how that works, i would expect it to be different for all consumers on send as well. Can someone help me understand how this works.
#Bean
public ReceiverOptions<String, String> kafkaReceiverOptions(String topic, KafkaProperties kafkaProperties) {
ReceiverOptions<String, String> basicReceiverOptions = ReceiverOptions.create(kafkaProperties.buildConsumerProperties());
return basicReceiverOptions.subscription(Collections.singletonList(topic))
.addAssignListener(receiverPartitions -> log.debug("onPartitionAssigned {}", receiverPartitions))
.addRevokeListener(receiverPartitions -> log.debug("onPartitionsRevoked {}", receiverPartitions));
}
#Bean
public ReactiveKafkaConsumerTemplate<String, String> kafkaConsumerTemplate(ReceiverOptions<String, String> kafkaReceiverOptions) {
return new ReactiveKafkaConsumerTemplate<String, String>(kafkaReceiverOptions);
}
#Bean
public ReactiveKafkaProducerTemplate<String, List<Object>> kafkaProducerTemplate(
KafkaProperties properties) {
Map<String, Object> props = properties.buildProducerProperties();
return new ReactiveKafkaProducerTemplate<String, List<Object>>(SenderOptions.create(props));
}
public void run(String... args) {
for(int i = 0; i < topicPartitionsCount ; i++) {
readWrite(destinationTopic).subscribe();
}
}}
public Flux<String> readWrite(String destTopic) {
return kafkaConsumerTemplate
.receiveAutoAck()
.doOnNext(consumerRecord -> log.info("received key={}, value={} from topic={}, offset={}",
consumerRecord.key(),
consumerRecord.value(),
consumerRecord.topic(),
consumerRecord.offset())
)
.doOnNext(consumerRecord -> log.info("Record received from partition {} in thread {}", consumerRecord.partition(),Thread.currentThread().getName()))
.doOnNext(s-> sendToKafka(s,destTopic))
.map(ConsumerRecord::value)
.onErrorContinue((exception,errorConsumer)->{
log.error("Error while consuming : {}", exception.getMessage());
});
}
public void sendToKafka(ConsumerRecord<String, String> consumerRecord, String destTopic){
kafkaProducerTemplate.send(destTopic, consumerRecord.key(), transformRecord(consumerRecord))
.doOnNext(senderResult -> log.info("Record received from partition {} in thread {}", consumerRecord.partition(),Thread.currentThread().getName()))
.doOnSuccess(senderResult -> {
log.debug("Sent {} offset : {}", metrics, senderResult.recordMetadata().offset());
}
.doOnError(exception -> {
log.error("Error while sending message to destination topic : {}", exception.getMessage());
})
.subscribe();
}
All sends for a producer are run on a single-threaded Scheduler (via .publishOn()).
See DefaultKafkaSender.doSend().
You should create a sender for each consumer.
I've an application where I process a stream and convert it into another. Here is a sample:
public void run(final String... args) {
final Serde<Event> eventSerde = new EventSerde();
final Properties props = streamingConfig.getProperties(
applicationName,
concurrency,
Serdes.String(),
eventSerde
);
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, EXACTLY_ONCE);
props.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, EventTimestampExtractor.class);
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, Event> eventStream = builder.stream(inputStream);
final Serde<Device> deviceSerde = new DeviceSerde();
eventStream
.map((key, event) -> {
final Device device = modelMapper.map(event, Device.class);
return new KeyValue<>(key, device);
})
.to("device_topic", Produced.with(Serdes.String(), deviceSerde));
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
streams.start();
}
Here are some details about the app:
Spring Boot 1.5.17
Kafka 2.1.0
Kafka Streams 2.1.0
Spring Kafka 1.3.6
Although a timestamp is set in the messages inside the input stream, I also place an implementation of TimestampExtractor to make sure that a proper timestamp is attached into all messages (as other producers may send messages into the same topic).
Within the code, I receive a stream of events and I basically convert them into different objects and eventually route those objects into different streams.
I'm trying to understand whether the initial timestamp I set is still attached to the messages published into device_topic in this particular case.
The receiving end (of device stream) is like this:
#KafkaListener(topics = "device_topic")
public void onDeviceReceive(final Device device, #Header(KafkaHeaders.RECEIVED_TIMESTAMP) final long timestamp) {
log.trace("[{}] Received device: {}", timestamp, device);
}
Unfortunetely the printed timestamp seems to be wall clock time. Is this the expected behaviour or am I missing something?
Spring Kafka 1.3.x uses a very old 0.11 client; perhaps it doesn't propagate the timestamp. I just tested with Boot 2.1.3 and Spring Kafka 2.2.4 and the timestamp is propagated ok...
#SpringBootApplication
#EnableKafkaStreams
public class So54771130Application {
public static void main(String[] args) {
SpringApplication.run(So54771130Application.class, args);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
template.send("so54771130", 0, 42L, null, "baz");
};
}
#Bean
public KStream<String, String> stream(StreamsBuilder builder) {
KStream<String, String> stream = builder.stream("so54771130");
stream
.map((k, v) -> {
System.out.println("Mapping:" + v);
return new KeyValue<>(null, "bar");
})
.to("so54771130-1");
return stream;
}
#Bean
public NewTopic topic1() {
return new NewTopic("so54771130", 1, (short) 1);
}
#Bean
public NewTopic topic2() {
return new NewTopic("so54771130-1", 1, (short) 1);
}
#KafkaListener(id = "so54771130", topics = "so54771130-1")
public void listen(String in, #Header(KafkaHeaders.RECEIVED_TIMESTAMP) long ts) {
System.out.println(in + "#" + ts);
}
}
and
Mapping:baz
bar#42
I am fairly new to kafka. I have created a sample producer and consumer in java. Using the producer, I was able to send data to a kafka topic but I am not able to get the number of records in the topic using the following consumer code.
public class ConsumerTests {
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
String topicName = "MobileData";
String groupId = "TestGroup";
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", groupId);
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Arrays.asList(topicName));
try {
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
}
}
I don't get any exception in the console but consumerRecords.count() always returns 0, even if there are messages in the topic. Please let me know, if I am missing something to get the record details.
The poll(...) call should normally be in a loop. It's always possible for the initial poll(...) to return no data (depending on the timeout) while the partition assignment is in progress. Here's an example:
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
For more info see this relevant article:
#kafkaListener consumer is commiting once a specific condition is met. Let us say a topic gets the following data from a producer
"Message 0" at offset[0]
"Message 1" at offset[1]
They are received at the consumer and commited with help of acknowledgement.acknowledge()
then the below messages come to the topic
"Message 2" at offset[2]
"Message 3" at offset[3]
The consumer which is running receive the above data. Here condition fail and the above offsets are not committed.
Even if new data comes at the topic, then also "Message 2" and "Message 3" should be picked up by any consumer from the same consumer group as they are not committed. But this is not happening,the consumer picks up a new message.
When I restart my consumer then I get back Message2 and Message3. This should have happened while the consumers were running.
The code is as follows -:
KafkaConsumerConfig file
enter code here
#Configuration
#EnableKafka
public class KafkaConsumerConfig {
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(3);
factory.setBatchListener(true);
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setSyncCommits(true);
return factory;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "100");
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "15000");
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
propsMap.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
propsMap.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,"1");
return propsMap;
}
#Bean
public Listener listener() {
return new Listener();
}
}
Listner Class
public class Listener {
public CountDownLatch countDownLatch0 = new CountDownLatch(3);
private Logger LOGGER = LoggerFactory.getLogger(Listener.class);
static int count0 =0;
#KafkaListener(topics = "abcdefghi", group = "group1", containerFactory = "kafkaListenerContainerFactory")
public void listenPartition0(String data, #Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitions,
#Header(KafkaHeaders.OFFSET) List<Long> offsets, Acknowledgment acknowledgment) throws InterruptedException {
count0 = count0 + 1;
LOGGER.info("start consumer 0");
LOGGER.info("received message via consumer 0='{}' with partition-offset='{}'", data, partitions + "-" + offsets);
if (count0%2 ==0)
acknowledgment.acknowledge();
LOGGER.info("end of consumer 0");
}
How can i achieve my desired result?
That's correct. The offset is a number which is pretty easy to keep tracking in the memory on consumer instance. We need offsets commited for newly arrived consumers in the group for the same partitions. That's why it works as expected when you restart an application or when rebalance happens for the group.
To make it working as you would like you should consider to implement ConsumerSeekAware in your listener and call ConsumerSeekCallback.seek() for the offset you would like to star consume from the next poll cycle.
http://docs.spring.io/spring-kafka/docs/2.0.0.M2/reference/html/_reference.html#seek:
public class Listener implements ConsumerSeekAware {
private final ThreadLocal<ConsumerSeekCallback> seekCallBack = new ThreadLocal<>();
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
this.seekCallBack.set(callback);
}
#KafkaListener()
public void listen(...) {
this.seekCallBack.get().seek(topic, partition, 0);
}
}
In my sample program i try to publish a file and try to consume that immediately. But my consumer iterator returns null.
Any idea what I'm doing wrong?
Test
**main(){**
KafkaMessageProducer producer = new KafkaMessageProducer(topic, file);
producer.generateMessgaes();
MessageListener listener = new MessageListener(topic);
listener.start();
}
MessageListener
public void start() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, new Integer(CoreConstants.THREAD_SIZE));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector
.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
executor = Executors.newFixedThreadPool(CoreConstants.THREAD_SIZE);
for (KafkaStream<byte[], byte[]> stream : streams) {
System.out.println("The stream is --"+ stream.iterator().makeNext().topic());
executor.submit(new ListenerThread(stream));
}
try { // without this wait the subsequent shutdown happens immediately before any messages are delivered
Thread.sleep(10000);
} catch (InterruptedException ie) {
}
if (consumerConnector != null) {
consumerConnector.shutdown();
}
if (executor != null) {
executor.shutdown();
}
}
ListenerThread
public class ListenerThread implements Runnable {
private KafkaStream<byte[], byte[]> stream;
public ListenerThread(KafkaStream<byte[], byte[]> msgStream) {
this.stream = msgStream;
System.out.println("----------" + stream.iterator().makeNext().topic());
}
public void run() {
try {
ConsumerIterator<byte[], byte[]> it = stream.iterator();
while (it.hasNext()) {
// MessageAndMetadata<byte[], byte[]> messageAndMetadata =
// it.makeNext();
// String topic = messageAndMetadata.topic();
// byte[] message = messageAndMetadata.message();
System.out.println("111111111111111111111111111");
FileProcessor processor = new FileProcessor();
processor.processFile("LOB_TOPIC", it.next().message());
}
in the above iterator it is not going inside while loop , since the iterator is null. But I'm sure I'm publishing a single message to the same topic and consumer listens to that topic.
Any help would be appreciated
I was having this same issue yesterday. After trying to work with it for a while, I couldn't get it to read from my current topic. So I took following steps
a. Stopped my consumer,
b. stopped the producer,
c. stopped the kafka server
bin/zookeeper-server-stop.sh config/zookeeper.properties
d. stopped the zookeeper
bin/zookeeper-server-stop.sh config/zookeeper.properties
After that I deleted my topic.
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
I also deleted the files that was created by following the "Setting up a multi-broker cluster" but I don't think it created the issue.
a. Started the Zookeeper
b. started kafka
c. started producer and send some messages to Kafka
it started to work again. I am not sure if this will help you or not. But seems like that somehow my producer must have got disconnected from the consumer. Hope this helps.