Using kafka streams to segregate messages - apache-kafka

I have a setup where each kafka message will contain a "sender" field. All these message are sent to a single topic.
Is there a way to segregate these messages at the consumer side? I would like sender specific consumer that will read all messages pertaining to that sender alone.
Should I be using Kafka Streams to achieve this? I am new to Kafka Streams, any advice guidance will be helpful.
public class KafkaStreams3 {
public static void main(String[] args) throws JSONException {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafkastreams1");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
final Serde < String > stringSerde = Serdes.String();
Properties kafkaProperties = new Properties();
kafkaProperties.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
kafkaProperties.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
kafkaProperties.put("bootstrap.servers", "localhost:9092");
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(kafkaProperties);
KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source = builder.stream(stringSerde, stringSerde, "topic1");
KStream<String, String> s1 = source.map(new KeyValueMapper<String, String, KeyValue<String, String>>() {
#Override
public KeyValue<String, String> apply(String dummy, String record) {
JSONObject jsonObject;
try {
jsonObject = new JSONObject(record);
return new KeyValue<String,String>(jsonObject.get("sender").toString(), record);
} catch (JSONException e) {
e.printStackTrace();
return new KeyValue<>(record, record);
}
}
});
s1.print();
s1.foreach(new ForeachAction<String, String>() {
#Override
public void apply(String key, String value) {
ProducerRecord<String, String> data1 = new ProducerRecord<String, String>(
key, key, value);
producer.send(data1);
}
});
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
streams.close();
producer.close();
}
}));
}
}

I believe the simplest way to achieve this is to use your "sender" field as a key and to have a single topic partitioned by "sender", this will give you locality and order per "sender" so you get a stronger ordering guarantee per "sender" and you can connect clients to consume from specific partitions.
Other possibility is that from the initial topic you stream your messages to other topics aggregating by key so you would end up having one topic per "sender".
Here's a fragment of code for a producer and then streaming with json serializers and deserializers.
Producer:
private Properties kafkaClientProperties() {
Properties properties = new Properties();
final Serializer<JsonNode> jsonSerializer = new JsonSerializer();
properties.put("bootstrap.servers", config.getHost());
properties.put("client.id", clientId);
properties.put("key.serializer", StringSerializer.class);
properties.put("value.serializer", jsonSerializer.getClass());
return properties;
}
public Future<RecordMetadata> send(String topic, String key, Object instance) {
ObjectMapper objectMapper = new ObjectMapper();
JsonNode jsonNode = objectMapper.convertValue(instance, JsonNode.class);
return kafkaProducer.send(new ProducerRecord<>(topic, key,
jsonNode));
}
The stream:
log.info("loading kafka stream configuration");
final Serializer<JsonNode> jsonSerializer = new JsonSerializer();
final Deserializer<JsonNode> jsonDeserializer = new JsonDeserializer();
final Serde<JsonNode> jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
KStreamBuilder kStreamBuilder = new KStreamBuilder();
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, config.getStreamEnrichProduce().getId());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, hosts);
//stream from topic...
KStream<String, JsonNode> stockQuoteRawStream = kStreamBuilder.stream(Serdes.String(), jsonSerde , config.getStockQuote().getTopic());
Map<String, Map> exchanges = stockExchangeMaps.getExchanges();
ObjectMapper objectMapper = new ObjectMapper();
kafkaProducer.configure(config.getStreamEnrichProduce().getTopic());
// - enrich stockquote with stockdetails before producing to new topic
stockQuoteRawStream.foreach((key, jsonNode) -> {
StockQuote stockQuote = null;
StockDetail stockDetail;
try {
stockQuote = objectMapper.treeToValue(jsonNode, StockQuote.class);
} catch (JsonProcessingException e) {
e.printStackTrace();
}
JsonNode exchangeNode = jsonNode.get("exchange");
// get stockDetail that matches current quote being processed
Map<String, StockDetail> stockDetailMap = exchanges.get(exchangeNode.toString().replace("\"", ""));
stockDetail = stockDetailMap.get(key);
stockQuote.setStockDetail(stockDetail);
kafkaProducer.send(config.getStreamEnrichProduce().getTopic(), null, stockQuote);
});
return new KafkaStreams(kStreamBuilder, props);

Related

when I seek to a timestamp using OffsetsForTimes method it give me a timout exeception

0
I created a class called ConsumerConfig and Service class that contain fonction that gets records from a topic.
locally it works just fine but when i add the brokers it stopped working and i get timout exception, and it takes a loong time can anybody help me .
here is the code of the ConsumerConfig class and the fonctions that gets records from specific topic :
#Component
public class ConsumerConfig {
private static Integer numberOfConsumer = 0;
private KafkaConsumer consumer;
private Map<String, Object> buildDefaultConfig() {
final Map<String, Object> defaultClientConfig = new HashMap<>();
defaultClientConfig.put("bootstrap.servers", (String) getbrokersConfigurationFromFile().get("spring.kafka.bootstrap-servers"));
defaultClientConfig.put("client.id", "test-consumer-id" + (++numberOfConsumer));
defaultClientConfig.put("request.timeout.ms",60000);
return defaultClientConfig;
}
#Bean
#RequestScope
public <K, V> KafkaConsumer<K, V> getKafkaConsumer() {
// Build config
final Map<String, Object> kafkaConsumerConfig = buildDefaultConfig();
kafkaConsumerConfig.put("key.deserializer", StringDeserializer.class);
kafkaConsumerConfig.put("value.deserializer", StringDeserializer.class);
kafkaConsumerConfig.put("auto.offset.reset", "earliest");
kafkaConsumerConfig.put("max.poll.records",5);
kafkaConsumerConfig.put("default.api.timeout.ms", 6000000);
kafkaConsumerConfig.put("max.block.ms ",60000000);
kafkaConsumerConfig.put("auto.offset.reset", "earliest");
kafkaConsumerConfig.put("enable.partition.eof", "false");
kafkaConsumerConfig.put("enable.auto.commit", "true");
kafkaConsumerConfig.put("auto.commit.interval.ms", "1000");
kafkaConsumerConfig.put("session.timeout.ms", "30000");
//fetch.max.byte The maximum amount of data the server should return for a fetch request.
// Create and return Consumer.
return consumer=new KafkaConsumer<K, V> (kafkaConsumerConfig);
}
}
/**
* reacd from specific offset timestamp
*/
public <K, V> List<Record> consumeFromTime(String topicName, Long timestampMs) {
// Get the list of partitions
List<PartitionInfo> partitionInfos = consumer.partitionsFor(topicName);
// Transform PartitionInfo into TopicPartition
List<TopicPartition> topicPartitionList = partitionInfos
.stream()
.map(info -> new TopicPartition(topicName, info.partition()))
.collect(Collectors.toList());
// Assign the consumer to these partitions
consumer.assign(topicPartitionList);
// Look for offsets based on timestamp
Map<TopicPartition, Long> partitionTimestampMap = topicPartitionList.stream()
.collect(Collectors.toMap(tp -> tp, tp -> System.currentTimeMillis() - timestampMs * 1000));
Map<TopicPartition, OffsetAndTimestamp> partitionOffsetMap = consumer.offsetsForTimes(partitionTimestampMap);
// Force the consumer to seek for those offsets
List<ConsumerRecord<K, V>> allRecords = new ArrayList<>();
partitionOffsetMap.forEach((tp, offsetAndTimestamp) -> {
if (!Objects.isNull(offsetAndTimestamp)) {
consumer.seek(tp, offsetAndTimestamp.offset());
ConsumerRecords<K, V> records;
records = consumer.poll(Duration.ofMillis(100L));
records.forEach(allRecords::add);
}
}
);
return allRecords.stream().map(v -> Record.builder().values(v.value()).offset(v.offset()).partition(v.partition()).build()).collect(Collectors.toList());
}

How to Commit Kafka Offsets Manually in Flink

I have a Flink job to consume a Kafka topic and sink it to another topic and the Flink job is setting as auto.commit with a interval 3 minutes(checkpoint disabled), but in the monitoring side, there is 3 minutes lag. But we want to monitor the processing on real time without 3 minutes lag, so we want to have a feature that the FlinkKafkaConsumer is able to commit the offset immediately after sink function.
Is there a way to achieve this goal within Flink framework?
Or any other options?
On line 53, I am trying to create a KafkaConsumer instance to call commitSync() function to make it working, but it does not work.
public class CEPJobTest {
private final static String TOPIC = "test";
private final static String BOOTSTRAP_SERVERS = "localhost:9092";
public static void main(String[] args) throws Exception {
System.out.println("start cep test job...");
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "console-consumer-cep");
properties.setProperty("enable.auto.commit", "false");
// offset interval
//properties.setProperty("auto.commit.interval.ms", "500");
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>("test", new SimpleStringSchema(),
properties);
//set commitoffset by checkpoint
consumer.setCommitOffsetsOnCheckpoints(false);
System.out.println("checkpoint enabled:"+consumer.getEnableCommitOnCheckpoints());
DataStream<String> stream = env.addSource(consumer);
stream.map(new MapFunction<String, String>() {
#Override
public String map(String value) throws Exception {
return new Date().toString() + ": " + value;
}
}).print();
//here, I want to commit offset manually after processing message...
KafkaConsumer<?, ?> kafkaConsumer = new KafkaConsumer(properties);
kafkaConsumer.commitSync();
env.execute("Flink Streaming");
}
private static Consumer<Long, String> createConsumer() {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "KafkaExampleConsumer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false);
final Consumer<Long, String> consumer = new KafkaConsumer<>(props);
return consumer;
}
}
This does not work like your code
env.execute is to submit job to cluster, the execution is then submitted. The code before this line is just build the job graph rather than executing anything.
To do this after sink, you should put it in your sink function
class mySink extends RichSinkFunction {
override def invoke(...) = {
val kafkaConsumer = new KafkaConsumer(properties);
kafkaConsumer.commitSync();
}
}

Why is windowing now working for Kafka Streams?

I am running a simple Kafka Streams program on my eclipse which is running successfully, but it is not able to implement the windowing concept.
I want to process all the messages received in a window of 5 seconds to the output topic. I googled and understand that I need to implement the tumbling window concept. However, I see that the output is sent to the output topic instantly.
What am I doing wrong here? Below is the main method that I am running:
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("wc-input");
#SuppressWarnings("deprecation")
KTable<Windowed<String>, Long> counts = source
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
#Override
public Iterable<String> apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
}
})
.groupBy(new KeyValueMapper<String, String, String>() {
#Override
public String apply(String key, String value) {
return value;
}
})
.count(TimeWindows.of(10000L)
.until(10000L),"Counts");
// need to override value serde to Long type
counts.to("wc-output");
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-wordcount-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
long windowSizeMs = TimeUnit.MINUTES.toMillis(50000); // 5 * 60 * 1000L
TimeWindows.of(windowSizeMs);
TimeWindows.of(windowSizeMs).advanceBy(windowSizeMs);
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
Windowing does not mean "one output" per window. If you want to get only one output per window, you want so use suppress() on the result KTable.
Compare this article: https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/

Error to serialize message when sending to kafka topic

i need to test a message, which contains headers, so i need to use MessageBuilder, but I can not serialize.
I tried adding the serialization settings on the producer props but it did not work.
Can someone help me?
this error:
org.apache.kafka.common.errors.SerializationException: Can't convert value of class org.springframework.messaging.support.GenericMessage to class org.apache.kafka.common.serialization.StringSerializer specified in value.serializer
My test class:
public class TransactionMastercardAdapterTest extends AbstractTest{
#Autowired
private KafkaTemplate<String, Message<String>> template;
#ClassRule
public static KafkaEmbedded embeddedKafka = new KafkaEmbedded(1);
#BeforeClass
public static void setUp() {
System.setProperty("spring.kafka.bootstrap-servers", embeddedKafka.getBrokersAsString());
System.setProperty("spring.cloud.stream.kafka.binder.zkNodes", embeddedKafka.getZookeeperConnectionString());
}
#Test
public void sendTransactionCommandTest(){
String payload = "{\"o2oTransactionId\" : \"" + UUID.randomUUID().toString().toUpperCase() + "\","
+ "\"cardId\" : \"11\","
+ "\"transactionId\" : \"20110405123456\","
+ "\"amount\" : 200.59,"
+ "\"partnerId\" : \"11\"}";
Map<String, Object> props = KafkaTestUtils.producerProps(embeddedKafka);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, Message<String>> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<String, Message<String>> ("notification_topic", MessageBuilder.withPayload(payload)
.setHeader("status", "RECEIVED")
.setHeader("service", "MASTERCARD")
.build()));
Map<String, Object> configs = KafkaTestUtils.consumerProps("test1", "false", embeddedKafka);
configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
ConsumerFactory<byte[], byte[]> cf = new DefaultKafkaConsumerFactory<>(configs);
Consumer<byte[], byte[]> consumer = cf.createConsumer();
consumer.subscribe(Collections.singleton("transaction_topic"));
ConsumerRecords<byte[], byte[]> records = consumer.poll(10_000);
consumer.commitSync();
assertThat(records.count()).isEqualTo(1);
}
}
I'd say the error is obvious:
Can't convert value of class org.springframework.messaging.support.GenericMessage to class org.apache.kafka.common.serialization.StringSerializer specified in value.serializer
Where your value is GenericMessage, but StringSerializer can work only with strings.
What you need is called JavaSerializer which does not exist, but not so difficult to write:
public class JavaSerializer implements Serializer<Object> {
#Override
public byte[] serialize(String topic, Object data) {
try {
ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
ObjectOutputStream objectStream = new ObjectOutputStream(byteStream);
objectStream.writeObject(data);
objectStream.flush();
objectStream.close();
return byteStream.toByteArray();
}
catch (IOException e) {
throw new IllegalStateException("Can't serialize object: " + data, e);
}
}
#Override
public void configure(Map<String, ?> configs, boolean isKey) {
}
#Override
public void close() {
}
}
And configure it for that value.serializer property.
private void configureProducer() {
Properties props = new Properties();
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.ByteArraySerializer");
producer = new KafkaProducer<String, String>(props);
}
This will do the job.
In my case i am using spring cloud and did not added the below property in the properties file
spring.cloud.stream.kafka.binder.configuration.value.serializer=org.apache.kafka.common.serialization.StringSerializer
This is what I used and it worked for me
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, org.springframework.kafka.support.serializer.JsonSerializer.class);
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.IntegerSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
annotate the JSON class with #XmlRootElement

Why my kafka consumer sometimes not listening messages?

I use older kafka consumer 0.8V.
Steps
Starting listener
Send 10 messages .Listener listens around 4 messages.
Send single message. Listener not listening
Again single message published. Listener is not listening.
Can anyone explain this behaviour?
Consumer
public class KafkaMessageListenerThread implements Runnable {
private KafkaStream<byte[], byte[]> stream;
private final KafkaMessageListener baseConsumer;
public KafkaMessageListenerThread(KafkaMessageListener consumer, KafkaStream<byte[], byte[]> stream) {
this.baseConsumer = consumer;
this.stream = stream;
}
public void run() {
ConsumerIterator<byte[], byte[]> itr = stream.iterator();
System.out.println("listens....");
while (itr.hasNext()) {
System.out.println("kafka record : " + itr.next());
byte[] data = itr.next().message();
baseConsumer.receiveData(data);
}
}
}
BaseConsumer
public void start() {
try {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, CoreConstants.THREAD_SIZE);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector
.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
executor = Executors.newFixedThreadPool(CoreConstants.THREAD_SIZE);
for (final KafkaStream stream : streams) {
executor.submit(new KafkaMessageListenerThread(this, stream));
}
} catch (Exception e) {
System.out.println("eror in polling");
}
}
consumer properties
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=com.xx.RawFileSerializer
zookeeper.connect=zookeeper.xx\:2181
serializer.class=com.xx.RawFileEncoderDecoder
bootstrap.servers=kafka.xx\:9092
group.id=test
consumer.timeout.ms=-1
fetch.min.bytes=1