Spring cloud kafka streams processor API kafka record key is null, while there is correct record value - spring-cloud

...
#StreamListener(Processor.INPUT)
#SendTo(Processor.OUTPUT)
public Event receive(#Payload String record) {
String key = UUID.randomUUID().toString();
Event event = composeEvent(); //generates an event
return event;
}
Since the #SendTo sends the returned event value directly to the OUTPUT Kafka topic destination, I cannot seem to find an example that sets key before the value is pushed to the output topic. The key turns out to be null in the kafka topic.
Any working code is highly appreciated. Thanks a lot.

Related

Kafka Ktable changelog (using toStream()) is missing some ktable updates when several messages with the same key arrive at the same time

I have an input stream, I use it to create a ktable. Then I create an output stream with the ktable changelog, using toStream() method. The problem is that the stream created by the toStream() method does not contains all the messages from the input stream that has updated my KTable. Here is my code :
final KTable<String, event> KTable = inputStream.groupByKey().aggregate(() -> null,
aggregateKtableMethod,
storageConf);
KStream<String, event> outputStream = KTable.toStream();
I would like to get one message in the outputStream for each message in inputStream. For most of the messages it is working well, but I am losing some events in a particular case : if I receive 2 messages with the same key in a small interval of time (less than 5 seconds). In this case I only receive the second event in the outputStream.
I think it is because the Ktable updates are made by some batch operations, but I can't find any configuration or documentation related to it. Is it the reason of these missing events and do you know how to change the configuration so that I will not lose any message ?
I found the solution. The issue was in the "storageConf" I have used to create my ktable, the cache was able. I just had to disabled it, with the function :
storageConf.withCachingDisabled();
final KTable<String, event> KTable = inputStream.groupByKey().aggregate(() -> null,
aggregateKtableMethod,
storageConf);
Now I have all my events in the output stream.

Roll back mechanism in kafka processor api?

I am using kafka processor api (not DSL)
public class StreamProcessor implements Processor<String, String>
{
public ProcessorContext context;
public void init(ProcessorContext context)
{
this.context = context;
context.commit()
//statestore initialized with key,value
}
public void process(String key, String val)
{
try
{
String[] topicList = stateStore.get(key).split("|");
for(String topic: topicList)
{
context.forward(key,val,To.child(consumerTopic));
} // forward same message to list of topics ( 1..n topics) , rollback if write to some topics failed ?
}
}
}
Scenario : we are reading data from a source topic and stream
processor writes data to multiple sink topics (topicList above) .
Question: How to implement rollback mechanism using kafka streams
processor api when one or more of the topics in the topicList above
fails to receive the message ? .
What I understand is processor api has rollback mechanism for each
record it failed to send, or can roll back for an an entire batch of
messages which failed be achieved as well? as process method in
processor interface is called per record rather than per batch hence I
would surmise it can only be done per record.Is this correct assumption ?, if not please suggest
how to achieve per record and per batch rollbacks for failed topics using processor api.
You would need to implement it yourself. For example, you could use two stores: main-store, and "buffer" store and first only update the buffer store, call context.forward() second to make sure all write are in the output topic, and afterward merge the "buffer" store into the main store.
If you need to roll back, you drop the content from the buffer store.

How to check message on uniqueness Kafka?

There is topic Users with partitions.
Each partitions have messages about user data.
How to avoid duplications, for example dont allow inserting of the same user's name?
If I got this right I should create seperate topic Usernames and append all requested usernames.
Then before adding a new user in topic Users I ensure that there are not dublications in topic Usernames, right?
Accordingly using streams
I assume you are talking about a scenario where you are trying to publish events to Kafka topic from a micro-service.
Also, assuming you want to publish users profile --> username as key, user profile as value.
There are 2 issues of deduplication here :-
1.) you might get different usernames to your service at different times and publishing to topic.
2.) Duplicate message processing - During Broker failure(ack not received) or kafka client failures, the same message can be re-processed as kafka client does not hace ack info.
This can be taken care by enabling idempotency on kafka producers and atomic transactions.(Refer to Exactly Once processing)
I believe your question is about 1.) where your service receives duplicate messages.
Solution 1:-
If you are using micro-service, you can have an inmemory cache/DB of usernames and publish to kafka if duplicate is not found.
Solution 2:- (Handle on Kafka itself using streams)
input topic - users
Build an Kafka Stream client with stateStore(keyValueStore) and transformer to implement your dedupe logic.
So, your kafka stream client consumes the events from users topic and transforms in UserDedupeTransformer(where you have dedupe logic) and then produces to the output topic(as per ur requirement)
StoreBuilder<KeyValueStore<String, String>> storeBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("UserDedupeStoreName"),
Serdes.String(),
Serdes.String())
.withCachingEnabled();
builder.addStateStore(storeBuilder)
.stream("users-topic", Consumed.with(Serdes.String(), Serdes.String()))
.transform(() -> new UsersDedupeTransformer(), "usersDedupeStoreName")
.to("destination-topic");
In UserDedupeTransformer - Configured userDedupeStore and override the transform method -
public void init(ProcessorContext context) {
this.context = context;
dedupeStore = (KeyValueStore<String, String>) context.getStateStore("userDedupeStoreName");
}
public KeyValue<String, String> transform(String key, String v) {
if (null != key && null != dedupeStore.get(key))
return KeyValue.pair(key, value);
else
return null;
This dedupe store can be configured as In-Memory and also can be persisted using RocksDB.

how to identify and merge the messages of different queues in kafka

Background:
We had previously used hibernate search, Lucene and jboss hornetq queue for indexing.
Our Application is the producer and sends the metadata(unique data information to identify a record in the Database) to the hornetq.
Consumer receives this metadata and query against the database to fetch the complete record details(including child objects).
This is much more database centric approach.
Now we want to eliminate the database centric approach for indexing. We have decided to use kafka rather hornetq.
There is no issue when user creates the data.
We see there is a potential problem when the user edits the data(Say a parent entity with two child objects). When the data is pulled from the database for user display,
we push the same data to kafka topic1. When user modify's the data(say parenet level data) and submits. We get only the parent level data(don't get the child objects data), we push the changed data to topic2. Now we have to merge the message present in topic1(child objects) with the corresponding message in topic2(parent level data)
Note: We have to take this route as you know there is no update in Indexing rather it is delete and then insert.
Questions:
If i go with the above approach, how can I map the specific
message present in topic1 with the specific message in topic2. Is
there a way to provide the same message ids in topic1 and topic2?
Is there any way to resolve this issue if i use the single topic?
Is there any better design/approach to resolve the above issue?
Thanks in advance.
If i go with the above approach, how can I map the specific message present in topic1 with the specific message in topic2. Is there a way to provide the same message ids in topic1 and topic2 ?
To map or join the specific messages between topics in the same Kafka cluster maybe Kafka Stream and KSQL is a good direction to do. Can you find the reference here.
There are many ways to make an object unique and I suggest using parent entity id when you send messages to topic1 and topic2. Sample Java code as following:
ProducerRecord<String, ParentEntity> record = new ProducerRecord<>(topic1,
ParentEntity.getId(), ParentEntity);
ListenableFuture<SendResult<String, ParentEntity>> future =
kafkaTemplate.send(record);
future.addCallback(new ListenableFutureCallback<SendResult<String,
ParentEntity>>() {
#Override
public void onSuccess(SendResult<String, ParentEntity> result) {}
#Override
public void onFailure(Throwable ex) {
//print out error log
}
});
ProducerRecord<String, ChildEntity> record = new ProducerRecord<>(topic2,
ChildEntity.getParentEntityId(), ChildEntity);
ListenableFuture<SendResult<String, ChildEntity>> future =
kafkaTemplate.send(record);
future.addCallback(new ListenableFutureCallback<SendResult<String,
ChildEntity>>() {
#Override
public void onSuccess(SendResult<String, ChildEntity> result) {}
#Override
public void onFailure(Throwable ex) {
//print out error log
}
});
Is there any way to resolve this issue if i use the single topic ?
You can create a new table (said A) in database to store the full message to be sent for indexing. Every time user creates or updates data the message also to be inserted/updated to the table A. Finally your Kafka client pulls message objects from the table A and produce to an unique topic in Kafka cluster.
Is there any better design/approach to resolve the above issue ?
Can you try Kafka Stream and KSQL as I mentioned above.

Apache Flink dynamic number of Sinks

I am using Apache Flink and the KafkaConsumer to read some values from a Kafka Topic.
I also have a stream obtained from reading a file.
Depending on the received values, I would like to write this stream on different Kafka Topics.
Basically, I have a network with a leader linked to many children. For each child, the Leader needs to write the stream read in a child-specific Kafka Topic, so that the child can read it.
When the child is started, it registers itself in the Kafka topic read from the Leader.
The problem is that I don't know a priori how many children I have.
For example, I read 1 from the Kafka Topic, I want to write the stream in just one Kafka Topic named Topic1.
I read 1-2, I want to write on two Kafka Topics (Topic1 and Topic2).
I don't know if it is possible because in order to write on the Topic, I am using the Kafka Producer along with the addSink method and to my understanding (and from my attempts) it seems that Flink requires to know the number of sinks a priori.
But then, is there no way to obtain such behavior?
If I understood your problem well, I think you can solve it with a single sink, since you can choose the Kafka topic based on the record being processed. It also seems that one element from the source might be written to more than one topic, in which case you would need a FlatMapFunction to replicate each source record N times (one for each output topic). I would recommend to output as a pair (aka Tuple2) with (topic, record).
DataStream<Tuple2<String, MyValue>> stream = input.flatMap(new FlatMapFunction<>() {
public void flatMap(MyValue value, Collector<Tupple2<String, MyValue>> out) {
for (String topic : topics) {
out.collect(Tuple2.of(topic, value));
}
}
});
Then you can use the topic previously computed by creating the FlinkKafkaProducer with a KeyedSerializationSchema in which you implement getTargetTopic to return the first element of the pair.
stream.addSink(new FlinkKafkaProducer10<>(
"default-topic",
new KeyedSerializationSchema<>() {
public String getTargetTopic(Tuple2<String, MyValue> element) {
return element.f0;
}
...
},
kafkaProperties)
);
KeyedSerializationSchema
Is now deprecated. Instead you have to use "KafkaSerializationSchema"
The same can be achieved by overriding the serialize method.
public ProducerRecord<byte[], byte[]> serialize(
String inputString, #Nullable Long aLong){
return new ProducerRecord<>(customTopicName,
key.getBytes(StandardCharsets.UTF_8), inputString.getBytes(StandardCharsets.UTF_8));
}