Publish vs Produce in MassTransit with Kafka - apache-kafka

I want to implement state machine integrated with Kafka topics. Whenever some message will be produced to a topic, I want state machine to react by changing a state. I have two questions:
Is producing a message on a topic identical with publishing an event?
How to wire things up in a proper way? Some simple code example would be welcome.

Produce is not the same as Publish. You can use Produce to send messages to a topic in Kafka from a state machine:
Initially(
When(Started)
.Produce(x => x.Init<KafkaMessage>(new {Text = "text"}))
.TransitionTo(Active));
There are unit tests that show how it works, I don't believe it is documented yet. It was added in this commit

Related

Kafka Consumer and Producer

Can I have the consumer act as a producer(publisher) as well? I have a user case where a consumer (C1) polls a topic and pulls messages. after processing the message and performing a commit, it needs to notify another process to carry on remaining work. Given this use case is it a valid design for Consumer (C1) to publish a message to a different topic? i.e. C1 is also acting as a producer
Yes. This is a valid use case. We have many production applications does the same, consuming events from a source topic, perform data enrichment/transformation and publish the output into another topic for further processing.
Again, the implementation pattern depends on which tech stack you are using. But if you after Spring Boot application, you can have look at https://medium.com/geekculture/implementing-a-kafka-consumer-and-kafka-producer-with-spring-boot-60aca7ef7551
Totally valid scenario, for example you can have connector source or a producer which simple push raw data to a topic.
The receiver is loosely coupled to your publisher so they cannot communicate each other directly.
Then you need C1 (Mediator) to consume message from the source, transform the data and publish the new data format to a different topic.
If you're using a JVM based client, this is precisely the use case for using Kafka Streams rather than the base Consumer/Producer API.
Kafka Streams applications must consume from an initial topic, then can convert(map), filter, aggregate, split, etc into other topics.
https://kafka.apache.org/documentation/streams/

Suddenly Kafka Streams (scala) app falls into the rejoining process (with no obvious reason) and never completes it

I am messing aroung with Kafka Streams handled by the K8s.
It goes more or less fine so far, yet weird behaviour is observed on the test environment:
[Consumer clientId=dbe-livestream-kafka-streams-77185a88-71a7-40cd-8774-aeecc04054e1-StreamThread-1-consumer, groupId=dbe-livestream-kafka-streams] We received an assignment [_livestream.dbe.tradingcore.sporteventmappings-table-0, _livestream.dbe.tradingcore.sporteventmappings-table-2, _livestream.dbe.tradingcore.sporteventmappings-table-4, _livestream.dbe.tradingcore.sporteventmappings-table-6, livestream.dbe.tennis.results-table-0, livestream.dbe.tennis.results-table-2, livestream.dbe.tennis.results-table-4, livestream.dbe.tennis.results-table-6, _livestream.dbe.betmanager.sporteventmappings-table-0, _livestream.dbe.betmanager.sporteventmappings-table-2, _livestream.dbe.betmanager.sporteventmappings-table-4, _livestream.dbe.betmanager.sporteventmappings-table-6] that doesn't match our current subscription Subscribe(_livestream.dbe.betmanager.sporteventmappings-table|_livestream.dbe.trading_states|_livestream.dbe.tradingcore.sporteventmappings-table|livestream.dbe.tennis.markets|livestream.dbe.tennis.markets-table); it is likely that the subscription has changed since we joined the group. Will try re-join the group with current subscription
As far as I understand, internal state somehow got broken, and Stream's source of truth conflicts with the broker/zookeeper's one. This behaviour never terminates: I just let it hang for few days beingh busy with another stuff, and still it's the, reported at the WARN level. More than that: no ERRORs were reported for this time.
I did not change nothing; did not deploy new instances; did not manipulate Kafka brokers in any way that might affect abovementioned Kafka Streams app. Any ideas what's wrong?
The error message itself indicates, that something is wrong with your subscription. This may happen if you have two Kafka Streams instances using the same application.id, but both don't subscribe to the exact some topics.
In your case, the subscription does not contain livestream.dbe.tennis.results-table but corresponding partitions are assigned.
Note, that Kafka Streams requires that all instances with the same application.id are required to execute the exact some Topology and thus subscribe to the exact some topics.

KStream.through() not creating intermediate topic, and impacting other processor nodes in topology

I am trying out the sample KafkaStreams code from Chapter 4 from the book - Kafka Streams in Action. I pretty much copied the code in github - https://github.com/bbejeck/kafka-streams-in-action/blob/master/src/main/java/bbejeck/chapter_4/ZMartKafkaStreamsAddStateApp.java This is an example using StateStore. When I run the code as is, no data is flowing through the topology. I verified that mock data is being generated, as I can see the offset in the input topic - transactions go up. However, nothing in the output topics, and nothing is printed to console.
However, when I comment line 81-88 (https://github.com/bbejeck/kafka-streams-in-action/blob/master/src/main/java/bbejeck/chapter_4/ZMartKafkaStreamsAddStateApp.java#L81-L88), basically avoid creating the "through()" processor node, the code works. I see data being generated to the "patterns" topics, and output generate in console.
I am using Kafka broker and client version 2.4. Would appreciate any help or pointers to debug the issue.
Thank you,
Ahmed.
It is well documented, that you need to create intermediate topic that you use via through() manually and upfront before you start your application. Intermediate topics, similar to input and output topics are not managed by Kafka Streams, but it's the users responsibility to manage them.
Cf: https://docs.confluent.io/current/streams/developer-guide/manage-topics.html
Btw: there is work in progress to add a new repartition() operator that allows you to repartition via a topic that will be managed by Kafka Streams (cf. https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+DSL+with+Connecting+Topic+Creation+and+Repartition+Hint)

Kafka topic to multiple kafka topics dispatcher (same cluster)

My use-case is as follows:
I have a kafka topic A with messages "logically" belonging to different "services", I don't handle neither the system sending the messages to A.
I want to read such messages from A and dispatch them to a per-service set of topics on the same cluster (let's call them A_1, ..., A_n), based on one column describing the service (the format is CSV-style, but it doesn't matter).
The set of services is static, I don't have to handle addition/removal at the moment.
I was hoping to use KafkaConnect to perform such task but, surprisingly, there are no Kafka source/sinks (I cannot find the tickets, but they have been rejected).
I have seen MirrorMaker2 but it looks like an overkill for my (simple) use-case.
I also know KafkaStreams but I'd rather not write and maintain code just for that.
My question is: is there a way to achieve this topic dispatching with kafka native tools without writing a kafka-consumer/producer myself?
PS: if anybody thinks that MirrorMaker2 could be a good fit I am interested too, I don't know the tool very well.
As for my knowledge, there is no straightforward way to branch incoming topic messages to a list of topics based on the incoming messages. You need to write custom code to achieve this.
Use Processor API Refer here
Pass list of topics inside the Processor method
Use logic to identify topics need to branch
Use context.forward to publish a message to other topics
context.forward(key, value, To.child("selected topic"))
Mirror Maker is for doing ... mirroring. It's useful when you want to mirror one cluster from one data center to the other with the same topics. Your use case is different.
Kafka Connect is for syncing different systems (data from Databases for example) through Kafka topics but I don't see it for this use case either.
I would use a Kafka Streams application for that.
All the other answers are right, at the time of writing I did find any "config-only" solution in the Kafka toolset.
What finally did the trick was to use Logstash, as its "kafka output plugin" supports jinja variables in topic-id parameter.
So once you have the "target topic name" available in a field (say service_name) it's as simple as this:
output {
kafka {
id => "sink"
codec => [...]
bootstrap_servers => [...]
topic_id => "%{[service_name]}"
[...]
}
}

Storm KafkaBolt push to multiple Kafka Topics

I have a use-case, where I have a message which has to be pushed to a number of kafka topics.
Currently at a high level, that method looks like this:
pushToTopics(String msg){
pushToTopicA(msg);
pushToTopicB(msg);
pushToTopicC(msg);
.
.
.
pushToTopicN(msg);
}
Every PushToTopicX(msg) has a condition which when fulfilled should lead the message to be published to the corresponding topic. Right now, all of this logic is at the terminal Bolt and to push the messages, we use KafkaProducer.
I was looking at ways to break this down into topic specific bolts and more importantly use KafkaBolts to push messages.
Is it possible with storm(v 1.2.2)? I saw that very recently a PR has been merged which lets one create custom callbacks, but we don't have that.
The KafkaBolt can decide which topic to send to based on the tuple. You could just use a splitter bolt to split your message into N messages, each with a different destination topic, and then send all of them to the KafkaBolt.
The way I eventually solved it is to create separate streams, each one bound to the destination topics. Then via collector.emit on specific streams, I was able to fan the messages out across various bolts, which eventually push to Kafka using KafkaBolt.