ZeroMQ Topic Filtering with Pub/Sub (Java binding) - filtering

How can I get ZeroMQ to support topics and pattern matching?
e.g.
stocks.*
stocks.appl
From my understanding ZeroMQ..the topic will be part of the message so I need someway of separating the topic from the actual message in the subscriber.
Whats the best way of separating the topic and message? Do you need a special character (e.g. SOH)?

Need to use a pub/sub envelope
Pub-Sub Message Envelopes

Quoting http://zeromq.org/area:faq:
Can I subscribe to messages using regex or wildcards?
No. Prefix matching only.

Related

Kafka topic to multiple kafka topics dispatcher (same cluster)

My use-case is as follows:
I have a kafka topic A with messages "logically" belonging to different "services", I don't handle neither the system sending the messages to A.
I want to read such messages from A and dispatch them to a per-service set of topics on the same cluster (let's call them A_1, ..., A_n), based on one column describing the service (the format is CSV-style, but it doesn't matter).
The set of services is static, I don't have to handle addition/removal at the moment.
I was hoping to use KafkaConnect to perform such task but, surprisingly, there are no Kafka source/sinks (I cannot find the tickets, but they have been rejected).
I have seen MirrorMaker2 but it looks like an overkill for my (simple) use-case.
I also know KafkaStreams but I'd rather not write and maintain code just for that.
My question is: is there a way to achieve this topic dispatching with kafka native tools without writing a kafka-consumer/producer myself?
PS: if anybody thinks that MirrorMaker2 could be a good fit I am interested too, I don't know the tool very well.
As for my knowledge, there is no straightforward way to branch incoming topic messages to a list of topics based on the incoming messages. You need to write custom code to achieve this.
Use Processor API Refer here
Pass list of topics inside the Processor method
Use logic to identify topics need to branch
Use context.forward to publish a message to other topics
context.forward(key, value, To.child("selected topic"))
Mirror Maker is for doing ... mirroring. It's useful when you want to mirror one cluster from one data center to the other with the same topics. Your use case is different.
Kafka Connect is for syncing different systems (data from Databases for example) through Kafka topics but I don't see it for this use case either.
I would use a Kafka Streams application for that.
All the other answers are right, at the time of writing I did find any "config-only" solution in the Kafka toolset.
What finally did the trick was to use Logstash, as its "kafka output plugin" supports jinja variables in topic-id parameter.
So once you have the "target topic name" available in a field (say service_name) it's as simple as this:
output {
kafka {
id => "sink"
codec => [...]
bootstrap_servers => [...]
topic_id => "%{[service_name]}"
[...]
}
}

Is Kafka message headers the right place to put event type name?

In scenario where multiple single domain event types are produced to single topic and only subset of event types are consumed by consumer i need a good way to read the event type before taking action.
I see 2 options:
Put event type (example "ORDER_PUBLISHED") into message body (payload) itself which would be like broker agnostic approach and have other advantages. But would involve parsing of every message just to know the event type.
Utilize Kafka message headers which would allow to consume messages without extra payload parsing.
The context is event-sourcing. Small commands, small payloads. There are no huge bodies to parse. Golang. All messages are protobufs. gRPC.
What is typical workflow in such scenario.
I tried to google on this topic, but didn't found much on Headers use-cases and good practices.
Would be great to hear when and how to use Kafka message headers and when not to use.
Clearly the same topic should be used for different event types that apply to the same entity/aggregate (reference). Example: BookingCreated, BookingConfirmed, BookingCancelled, etc. should all go to the same topic in order to (excuse the pun) guarantee ordering of delivery (in this case the booking ID is the message key).
When the consumer gets one of these events, it needs to identify the event type, parse the payload, and route to the processing logic accordingly. The event type is the piece of message metadata that allows this identification.
Thus, I think a custom Kafka message header is the best place to indicate the type of event. I'm not alone:
Felipe Dutra: "Kafka allow you to put meta-data as header of your message. So use it to put information about the message, version, type, a correlationId. If you have chain of events, you can also add the correlationId of opentracing"
This GE ERP system has a header labeled "event-type" to show "The type of the event that is published" to a kafka topic (e.g., "ProcessOrderEvent").
This other solution mentions that "A header 'event' with the event type is included in each message" in their Kafka integration.
Headers are new in Kafka. Also, as far as I've seen, Kafka books focus on the 17 thousand Kafka configuration options and Kafka topology. Unfortunately, we don't easily find much on how an event-driven architecture can be mapped with the proper semantics onto elements of the Kafka message broker.

SQS: How to forward message to subscriber based on a certain key

I have a validation service which takes in validation-requests and publishes them to a SQS queue. Now based on the type of validation request, I want to forward the message to that specific service.
So basically, I have one producer and multiple consumers, but essentially, one message is to be consumed by only one consumer.
What approach should I use? Should I have a different SQS queue for each service or I can do this using a single queue based on message type?
As I see it, you have three options;
The first option, like you say is to have a unique consumer for each message type. This is the approach we use and we have thousands of queues and many different messages types.
The second option would be to decorate the message being pushed onto SQS with something that would indicate it's desired consume, then have a generic consumer in your application that can forward the message on to the right consumer. Though this approach is generally seen as an anti pattern, I would personally agree.
Thirdly, you could take advantage of SNS filtering but that's only if you use SNS right now otherwise you'd have to invest in some time to setup it up and make it work.
Hope that helps!

How Kafka Kstream and Spring #KafkaListener are different?

I am a bit new to Kafka and reading through documentation. The Kafka office site has an example on KStream. Where the application is bound to a topic and as soon as the message arrives its processed. The results are posted back to topic or databases.
Spring Kafka annotation #KafkaListener does the same functionality. For example, I tried my hands on KafaListner application. In here as well, we listen to a topic and process it when something is posted.
So I was curious to know
1. How these 2 are different?
2. Which one to prefer in which scenario?
Please note that this is is a very limited explanation. Refer the docs.
To answer your question 1 "How these 2 are different?" - Both KafkaListener and KStream consume messages from Kafka topics. However they differ in the way they maintain state. The KafkaListener does not maintain state. It consumes messages as it comes. KStream reads the topic as a continuous Stream of messages.
Lets assume that a topic sends lines and we maintain a count of the number of each word. So after we send the topic these 2 lines,
Hello good morning,
Hello thanks
We will have the word counts - Hello 2, good 1, morning 1 & thanks 1.
KakfaListener can be used to keep this word count manually. The developer can store the words in a static Hashmap and keep the count. KStream will do it naturally because it reads the topic as a stream -
it is designed to operate on an infinite, unbounded stream of data
The KStream example explains this in good detail.
To answer your question 2 "Which one to prefer in which scenario?", Use KafkaListener if you need to consume messages without maintaining state..like a pipeline, to take info from source to sink. Use KStream if your messages are related to each other - like find total number of a particular word in all the messages (roughly similar to a GROUP BY in SQL).
#KafkaListener is not using KStream (Stream API). #KafkaListener is an annotation from spring-kafka which uses Consumer API internally. KStream is not available in Consumer API, it's available in Stream API.
For differences between Stream and Consumer APIs, check out question linked in comment to your question. Just remember one thing, spring-kafka library wraps Kafka libraries, so you have available four APIs: Stream API wrapped by spring-kafka, Consumer API wrapped by spring-kafka, Stream API and Consumer API. Two examples you mentioned are: Stream API and Consumer API wrapped by spring-kafka.

Implement filering for kafka messages

I have started using Kafka recently and evaluating Kafka for few use cases.
If we wanted to provide the capability for filtering messages for consumers (subscribers) based on message content, what is best approach for doing this?
Say a topic named "Trades" is exposed by producer which has different trades details such as market name, creation date, price etc.
Some consumers are interested in trades for a specific markets and others are interested in trades after certain date etc. (content based filtering)
As filtering is not possible on broker side, what is best possible approach for implementing below cases :
If filtering criteria is specific to consumer. Should we use
Consumer-Interceptor (though interceptor are suggested for logging
purpose as per documentation)?
If filtering criteria (content based filtering) is common among consumers, what should be the approach?
Listen to topic and filter the messages locally and write to new topic (using either interceptor or streams)
If I understand you question correctly, you have one topic and different consumer which are interested in specific parts of the topic. At the same time, you do not own those consumer and want to avoid that those consumer just read the whole topic and do the filtering by themselves?
For this, the only way to go it to build a new application, that does read the whole topic, does the filtering (or actually splitting) and write the data back into two (multiple) different topics. The external consumer would consumer from those new topics and only receive the date they are interested in.
Using Kafka Streams for this purpose would be a very good way to go. The DSL should offer everything you need.
As an alternative, you can just write your own application using KafkaConsumer and KafkaProducer to do the filtering/splitting manually in your user code. This would not be much different from using Kafka Streams, as a Kafka Streams application would do the exact same thing internally. However, with Streams your effort to get it done would be way less.
I would not use interceptors for this. Even is this would work, it seems not to be a good software design for you use case.
Create your own interceptor class that implements org.apache.kafka.clients.consumer.ConsumerInterceptor and implement your logic in method 'onConsume' before setting 'interceptor.classes' config for the consumer.