Create Kafka Topic using vertx - apache-kafka

How can we create producer for different topics using vert.x. Vert.x zanox module did some pretty job but seems it was limited to one topic. There is no way to send messages to desired topic. It sticks to one topic which we gave at config file.

If you are using the Zanox module (like me) I just deploy one module per topic that I need to produce to, passing in a different configuration file (with appropriate topic name).
Not the most efficient approach I agree, but short of writing your own Kafka module/integration this is the only option available.

Related

Does it make sense to use kafka-connect to transform kafka messages?

We have confluents platform in our infrastructure. At core, we are using kafka broker to distribute events. Dozens of devices produce events to kafka topics (there is a kafka topic for each type of event), where events are serialized in google's protobuf. We have confluent's schema registry to keep track of the protobuf schemas.
What we need is, for several events, we need to apply some transformation and then publish the transformation output to some other kafka topic. Of course Kafka Streams is one way to accomplish that, like in this example. However, we don't want to have a java application for each transformation (which increase the complexity of the project and development/deployment effort), and it doesn't feels right to put all streams in one application (modifying one will require to stop all streams ans start again).
At this point, we thought that maybe Confluent's Kafka Connect might be better approach. We can have several workers, and we can deploy them into one kafka connect instance/or cluster. The question is;
Does it make sense to use kafka connect to get message from one kafka topic and send it to another kafka topic? Be cause all the use cases and examples aims to get data from outside (database, file etc.) to kafka, and from kafka to outside.
To clarify, Kafka Connect is not "Confluent's", it's part of Apache Kafka.
While you could use MirrorMaker2/Confluent Replicator with transforms, it honestly wouldn't be much different than extracting the transformation logic into a shared library, then bundling a deployable Kafka Streams application that accepts configuration parameters for input and output topics with the transformation in-between.
You make a good point about single-point of administration, but that's also a single point of failure... If you use Connect, changing your transform plugin will also require you to stop and restart the Connect server, if all topics are part of the same connector, then any task failure would stop some percentage of the topic transformations
Kafka Streams (or KSQL) is preferred for inter-cluster translations, anyway
You could also look at solutions like Apache Nifi for more complex event management and routing

Kafka topic to multiple kafka topics dispatcher (same cluster)

My use-case is as follows:
I have a kafka topic A with messages "logically" belonging to different "services", I don't handle neither the system sending the messages to A.
I want to read such messages from A and dispatch them to a per-service set of topics on the same cluster (let's call them A_1, ..., A_n), based on one column describing the service (the format is CSV-style, but it doesn't matter).
The set of services is static, I don't have to handle addition/removal at the moment.
I was hoping to use KafkaConnect to perform such task but, surprisingly, there are no Kafka source/sinks (I cannot find the tickets, but they have been rejected).
I have seen MirrorMaker2 but it looks like an overkill for my (simple) use-case.
I also know KafkaStreams but I'd rather not write and maintain code just for that.
My question is: is there a way to achieve this topic dispatching with kafka native tools without writing a kafka-consumer/producer myself?
PS: if anybody thinks that MirrorMaker2 could be a good fit I am interested too, I don't know the tool very well.
As for my knowledge, there is no straightforward way to branch incoming topic messages to a list of topics based on the incoming messages. You need to write custom code to achieve this.
Use Processor API Refer here
Pass list of topics inside the Processor method
Use logic to identify topics need to branch
Use context.forward to publish a message to other topics
context.forward(key, value, To.child("selected topic"))
Mirror Maker is for doing ... mirroring. It's useful when you want to mirror one cluster from one data center to the other with the same topics. Your use case is different.
Kafka Connect is for syncing different systems (data from Databases for example) through Kafka topics but I don't see it for this use case either.
I would use a Kafka Streams application for that.
All the other answers are right, at the time of writing I did find any "config-only" solution in the Kafka toolset.
What finally did the trick was to use Logstash, as its "kafka output plugin" supports jinja variables in topic-id parameter.
So once you have the "target topic name" available in a field (say service_name) it's as simple as this:
output {
kafka {
id => "sink"
codec => [...]
bootstrap_servers => [...]
topic_id => "%{[service_name]}"
[...]
}
}

Dynamically create and change Kafka topics with Flink

I'm using Flink to read and write data from different Kafka topics.
Specifically, I'm using the FlinkKafkaConsumer and FlinkKafkaProducer.
I'd like to know if it is possible to change the Kafka topics I'm reading from and writing to 'on the fly' based on either logic within my program, or the contents of the records themselves.
For example, if a record with a new field is read, I'd like to create a new topic and start diverting records with that field to the new topic.
Thanks.
If you have your topics following a generic naming pattern, for example, "topic-n*", your Flink Kafka consumer can automatically reads from "topic-n1", "topic-n2", ... and so on as they are added to Kafka.
Flink 1.5 (FlinkKafkaConsumer09) added support for dynamic partition discovery & topic discovery based on regex. This means that the Flink-Kafka consumer can pick up new Kafka partitions without needing to restart the job and while maintaining exactly-once guarantees.
Consumer constructor that accepts subscriptionPattern: link.
Thinking more about the requirement,
1st step is - You will start from one topic (for simplicity) and will spawn more topic during runtime based on the data provided and direct respective messages to these topics. It's entirely possible and will not be a complicated code. Use ZkClient API to check if topic-name exists, if does not exist create a model topic with new name and start pushing messages into it through a new producer tied to this new topic. You don't need to restart job to produce messages to a specific topic.
Your initial consumer become producer(for new topics) + consumer(old topic)
2nd step is - You want to consume messages for new topic. One way could be to spawn a new job entirely. You can do this be creating a thread pool initially and supplying arguments to them.
Again be more careful with this, more automation can lead to overload of cluster in case of a looping bug. Think about the possibility of too many topics created after some time if input data is not controlled or is simply dirty. There could be better architectural approaches as mentioned above in comments.

Using Kafka to Transfer Files between two clients

I have kafka cluster setup between to machines (machine#1 and machine#2) and the configuration is the following:
1) Each machine is configured to have one broker and one zookeeper running.
2) Server and zookeeper properties are configured to have a multi-broker, mulit-node zookeeper.
I currently have the following understanding of KafkaProducer and KafkaConsumer:
1) If I send a file from machine#1 to machine#2, it's broken down in lines using some default delimiter (LF or \n).
2) Therefore, if machine#1 publishes 2 different files to the same topic, that doesn't mean that machine#2 will receive the two files. Instead, every line will be appended to the topic log partitions and a machine#2 will read it from the log partitions in the order of arrival. i.e. the order is not the same as
file1-line1
file1-line2
end-of-file1
file2-line1
file2-line2
end-of-file2
but it might be something like:
file1-line1
file2-line1
file1-line2
end-of-file1
file-2-line2
end-of-file2
Assuming that the above is correct (i'm happy to be wrong), I believe simple Producer Consumer usage to transfer files is not the correct approach (Probably connect API is the solution here). Since Kafka Website says that "Log Aggregation" is a very popular use case, I was wonder if someone has any example projects or website which demonstrates file exchange examples using Kafka.
P.S. I know that by definition Connect API says that this is for reliable data exchange between kafka and "Other" systems - but I don't see why the other system cannot have kafka. So I am hoping that my question doesn't have to focus on "Other" non-kafka systems.
Your understanding is correct, however if u want the same order you can use just 1 partition for that topic.
So the order in which machine#2 reads will be the same as what you sent.
However this will be inefficient and will lack parallelism for which Kafka is widely used.
Kafka has ordering guarantee within a partition. quote from documentation
Kafka only provides a total order over records within a partition, not
between different partitions in a topic
In order to send all the lines from a file to only one partition, send an additional key to the producer client which will hash the sent message to the same partition.
This will make sure you receive the events from one file in the same order on machine#2. If you have any questions feel free to ask, as we use Kafka for ordering guarantee of events generated from multiple sources in production which is basically your use case as well.

Kafka topic alias

Is it possible to create an alias of a topic name?
Or, put another way...
If a user writes to topic examplea is it possible to override that at the broker so they actually write to topic exampleb?
alternatively, if the topic was actually written as examplea, but the consumer can refer to it as exampleb.
I'm thinking it could probably be achieved using small hack at the broker where it replies to metadata requests, but I'd rather not if it can be done in some standard way.
Aliases are not natively supported in Kafka.
One workaround could be to produce to examplea and have a consumer/producer pair that consumers from examplea and produces to exampleb. The consumer/producer pair could be written with Kafka clients, as a connector in Connect, as a MirrorMaker instance (though you'll need to modify it to change the topic name), or as a Kafka Streams job. Note that the messages will appear in exampleb slightly after examplea because they're being copied after being written.
How are you writing to the Kafka topic because if it's via a REST proxy you should be able to rewrite the topic portion of the URL using NGINX or a similar reverse proxy intermediary.