Create new Producer from Kafka consumer? - apache-kafka

How to create new Kafka Producer from existing Consumer with java ?

You can't create a KafkaProducer from a KafkaConsumer instance.
You have to explicitly create a KafkaProducer using the same connection settings as your producer.
Considering the use case you mentioned (copying data from a topic to another), I'd recommend using Kafka Streams. There's actually an example in Kafka that does exactly that: https://github.com/apache/kafka/blob/trunk/streams/examples/src/main/java/org/apache/kafka/streams/examples/pipe/PipeDemo.java

I will recommend to use the Kafka Streams library. It reads data from kafka topics and do some processing and write back to another topics.
That could be simpler approach for you.
https://kafka.apache.org/documentation/streams/
Current limitation is, Source and destination cluster should be same with Kafka Streams.
Otherwise you need to use Processor API to define another destination cluster.
Another approach, is simply define a producer in the consumer program. Wherever your rule matches(based on offset or any conditions), call producer.send() method

Related

How to send data to kafka topic with kafka sink?

Currently I have a sink connector which gets data from topic A and sends its to an external service.
Now I have a use case when based on some logic I should send it to topic B instead of the service.
And this logic based on the response of the target service,that will return response based on the data.
So because the data should be sent to the target system every time I couldnt use the stream api.
Is that feasible somehow?
Or should I add a kafka producer manually to my sink? If so is there any drawback?
The first option, is to create a custom Kafka Connect Single Message Transform that will implement the desired logic and possibly use ExtractTopic as well (depending on how your custom smt looks like).
The second option is to build your own consumer. For example:
Step 1: Create one more topic on top of topic A
Create one more topic, say topic_a_to_target_system
Step 2: Implement your custom consumer
Implement a Kafka Consumer that consumes all the messages from topic topic_a.
At this point, you need to instantiate a Kafka Producer and based on the logic, decide whether the topic needs to be forwarded to topic_B or to the target system (topic_a_to_target_system).
Step 3: Start Sink connector on topic_a_to_target_system
Finally start your sink connector so that it sinks the data from topic topic_a_to_target_system to your target system.

Producer-consumer processing pattern for Kafka processing

I'm implementing a streaming pipeline that resembles the illustration below:
*K-topic1* ---> processor1 ---> *K-topic2* ---> processor2 -->
*K-topic3* ---> processor3 --> *K-topic4*
The K-topic components represent Kafka topics and the processor components code (Python/Java).
For the processor component, the intention is to read/consume data from the topic, perform some processing/ETL on it, and persist the results to the next topic in the chain as well as persistent store such as S3.
I have a question regarding the design approach.
The way I see it, each processor component should encapsulate both consumer and producer functionality.
Would the best approach be to have a Processor module/class that could contain KafkaConsumer and KafkaProducer classes ? To date, most examples I've seen have separate consumer and producer components which are run separately and would entail running double the number of components
as opposed to encapsulating producers & consumers within each Processor object.
Any suggestions/references are welcome.
This question is different from
Designing a component both producer and consumer in Kafka
as that question specifically mentions using Samza which is not the case here.
the intention is to read/consume data from the topic, perform some processing/ETL on it, and persist the results to the next topic in the chain
This is exactly the strength of Kafka Streams and/or KSQL. You could use the Processor API, but from what you describe, I think you'll only need the Streams DSL API
persist the results to the next topic in the chain as well as persistent store such as S3.
From the above topic, you can use a Kafka Connect Sink for getting the topic data into these other external systems. There is no need to write a consumer to do this for you.

Where to run the processing code in Kafka?

I am trying to setup a data pipeline using Kafka.
Data go in (with producers), get processed, enriched and cleaned and move out to different databases or storage (with consumers or Kafka connect).
But where do you run the actual pipeline processing code to enrich and clean the data? Should it be part of the producers or the consumers? I think I missed something.
In the use case of a data pipeline the Kafka clients could serve both as a consumer and producer.
For example, if you have raw data being streamed into ClientA where it is being cleaned before being passed to ClientB for enrichment then ClientA is serving as a consumer (listening to a topic for raw data) and a producer (publishing cleaned data to a topic).
Where you draw those boundaries is a separate question.
It can be part of either producer or consumer.
Or you could setup an environment dedicated to something like Kafka Streams processes or a KSQL cluster
It is possible either ways.Consider all possible options , choose an option which suits you best. Lets assume you have a source, raw data in csv or some DB(Oracle) and you want to do your ETL stuff and load it back to some different datastores
1) Use kafka connect to produce your data to kafka topics.
Have a consumer which would consume off of these topics(could Kstreams, Ksql or Akka, Spark).
Produce back to a kafka topic for further use or some datastore, any sink basically
This has the benefit of ingesting your data with little or no code using kafka connect as it is easy to set up kafka connect source producers.
2) Write custom producers, do your transformations in producers before
writing to kafka topic or directly to a sink unless you want to reuse this produced data
for some further processing.
Read from kafka topic and do some further processing and write it back to persistent store.
It all boils down to your design choice, the thoughput you need from the system, how complicated your data structure is.

User topic management using Kafka Stream Processor API

I have just started my hands dirty with kafka. I have gone through this. It only says data/topic management for kafka stream DSL. Can anyone share any link for same sort of data management for Processor API of kafka stream? I am specially interested for user and internal topic management of Processor API.
TopologyBuilder builder = new TopologyBuilder();
// add the source processor node that takes Kafka topic "source-topic" as input
builder.addSource("Source", "source-topic")
From where to populate this source topic with input data before stream processor starts consuming the same?
In short, can we write to kafka "Source" topic using streams, like as producer writes to topic? Or is stream only for parallel consumption of topic?
I believe we should as "Kafka's Streams API is built on top of Kafka's producer and consumer clients".
Yes, you have to use a KafkaProducer to generate inputs for the source topics which feeds the KStream.
But, the intermediate topics can be populated, via
KafkaStreams#to
KafkaStreams#through
You can use JXL(Java Excel API) to write a producer which writes to a kafka topic from an excel file.
Then create a kafka streams application to consume that topic and produce to another topic.
And you can use context.getTopic() to get topic from which the processor is receiving.
Then set multiple if statements to call the process logic for that topic inside the process() function.

Kafka topic alias

Is it possible to create an alias of a topic name?
Or, put another way...
If a user writes to topic examplea is it possible to override that at the broker so they actually write to topic exampleb?
alternatively, if the topic was actually written as examplea, but the consumer can refer to it as exampleb.
I'm thinking it could probably be achieved using small hack at the broker where it replies to metadata requests, but I'd rather not if it can be done in some standard way.
Aliases are not natively supported in Kafka.
One workaround could be to produce to examplea and have a consumer/producer pair that consumers from examplea and produces to exampleb. The consumer/producer pair could be written with Kafka clients, as a connector in Connect, as a MirrorMaker instance (though you'll need to modify it to change the topic name), or as a Kafka Streams job. Note that the messages will appear in exampleb slightly after examplea because they're being copied after being written.
How are you writing to the Kafka topic because if it's via a REST proxy you should be able to rewrite the topic portion of the URL using NGINX or a similar reverse proxy intermediary.