Kafka exactly once with other destination - apache-kafka

I am using Kafka 2 and looks like exactly once is possible with
Kafka Streams
Kafka read/transform/write transactional producer
Kafka connect
Here, all of the above works between topics (source and destination is topic).
Is it possible to have exactly once with other destinations?

Source and destinations (sinks) of Connect are not only topics, but which Connector you use determines the delivery semantics, not all are exactly once
For example, a JDBC Source Connector polling a database might miss some records
Sink Connectors coming out of Kafka will send every message from a topic, but it's up to the downstream system to acknowledge that retrieval

Related

Kafka Connector To read from a Topic and write to a topic

I want to build a Kafka connector which needs to read from the Kafka topic and make a call to the GRPC service to get some data and write the whole data into another kafka topic.
I have written a Kafka Sink connector which reads from a topic and called a GRPC service. But not sure how to redirect this data into a Kafka topic.
Kafka Streams can read from topics, call external services as necessary, then forward this data to a new topic in the same cluster.
MirrorMaker2 can be used between different clusters, but using Connect transforms is generally not recommended with external services.
Or you could make your gRPC service into a Kafka producer.

Kafka JMS Source Connector write message to more topics

I have an ActiveMQ Artemis JMS queue and there is a Kafka Source Connector. I want to write message from this queue to multiple topics parallel. I found that the Simple Message Transform could be the solution. I tried to configure RegexRouter, but I can only change the name of the topic?
I tried to create 3 connector instances but only one can receive message. I guess because message is deleted from queue at the first read.

Apache Kafka Consumer-Producer Confusion

I know about what is Producer and Consumer. But official documentation says
It is streaming platform.
It is enterprise messaging system.
Kafka has connectors which are import and export data from databases and other system also.
What does it mean?
I know Producers are client applications which send data to Kafka Broker and Consumers are also client applications which read data from Kafka Broker.
But my question is, can a Consumer push data into Kafka Broker?
And as per my knowledge, I assume that if Consumer wants to push data into Kafka Broker, it becomes a Producer. Is that correct?
1.It is a streaming platform.
It is used for distribution of a data on a public-subscriber model with a storage layer and processing layer.
2.It is an enterprise messaging system.
Big Data infrastructure is open source, so big data market cost per year approximately $40B and may be increased day by day. So it has come to host of hardware. Despite the open source nature of much of his software, there's a lot of money to be made.
3.Kafka has connectors which are import and export data from databases
and other systems also.
Kafka connect provides connectors i.e. Source connector, Sink Connector, JDBC Connector. It provides a facility to importing data from sources and exporting it to multiple targets.
Producers: It can only push data to a Kafka broker or we can say publish data.
Consumers: It can only pull data from the Kafka broker.
A producer produces/puts/publishes messages and as consumer consumes/gets/reads messages.
A consumer can only read, when you want to write you need a producer. A consumer cannot become a producer.
A producer only push data to a Kafka broker.
A consumer only pull data from a Kafka broker.
However, you can have a program being both, a producer and a consumer.

How to enable Kafka sink connector to insert data from topics to tables as and when sink is up

I have developed kafka-sink-connector (using confluent-oss-3.2.0-2.11, connect framework) for my data-store (Amppol ADS), which stores data from kafka topics to corresponding tables in my store.
Every thing is working as expected as long as kafka servers and ADS servers are up and running.
Need a help/suggestions about a specific use-case where events are getting ingested in kafka topics and underneath sink component (ADS) is down.
Expectation here is Whenever a sink servers comes up, records that were ingested earlier in kafka topics should be inserted into the tables;
Kindly advise how to handle such a case.
Is there any support available in connect framework for this..? or atleast some references will be a great help.
SinkConnector offsets are maintained in the _consumer_offsets topic on Kafka against your connector name and when SinkConnector restarts it will pick messages from Kafka server from the previous offset it had stored on the _consumer_offsets topic.
So you don't have to worry anything about managing offsets. Its all done by the workers in the Connect framework. In your scenario you go and just restart your sink connector. If the messages are pushed to Kafka by your source connector and are available in the Kafka, sink connector can be started/restarted at any time.

Kafka source vs Avro source for reading and writing data into kafka channel using flume

In flume, I have Kafka-channel from where I can read and write data.
What is the difference between the performance of reading and writing data into Kafka channel if I replace Kafka source and Kafka sink with Avro source and Avro sink?
In my opinion, by replacing Kafka-source with Avro-source, I will be unable to read data in parallel from multiple partitions of Kafka broker, as there is no consumer group specified in case of Avro-source. Please correct me if I am wrong.
In Flume, the Avro RPC source binds to a specified TCP port of a network interface, so only one Avro source of one of the Flume agents running on a single machine can ever receive events sent to this port.
Avro source is meant to connect two or more Flume agents together: one or more Avro sinks connect to a single Avro source.
As you point out, using Kafka as a source allows for events to be received by several consumer groups. However, my experience with Flume 1.6.0 is that it is faster to push events from one Flume agent to another on a remote host through Avro RPC rather than through Kafka.
So I ended up with the following setup for log data collection:
[Flume agent on remote collected node] =Avro RPC=> [Flume agent in central cluster] =Kafka=> [multiple consumer groups in central cluster]
This way, I got better log ingestion and processing throughput and I also could encrypt and compress log data between remote sites and central cluster. This may however change when Flume adds support for the new protocol introduced by Kafka 0.9.0 in a future version, possibly making Kafka more usable as the front interface of the central cluster with remote data collection nodes (see here).