How to enable Kafka sink connector to insert data from topics to tables as and when sink is up - apache-kafka

I have developed kafka-sink-connector (using confluent-oss-3.2.0-2.11, connect framework) for my data-store (Amppol ADS), which stores data from kafka topics to corresponding tables in my store.
Every thing is working as expected as long as kafka servers and ADS servers are up and running.
Need a help/suggestions about a specific use-case where events are getting ingested in kafka topics and underneath sink component (ADS) is down.
Expectation here is Whenever a sink servers comes up, records that were ingested earlier in kafka topics should be inserted into the tables;
Kindly advise how to handle such a case.
Is there any support available in connect framework for this..? or atleast some references will be a great help.

SinkConnector offsets are maintained in the _consumer_offsets topic on Kafka against your connector name and when SinkConnector restarts it will pick messages from Kafka server from the previous offset it had stored on the _consumer_offsets topic.
So you don't have to worry anything about managing offsets. Its all done by the workers in the Connect framework. In your scenario you go and just restart your sink connector. If the messages are pushed to Kafka by your source connector and are available in the Kafka, sink connector can be started/restarted at any time.

Related

MongoDB Atlas Source Connector Single Topic

I am using Confluent MongoDB Atlas Source Connector to pull data from MongoDB collection to Kafka. I have noticed that the connector is creating multiple topics in the Kafka Cluster. I need the data to be available on one topic so that the consumer application can consume the data from the topic. How can I do this?
Besides, why the Kafka connector is creating so many topics? isn't is difficult for consumer applications to retrieve the data with that approach?
Kafka Connect creates 3 internal topics for the whole cluster for managing its own workload. You should never need/want external consumers to use these
In addition to that, connectors can create their own topics. Debezium for example creates a "database history topic", and again, this shouldn't be read outside of the Connect framework.
Most connectors only need to create one for the source to pull data into, which is what consumers actually should care about

How to make a Data Pipeline from MQTT to KAFKA Broker to MongoDB?

How can I make a data pipeline, I am sending data from MQTT to KAFKA topic using Source Connector. and on the other side, I have also connected Kafka Broker to MongoDB using Sink Connector. I am having trouble making a data pipeline that goes from MQTT to KAFKA and then MongoDB. Both connectors are working properly individually. How can I integrate them?
here is my MQTT Connector
MQTT Connector
Node 1 MQTT Connector
Message Published from MQTT
Kafka Consumer
Node 2 MongoDB Connector
MongoDB
that is my MongoDB Connector
MongoDB Connector
It is hard to tell what exactly the problem is without more logs, please provide your connect.config as well, please check /status of your connector, I still did not understand exactly what the issue you are facing, you are saying that , MQTT SOURCE CONNECTOR sending messages successfully to KAFKA TOPIC and your MONGO DB SINK CONNECTOR successfully reading this KAFKA TOPIC and write to your mobgodb, hence your pipeline, Where is the error? Is your KAFKA is the same KAFKA? Or separated different KAFKA CLUSTERS? Seems like both localhost, but is it the same machine?
Please elaborate and explain what are you expecting? What does "pipeline" means in your word?
You need both connectors to share same kafka cluster, what does node1 and node2 mean is it seperate kafka instance? Your connector need to connect to the same kafka "node" / cluster in order to share the data inside the kafka topic one for input and one for output, share your bootstrap service parameters, share your server.properties as well of the kafka
In order to run two different connect clusters inside same kafka , you need to set in different internal topics for each connect cluster
config.storage.topic
offset.storage.topic
status.storage.topic

Kafka Connect writes data to non-existing topic

Does Kafka Connect creates the topic on the fly if it doesn't exist (but provided as a destination) or fails to copy messages to it?
I need to create such topics on the fly or programmatically (Java API) at least, not manually using scripts.
I searched this info, but it seems topics have to be already created before migration
Kafka Connect doesn't really control this.
There's a setting in Kafka that enables/disables automatic topic creation.
If this is turned on - Kafka Connect will create its' own topics, if not - you have to create them yourselves.
By default, Kafka will not create a new topic when a consumer subscribes to a non-existing topic. you should enable the auto.create.topics.enable=truein your Kafka server configuration file which enables auto-creation of topics on the server.
Once you turn on this feature Kafka will automatically create topics on the fly. When an application tries to connect to a non-existing topic, Kafka will create that topic automatically.

Kafka exactly once with other destination

I am using Kafka 2 and looks like exactly once is possible with
Kafka Streams
Kafka read/transform/write transactional producer
Kafka connect
Here, all of the above works between topics (source and destination is topic).
Is it possible to have exactly once with other destinations?
Source and destinations (sinks) of Connect are not only topics, but which Connector you use determines the delivery semantics, not all are exactly once
For example, a JDBC Source Connector polling a database might miss some records
Sink Connectors coming out of Kafka will send every message from a topic, but it's up to the downstream system to acknowledge that retrieval

Apache Kafka Consumer-Producer Confusion

I know about what is Producer and Consumer. But official documentation says
It is streaming platform.
It is enterprise messaging system.
Kafka has connectors which are import and export data from databases and other system also.
What does it mean?
I know Producers are client applications which send data to Kafka Broker and Consumers are also client applications which read data from Kafka Broker.
But my question is, can a Consumer push data into Kafka Broker?
And as per my knowledge, I assume that if Consumer wants to push data into Kafka Broker, it becomes a Producer. Is that correct?
1.It is a streaming platform.
It is used for distribution of a data on a public-subscriber model with a storage layer and processing layer.
2.It is an enterprise messaging system.
Big Data infrastructure is open source, so big data market cost per year approximately $40B and may be increased day by day. So it has come to host of hardware. Despite the open source nature of much of his software, there's a lot of money to be made.
3.Kafka has connectors which are import and export data from databases
and other systems also.
Kafka connect provides connectors i.e. Source connector, Sink Connector, JDBC Connector. It provides a facility to importing data from sources and exporting it to multiple targets.
Producers: It can only push data to a Kafka broker or we can say publish data.
Consumers: It can only pull data from the Kafka broker.
A producer produces/puts/publishes messages and as consumer consumes/gets/reads messages.
A consumer can only read, when you want to write you need a producer. A consumer cannot become a producer.
A producer only push data to a Kafka broker.
A consumer only pull data from a Kafka broker.
However, you can have a program being both, a producer and a consumer.