We have implemented a kafka sink connector for out product named Ampool ADS which ingestt data from kafka topics to corresponding ampool tables. Topics and Tables are mapped by their names.
I Need to handle individual topic (ingestion from topic ---> table) into a dedicated sink task.
So for example, if my config contains 3 different topics (topic1, topic2, topic3), Sink connector should create 3 different sink tasks, each doing (per table) dedicated ingestion to their respective/mapped table in parallel.
NOTE: Reason behind handling an individual topic into a dedicated sink task is its simple to use RetriableException mechanism if specific table are offline/not-created. Only a individual topic/table records will get replayed after configured time-interval.
Is this possible with kafka connect framework, if so how..?
If you set the number of tasks to be equal to the number of partitions (and I think you can do this from the connector code - when creating the configuration), then each task will get exactly one partition.
Related
I'm using kafka to send the multiple collections data into a single topic using mongo source connector and upsert the data into different oracle tables using jbcsink connector.
In mongo source connector,we are appending respective collection name for all records to process the information at sink side based on the collection name.
is that possible using jdbcsink connectors? can we do this via node .js/ spring boot as a consumer application to split the topic message and write it into different collections?
EX: Collection A,collection B collection C - MongoSourceconnector
Table A,Table B,Table c- Jdbcsinkconnector
Collection A 's data has to map to table A, likewise for the remaining.
The JDBC Sink will only write to a table that is based on the name of the topic, by default. You'd need to rename the topic at runtime to write data from one topic to other tables.
can we do this via node .js/ spring boot as a consumer application to split the topic message and write it into different collections?
Sure, you can use Kafka Streams (Spring Cloud Streams) to branch the data into different topics before the sink connector would read them.
I am using the Schema Registry with RecordNameStrategy naming policy so I have events with totally different avro schemas into the same Kafka topic.
I am doing that as I want to group logically related events that may have different data structures under a same topic to keep order for these data.
For instance:
user_created event and user_mail_confirmed event might have different schemas but it's important to keep them into a same topic partition to guarantee order for consumers.
I am trying to sink these data, coming from a single topic, into GCS in multiple paths (one path for each schema)
Does someone know if the Confluent Kafka connect GCS Sink connector (or any other connector) provide us with that feature please ?
I haven't used GCS connector, but I suppose that this is not possible with Confluent connectors in general.
You should probably copy your source topic with different data structures to a new set of topics, where data have common data structure. This is possible with ksqlDB (check an example) or Kafka Streams application. Then, you can create connectors for these topics.
Alternatively, you can use RegexRouter transformation with a set of predicates based on the message headers.
I was looking for a documentation about where KSQL storage querys and tables. For example, since KSQL was built to work with Kafka, when I create a table from a topic, or when I write a query, where are stored the tables or the query results? More specifically, does KSQL use some kind of pointers to events inside segments inside the topic partitions, or it duplicates the events when I create a table from a topic, for example?
The queries that have been ran or are active are persisteted back into a Kafka topic.
A Select statement has no persistent state - it acts as a consumer
A Create Stream/Table command will create potentially many topics, resulting in duplication, manpulation, and filtering of the input topic out to a given destination topic. For any stateful operations, results would be stored in a RocksDB instance on the KSQL server(s).
Since KSQL is built on Kafka Streams, you can refer to the wiki on Kafka Streams Internal Data Management
I want to stream data from a particular Kafka topic into two distinct databases (MySQL and SQL Server). Every stream of data should be sent into both tables in both databases. What configuration is required in sink connectors in order to achieve this goal?
Create two JDBC Sink connectors, using the same source topic. They'll function independently, and each send the messages from the specified topic to the target RDBMSs.
I have tried to send the information of a Kafka Connnect instance in distributed mode with one worker to a specific topic, I have the topic name in the "archive.properties" file that use when I launch the instance.
But, when I send five or more instances, I see the messages merged in all topics.
The "solution" I thought was make a map to store the relation between ID and topic but it doesn't worked
Is there an specific Kafka connect implementation to do this?
Thanks.
First, details on how you are running connect and which connector you are using will be very helpful.
Some connectors support sending data to more than one topic. For example, confluent-jdbc-sink will send each table to a separate topic. So this could be a limitation of the connector you are using.
Also depending on the connector and your use case - whether you need to run more than one connector. With the JDBC connector, you need one connector per database and it will handle all the tables. If you run two connectors on the same database and same tables, you'll get duplicates.
In short hopefully your connector has helpful documentation.
In the next release of Apache Kafka we are adding Single Message Transformations. One of the transformations can modify the target topic based on data in the event - so you can use the transformation to perform event routing.