Read from single kafka topic and upsert into multiple oracle tables - apache-kafka

I'm using kafka to send the multiple collections data into a single topic using mongo source connector and upsert the data into different oracle tables using jbcsink connector.
In mongo source connector,we are appending respective collection name for all records to process the information at sink side based on the collection name.
is that possible using jdbcsink connectors? can we do this via node .js/ spring boot as a consumer application to split the topic message and write it into different collections?
EX: Collection A,collection B collection C - MongoSourceconnector
Table A,Table B,Table c- Jdbcsinkconnector
Collection A 's data has to map to table A, likewise for the remaining.

The JDBC Sink will only write to a table that is based on the name of the topic, by default. You'd need to rename the topic at runtime to write data from one topic to other tables.
can we do this via node .js/ spring boot as a consumer application to split the topic message and write it into different collections?
Sure, you can use Kafka Streams (Spring Cloud Streams) to branch the data into different topics before the sink connector would read them.

Related

Is it a good practice to use the exiting topic for multiple connectors?

I am using the Debezium PostgreSQL connector to get the users table into a Kafka Topic.
I have a JDBC Sink Connector connector that then reads the data from the topic and pushes it into it's own Database.
Now, I need a subset of the data for another Microservice Database. So I am planning to write another JDBC Sink Connector.
The Question: is it a good practice to use the existing users table topic? If yes, then how I can make sure that new JDBC connector get's a snapshot of entire users table
 
If Debezium snapshotted the table and data hasn't been lost in the topic due to retention, then that's what any sink or other consumer will read.
Any unique sink connector name will read unique offsets from its topic. Nothing bad will happen with multiple consumers reading the same topic; this is how Kafka is intended to be used.
You may need to ensure consumer.auto.offset.reset=earliest for connect to read from the start of the topic
To get a subset of fields, you'll need to "replace" them - https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#replacefield

KSQL query and tables storage

I was looking for a documentation about where KSQL storage querys and tables. For example, since KSQL was built to work with Kafka, when I create a table from a topic, or when I write a query, where are stored the tables or the query results? More specifically, does KSQL use some kind of pointers to events inside segments inside the topic partitions, or it duplicates the events when I create a table from a topic, for example?
The queries that have been ran or are active are persisteted back into a Kafka topic.
A Select statement has no persistent state - it acts as a consumer
A Create Stream/Table command will create potentially many topics, resulting in duplication, manpulation, and filtering of the input topic out to a given destination topic. For any stateful operations, results would be stored in a RocksDB instance on the KSQL server(s).
Since KSQL is built on Kafka Streams, you can refer to the wiki on Kafka Streams Internal Data Management

How to setup multiple Kafka JDBC sink connectors for a single topic

I want to stream data from a particular Kafka topic into two distinct databases (MySQL and SQL Server). Every stream of data should be sent into both tables in both databases. What configuration is required in sink connectors in order to achieve this goal?
Create two JDBC Sink connectors, using the same source topic. They'll function independently, and each send the messages from the specified topic to the target RDBMSs.

Using kafka streams to create a table based on elasticsearch events

Is it possible to use Kafka streaming to create a pipeline that reads JSON from a Kafka topic and then do some logic with them and send the results to another Kafka topic or something else?
For example, I populate my topic using logs from elasticsearch. That is pretty easy using a simple logstash pipeline.
Once I have my logs in the kafka topic, I want to extract some pieces of information from the log and put them in a sort of "table" with N column(is Kafka capable of this?) and then put the table somewhere else (another topic or a db).
I didn't find any example that satisfies my criteria.
thanks
Yes, it's possible.
There is no concept of columns in kafka or kafka-streams. However, you typically just define a plain old java object of your choice, with the fields that your want (fields being the equivalent of columns in this case). You produce the output in that format to an output topic (using an appropriately chosen serializer). Finally, if you want to store the result in a relational database, you map the fields into columns, typically using a kafka connect jdbc sink:
http://docs.confluent.io/current/connect/connect-jdbc/docs/sink_connector.html

Kafka sink connect - How to create one task per topic(table)

We have implemented a kafka sink connector for out product named Ampool ADS which ingestt data from kafka topics to corresponding ampool tables. Topics and Tables are mapped by their names.
I Need to handle individual topic (ingestion from topic ---> table) into a dedicated sink task.
So for example, if my config contains 3 different topics (topic1, topic2, topic3), Sink connector should create 3 different sink tasks, each doing (per table) dedicated ingestion to their respective/mapped table in parallel.
NOTE: Reason behind handling an individual topic into a dedicated sink task is its simple to use RetriableException mechanism if specific table are offline/not-created. Only a individual topic/table records will get replayed after configured time-interval.
Is this possible with kafka connect framework, if so how..?
If you set the number of tasks to be equal to the number of partitions (and I think you can do this from the connector code - when creating the configuration), then each task will get exactly one partition.