oracle Golden gate for big data kafka adapter grouping data to kafka

oracle Golden gate for big data kafka adapter grouping data to kafka - apache-kafka

Source: Oracle Database
Target: kafka
Moving data from source to target by oracle golden adapter for big data. Problem is data is moving fine but when am inserting 5 records its going as one file in topic.
I want to group it. If am making 5 insert i need five separate entries in topic(kafka)
kafka handler, version gg for big data 12.3.1
Am inserting five records in source and in khafka am getting all inserts like below
{"table":"MYSCHEMATOPIC.ELASTIC_TEST","op_type":"I","op_ts":"2017-10-24 08:52:01.000000","current_ts":"2017-10-24T12:52:04.960000","pos":"00000000030000001263","after":{"TEST_ID":2,"TEST_NAME":"Francis","TEST_NAME_AR":"Francis"}}
{"table":"MYSCHEMATOPIC.ELASTIC_TEST","op_type":"I","op_ts":"2017-10-24 08:52:01.000000","current_ts":"2017-10-24T12:52:04.961000","pos":"00000000030000001437","after":{"TEST_ID":3,"TEST_NAME":"Ashfak","TEST_NAME_AR":"Ashfak"}}
{"table":"MYSCHEMATOPIC.ELASTIC_TEST","op_type":"U","op_ts":"2017-10-24 08:55:04.000000","current_ts":"2017-10-24T12:55:07.252000","pos":"00000000030000001734","before":{"TEST_ID":null,"TEST_NAME":"Francis"},"after":{"TEST_ID":null,"TEST_NAME":"updatefrancis"}}
{"table":"MYSCHEMATOPIC.ELASTIC_TEST","op_type":"D","op_ts":"2017-10-24 08:56:11.000000","current_ts":"2017-10-24T12:56:14.365000","pos":"00000000030000001865","before":{"TEST_ID":2}}
{"table":"MYSCHEMATOPIC.ELASTIC_TEST","op_type":"U","op_ts":"2017-10-24 08:57:43.000000","current_ts":"2017-10-24T12:57:45.817000","pos":"00000000030000002152","before":{"TEST_ID":3},"after":{"TEST_ID":4}}

I would recommend using the Kafka Connect Handler, since it then registers the data's schema with the Confluent Schema Registry, making it much easier to stream onwards to targets such as Elasticsearch (using Kafka Connect).
In Kafka each record from Oracle will be one Kafka message.

Made below in .props file
gg.handler.kafkahandler.mode=op.
And it worked!!

Related

Streaming Database(mariadb) Changes into another database table using apache-kafka and debezium connector

What I am aiming to do is to Stream data changes into new database table using apache-kafka along with debezium-connectors. But I don't have the slightest idea to how to achieve it. Although I know how to start kafka-zookeeper ,creating topics , and subscribe to that topic . And I am unfamiliar with all the next steps .How to achieve data streaming and capture that data into new database table using Change Data Capture(CDC)?

Debezium only sources data into Kafka. Won't read from Kafka or write to a new database.
You can refer an old blog post of theirs using the JDBC Sink Kafka Connector to write to a new server
https://debezium.io/blog/2017/09/25/streaming-to-another-database/

How do we check number of records are loaded so far onto db from Kafka topic?

I'm trying to load data from Kafka topic to Postgres using Jdbc sink connector . Now, how do we know the number of records are loaded so far into Postgres. As of now I keep on checking number of records in db using sql query. Is there any other way I can know about it?

Kafka Connect doesn't track this. I see nothing wrong with SELECT COUNT(*) on the table, however this doesn't exclude other processes writing to that table as well

it is not possible in KAFKA. Because once you have sinked the records into the target DB, KAFKA is already done its job. But you can track number of records that you are updating using SINK Record Collections write into your local file or insert into a KAFKA State store.

How can i publish and consume changes from thousands of tables from a RDBMS with Apache Nifi

You have a RDBMS with thousands of tables (~9.000). Changes in these tables should be published to other consumers through Apache Kafka. I know Apache Nifi as a data routing and transformation tool from nearly any source. And i think about a graphical interface where i can select a table/object from the source system to publish the records to Apache Kafka. But how would a Change Data Capture of so many tables look like in Apache Nifi and also within Apache Kafka? Would i have to create ~9.000 processors (one for each table) and publish each of the tables to a dedicated topic per table? Or is there any more elegant way to do this?

Streaming database data to Kafka topic without using a connector

I have a use case where I have to push all my MySQL database data to a Kafka topic. Now, I know I can get this up and running using a Kafka connector, but I want to understand how it all works internally without using a connector. In my spring boot project I already have created a Kafka Producer file where I set all my configuration, create a Producer record and so on.
Has anyone tried this approach before? Can anyone throw some light on this?

Create entity using spring jpa for tables and send data to topic using find all. Use scheduler for fetching data and sending it to topic. You can add your own logic for fetching from DB and also a different logic for sending it to Kafka topic. Like fetch using auto increment, fetch using last updated timestamp or a bulk fetch. Same logic of JDBC connectors can be implemented.
Kakfa Connect will do it in an optimized way.

Build a solution for Kafka+Spark for RDBMS data

My current project is in MainFrames with DB2 as its database. We have 70 databases with nearly 60 tables in each of them. Our architect proposed a plan of using Kafka with Spark streaming for processing data. How good is Kafka in reading the RDBMS tables for data ? Do we directly read the data from the tables using Kafka or is there any other way to get the data from RDBMS into Kafka ?
If there is any better solution, your suggestions can help a lot.

Do not directly read from database, it will create additional load. I would suggest two approaches.
Send new data both to databases and to Kafka, or send it to Kafka and then consume for processing.
Read data from database write ahead log (I know it is possible for MySQL with Maxwell but I am not sure for DB2) and send it to Kafka for further processing.
You can use Spark Streaming or Kafka Streams depending on your needs.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

oracle Golden gate for big data kafka adapter grouping data to kafka - apache-kafka

I would recommend using the Kafka Connect Handler, since it then registers the data's schema with the Confluent Schema Registry, making it much easier to stream onwards to targets such as Elasticsearch (using Kafka Connect). In Kafka each record from Oracle will be one Kafka message.

Made below in .props file gg.handler.kafkahandler.mode=op. And it worked!!

Related

Streaming Database(mariadb) Changes into another database table using apache-kafka and debezium connector

How do we check number of records are loaded so far onto db from Kafka topic?

How can i publish and consume changes from thousands of tables from a RDBMS with Apache Nifi

Streaming database data to Kafka topic without using a connector

Build a solution for Kafka+Spark for RDBMS data

Categories

Resources