Kafka connect confluent jdbc does not control session pool in MSSQL database - apache-kafka

I am working with Kafka connect and confluent jdbc. Integrate a source connector with Mssql and a few days ago the operating area warned us that there is a high number of sessions in the "sleeping" state in the database. I need to control those sessions but apparently the connector (confluent jdbc) doesn't have those properties in its configuration.
Do you have any ideas to correct this problem?

Kafka Connect will run a minimum of one task per connector. Each connector is isolated from the other and other than sharing a runtime environment is isolated from the others.
Therefore if you have 27 connectors sourcing from the same database, you will have a minimum of 27 connections to the database.
If you can't reduce the number of connectors (e.g. by have one connector pull from multiple tables), then the only option I think you have is to speak to your DBA about enforcing some kind of resource management on the RDBMS side. For example, on Oracle the Resource Manager option can be used for this.

Related

Custom Connector for Apache Kafka

I am looking to write a custom connector for Apache Kafka to connect to SQL database to get CDC data. I would like to write a custom connector so I can connect to multiple databases using one connector because all the marketplace connectors only offer one database per connector.
First question: Is it possible to connect to multiple databases using one custom connector? Also, in that custom connector, can I define which topics the data should go to?
Second question: Can I write a custom connector in .NET or it has to be Java? Is there an example that I can look at for custom connector for CDC for a database in .net?
There are no .NET examples. The Kafka Connect API is Java only, and not specific to Confluent.
Source is here - https://github.com/apache/kafka/tree/trunk/connect
Dependency here - https://search.maven.org/artifact/org.apache.kafka/connect-api
looking to write a custom connector ... to connect to SQL database to get CDC data
You could extend or contribute to Debezium, if you really wanted this feature.
connect to multiple databases using one custom connector
If you mean database servers, then not really, no. Your URL would have to be unique per connector task, and there isn't an API to map a task number to a config value. If you mean one server, and multiple database schemas, then I also don't think that is really possible to properly "distribute" within a single connector with multiple tasks (thus why database.names config in Debezium only currently supports one name).
explored debezium but it won't work for us because we have microservices architecture and we have more than 1000 databases for many clients and debezium creates one topic for each table which means it is going to be a massive architecture
Kafka can handle thousands of topics fine. If you run the connector processes in Kubernetes, as an example, then they're centrally deployable, scalable, and configurable from there.
However, I still have concerns over you needing all databases to capture CDC events.
Was also previously suggested to use Maxwell

Kafka Connect instead of Flume Ingestion

I have been looking into the concepts and application of Kafka Connect, and I have even touched one project based on it in one of my intern. Now in my working scenario, now I am considering replacing the architecture of the our real time data ingestion platform which is currently based on flume -> Kafka with Kafka Connect and Kafka.
The reason why I am considering the switch can be concluded mainly into:
But if we use flume we need to install the agent on each remote machine which generates tons of workload for further devops, especially at the place where I am working where the authority of machines is managed in a rigid way that maintaining utilities on machines belonging to other departments.
Another reason for the consideration is that the machines' os environment varies, if we install flumes on a variety of machines , some machine with different os and jdks(I have met some with IBM jdk) just cannot make flume work well which in worst case can result in zero data ingestion.
It looks with Kafka Connect we can deploy it in a centralized way with our Kafka cluster so that the develops cost can go down. Beside, we can avoid installing flumes on machines belonging to others and avoid the risk of incompatible environment to ensure the stable ingestion of data from every remote machine.
Besides, the most ingestion scenario is only to ingest real-time-written log text file on remote machines(on linux and unix file system) into Kafka topics, that is it. So I won't need advanced connectors which is not supported in apache version of Kafka.
But I am not sure if I am understanding the usage or scenario of Kafka Connect the right way. Also I am wondering if Kafka Connect should be deployed on the same machine with the data source machines or if it is ok they resides on different machines. If they can be different then why flume requires the agent to be run on the same machine with the data source? So I wish someone more experienced can give me some lights on that.
Is Kafka Connect appropriate for ingesting data to Kafka? yes
Does Kafka Connect run local to the data source? only if it has to (e.g. reading a local file with Kafka Connect spooldir plug, FilePulse plugin, etc ).
Should you rip out something that works and replace it with Kafka Connect? not unless it's fixing a problem that you have
If you're not using either yet, should you use Kafka Connect instead of Flume? Quite possibly.
Learn more about Kafka Connect here: https://dev.to/rmoff/crunchconf-2019-from-zero-to-hero-with-kafka-connect-81o
For file ingest alone there's other tools too like Filebeat too

What are the proper properties for Kafka connector when using Oracle DB

I am learning about Kafka Connect and would like to use Oracle as my database.
I am having trouble with the properties.
Is there any setting/property that I am missing in order to fix this error?
According the docs:
The JDBC source and sink connectors use the Java Database Connectivity (JDBC) API that enables applications to connect to and use a wide range of database systems. In order for this to work, the connectors must have a JDBC Driver for the particular database systems you will use.
So in your case you have to install Oracle Database driver.

Is there a way to connect to multiple databases in multiple hosts using Kafka Connect?

I have a need to get data from Informix database using Kafka Connect. The scenario is this - I have 50 Informix Databases residing in 50 hosts. What I have understood by reading from Kafka connect is that we need to install the Kafka connect in each hosts to get the data from the database residing in that host. My question is this - Is there a way in which I can create the connectors centrally for these 50 hosts instead of installing into each of them and pull data from the databases?
Kafka Connect JDBC does not have to run on the database, just as other JDBC clients don't, so you can a have a Kafka Connect cluster be larger or smaller than your database pool.
Informix seems to have a thing called "CDC Replication Engine for Kafka", however, which might be something worth looking into, as CDC overall causes less load on the database
You don’t need any additional software installation on the system where Informix server is running.I am not fully clear about the question or the type of operation you are plan to do. If you are planning to setup a real time replication type of scenario, then you may have to invoke CDC API. Then one-time setup of CDC API at server is needed, then this APIs can be invoked using any Informix database driver API. If you are plan to read existing data from table(s) and pump into Kafka topic, then no need of any additional setup at server side. You could connect to all 50 database server(s) from a single program (remotely) and then pump those records to the Kafka topic(s). Base on the program language you are using you may choose Informix database driver.

Couchdb changes to Apache Kafka

I want to have all of the changes of a couchdb database in kafka at application run time as they arrive. Is there any reliable existing tool for that?
You may try to use Kafka Connect tool. Also, Confluent Platform provides long list of different connectors for Kafka Connect.
I'm not a CouchDB user, but you may choose one of applicable source connectors here or create your own Kafka CouchDB source connector.