Not able to use Kafka's JdbcSourceConnector to read data from Oracle DB to kafka topic - apache-kafka

I am trying to write a standalone java program using kafka-jdbc-connect API to stream data from oracle-table to kafka topic.
API used: I'm currently trying to use Kafka Connectors, JdbcSourceConnector class to be precise.
Constraint: Use Confluent Java API and not do it through CLI or by executing provided shell script.
What I did: create an instance of JdbcSourceConnector.java class and call start(Properties) method of this class by providing the Properties object as a parameter. This properties object has database connection properties, table whitelist property, topic prefix etc.
After starting thread, i'm unable to read the data from "topic-prefix-tablename" topic. I am not sure how to pass Kafka Broker details to JdbcSourceConnector. Calling start() method on JdbcSourceConnector starting thread but not doing anything.
Is there a simple java API tutorial page/example code i can refer because all the examples i see are using CLI/shell scripts?
Any help is appreciated
Code:
public static void main(String[] args) {
Map<String, String> jdbcConnectorConfig = new HashMap<String, String>();
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.CONNECTION_URL_CONFIG, "<DATABASE_URL>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.CONNECTION_USER_CONFIG, "<DATABASE_USER>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.CONNECTION_PASSWORD_CONFIG, "<DATABASE_PASSWORD>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.POLL_INTERVAL_MS_CONFIG, "300000");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.BATCH_MAX_ROWS_CONFIG, "10");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.MODE_CONFIG, "timestamp");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.TABLE_WHITELIST_CONFIG, "<TABLE_NAME>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.TIMESTAMP_COLUMN_NAME_CONFIG, "<TABLE_COLUMN_NAME>");
jdbcConnectorConfig.put(JdbcSourceConnectorConfig.TOPIC_PREFIX_CONFIG, "test-oracle-jdbc-");
JdbcSourceConnector jdbcSourceConnector = new JdbcSourceConnector ();
jdbcSourceConnector.start(jdbcConnectorConfig);
}

Assuming you are trying to do it in Standalone mode.
In your Application run configuration, your main class should be "org.apache.kafka.connect.cli.ConnectStandalone" and you need to pass two property files as program arguments.
You should also extend "your-custom-JdbcSourceConnector" class with "org.apache.kafka.connect.source.SourceConnector" class
Main Class: org.apache.kafka.connect.cli.ConnectStandalone
Program Arguments: .\path-to-config\connect-standalone.conf .\path-to-config\connetcor.properties
"connect-standalone.conf" file will contain all Kafka broker details.
// Example connect-standalone.conf
bootstrap.servers=<comma seperated brokers list here>
group.id=some_loca_group_id
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=connect.offset
offset.flush.interval.ms=100
offset.flush.timeout.ms=180000
buffer.memory=67108864
batch.size=128000
producers.acks=1
"connector.properties" file will contain all details required to create and start connector
// Example connector.properties
name=some-local-connector-name
connector.class=your-custom-JdbcSourceConnector
tasks.max=3
topic=output-topic
fetchsize=10000
More info here : https://docs.confluent.io/current/connect/devguide.html#connector-example

Related

Messages are not getting consumed

I have added the below configuration in application.properties file of Spring Boot with Camel implementation but the messages are not getting consumed. Am I missing any configuration? Any pointers to implement consumer from Azure event hub using kafka protocol and Camel ?
bootstrap.servers=NAMESPACENAME.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}";
The route looks like this:
from("kafka:{{topicName}}?brokers=NAMESPACENAME.servicebus.windows.net:9093" )
.log("Message received from Kafka : ${body}");
I found the solution for this issue. Since I was using the Spring Boot Auto configuration (camel-kafka-starter), the entry on the application.properties file had to be modified as given below:
camel.component.kafka.brokers=NAMESPACENAME.servicebus.windows.net:9093
camel.component.kafka.security-protocol=SASL_SSL
camel.component.kafka.sasl-mechanism=PLAIN
camel.component.kafka.sasl-jaas-config =org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}";
The consumer route for the Azure event hub with Kafka protocol will look like this:
from("kafka:{{topicName}}")
.log("Message received from Kafka : ${body}");
Hope this solution helps to consume events from Azure event hub in Camel using Kafka protocol

How to consume Kafka messages with a protobuf definition in Apache Beam?

I'm using KafkaIO unbounded source in a Apache Beam pipeline running on DataFlow. Following configuration works for me
Map<String, Object> kafkaConsumerConfig = new HashMap<String, Object>() {{
put("auto.offset.reset", "earliest");
put("group.id", "my.group.id");
}};
p.apply(KafkaIO.<String, String>read()
.withBootstrapServers("ip1:9092,ip2:9092,ip3:9092")
.withConsumerConfigUpdates(kafkaConsumerConfig)
.withTopic("my.topic")
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.withMaxNumRecords(10)
.withoutMetadata())
// do something
Now as I have a protobuf definition for the messages in my topic I would like to use it to convert the kafka records in Java objects.
Following configuration doesn't work and requires a Coder:
p.apply(KafkaIO.<String, Bytes>read()
.withBootstrapServers("ip1:9092,ip2:9092,ip3:9092")
.withConsumerConfigUpdates(kafkaConsumerConfig)
.withTopic("my.topic")
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(BytesDeserializer.class)
.withMaxNumRecords(10)
.withoutMetadata())
Unfortunately, I cannot find out what is the right Value Deserializer + Coder combination and cannot find similar examples in the documentation. Do you have any working examples for using Protobuf with Kafka source in Apache Beam?

Snowflake Kafka connector config issue

I'm following the steps in this guide Snowflake Connector for Kafka
The error message I'm getting is
BadRequestException: Connector config {.....} contains no connector type
I am running the command as
sh kafka_2.12-2.3.0/bin/connect-standalone.sh connect-standalone.properties snowflake_kafka_config.json
my config files are
connect-standalone.properties
bootstrap.servers=localhost:9092
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/Users/kafka_test/kafka
jar file snowflake-kafka-connector-0.5.1.jar is in plugin.path
snowflake_kafka_config.json
{
"name":"Kafka_Test",
"Config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"test",
"snowflake.topic2table.map": "",
"buffer.count.records":"1",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"<url>",
"snowflake.user.name":"<user_name>",
"snowflake.private.key":"<private_key>",
"snowflake.private.key.passphrase":"<pass_phrase>",
"snowflake.database.name":"<db>",
"snowflake.schema.name":"<schema>",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"value.converter.schema.registry.url":"",
"value.converter.basic.auth.credentials.source":"",
"value.converter.basic.auth.user.info":""
}
}
Kafka is running on local, I have a producer and consumer up, can see the data flowing.
This is the same question I answered over on the Confluent community Slack, but I'll post it here for reference too :-)
The connect worker log shows that the connector JAR itself is being loaded, so the 'contains no connector type` is because your config formatting is fubar.
You're running in Standalone mode, but passing in a JSON file which won't. My personal opinion is always use distributed, even if just a single node of it. Check this out if you need a recap on standalone vs distributed : http://rmoff.dev/ksldn19-kafka-connect
If you must use standalone then you need your connector config (snowflake_kafka_config.json) to be a properties file like this:
param1=argument1
param2=argument2
You can see valid JSON examples (if you use distributed mode) here: https://github.com/confluentinc/demo-scene/blob/master/kafka-connect-zero-to-hero/demo_zero-to-hero-with-kafka-connect.adoc#stream-data-from-kafka-to-elasticsearch

There's no avro data in hdfs using kafka connect

I am using kafka connect distribution.
The command is : bin/connect-distributed etc/schema-registry/connect-avro-distributed.properties
The worker configuration is:
bootstrap.servers=kafka1:9092,kafka2:9092,kafka3:9092
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
The kafka connect start over with no errors!
The topic connect-configs,connect-offsets,connect-statuses has been created.
The topic mysiteview has been created.
Then i create kafka connectors using RESTful API like this:
curl -X POST -H "Content-Type: application/json" --data '{"name":"hdfs-sink-mysiteview","config":{"connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector","tasks.max":"3","topics":"mysiteview","hdfs.url":"hdfs://master1:8020","topics.dir":"/kafka/topics","logs.dir":"/kafka/logs","format.class":"io.confluent.connect.hdfs.avro.AvroFormat","flush.size":"1000","rotate.interval.ms":"1000","partitioner.class":"io.confluent.connect.hdfs.partitioner.DailyPartitioner","path.format":"YYYY-MM-dd","schema.compatibility":"BACKWARD","locale":"zh_CN","timezone":"Asia/Shanghai"}}' http://kafka1:8083/connectors
And when i producer data to topic "mysiteview" something like this:
{"f1":"192.168.1.1","f2":"aa.example.com"}
The java code is following:
Properties props = new Properties();
props.put("bootstrap.servers","kafka1:9092");
props.put("acks","all");
props.put("retries",3);
props.put("batch.size", 16384);
props.put("linger.ms",30);
props.put("buffer.memory",33554432);
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String,String>(props);
Random rnd = new Random();
for(long nEvents = 0; nEvents < events; nEvents++) {
long runtime = new Date().getTime();
String site = "www.example.com";
String ipString = "192.168.2." + rnd.nextInt(255);
String key = "" + rnd.nextInt(255);
User u = new User();
u.setF1(ipString);
u.setF2(site+" "+rnd.nextInt(255));
System.out.println(JSON.toJSONString(u));
producer.send(new ProducerRecord<String,String>("mysiteview",JSON.toJSONString(u)));
Thread.sleep(50);
}
producer.flush();
producer.close();
The weird things occured.
I get data from kafka-logs but no data in hdfs(no topic directory).
I try the connector command:
curl -X GET http://kafka1:8083/connectors/hdfs-sink-mysiteview/status
output is:
{"name":"hdfs-sink-mysiteview","connector":{"state":"RUNNING","worker_id":"10.255.223.178:8083"},"tasks":[{"state":"RUNNING","id":0,"worker_id":"10.255.223.178:8083"},{"state":"RUNNING","id":1,"worker_id":"10.255.223.178:8083"},{"state":"RUNNING","id":2,"worker_id":"10.255.223.178:8083"}]}
But when i inspect the task status using following command:
curl -X GET http://kafka1:8083/connectors/hdfs-sink-mysiteview/hdfs-sink-siteview-1
I get the result: "Error 404" . Three tasks is as the same error!
What' going wrong?
Without seeing the worker's log, I'm not sure with which exception exactly your HDFS Connector instances are failing when you use the settings you describe above. However I can spot a few issues with the configuration:
You mention that you start your Connect worker with: bin/connect-distributed etc/schema-registry/connect-avro-distributed.properties. These properties default to having key and value converters set to AvroConverter and require you to run the schema-registry service. If indeed you've edited the configuration in connect-avro-distributed.properties to use the JsonConverter instead, your HDFS connector will probably fail during the conversion of Kafka records to Connect's SinkRecord data type, just before it tries to export your data to HDFS.
Until recently, the HDFS connector was able to export only Avro records, to files of Avro or Parquet format. And that requires using the AvroConverter as mentioned above. The capability to export records to text files as JSON was added recently, and will appear in version 4.0.0 of the connector (you may try this capability by checking-out and building the connector from source).
At this point, my first suggestion would be to try and import your data with bin/kafka-avro-console-producer. Define their schema, confirm that the data are imported successfully with bin/kafka-avro-console-consumer and then set your HDFS Connector to use AvroFormat as above. The quickstart at the connector's page describes a very similar process, and maybe it would be a great starting point for your use case.
maybe you are just using the REST-Api wrong.
According to the documentation the call should be
/connectors/:connector_name/tasks/:task_id
https://docs.confluent.io/3.3.1/connect/restapi.html#get--connectors-(string-name)-tasks-(int-taskid)-status

Getting exception while instantiating KafkaProducer

I am using IBM Bluemix implementation of the Kafka Broker.
I am creating the KafkaProducer with following properties:
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
bootstrap.servers=xxxx.xxxxxx.xxxxxx.xxxxxx.bluemix.net:xxxx
client.id=messagehub
acks=-1
security.protocol=SASL_SSL
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.truststore.location=xxxxxxxxxxxxxxxxx
ssl.truststore.password=xxxxxxxx
ssl.truststore.type=JKS
ssl.endpoint.identification.algorithm=HTTPS
KafkaProducer<byte[], byte[]> kafkaProducer =
new KafkaProducer<byte[], byte[]>(props);
With this I got following exception:
org.apache.kafka.common.KafkaException:
org.apache.kafka.clients.producer.internals.DefaultPartitioner is not
an instance of org.apache.kafka.clients.producer.Partitioner
After reading the following blog:
http://blog.rocana.com/kafkas-defaultpartitioner-and-byte-arrays I added the following line to my property file, even though I was using new API:
partitioner.class=kafka.producer.ByteArrayPartitioner
Now I am getting this exception:
org.apache.kafka.common.KafkaException: Could not instantiate class
kafka.producer.ByteArrayPartitioner Does it have a public no-argument
constructor?
It looks like ByteArrayPartitioner does not have a default constructor.
Any idea what I am missing here?
Thanks
Madhu
As I was using the KafkaProducer API, I did not need
partitioner.class=kafka.producer.ByteArrayPartitioner
property. The issue was there were 2 copies of the kafkaclient jar. We have configured our installation such that all library jar files are in an external shared directory. But due to the POM configuration error the war file also had a copy of the kafka client in its lib directory. Once I fixed this issue, it worked fine.
Madhu