we try to use the confluent kinessis-source connector for kafka connector (kafka cluster is confluent, kafka-connect deployed on k8s)
we use the trial - no license added yet,
we get the following error:
org.apache.kafka.common.errors.TimeoutException: License topic could not be created
WorkerConnector{id=kinessis-connector} Error while starting connectorsg","THREAD":"connector-thread-kinessis-for-connector","SRC":"org.apache.kafka.connect.runtime.WorkerConnector.org.apache.kafka.common.errors.TimeoutException: License topic could not be created
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=createTopics, deadlineMs=1626952177722, tries=1, nextAllowedTryMs=1626952177823) timed out at 1626952177723 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
[Worker clientId=connect-1, groupId=kafkaconnect-platform-dev] Failed to start connector 'kinessis-connector","SRC":"org.apache.kafka.connect.runtime.distributed.DistributedHerder"}
org.apache.kafka.connect.errors.ConnectException: Failed to start connector: kinessis-connector
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.lambda$startConnector$5(DistributedHerder.java:1305)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:335)
at org.apache.kafka.connect.runtime.WorkerConnector.doRun(WorkerConnector.java:140)
at org.apache.kafka.connect.runtime.WorkerConnector.run(WorkerConnector.java:117)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
the connector configuration is:
{
"connector.class": "io.confluent.connect.kinesis.KinesisSourceConnector",
"key.converter.schemas.enable": "false",
"confluent.topic.bootstrap.servers": "*****.eu-west-1.aws.confluent.cloud:9092",
"tasks.max": "1",
"kinesis.region": "eu-west-1",
"value.converter.schemas.enable": "false",
"name": "kinessis-connector",
"kafka.topic": "kinessis_events",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"kinesis.stream": "kinessis-events",
"key.converter": "org.apache.kafka.connect.json.JsonConverter"
}```
Related
I need to send the data stored in my PostgreSQL to my Kafka (is running on my docker),
I tried to create the connector with these setting:
{
"name": "Connessione",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"topics": "Bill",
"connection.url": "jdbc:postgresql://localhost:5432/Kafka_Example?sslmode=require",
"connection.user": "postgres",
"connection.password": "**********",
"table.whitelist": "messaggio",
"mode": "timestamp",
"max.retries": "4",
"timestamp.column.name":"modified_at,created_at",
"poll.interval.ms": "2000",
"topic.prefix": "pg_source_"
}
}
but I got this when I uploading my config file on control center.
This is my logs from Docker:
org.apache.kafka.connect.errors.ConnectException: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:59)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:64)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:563)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:326)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:229)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:319)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)
at org.postgresql.Driver.makeConnection(Driver.java:400)
at org.postgresql.Driver.connect(Driver.java:259)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at io.confluent.connect.jdbc.dialect.GenericDatabaseDialect.getConnection(GenericDatabaseDialect.java:250)
at io.confluent.connect.jdbc.dialect.PostgreSqlDatabaseDialect.getConnection(PostgreSqlDatabaseDialect.java:103)
at io.confluent.connect.jdbc.util.CachedConnectionProvider.newConnection(CachedConnectionProvider.java:80)
at io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:52)
... 13 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.postgresql.core.PGStream.createSocket(PGStream.java:241)
at org.postgresql.core.PGStream.<init>(PGStream.java:98)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:109)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:235)
... 23 more
[2022-11-09 16:30:00,218] ERROR WorkerSinkTask{id=Connessione-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:591)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:326)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:229)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:59)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:64)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:563)
... 10 more
(My database is running on localhost:5432)
Is anybody tell me what Am I doing wrong, and Is there any easy way to do this on my spring boot application where I already have both kafka and database?
Assuming you've started Connect from Docker, then localhost refers to the Connect container itself, not your database.
https://docs.docker.com/network/bridge/
Similarly, your Spring app isn't running Kafka or a database; those are external services, and Connect framework doesn't embed into Spring.
I run Kafka and Kafka-connect on different servers (Let`s say serverA and serverB)
serverA for kafka connect
# vi /home/kafka/config/connect-distributed.properties
bootstrap.servers=serverB:9092
rest.host.name=localhost
rest.port=8083
serverB for kafka
# vi server.properties
broker.id=1
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://serverA:9092
delete.topic.enable = true
But, when i run kafka connect on serverA, i got an error.
[2020-04-30 16:59:37,053] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed:84)
org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check worker's broker connection and security properties.
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:64)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:45)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:95)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:78)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1588233577048) timed out at 1588233577049 after 1 attempt(s)
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:58)
... 3 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1588233577048) timed out at 1588233577049 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
FYI, If i run kafka connect on kafka server(serverB), It worked. but, I want to run these on different server.
How can I connect kafka-connect to kafka?
In your server.properties you have
advertised.listeners=PLAINTEXT://serverA:9092
but Kafka connnects uses
bootstrap.servers=serverB:9092
instead of
bootstrap.servers=serverA:9092
I have installed the new version of confluent i.e 5.4 and since after that I am unable to connect to the confluent, my schema registry also gets terminated untimely.
Today when I started the confluent and tried to produce the data, I recieved the following error:
2020-03-05 12:25:00,453] ERROR Failed to send HTTP request to endpoint: http://localhost:8081/subjects/avro-key/versions (io.confluent.kafka.schemaregistry.client.rest.RestService:245)
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1248)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1362)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1337)
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:241)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:322)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:422)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:414)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:400)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:140)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:196)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:172)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:71)
at io.confluent.kafka.formatter.AvroMessageReader.readMessage(AvroMessageReader.java:199)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:55)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)'
Updated the question with the Schema-registry logs:
INFO Logging initialized #865ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log:169)
[2020-03-09 12:35:51,851] INFO Adding listener: http://0.0.0.0:8081 (io.confluent.rest.ApplicationServer:316)
[2020-03-09 12:35:52,366] INFO Created schema registry namespace localhost:2181 /schema_registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig:709)
[2020-03-09 12:35:53,329] INFO Initializing KafkaStore with broker endpoints: PLAINTEXT://LAP-LIN-897:9092 (io.confluent.kafka.schemaregistry.storage.KafkaStore:108)
[2020-03-09 12:38:03,215] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:77)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:248)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:75)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:90)
at io.confluent.rest.Application.configureHandler(Application.java:217)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:185)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: Timed out trying to create or validate schema topic configuration
at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:177)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:119)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:246)
... 6 more
Caused by: java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:170)
... 8 more
'
I am getting following exception while running debezium with confluent enterprise.
org.apache.kafka.connect.errors.ConnectException: An exception
occurred in the change event producer. This connector will be stopped.
at io.debezium.connector.base.ChangeEventQueue.throwProducerFailureIfPresent(ChangeEventQueue.java:171)
at io.debezium.connector.base.ChangeEventQueue.poll(ChangeEventQueue.java:151)
at io.debezium.connector.oracle.OracleConnectorTask.poll(OracleConnectorTask.java:110)
at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:259)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:226)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for
jdbc:oracle:oci:#orclnode:1527/mydb
at io.debezium.relational.RelationalSnapshotChangeEventSource.execute(RelationalSnapshotChangeEventSource.java:108)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:87)
at io.debezium.pipeline.ChangeEventSourceCoordinator$$Lambda$485/1478798798.run(Unknown
Source)
... 5 more
Caused by: java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:oracle:oci:#orclnode:1527/mydb
at io.debezium.connector.oracle.OracleConnection.setSessionToPdb(OracleConnection.java:51)
at io.debezium.connector.oracle.OracleSnapshotChangeEventSource.prepare(OracleSnapshotChangeEventSource.java:72)
at io.debezium.relational.RelationalSnapshotChangeEventSource.execute(RelationalSnapshotChangeEventSource.java:104)
... 7 more
Caused by: java.sql.SQLException: No suitable driver found for jdbc:oracle:oci:#orclnode:1527/mydb
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at io.debezium.connector.oracle.OracleConnectionFactory.connect(OracleConnectionFactory.java:25)
at io.debezium.jdbc.JdbcConnection.connection(JdbcConnection.java:768)
at io.debezium.jdbc.JdbcConnection.connection(JdbcConnection.java:763)
at io.debezium.connector.oracle.OracleConnection.setSessionToPdb(OracleConnection.java:47)
... 9 more
[2020-01-15 12:15:25,447] ERROR WorkerSourceTask{id=test-debezium-1-0} Task is being killed and will
not recover until manually restarted
(org.apache.kafka.connect.runtime.WorkerTask:180)
Now at first glance it looks like a minor issue but I have tried different drivers and its not working for me. Debezium uses OCI drivers and this is the first time I setup that but I followed the instruction from Oracle website to setup the driver.
My Oracle Version is :
`SELECT * FROM v$version;
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
"CORE 11.2.0.4.0 Production"
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production`
I have tried following few drivers and more
`
[user#nodexxx oracle]$ ls -l
total 8
drwxr-xr-x. 2 root root 271 Jan 10 10:45 instantclient_11_2
drwxr-xr-x. 2 user root 4096 Jan 15 11:36 instantclient_12_2
drwxr-xr-x. 3 user root 4096 Jan 8 16:54 instantclient_19_5
[user#nodexxx oracle]$ echo $LD_LIBRARY_PATH
/opt/oracle/instantclient_12_2:
[user#nodexxx oracle]$ echo $PATH
/opt/oracle/instantclient_12_2::/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/app/confluent-5.3.1/bin:/usr/java/jdk1.8.0_231/bin:/home/user/.local/bin:/home/user/bin
[user#nodexxx oracle]$
`
I am installing instant client in my local and oracle DB is running on remote. Do I need to add any other files apart from the instant client downloaded zip.
My Linux Distribution is as shown below:
`
[user#nodexxx oracle]$ sudo lsb_release -a
[sudo] password for user:
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.6 (Maipo)
Release: 7.6
Codename: Maipo
`
My connector's deployed configuration is as shown below:
{
"name": "test-debezium-1",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"tasks.max": "1",
"database.tablename.case.insensitive": "true",
"database.oracle.version": "11",
"database.server.name": "deb-linux",#its just a logical name
"database.hostname": "orclnode",
"database.port": "1527",
"database.user": "xstream",
"database.password": "xstream",
"database.dbname": "mydb",
"database.pdb.name": "",
"database.out.server.name": "dbzxout",
"database.history.kafka.bootstrap.servers": "kafka1:9092,kafka2:9092,kafka3:9092",
"database.history.kafka.topic": "debezium-inventory-topic",
"snapshot.mode": "initial",
"table.whitelist": "orcl\\.debezium\\.(.*)",
"name": "test-debezium-1"
},
"tasks": [
{
"connector": "test-debezium-1",
"task": 0
}
],
"type": "source"
}
I am stuck and have no clue how to resolve this issue.
I am using kafka connect for hive integration to create hive tables along with partitions on s3. After starting connect distributed process and making a post call to listen to a topic, as soon as there is some data in the topic, I can see in the logs that data is being committed to s3 as shown below.
2017-07-13 06:59:37 INFO AbstractCoordinator:434 - Successfully joined group connect-hive-int-1 with generation 2
2017-07-13 06:59:37 INFO ConsumerCoordinator:219 - Setting newly assigned partitions [test_hive_int_1-0] for group connect-hive-int-1
2017-07-13 06:59:37 INFO TopicPartitionWriter:213 - Started recovery for topic partition test_hive_int_1-0
2017-07-13 06:59:38 INFO TopicPartitionWriter:228 - Finished recovery for topic partition test_hive_int_1-0
2017-07-13 06:59:38 INFO NativeS3FileSystem:246 - OutputStream for key 'ashishs/topics/+tmp/test_hive_int_1/year=2017/month=07/day=13/hour=06/minute=58/97a5b3f2-e9c2-41b4-b344-eb080d048052_tmp.avro' writing to tempfile '/tmp/hadoop-root/s3/output-2343236621771119424.tmp'
2017-07-13 06:59:38 WARN HiveMetaStore:150 - Hive database already exists: default
2017-07-13 06:59:38 INFO TopicPartitionWriter:302 - Starting commit and rotation for topic partition test_hive_int_1-0 with start offsets {year=2017/month=07/day=13/hour=06/minute=58/=0} and end offsets {year=2017/month=07/day=13/hour=06/minute=58/=1}
2017-07-13 06:59:38 INFO NativeS3FileSystem:280 - OutputStream for key 'ashishs/topics/+tmp/test_hive_int_1/year=2017/month=07/day=13/hour=06/minute=58/97a5b3f2-e9c2-41b4-b344-eb080d048052_tmp.avro' closed. Now beginning upload
2017-07-13 06:59:38 INFO NativeS3FileSystem:292 - OutputStream for key 'ashishs/topics/+tmp/test_hive_int_1/year=2017/month=07/day=13/hour=06/minute=58/97a5b3f2-e9c2-41b4-b344-eb080d048052_tmp.avro' upload complete
2017-07-13 06:59:39 INFO TopicPartitionWriter:638 - Committed s3://dev.canopydata.com/ashishs//topics/test_hive_int_1/year=2017/month=07/day=13/hour=06/minute=58/test_hive_int_1+0+0000000000+0000000001.avro for test_hive_int_1-0
But as soon as after the first commit, I get the following exception:
2017-07-13 06:59:39 INFO TopicPartitionWriter:638 - Committed s3://dev.canopydata.com/ashishs//topics/test_hive_int_1/year=2017/month=07/day=13/hour=06/minute=58/test_hive_int_1+0+0000000000+0000000001.avro for test_hive_int_1-0
2017-07-13 06:59:39 INFO WorkerSinkTask:244 - WorkerSinkTask{id=hive-int-1-0} Committing offsets
2017-07-13 06:59:39 INFO TopicPartitionWriter:531 - Ignoring stale out-of-order record in test_hive_int_1-0. Has offset 0 instead of expected offset 4
2017-07-13 06:59:49 ERROR WorkerSinkTask:390 - Task hive-int-1-0 threw an uncaught and unrecoverable exception
java.lang.RuntimeException: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Hive MetaStore exception
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:229)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:370)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:227)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:170)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:142)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Hive MetaStore exception
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:223)
... 12 more
Caused by: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Hive MetaStore exception
at io.confluent.connect.hdfs.hive.HiveMetaStore.alterTable(HiveMetaStore.java:226)
at io.confluent.connect.hdfs.avro.AvroHiveUtil.alterSchema(AvroHiveUtil.java:58)
at io.confluent.connect.hdfs.TopicPartitionWriter$2.call(TopicPartitionWriter.java:664)
at io.confluent.connect.hdfs.TopicPartitionWriter$2.call(TopicPartitionWriter.java:661)
... 4 more
Caused by: MetaException(message:org.datanucleus.exceptions.NucleusDataStoreException: Clear request failed : DELETE FROM `PARTITION_KEYS` WHERE `TBL_ID`=?)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result$alter_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:39803)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result$alter_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:39780)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result.read(ThriftHiveMetastore.java:39722)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1345)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1329)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:345)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:334)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy48.alter_table(Unknown Source)
at io.confluent.connect.hdfs.hive.HiveMetaStore$6.call(HiveMetaStore.java:212)
at io.confluent.connect.hdfs.hive.HiveMetaStore$6.call(HiveMetaStore.java:209)
at io.confluent.connect.hdfs.hive.HiveMetaStore.doAction(HiveMetaStore.java:87)
at io.confluent.connect.hdfs.hive.HiveMetaStore.alterTable(HiveMetaStore.java:218)
... 7 more
2017-07-13 06:59:49 ERROR WorkerSinkTask:391 - Task is being killed and will not recover until manually restarted
One weird observation is if I delete this particular job and submit it again with the same configuration, further data in the topic gets committed to s3 without any exception. its just after the first commit I am seeing this exception.
The load I am using in my post call is:
{
"name": "hive-int-1",
"config": {
"connector.class": "com.qubole.streamx.s3.S3SinkConnector",
"format.class": "io.confluent.connect.hdfs.avro.AvroFormat",
"tasks.max": "1",
"topics": "test_hive_int_1",
"flush.size": "2",
"s3.url": "s3://dev.canopydata.com/ashishs/",
"hadoop.conf.dir": "/usr/local/streamx/config/hadoop-conf",
"rotate.interval.ms": "60000",
"hive.integration":"true",
"hive.metastore.uris":"thrift://<host_fqdn>:10000",
"schema.compatibility":"BACKWARD",
"partitioner.class": "io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner",
"partition.duration.ms": "120000",
"locale": "en",
"path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH/'minute'=mm/",
"timezone": "GMT"
}
}
Any pointers on what I am doing wrong or if i am missing something would be helpful.