Kafka connect (sink file not updated)

Kafka connect (sink file not updated) - apache-kafka

I have updated below properties file according to my requirement
connect-standalone.properties
connect-file-source.properties
connect-file-sink.properties
The Kafka Connect process start and the source connector read lines and write these as messages to the test_topic but messages are not written in test.sink.txt
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

Related

Kafka Debezium Connector Working But not ingesting data into PostgreSQL

I have successfully connect postgresql to kafka using debezium-connector-postgres-1.8.0.Final-plugin connector.Below is my Standalone.properties file:
rest.port=8084
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/home/abc/debezium-connector-postgres-1.8.0.Final-plugin/debezium-connector-postgres/debezium-connector-postgres-1.8.0.Final.jar,
############################# Zookeeper #############################
zookeeper.connect=localhost:2181
My connector file is given below:
name= inventory_db-connector
connector.class= io.debezium.connector.postgresql.PostgresConnector
tasks.max= 1
database.hostname= localhost
database.port= 5432
database.user= user
database.password= pass
database.dbname = testdb
database.server.name= dbserver1
database.whitelist= testdb
database.history.kafka.bootstrap.servers= kafka=9092
database.history.kafka.topic= schema-changes.inventory
topics=inventory
topic.inventory.inventory.conditions.mapping=time=value.time,device_id=value.device_id,temperature=value.temperature,humidity=value.humidity
It is running successfully on the terminal through the following command kafka/bin/connect-standalone.sh kafka/config/connect-standalone.properties /home/abc/inventory_db-connector.
My inventory.txt contains the data which i want to ingest in the database in the JSON format.
Also I have checked on kafka-consumer for inventory topic and it shows that data is loaded:
{"time":"2022-02-08 16:18:00-05", "device_id":"weather-pro001000", "temperature":"200", "humidity":"200"}
Now we are trying to ingest the data into postgres using the following command cat inventory.txt | kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic demo --property "parse.key=true" --property "key.separator=:";
But now the problem is that data is not inserted in the database and the command does
not shows any error and my log is also empty.
I cannot seem to figure out this problem.

Debezium connector is a source connector i.e it is used to read data from postgresSQL into kafka.
If you want to ingest data into postgres sql try using JDBC sink connector
https://docs.confluent.io/kafka-connect-jdbc/current/sink-connector/index.html

Snowflake Kafka connector config issue

I'm following the steps in this guide Snowflake Connector for Kafka
The error message I'm getting is
BadRequestException: Connector config {.....} contains no connector type
I am running the command as
sh kafka_2.12-2.3.0/bin/connect-standalone.sh connect-standalone.properties snowflake_kafka_config.json
my config files are
connect-standalone.properties
bootstrap.servers=localhost:9092
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/Users/kafka_test/kafka
jar file snowflake-kafka-connector-0.5.1.jar is in plugin.path
snowflake_kafka_config.json
{
"name":"Kafka_Test",
"Config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"test",
"snowflake.topic2table.map": "",
"buffer.count.records":"1",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"<url>",
"snowflake.user.name":"<user_name>",
"snowflake.private.key":"<private_key>",
"snowflake.private.key.passphrase":"<pass_phrase>",
"snowflake.database.name":"<db>",
"snowflake.schema.name":"<schema>",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"value.converter.schema.registry.url":"",
"value.converter.basic.auth.credentials.source":"",
"value.converter.basic.auth.user.info":""
}
}
Kafka is running on local, I have a producer and consumer up, can see the data flowing.

This is the same question I answered over on the Confluent community Slack, but I'll post it here for reference too :-)
The connect worker log shows that the connector JAR itself is being loaded, so the 'contains no connector type` is because your config formatting is fubar.
You're running in Standalone mode, but passing in a JSON file which won't. My personal opinion is always use distributed, even if just a single node of it. Check this out if you need a recap on standalone vs distributed : http://rmoff.dev/ksldn19-kafka-connect
If you must use standalone then you need your connector config (snowflake_kafka_config.json) to be a properties file like this:
param1=argument1
param2=argument2
You can see valid JSON examples (if you use distributed mode) here: https://github.com/confluentinc/demo-scene/blob/master/kafka-connect-zero-to-hero/demo_zero-to-hero-with-kafka-connect.adoc#stream-data-from-kafka-to-elasticsearch

Kafka Connect FileStreamSource ignores appended lines

I'm working on an application to process logs with Spark and I thought to use Kafka as a way to stream the data from the log file. Basically I have a single log file (on the local file system) which is continuously updated with new logs, and Kafka Connect seems to be the perfect solution to get the data from the file along with the new appended lines.
I'm starting the servers with their default configurations with the following commands:
Zookeeper server:
zookeeper-server-start.sh config/zookeeper.properties
zookeeper.properties
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
Kafka server:
kafka-server-start.sh config/server.properties
server.properties
broker.id=0
log.dirs=/tmp/kafka-logs
zookeeper.connect=localhost:2181
[...]
Then I created the topic 'connect-test':
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic connect-test
And finally I run the Kafka Connector:
connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
connect-standalone.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
connect-file-source.properties
name=my-file-connector
connector.class=FileStreamSource
tasks.max=1
file=/data/users/zamara/suivi_prod/app/data/logs.txt
topic=connect-test
At first I tested the connector by running a simple console consumer:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
Everything was working perfectly, the consumer was receiving the logs from the file and as I was adding logs the consumer kept updating with the new ones.
(Then I tried Spark as a "consumer" following this guide: https://spark.apache.org/docs/2.2.0/streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers and it was still fine)
After this, I removed some of the logs from the log file and changed the topic (I deleted the 'connect-test' topic, created another one and edited the connect-file-source.properties with the new topic).
But now the connector doesn't work the same way anymore. When using the console consumer, I only get the logs that were already in the file and every new line I add is ignored. Maybe changing the topic (and/or modifying the data from the log file) without changing the connector name broke something in Kafka...
This is what Kafka Connect does with my topic 'new-topic' and connector 'new-file-connector', :
[2018-05-16 15:06:42,454] INFO Created connector new-file-connector (org.apache.kafka.connect.cli.ConnectStandalone:104)
[2018-05-16 15:06:42,487] INFO Cluster ID: qjm74WJOSomos3pakXb2hA (org.apache.kafka.clients.Metadata:265)
[2018-05-16 15:06:42,522] INFO Updated PartitionLeaderEpoch. New: {epoch:0, offset:0}, Current: {epoch:-1, offset:-1} for Partition: new-topic-0. Cache now contains 0 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-05-16 15:06:52,453] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:06:52,453] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
[2018-05-16 15:06:52,458] INFO WorkerSourceTask{id=new-file-connector-0} Finished commitOffsets successfully in 5 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:427)
[2018-05-16 15:07:02,459] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:07:02,459] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
[2018-05-16 15:07:12,459] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:07:12,460] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
(it keeps flushing 0 outstanding messages even after appending new lines to the file)
So I tried to start over: I deleted the /tmp/kafka-logs directory, the /tmp/connect.offset file, and used a brand new topic name, connector name and log file, just in case. But still, the connector ignores new logs... I even tried to delete my kafka, re-extract it from the archive and run the whole process again (in case something changed in Kafka), but no success.
I'm confused as to where the problem is, any help would be appreciated !

Per docs:
The FileStream Connector examples are intended to show how a simple connector runs for those first getting started with Kafka Connect as either a user or developer. It is not recommended for production use.
I would use something like Filebeat (with its Kafka output) instead for ingesting logs into Kafka. Or kafka-connect-spooldir if your logs are not appended to directly but are standalone files placed in a folder for ingest.

Kafka Connect does not "watch" or "tail" a file. I don't believe it is documented anywhere that it does do that.
I would say it is even less useful for reading active logs than using Spark Streaming to watch a folder. Spark will "recognize" newly created files. Kafka Connect FileStreamSource must point at a single pre-existing, immutable file.
To get Spark to work with active logs, you would need something that does "log rotation" - that is, when the file reaches a max size or a condition such as the end of a time period (say a day), then this process would move the active log to the directory Spark is watching, then it handles starting a new log file for your application to continue to write to.
If you want files to be actively watched and ingested into Kafka then Filebeat, Fluentd, or Apache Flume can be used.

Connector config contains no connector type

I'm trying to use JDBC Connector to connect to a PostgreSQL database on my cluster (the database is not directly managed by the cluster).
I've been calling the Kafka Connect with the following command:
connect-standalone.sh worker.properties jdbc-connector.properties
This is the content of the worker.propertiesfile:
class=io.confluent.connect.jdbc.JdbcSourceConnector
name=test-postgres-1
tasks.max=1
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/home/user/offest
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
connection.url=jdbc:postgresql://database-server.url:port/database?user=user&password=password
And this are the content of the jdbc-connector.properties:
mode=incrementing
incrementing.column.name=id
topic.prefix=test-postgres-jdbc-
When I try to launch the connector with the above command it crashes with the following error:
[2018-04-16 11:39:08,164] ERROR Failed to create job for jdbc.properties (org.apache.kafka.connect.cli.ConnectStandalone:88)
[2018-04-16 11:39:08,166] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:99)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector config {mode=incrementing, incrementing.column.name=pdv, topic.prefix=test-postgres-jdbc-} contains no connector type
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:80)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:67)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:96)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector config {mode=incrementing, incrementing.column.name=id, topic.prefix=test-postgres-jdbc-} contains no connector type
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:233)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:158)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:93)
After noting that the connector causing the error displayed only informations from jdbc-connector.properties I've tried merging the two files together, but then the command terminates abruptly (without creating a topic or an offset file) with the following output:
[SLF4J infos...]
[2018-04-16 11:48:54,620] INFO Usage: ConnectStandalone worker.properties connector1.properties [connector2.properties ...] (org.apache.kafka.connect.cli.ConnectStandalone:59)

You need to have most of those properties in the jdbc-connector.properties, not the worker.properties. See https://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html for a full list of config options that go in the connector configuration (jdbc-connector.properties in your example).
Try this:
worker.properties:
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/home/user/offest
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
jdbc-connector.properties:
class=io.confluent.connect.jdbc.JdbcSourceConnector
name=test-postgres-1
tasks.max=1
mode=incrementing
incrementing.column.name=id
topic.prefix=test-postgres-jdbc-
connection.url=jdbc:postgresql://database-server.url:port/database?user=user&password=password
You can see some more examples with Kafka Connect here:
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
https://www.confluent.io/blog/blogthe-simplest-useful-kafka-connect-data-pipeline-in-the-world-or-thereabouts-part-2/
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-3/

Apache Kafka config/producer.properties

I want to copy kafka data to hadoop. How can i do configuration of producer.properties file. Kafka data format is .avro file.
metadata.broker.list
request.required.acks
producer.type
serializer.class