Kafka connect not working for file streaming

Kafka connect not working for file streaming - streaming

I have been using Kafka connect for the confluent platform using the following guide
Kafka connect quickstart
But it doesn't update the sink file anymore, any changes in the source file are not written in the kafka topic.
I have already deleted all tmp files but no change.
Thanks in advance

Start up a new file source connector with a new location for storing the offsets. This connector is meant as a demo and really doesn't handle anything except a simple file that only gets append updates. Note, you shouldn't be doing anything with this connector other than a simple demo. Have a look at the connector hub if you need something for production.

To OP, I have had this like 5 mins ago but when I restarted the connector it's fine, both test.sink.txt and the consumer are getting new line added. So in a nutshell, just restart your connector.

The FileStreamSource/Sink does not work after it worked fine and you've already restarted the zookeeper, kafka server and the connector but still it does not work then the problem is with the CONNECT.OFFSETS file in the kafka directory.
You should delete it and create a new empty one.

I faced the same problem before. But correcting the path of the input and output files in the properties files as below worked for me. And it streamed from input file(test.txt) to output file(test.sink.txt).
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=/home/mypath/kafka/test.txt
topic=connect-test
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=/home/mypath/kafka/test.sink.txt
topics=connect-test

Related

Where do user-supplied Kafka connectors live?

We've got a managed Kafka setup (Confluent platform, Kafka connect 5.5.1), streaming data from ~40 topics across 8 to 10 connectors. A few weeks ago I noticed that for some of those topics, we don't have any consumers assigned. The consumers which should be reading from or writing to those topics are ones that our org has written and have not changed in months.
Looking through our connector hosts (AWS EC2 instances) I actually cannot see where our connector JAR files exist - which surprises me a lot. We've got all the other connectors there, and when I used confluent hub to install the BigQuery connector that got put under /usr/share/java as one would expect.
Where should home-grown connectors live on the filesystem?
For the record, when I query :8083 using the appropriate calls I can see the connector and it does have an allegedly-running task.

They are picked from the Java CLASSPATH and plugin.path
As for where they should exist, is somewhere that the user account running the connect process has access to read those files.

Is there any way to send chrome history logs to kafka?

I want to send my google chrome history to kafka.
My basic idea is to use my local data located in
C:/Users/master/AppData/Local/Google/Chrome/User Data/Default/history
To do so, I want to use Kafka file source connector.
But how can I send newly added chrome history log after I run kakfa source connector?
Is there any way track the change of source file so kafka broker can acknowledge it?

Indeed you can use FileStreamSourceConnector to achieve that. You do not need to anything else.
Once you start FileStreamSourceConnector, it will hook to the specified file. So, whenever new data is appended to the file, your connector will automatically produce to the topic.
From the link that I shared above:
This connector will read only one file and send the data within that file to Kafka. It will then watch the file for appended updates only. Any modification of file lines already sent to Kafka will not be reprocessed.
This may help you: Read File Data with Connect

Why doesn't my consumer console sync up with my text file?

I set up everything as recommended for a quick start, I used a text file as a source or producer which contains one sentence. when I launch a consumer console for the first time I am able to read the sentence (JSON format) in the file but when I add something in the file it's not showing in the consumer console and when I use the producer console to add something in the topic, it shows right the way in the consumer console. What could be the problem?
zookeeper UP
Connector UP
consumer UP
producer UP
Kafka UP

Kafka doesn't watch files for changes. You would need to program your own code to detect file modifications on disk then restart the producer thread to pick up those changes
Alternatively, use kafka-connect-spooldir connector, available on Github

I created a new topic and placed the file to the wrong path, so I had to edit these files:
bin/connect-standalone.sh
config/connect-standalone.properties
config/connect-file-source.properties
config/connect-file-sink.properties
---------- edit these lines------------------------------
topic=my_created_topic
file=PATH_TO_MY_SOURCE_FILE
Everything is working perfectly, yah!!!!!!!!!!

some data did not consumed by kafka-engine

We are using the kafka-engine to connect kafka topic, and then MATERIALIZED VIEW to store the data.
But from time to time, some data did not consumed by kafka-engine( due to we also apply flume to put the data into hdfs file, and these missing data can be found in hdfs file).
Is there any other method to find related log to located problem except upgrade the clcikhouse server version(we are on the way to upgrade clickhouse server )

You can enable rdkafka logs by adding <kafka><debug>all</debug></kafka> fragment to clickhouse config.xml file
Logs will be written to /var/log/clickhouse-server/stderr file (in docker - to docker logs).

How to save a kafka topic at shutdown

I'm configuring my first kafka network. I can't seem to find any support on saving a configured topic. I know I can configure a topic from the quickstart guide here, but how do I save it? I thought I could add the topic info to a .properties file inside the config dir, but I don't see any support for that.
If I shutdown my machine, my topic is deleted. How do I save the configuration?

Could the topic be deleted because you are using the default broker config? With the default config, Kafka logs are stored under /tmp folder. This folder gets wiped out during a machine reboot. You could change the broker config and pick another location for Kafka logs.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kafka connect not working for file streaming - streaming

I have been using Kafka connect for the confluent platform using the following guide Kafka connect quickstart But it doesn't update the sink file anymore, any changes in the source file are not written in the kafka topic. I have already deleted all tmp files but no change. Thanks in advance

To OP, I have had this like 5 mins ago but when I restarted the connector it's fine, both test.sink.txt and the consumer are getting new line added. So in a nutshell, just restart your connector.

The FileStreamSource/Sink does not work after it worked fine and you've already restarted the zookeeper, kafka server and the connector but still it does not work then the problem is with the CONNECT.OFFSETS file in the kafka directory. You should delete it and create a new empty one.

Related

Where do user-supplied Kafka connectors live?

Is there any way to send chrome history logs to kafka?

Why doesn't my consumer console sync up with my text file?

some data did not consumed by kafka-engine

How to save a kafka topic at shutdown

Categories

Resources