Kafka connect tutorial stopped working - apache-kafka

I was following step #7 (Use Kafka Connect to import/export data) at this link:
http://kafka.apache.org/documentation.html#quickstart
It was working well until I deleted the 'test.txt' file. Mainly because that's how log4j files would work. After certain time, the file will get rotated - I mean - it will be renamed & a new file with the same name will start getting written to.
But after, I deleted 'test.txt', the connector stopped working. I restarted connector, broker, zookeeper etc, but the new lines from 'test.txt' are not going to the 'connect-test' topic & therefore are not going to the 'test.sink.txt' file.
How can I fix this?

The connector keeps tabs of its "last location read from a file", so in case it crashes while reading the file, it can continue where it left off.
The problem is that you deleted the file without resetting the offsets to 0, so it basically doesn't see any new data since it waits for new data to show starting at a specific character count from the beginning...
The work-around if to reset the offsets. If you are using connect in stand-alone mode, the offsets are stored in /tmp/connect.offsets by default, just delete them from there.
In the long term, we need a better file connector :)

Related

Is there any way to send chrome history logs to kafka?

I want to send my google chrome history to kafka.
My basic idea is to use my local data located in
C:/Users/master/AppData/Local/Google/Chrome/User Data/Default/history
To do so, I want to use Kafka file source connector.
But how can I send newly added chrome history log after I run kakfa source connector?
Is there any way track the change of source file so kafka broker can acknowledge it?
Indeed you can use FileStreamSourceConnector to achieve that. You do not need to anything else.
Once you start FileStreamSourceConnector, it will hook to the specified file. So, whenever new data is appended to the file, your connector will automatically produce to the topic.
From the link that I shared above:
This connector will read only one file and send the data within that file to Kafka. It will then watch the file for appended updates only. Any modification of file lines already sent to Kafka will not be reprocessed.
This may help you: Read File Data with Connect

kafka file stream with limitation

am trying to extract the file from string input from a file, I am having stream connector running throughout the day, but am facing below issues,
1, The file has more than 20k records(20K lines), when a new file is recognized, the connector posts the message up to 10k and then goes to idle. does the connector has any limitations/configuration to limit the data? am checking this count from the topic(which is configured at source.properties) console consumer. (as soon as job started, I saw another window with consuming the messages with diff consumer group).
My file connector keeps running through our the day, file(abc.txt) data is published to the topic and tries to replace the new file (remove the current file from a location and place same file name with diff data set), the running job is getting an exception, but when I append the new data set to the existing file, it is running fine. is it an excepted behavior?
any help is really appreciated.

Why doesn't my consumer console sync up with my text file?

I set up everything as recommended for a quick start, I used a text file as a source or producer which contains one sentence. when I launch a consumer console for the first time I am able to read the sentence (JSON format) in the file but when I add something in the file it's not showing in the consumer console and when I use the producer console to add something in the topic, it shows right the way in the consumer console. What could be the problem?
zookeeper UP
Connector UP
consumer UP
producer UP
Kafka UP
Kafka doesn't watch files for changes. You would need to program your own code to detect file modifications on disk then restart the producer thread to pick up those changes
Alternatively, use kafka-connect-spooldir connector, available on Github
I created a new topic and placed the file to the wrong path, so I had to edit these files:
bin/connect-standalone.sh
config/connect-standalone.properties
config/connect-file-source.properties
config/connect-file-sink.properties
---------- edit these lines------------------------------
topic=my_created_topic
file=PATH_TO_MY_SOURCE_FILE
Everything is working perfectly, yah!!!!!!!!!!

Why the size of Zookeeper log block is not 64M?

I started zookeeper cluster in my computer, it's includes three instances.By default the size of log file should be 64M, but i found a strange things
If anyone can explain what happened with Zookeeper?
here is the content of the log file
The FileTxnLog is truncated, which is implemented by FileTxnSnapLog.truncateLog.
This scenario happens when there is a new election, and follower has some transaction that is not committed in leader.
This can be verified if log like:
Truncating log to get in sync with ...
exists in zookeeper.out or the log file you specified.

Kafka connect not working for file streaming

I have been using Kafka connect for the confluent platform using the following guide
Kafka connect quickstart
But it doesn't update the sink file anymore, any changes in the source file are not written in the kafka topic.
I have already deleted all tmp files but no change.
Thanks in advance
Start up a new file source connector with a new location for storing the offsets. This connector is meant as a demo and really doesn't handle anything except a simple file that only gets append updates. Note, you shouldn't be doing anything with this connector other than a simple demo. Have a look at the connector hub if you need something for production.
To OP, I have had this like 5 mins ago but when I restarted the connector it's fine, both test.sink.txt and the consumer are getting new line added. So in a nutshell, just restart your connector.
The FileStreamSource/Sink does not work after it worked fine and you've already restarted the zookeeper, kafka server and the connector but still it does not work then the problem is with the CONNECT.OFFSETS file in the kafka directory.
You should delete it and create a new empty one.
I faced the same problem before. But correcting the path of the input and output files in the properties files as below worked for me. And it streamed from input file(test.txt) to output file(test.sink.txt).
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=/home/mypath/kafka/test.txt
topic=connect-test
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=/home/mypath/kafka/test.sink.txt
topics=connect-test