Kafka connector logs - apache-kafka

I am working on kafka connectors and by time my connectors are increasing. So my log file is getting really messy, I was wondering if it is possible to have separate log file for each connector.

you can use grep command to view only the required logs.
command: tail /var/log/kafka/connect.log -f | grep -n 'phrase to search'
your path for log file could be different.

Whatever you are trying to achieve is not possible as of current Kafka-connect implementation. But there is a KIP under discussion which may help you when it's accepted and implemented. https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs

Related

Unable to change default /etc/kafka/connect-log4j.properties location for different kafka connectors

I am using multiple kafka connectors. But every connector is writing log within same connect.log file. But I want connectors to write different log files. For that, during startup I need to change default /etc/kafka/connect-log4j.properties file. But unable to change it.
Sample Start Script:
/usr/bin/connect-standalone ../properties/sample-worker.properties ../properties/sample-connector.properties > /dev/null 2>&1 &
Is there any way to change default /etc/kafka/connect-log4j.properties file during the startup of connectors.
Kafka uses log4j, and has a variable for overriding it
export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:///some/other/log4j.properties"
connect-standalone.sh ...
Generally, it would be best to use connect-distributed and use some log aggregation tool like the ELK stack to parse through log events for different connectors

Mongodb Kafka messages not seen by topic

I encountered that my topic despite running and operating doesn't register events occuring in my MongoDB.
Everytime I insert/modify record I'm not getting anymore logs from kafka-console-consumer command.
Is there a way clear Kafka's cache/offset maybe?
Source and sink connection are up and running. Entire cluster is also healthy, thing is that everything worked as usual but every couple weeks I see this coming back or when I log into my Mongo cloud from other location.
--partition 0 parameter didn't help, changing retention_ms to 1 too.
I checked both connectors' status and got RUNNING:
curl localhost:8083/connectors | jq
curl localhost:8083/connectors/monit_people/status | jq
Running docker-compose logs connect I found:
WARN Failed to resume change stream: Resume of change stream was not possible, as the resume point may no longer be in the oplog. 286
If the resume token is no longer available then there is the potential for data loss.
Saved resume tokens are managed by Kafka and stored with the offset data.
When running Connect in standalone mode offsets are configured using the:
`offset.storage.file.filename` configuration.
When running Connect in distributed mode the offsets are stored in a topic.
Use the `kafka-consumer-groups.sh` tool with the `--reset-offsets` flag to reset offsets.
Resetting the offset will allow for the connector to be resume from the latest resume token.
Using `copy.existing=true` ensures that all data will be outputted by the connector but it will duplicate existing data.
Future releases will support a configurable `errors.tolerance` level for the source connector and make use of the `postBatchResumeToken
Issue requires more practice with Confluent Platform thus for now I re-built entire environment by removing entire container with:
docker system prune -a -f --volumes
docker container stop $(docker container ls -a -q -f "label=io.confluent.docker").
After running docker-compose up -d all is up and working.

How to stop/terminate confluent JDBC source connector?

I am running the confluent JDBC source connector to read from a DB table and publish to a Kafka Topic. The Connector is started by a Job-scheduler and I need to stop the connector after it has published all the rows from the DB table. Any idea how to stop it gracefully ?
You can use the REST API to pause (or delete) a connector
PUT /connectors/:name/pause
There is no "notification" to know if all records are loaded, though, so in the JDBC Source, you can also schedule the bulk mode with a long time delay (say a whole week), then schedule the connector deletion.
to pause it, run this from a command shell (that has CURL installed):
curl -X PUT <host>:8083/connectors/<connector_name>/pause
to resume back again you use:
curl -X PUT <host>:8083/connectors/<connector_name>/resume
to see whether it's paused or not, use:
curl <host>:8083/connectors/<connector_name>/status | jq
the "jq" part makes it more readable.

Is it possible to log all incoming messages in Apache Kafka

I need to know if it possible to configure logging for Apache Kafka broker to write all produced/consumed topics and it's messages.
I've been looking the log4j.properties but none of the suggested properties seems to do what I need.
Thanks in advance.
Looking the generated logging files by Kafka, none of them seems to log the messages written in the different topics.
UPDATE:
Not exactly what I was looking for, but for anyone looking something similar I found: https://github.com/kafka-lens/kafka-lens which provides a friendly GUI to view messages on different topics.
I feel like there's some confusion with the word "log".
As you're talking about log4j, I assume you're talking about what I'd call "application logs". Kafka does not write the records it handles in application/log4j logs. In Kafka, log4j logs are only used to trace errors and give some context about the work brokers are doing.
On the other hand, Kafka write/read records into/from its "log", the Kafka log. These are stored in the path specified by log.dirs (/tmp/kafka-logs by default) and are not directly readable. You can use the DumpLogSegments tool to read these files if you want, for example:
bin/kafka-run-class.sh kafka.tools.DumpLogSegments \
-f /tmp/kafka-logs/topic-0/00000000000000000000.log

How to read only new changes from a file using kafka producer

I am currently using windows machine and able to read whole file through command prompt using Kafka producer and consumer. I need to only get the recent changes in a file and need to use it in as input for Apache flink. I tried using this link but due to kafka client jar mismatch issue, i was not able to use it.
In my current approach when i call my producer each time it loads the whole file and we need to run it every time to see the changes occurred to file. I thought of using threads and some way of comparing difference in file using java code but is there any of doing only by Kafka.
I had similar problem recently (but in Linux) and solved it following way:
tail -f somefile.log | kafka-console-producer.sh ...
In your case you can try some Windows alternatives to Linux's tail: 13 Ways to Tail a Log File on Windows & Linux