Redirect only error to log file from console consumer - apache-kafka

I am using kafka console consumer, and i want to redirect error thrown from kafka process to a log file. Requirement is such that if a process runs without error i redirect output of my topic to a file and am using command like kafka-console-consumer --topic example --group-id mygroup -bootstrap-server broker:9092 --from-beginning --property print.key=true . Now if there is an error while say connecting to topic or permission on say group id , i want that error to be redirected to a file say /tmp/myerrors.txt ( errors thrown by kafka or system error ) but in case there is no issues and i start getting data for the topic i redirect data to another file \home\abcd\data.txt . Is it feasible ? I want it so that error and data files are separate and due to an issue on mid-day my files which is meant for collecting data does not get the error logged into it.

Related

unable to read avro message via kafka-avro-console-consumer (end goal read it via spark streaming)

(end goal) before trying out whether i could eventually read avro data, usng spark stream, out of the Confluent Platform like some described here: Integrating Spark Structured Streaming with the Confluent Schema Registry
I'd to verify whether I could use below command to read them:
$ kafka-avro-console-consumer \
> --topic my-topic-produced-using-file-pulse-xml \
> --from-beginning \
> --bootstrap-server localhost:9092 \
> --property schema.registry.url=http://localhost:8081
I receive this error message, Unknown magic byte
Processed a total of 1 messages
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
note, The message can be read like this (using console-consumer instead of avro-console-consumer):
kafka-console-consumer \
--bootstrap-server localhost:9092 --group my-group-console \
--from-beginning \
--topic my-topic-produced-using-file-pulse-xml
The message was produced using confluent connect file-pulse (1.5.2) reading xml file (streamthoughts/kafka-connect-file-pulse)
Please help here:
Did I use the kafka-avro-console-consumer wrong?
I tried "deserializer" properties options described here: https://stackoverflow.com/a/57703102/4582240, did not help
I did not want to be brave to start the spark streaming to read the data yet.
the file-pulse 1.5.2 properties i used are like below added 11/09/2020 for completion.
name=connect-file-pulse-xml
connector.class=io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector
topic= my-topic-produced-using-file-pulse-xml
tasks.max=1
# File types
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.RegexFileListFilter
file.filter.regex.pattern=.*\\.xml$
task.reader.class=io.streamthoughts.kafka.connect.filepulse.reader.XMLFileInputReader
force.array.on.fields=sometagNameInXml
# File scanning
fs.cleanup.policy.class=io.streamthoughts.kafka.connect.filepulse.clean.LogCleanupPolicy
fs.scanner.class=io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker
fs.scan.directory.path=/tmp/kafka-connect/xml/
fs.scan.interval.ms=10000
# Internal Reporting
internal.kafka.reporter.bootstrap.servers=localhost:9092
internal.kafka.reporter.id=connect-file-pulse-xml
internal.kafka.reporter.topic=connect-file-pulse-status
# Track file by name
offset.strategy=name
If you are getting Unknown Magic Byte with the consumer, then the producer didn't use the Confluent AvroSerializer, and might have pushed Avro data that doesn't use the Schema Registry.
Without seeing the Producer code or consuming and inspecting the data in binary format, it is difficult to know which is the case.
The message was produced using confluent connect file-pulse
Did you use value.converter with the AvroConverter class?

Produce/Consume to Remote Kafka Does not Work

I have set up a AWS EC2 instance running Apache Kafka 0.8 via a Bitnami AMI image. The server properties are pretty much default (Kafka located at localhost:9092 and zookeeper located at localhost:2181).
When I SSH into the machine, I can produce/consume data using the scripts provided by Kafka, located at kafka/bin.
To produce I run the following command:
./kafka-console-producer.sh --broker-list localhost:9092 --topic test
To Consume:
./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This works correctly, thus I have determined that Kafka is functioning correctly. Next I attempted to produce/consume from my machine, using the python library pykafka:
client = KafkaClient(hosts = KAFKA_HOST)
topic = client.topics[sys.argv[1]]
try:
with topic.get_producer(max_queued_messages=1, auto_start=True) as producer:
while True:
for i in range(10):
message = "Test message sent on: " + str(datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y"))
encoded_message = message.encode("utf-8")
mess = producer.produce(encoded_message)
except Exception as error:
print('Something went wrong; printing exception:')
print(error)
And I consume as follows:
client = KafkaClient(hosts = KAFKA_HOST)
topic = client.topics[sys.argv[1]]
try:
while True:
consumer = topic.get_simple_consumer(auto_start=True)
for message in consumer:
if message is not None:
print (message.offset, message.value)
except Exception as error:
print('Something went wrong; printing exception:')
print(error)
These snippets run without errors or exceptions, but no messages are produced or consumed, not even the ones created via the local scripts.
I have confirmed that both ports 9092 and 2181 are open via telnet.
My questions are as follows:
Is there a way to debug such problems and find the root cause? I would expect the library to throw an exception if there is some connectivity issues.
What is going on?

Error getting kafka consumer lag with kafka-consumer-groups.sh

I’m having an issue with using the Kafka command line tools to get consumer lag for a given group/topic. Currently, I'm trying to use kafka-consumer-groups.sh as mentioned in countless online resources. The following command works just fine: kafka-consumer-groups.sh --bootstrap-server $BROKERS --list
However, if I use kafka-consumer-groups.sh --bootstrap-server $BROKERS --describe --group $GROUP, I get the following output and error:
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
[2018-07-24 00:19:09,139] ERROR admin-client-network-thread exited (kafka.admin.AdminClient) java.lang.NullPointerException
at org.apache.kafka.common.utils.Utils.join(Utils.java:399)
at org.apache.kafka.common.requests.OffsetFetchRequest$Builder.toString(OffsetFetchRequest.java:74)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at org.apache.kafka.clients.ClientRequest.toString(ClientRequest.java:63)
at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:374)
at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:332)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:409)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:252)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:199)
at kafka.admin.AdminClient$$anon$1.run(AdminClient.scala:61)
at java.lang.Thread.run(Thread.java:748) Error: Executing consumer group command failed due to The server experienced an unexpected error when processing the request
I've tried setting GROUP to each and every group available and I've used KafkaTool to confirm that these groups exist and are working properly (consuming messages from various topics). I've tried placing strings in the command directly instead of using an environment variable.
Why am I getting this error and what else can I do to debug?

Missing required argument "[zookeeper]"

i'm trying to start a consumer using Apache Kafka, it used to work well, but i had to format my pc and reinstall everything again, and now when trying to run this:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
this is what i'm getting:
Missing required argument "[zookeeper]"
Option Description
------ -----------
--blacklist <blacklist> Blacklist of topics to exclude from
consumption.
--bootstrap-server <server to connect
to>
--consumer.config <config file> Consumer config properties file.
--csv-reporter-enabled If set, the CSV metrics reporter will
be enabled
--delete-consumer-offsets If specified, the consumer path in
zookeeper is deleted when starting up
--formatter <class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--from-beginning If the consumer does not already have
an established offset to consume
from, start with the earliest
message present in the log rather
than the latest message.
--key-deserializer <deserializer for
key>
--max-messages <Integer: num_messages> The maximum number of messages to
consume before exiting. If not set,
consumption is continual.
--metrics-dir <metrics directory> If csv-reporter-enable is set, and
this parameter isset, the csv
metrics will be outputed here
--new-consumer Use the new consumer implementation.
--property <prop>
--skip-message-on-error If there is an error when processing a
message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms> If specified, exit if no message is
available for consumption for the
specified interval.
--topic <topic> The topic id to consume on.
--value-deserializer <deserializer for
values>
--whitelist <whitelist> Whitelist of topics to include for
consumption.
--zookeeper <urls> REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.
my guess is that there's some kind of problem with the zookeeper connection port, because it's telling me to specify the port which zookeeper has to use to get connected to kafka. I'm not sure of this though, and don't know how to figure out the port to specify if this was the problem. Any suggestions??
Thanks in advance for the help
It looks like you are using an old version of the Kafka tools that requires to set --new-consumer if you want to directly connect to the brokers.
I'd recommend picking a recent version of Kafka so you only need to specify --bootstrap-server like in your example: http://kafka.apache.org/downloads

Kafka console producer skipping messages

I'm trying to send a file to a topic using:
cat myfile | kafka-console-producer.sh --broker-list $BROKER_URL --topic mytopic
When I check the count of messages on the topic I see few hundred messages less than actual.
During the write I see a message:
[2017-11-15 14:05:26,864] WARN Error while fetching metadata with correlation id 0 : {abc123=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I have correctly set the advertised hostname and listeners.
What confuses me is that if leader is not available how does it manage to put any messages into the topic? Furthermore, the message appears randomly, sometimes it doesn't.
How can I debug this?
As pointed out by vahid in comments this is a know issue.
The workaround is to specify --request-required-acks 1 to the console producer.
The random occurence of LEADER_NOT_AVAILABLE happens when I write to a new topic without explicitly creating it first. (Thanks to amethystic)