kafka-avro-console-producer Quick Start fails - confluent-platform

I'm using kafka-avro-console-producer from confluent-3.0.0 and error occurs when I execute the following:
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test1234 --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/tonydao/dev/bin/confluent-3.0.0/share/java/kafka-serde-tools/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/tonydao/dev/bin/confluent-3.0.0/share/java/confluent-common/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/tonydao/dev/bin/confluent-3.0.0/share/java/schema-registry/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
{"f1":"value1"}
{"f1":"value2"}
org.apache.kafka.common.errors.SerializationException: Error deserializing json to Avro of schema {"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}
Caused by: java.io.EOFException
at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:138)
at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:219)
at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:363)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:355)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at io.confluent.kafka.formatter.AvroMessageReader.jsonToAvro(AvroMessageReader.java:189)
at io.confluent.kafka.formatter.AvroMessageReader.readMessage(AvroMessageReader.java:157)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:55)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)

Make sure that you are running all the required services(zookeeper kafka server and schema registry) from the confluent kafka package only.
You might have used some other version of kafka earlier on the same server and might need to clean the logs directory (/tmp/kafka is the default one)
Make sure you are not hitting Enter without providing data as it is considered as null and results into exception.
Try with a completely new topic

This occurs when you enter a NULL value in the producer message. Looks like it can't convert NULL from json->avro. Need to just enter the json and press enter, then ctrl+d when complete.
I noticed my avro producer didn't have a '>' character to specify it was accepting messages. So I was pressing enter to get some response from the script when I hit this.

Firstly, when you are running this command, please add escape characters to the double quotes as follows and press enter once:
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test1234 --property value.schema='{\"type\":\"record\",\"name\":\"myrecord\",\"fields\":[{\"name\":\"f1\",\"type\":\"string\"}]}'
Once you run this and have pressed enter once, type in a json object without escape characters for double quotes as follows:
{"f1":"value1"}
After each json object, press enter only once, if you press twice, it will take null as the next json object because of which you got that error. There is no acknowledgement that you get after entering json object, but it would have been sent to kafka already, wait for a min or two and check in control center of kafka, the messages should be there in the given topic. Just press command c and come out of the producer console.

Related

How to send 1 Kafka message with a key?

I am using this command: bin/kafka-console-producer.sh --bootstrap-server url --topic Notifications --property "parse.key=true" --property "key.separator=:"
I enter this command, and enter:
1: {"producer": 4, "message": {"created_date": "2020-08-04 15:03:19.063196"}}
Then it immediately expects a second line. But I only want to send 1 message, so I hit enter again, and get
org.apache.kafka.common.KafkaException: No key found on line 2: q at kafka.tools.ConsoleProducer$LineMessageReader.readMessage(ConsoleProducer.scala:289) at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:51) at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
I think that the message is actually sent. kafka-console-producer accepts the input line by line. Every line (when you enter a newline character by hitting [Enter]) submits a message to the topic. After that kafka-console-producer continues to wait for the next input, should you send more messages. However, because for the next input you attempt to send an empty string, it doesn't parse indeed and the error is produced. This error, though, is for the second entry only. The first message should have been, normally, already sent by that time.
As #Patrick Kelly suggest in the comments, you may confirm this assumption by running a consumer on the topic in question and see what's in there. For example, run the kafka-console-consumer whilst entering the messages in the kafka-console-producer prompt.

unable to read avro message via kafka-avro-console-consumer (end goal read it via spark streaming)

(end goal) before trying out whether i could eventually read avro data, usng spark stream, out of the Confluent Platform like some described here: Integrating Spark Structured Streaming with the Confluent Schema Registry
I'd to verify whether I could use below command to read them:
$ kafka-avro-console-consumer \
> --topic my-topic-produced-using-file-pulse-xml \
> --from-beginning \
> --bootstrap-server localhost:9092 \
> --property schema.registry.url=http://localhost:8081
I receive this error message, Unknown magic byte
Processed a total of 1 messages
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
note, The message can be read like this (using console-consumer instead of avro-console-consumer):
kafka-console-consumer \
--bootstrap-server localhost:9092 --group my-group-console \
--from-beginning \
--topic my-topic-produced-using-file-pulse-xml
The message was produced using confluent connect file-pulse (1.5.2) reading xml file (streamthoughts/kafka-connect-file-pulse)
Please help here:
Did I use the kafka-avro-console-consumer wrong?
I tried "deserializer" properties options described here: https://stackoverflow.com/a/57703102/4582240, did not help
I did not want to be brave to start the spark streaming to read the data yet.
the file-pulse 1.5.2 properties i used are like below added 11/09/2020 for completion.
name=connect-file-pulse-xml
connector.class=io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector
topic= my-topic-produced-using-file-pulse-xml
tasks.max=1
# File types
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.RegexFileListFilter
file.filter.regex.pattern=.*\\.xml$
task.reader.class=io.streamthoughts.kafka.connect.filepulse.reader.XMLFileInputReader
force.array.on.fields=sometagNameInXml
# File scanning
fs.cleanup.policy.class=io.streamthoughts.kafka.connect.filepulse.clean.LogCleanupPolicy
fs.scanner.class=io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker
fs.scan.directory.path=/tmp/kafka-connect/xml/
fs.scan.interval.ms=10000
# Internal Reporting
internal.kafka.reporter.bootstrap.servers=localhost:9092
internal.kafka.reporter.id=connect-file-pulse-xml
internal.kafka.reporter.topic=connect-file-pulse-status
# Track file by name
offset.strategy=name
If you are getting Unknown Magic Byte with the consumer, then the producer didn't use the Confluent AvroSerializer, and might have pushed Avro data that doesn't use the Schema Registry.
Without seeing the Producer code or consuming and inspecting the data in binary format, it is difficult to know which is the case.
The message was produced using confluent connect file-pulse
Did you use value.converter with the AvroConverter class?

Kafka Connect FileStreamSink connector removes quotation marks and changes colon to equal sign for JSON message

Summary
When I stream this with the console producer
{"id":1337,"status":"example_topic_1 success"}
I get this in from my filestream consumer
/data/example_topic_1.txt
{id=1337, status=example_topic_1 success}
This is a major problem for me, because the original JSON message cannot be recovered without making assumptions about where the quotes used to be. How can I output the messages to a file, while preserving the quotation marks?
Details
First, I start my file sink connector.
# sh bin/connect-standalone.sh \
> config/worker.properties \
> config/connect-file-sink-example_topic_1.properties
Second, I start console consumer (also built in to Kafka) so that I have easy visual confirmation that the messages are coming through correctly.
# sh bin/kafka-console-consumer.sh \
> --bootstrap-server kafka_broker:9092 \
> --topic example_topic_1
Finally, I start a console producer for sending messages, and I enter a message.
# sh bin/kafka-console-producer.sh \
> --broker-list kafka_broker:9092 \
> --topic example_topic_1
From the console consumer, the message pops out correctly, with quotes.
{"id":1337,"status":"example_topic_1 success"}
But I get this from my the FileStreamSink consumer:
/data/example_topic_1.txt
{id=1337, status=example_topic_1 success}
My Configuration
config/worker.properties
offset.storage.file.filename=/tmp/example.offsets
bootstrap.servers=kafka_broker:9092
offset.flush.interval.ms=10000
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
config/connect-file-sink-example_topic_1.properties
name=file-sink-example_topic_1
connector.class=FileStreamSink
tasks.max=1
file=/data/example_topic_1.txt
topics=example_topic_1
Since you're not actually wanting to parse the JSON data, but just pass it straight through as a lump of text, you need to use the StringConverter:
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
This article explains more about the nuances of converters: https://rmoff.net/2019/05/08/when-a-kafka-connect-converter-is-not-a-converter/. This shows an example of what you're trying to do, although uses kafkacat in place of the console producer/consumer.

Kafka console producer skipping messages

I'm trying to send a file to a topic using:
cat myfile | kafka-console-producer.sh --broker-list $BROKER_URL --topic mytopic
When I check the count of messages on the topic I see few hundred messages less than actual.
During the write I see a message:
[2017-11-15 14:05:26,864] WARN Error while fetching metadata with correlation id 0 : {abc123=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I have correctly set the advertised hostname and listeners.
What confuses me is that if leader is not available how does it manage to put any messages into the topic? Furthermore, the message appears randomly, sometimes it doesn't.
How can I debug this?
As pointed out by vahid in comments this is a know issue.
The workaround is to specify --request-required-acks 1 to the console producer.
The random occurence of LEADER_NOT_AVAILABLE happens when I write to a new topic without explicitly creating it first. (Thanks to amethystic)

checking kafka data if compressed

The document said add the line compression.codec=gzip in producer.properties to make the message compressed.
However when I open the data file such as: 0000000000000.log I found the data does not look like it is compressed. How should check whether the data in kafka is compressed already?
P.S: Every testing I would stop the Kafka cluster and Zookeeper and deleted all of the data in kafka-logs and Zookeeper,then start the server again and create a new topic.
The Java ProducerConfig class has changed for this config.
public static final String COMPRESSION_TYPE_CONFIG = "compression.type";
I've successfully produced messages with the java client (0.8.2.1) using the ProducerConfig.COMPRESSION_TYPE_CONFIG and it works fine.
Example:
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");
Or set compression.type=gzip in your server.properties file for the Java client.
Update for cli tool
Read the usage for the command line tools:
chrisblack:kafka:% ./bin/kafka-console-producer.sh
...
--compression-codec [compression-codec] The compression codec: either 'none',
'gzip', 'snappy', or 'lz4'.If
specified without value, then it
defaults to 'gzip'
...
--new-producer Use the new producer implementation.
--producer-property <producer_prop> A mechanism to pass user-defined
properties in the form key=value to
the producer.
--property <prop> A mechanism to pass user-defined
properties in the form key=value to
the message reader. This allows
custom configuration for a user-
defined message reader.
...
Try running a similar command from the shell:
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test_compression --compression-codec