Error while consuming AVRO Kafka Topic from KSQL Stream

Error while consuming AVRO Kafka Topic from KSQL Stream - apache-kafka

I created some dummydata as a Stream in KSQLDB with
VALUE_FORMAT='JSON' TOPIC='MYTOPIC'
The Setup is over Docker-compose. I am running a Kafka Broker, Schema-registry, ksqldbcli, ksqldb-server, zookeeper
Now I want to consume these records from the topic.
My first and last approach was over the commandline with following command
docker run --net=host --rm confluentinc/cp-schema-registry:5.0.0 kafka-avro-console-consumer
--bootstrap-server localhost:29092 --topic DXT --from-beginning --max-messages 10
--property print.key=true --property print.value=true
--value-deserializer io.confluent.kafka.serializers.KafkaAvroDeserializer
--key-deserializer org.apache.kafka.common.serialization.StringDeserializer
But that just returns the error
[2021-04-22 21:45:42,926] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
I also tried it with different use cases in Java Spring but with no prevail. I just cannot consume the created topics.
If I would need to define my own schema, where should I do that and what would be the easiest way because I just created a stream in Ksqldb?
Is there an easy to follow example. I did not specifiy anything else when I created the stream like in the quickstart example on Ksqldb.io. (I added the schema-registry in my deployment)
As I am a noob that is sitting here for almost 10 hours any help would be appreciated.
Edit: I found that pure JSON does not need the Schema-registry with ksqldb. Here.
But how to deserialize it?

If you've written JSON data to the topic then you can read it with the kafka-console-consumer.
The error you're getting (Error deserializing Avro message for id -1…Unknown magic byte!) is because you're using the kafka-avro-console-consumer which attempts to deserialise the topic data as Avro - which it isn't, hence the error.
You can also use PRINT DXT; from within ksqlDB.

Related

Kafka connect MongoDB sink connector using kafka-avro-console-producer

I'm trying to write some documents to MongoDB using the Kafka connect MongoDB connector. I've managed to set up all the components required and start up the connector but when I send the message to Kafka using the kafka-avro-console-producer, Kafka connect is giving me the following error:
org.apache.kafka.connect.errors.DataException: Error: `operationType` field is doc is missing.
I've tried to add this field to the message but then kafka connect is asking me to include a documentKey field. It seems like I need to include some extra fields apart from the payload defined in my schema but I can't find a comprehensive documentation. Does anyone have an example of a kafka message payload (using kafka-avro-console-producer) that goes through a Kafka -> Kafka connect -> MongoDB pipeline?
See following an example of one of the messages I'm sending to Kafka (btw, kafka-avro-console-consumer is able to consume the messages):
./kafka-avro-console-producer --broker-list kafka:9093 --topic sampledata --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"field1","type":"string"}]}'
{"field1": "value1"}
And see also following the configuration of the sink connector:
{"name": "mongo-sink",
"config": {
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://schemaregistry:8081",
"connection.uri":"mongodb://cadb:27017",
"database":"cognitive_assistant",
"collection":"topicData",
"topics":"sampledata6",
"change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
}
}

I've just managed to make the connector work. I deleted the change.data.capture.handler property from the connector configuration and it works now.

unable to read avro message via kafka-avro-console-consumer (end goal read it via spark streaming)

(end goal) before trying out whether i could eventually read avro data, usng spark stream, out of the Confluent Platform like some described here: Integrating Spark Structured Streaming with the Confluent Schema Registry
I'd to verify whether I could use below command to read them:
$ kafka-avro-console-consumer \
> --topic my-topic-produced-using-file-pulse-xml \
> --from-beginning \
> --bootstrap-server localhost:9092 \
> --property schema.registry.url=http://localhost:8081
I receive this error message, Unknown magic byte
Processed a total of 1 messages
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
[2020-09-10 12:59:54,795] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
note, The message can be read like this (using console-consumer instead of avro-console-consumer):
kafka-console-consumer \
--bootstrap-server localhost:9092 --group my-group-console \
--from-beginning \
--topic my-topic-produced-using-file-pulse-xml
The message was produced using confluent connect file-pulse (1.5.2) reading xml file (streamthoughts/kafka-connect-file-pulse)
Please help here:
Did I use the kafka-avro-console-consumer wrong?
I tried "deserializer" properties options described here: https://stackoverflow.com/a/57703102/4582240, did not help
I did not want to be brave to start the spark streaming to read the data yet.
the file-pulse 1.5.2 properties i used are like below added 11/09/2020 for completion.
name=connect-file-pulse-xml
connector.class=io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector
topic= my-topic-produced-using-file-pulse-xml
tasks.max=1
# File types
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.RegexFileListFilter
file.filter.regex.pattern=.*\\.xml$
task.reader.class=io.streamthoughts.kafka.connect.filepulse.reader.XMLFileInputReader
force.array.on.fields=sometagNameInXml
# File scanning
fs.cleanup.policy.class=io.streamthoughts.kafka.connect.filepulse.clean.LogCleanupPolicy
fs.scanner.class=io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker
fs.scan.directory.path=/tmp/kafka-connect/xml/
fs.scan.interval.ms=10000
# Internal Reporting
internal.kafka.reporter.bootstrap.servers=localhost:9092
internal.kafka.reporter.id=connect-file-pulse-xml
internal.kafka.reporter.topic=connect-file-pulse-status
# Track file by name
offset.strategy=name

If you are getting Unknown Magic Byte with the consumer, then the producer didn't use the Confluent AvroSerializer, and might have pushed Avro data that doesn't use the Schema Registry.
Without seeing the Producer code or consuming and inspecting the data in binary format, it is difficult to know which is the case.
The message was produced using confluent connect file-pulse
Did you use value.converter with the AvroConverter class?

Configure Apache Kafka sink jdbc connector

I want to send the data sent to the topic to a postgresql-database. So I follow this guide and have configured the properties-file like this:
name=transaction-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=transactions
connection.url=jdbc:postgresql://localhost:5432/db
connection.user=db-user
connection.password=
auto.create=true
insert.mode=insert
table.name.format=transaction
pk.mode=none
I start the connector with
./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-jdbc/sink-quickstart-postgresql.properties
The sink-connector is created but does not start due to this error:
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
The schema is in avro-format and registered and I can send (produce) messages to the topic and read (consume) from it. But I can't seem to sent it to the database.
This is my ./etc/schema-registry/connect-avro-standalone.properties
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
This is a producer feeding the topic using the java-api:
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
properties.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
try (KafkaProducer<String, Transaction> producer = new KafkaProducer<>(properties)) {
Transaction transaction = new Transaction();
transaction.setFoo("foo");
transaction.setBar("bar");
UUID uuid = UUID.randomUUID();
final ProducerRecord<String, Transaction> record = new ProducerRecord<>(TOPIC, uuid.toString(), transaction);
producer.send(record);
}
I'm verifying data is properly serialized and deserialized using
./bin/kafka-avro-console-consumer --bootstrap-server localhost:9092 \
--property schema.registry.url=http://localhost:8081 \
--topic transactions \
--from-beginning --max-messages 1
The database is up and running.

This is not correct:
The unknown magic byte can be due to a id-field not part of the schema
What that error means that the message on the topic was not serialised using the Schema Registry Avro serialiser.
How are you putting data on the topic?
Maybe all the messages have the problem, maybe only some—but by default this will halt the Kafka Connect task.
You can set
"errors.tolerance":"all",
to get it to ignore messages that it can't deserialise. But if all of them are not correctly Avro serialised this won't help and you need to serialise them correctly, or choose a different Converter (e.g. if they're actually JSON, use the JSONConverter).
These references should help you more:
https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues
http://rmoff.dev/ksldn19-kafka-connect
Edit :
If you are serialising the key with StringSerializer then you need to use this in your Connect config:
key.converter=org.apache.kafka.connect.storage.StringConverter
You can set it at the worker (global property, applies to all connectors that you run on it), or just for this connector (i.e. put it in the connector properties itself, it will override the worker settings)

Missing required argument "[zookeeper]"

i'm trying to start a consumer using Apache Kafka, it used to work well, but i had to format my pc and reinstall everything again, and now when trying to run this:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
this is what i'm getting:
Missing required argument "[zookeeper]"
Option Description
------ -----------
--blacklist <blacklist> Blacklist of topics to exclude from
consumption.
--bootstrap-server <server to connect
to>
--consumer.config <config file> Consumer config properties file.
--csv-reporter-enabled If set, the CSV metrics reporter will
be enabled
--delete-consumer-offsets If specified, the consumer path in
zookeeper is deleted when starting up
--formatter <class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--from-beginning If the consumer does not already have
an established offset to consume
from, start with the earliest
message present in the log rather
than the latest message.
--key-deserializer <deserializer for
key>
--max-messages <Integer: num_messages> The maximum number of messages to
consume before exiting. If not set,
consumption is continual.
--metrics-dir <metrics directory> If csv-reporter-enable is set, and
this parameter isset, the csv
metrics will be outputed here
--new-consumer Use the new consumer implementation.
--property <prop>
--skip-message-on-error If there is an error when processing a
message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms> If specified, exit if no message is
available for consumption for the
specified interval.
--topic <topic> The topic id to consume on.
--value-deserializer <deserializer for
values>
--whitelist <whitelist> Whitelist of topics to include for
consumption.
--zookeeper <urls> REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.
my guess is that there's some kind of problem with the zookeeper connection port, because it's telling me to specify the port which zookeeper has to use to get connected to kafka. I'm not sure of this though, and don't know how to figure out the port to specify if this was the problem. Any suggestions??
Thanks in advance for the help

It looks like you are using an old version of the Kafka tools that requires to set --new-consumer if you want to directly connect to the brokers.
I'd recommend picking a recent version of Kafka so you only need to specify --bootstrap-server like in your example: http://kafka.apache.org/downloads

Error retrieving Avro schema for id 1, Subject not found.; error code: 40401

Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 1
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
Confluent Version 4.1.0
I am consuming data from a couple of topics(topic_1, topic_2) using KTable, joining the data and then pushing the data onto another topic(topic_out) using KStream. (Ktable.toStream())
The data is in avro format
When I check the schema by using
curl -X GET http://localhost:8081/subjects/
I find
topic_1-value
topic_1-key
topic_2-value
topic_2-key
topic_out-value
but there is no subject with topic_out-key. Why is it not created?
output from topic_out:
kafka-avro-console-consumer --bootstrap-server localhost:9092 --from-beginning --property print.key=true --topic topic_out
"code1 " {"code":{"string":"code1 "},"personid":{"string":"=NA="},"agentoffice":{"string":"lic1 "},"status":{"string":"a"},"sourcesystem":{"string":"ILS"},"lastupdate":{"long":1527240990138}}
I can see the key being generated, but no subject for key.
Why is subject with key required?
I am feeding this topic to another connector (hdfs-sink) to push the data to hdfs but it fails with below error
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema for id 5\nCaused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
when I look at the schema-registry.logs, I can see:
[2018-05-24 15:40:06,230] INFO 127.0.0.1 - -
[24/May/2018:15:40:06 +0530] "POST /subjects/topic_out-key?deleted=true HTTP/1.1" 404 51 9 (io.confluent.rest-utils.requests:77)
any idea why the subject topic_out-key not being created?

any idea why the subject topic_out-key not being created
Because the Key of your Kafka Streams output is a String, not an Avro encoded string.
You can verify that using kafka-console-consumer instead and adding --property print.value=false and not seeing any special characters compared to the same command when you do print the value (this is showing the data is binary Avro)
From Kafka Connect, you must use Kafka's StringConverter class for key.converter property rather than the Confluent Avro one