Kafka console consumer to read avro messages in HDP 3

Kafka console consumer to read avro messages in HDP 3 - apache-kafka

Trying to consume kafka Avro messages from console consumer and not exactly sure how to deserialize the messages.
sh /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server localhost:6667 --topic test --consumer.config /home/user/kafka.consumer.properties --from-beginning --value-deserializer ByteArrayDeserializer
The Avro Schema in Schema Registry for the test topic is:
{
"type": "record",
"namespace": "test",
"name": "TestRecord",
"fields": [
{
"name": "Name",
"type": "string",
"default": "null"
},
{
"name": "Age",
"type": "int",
"default": -1
}
]
}
Using HDP 3.1 version and Kafka-clients-2.0.0.3.1.0.0-78
Could someone help me what would be the Deserializer required to read Avro messages from console.

Use kafka-avro-console-consumer
e.g.
sh /usr/hdp/current/kafka-broker/bin/kafka-avro-console-consumer.sh \
--bootstrap-server localhost:6667 \
--topic test \
--from-beginning \
--property schema.registry.url=http://localhost:8081

Related

Publishing an AVRO messages to Kafka via Kafka REST

We have deployed Confluent Platform 6.0 in our Kubernetes cluster. I have created a Kafka topic "test-topic-1" via Kafka REST api. Now I'm trying to publish a simple AVRO message to this topic.
curl --location --request POST 'https://kafka-rest-master.k8s.hip.com.au/topics/test-topic-1' \
--header 'Content-Type: application/vnd.kafka.avro.v2+json' \
--header 'Accept: application/vnd.kafka.v2+json' \
--data-raw '{"value_schema":{"type":"record","name":"User","fields":[{"name":"name","type":"string"}]},"records":[{"value":{"name":"testUser"}}]}'
I get a 500 error response for this request,
{"error_code":500,"message":"Internal Server Error"}
When I check the logs of the kafka rest pod, I can see the following error,
ERROR Request Failed with exception
(io.confluent.rest.exceptions.DebuggableExceptionMapper)
com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
deserialize instance of java.lang.String out of START_OBJECT token
at [Source:
(org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
line: 1, column: 17] (through ref erence chain:
io.confluent.kafkarest.entities.v2.SchemaTopicProduceRequest["value_schema"])
at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59)
at com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1445)
Am I following the correct steps to publish an AVRO message to a newly created Kafka topic? If so what could be the problem here?

your AVRO schema definition is wrong. It should be defined as the following
{
"type": "record",
"name": "recordName",
"namespace": "namespace",
"doc": "description",
"fields": [
{
"name": "key",
"type": {
"type": "string",
"avro.java.string": "String"
}
}
]
}
OR
"fields": [
{
"name": "fiedName",
"type": "string"
},
]

Clean Kafka topic in a cluster

I know I can clean Kafka topic on a broker by either deleting logs under /data/kafka-logs/topic/* or by setting retention.ms config to 1000. I want to know how can clean topics in a multi-node cluster. Should I stop Kafka process on each broker, delete logs and start Kafka or only leader broker would suffice? If I want to clean by setting retension.ms to 1000, do I need to set it on each broker?

To delete all messages in a specific topic, you can run kafka-delete-records.sh
For example, I have a topic called test, which has 4 partitions.
Create a Json file , for example j.json:
{
"partitions": [
{
"topic": "test",
"partition": 0,
"offset": -1
}, {
"topic": "test",
"partition": 1,
"offset": -1
}, {
"topic": "test",
"partition": 2,
"offset": -1
}, {
"topic": "test",
"partition": 3,
"offset": -1
}
],
"version": 1
}
now delete all messages by this command :
/opt/kafka/confluent-4.1.1/bin/kafdelete-records --bootstrap-server 192.168.XX.XX:9092 --offset-json-file j.json
After executing the command, this message will be displayed
Records delete operation completed:
partition: test-0 low_watermark: 7
partition: test-1 low_watermark: 7
partition: test-2 low_watermark: 7
partition: test-3 low_watermark: 7
if you want to delete one topic, you can use kafka-topics :
for example, I want to delete test topic :
/opt/kafka/confluent-4.0.0/bin/kafka-topics --zookeeper 109.XXX.XX.XX:2181 --delete --topic test
You do not need to restart Kafka

Kafka consumer not able to consume messages using bootstrap server name

I am facing an issue while consuming message using the bootstrap-server i.e. Kafka server. Any idea why is it not able to consume messages without zookeeper?
Kafka Version: kafka_2.11-1.0.0
Zookeeper Version: kafka_2.11-1.0.0
Zookeeper Host and port: zkp02.mp.com:2181
Kafka Host and port: kfk03.mp.com:9092
Producing some message:
[kfk03.mp.com ~]$ /bnsf/kafka/bin/kafka-console-producer.sh --broker-list kfk03.mp.com:9092 --topic test
>hi
>hi
Consumer not able to consume messages if I give –-bootstrap-server:
[kfk03.mp.com ~]$
/bnsf/kafka/bin/kafka-console-consumer.sh --bootstrap-server kfk03.mp.com:9092 --topic test --from-beginning
Consumer able to consume messages when --zookeeper server is given instead of --bootstrap-server -:
[kfk03.mp.com ~]$ /bnsf/kafka/bin/kafka-console-consumer.sh --zookeeper zkp02.mp.com:2181 --topic test --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
hi
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
hi
hi
uttam
hi
hi
hi
hello
hi
^CProcessed a total of 17 messages

While consuming messages from kafka using bootstrap-server parameter, the connection happens via the kafka server instead of zookeeper. Kafka broker stores offset details in __consumer_offsets topic.
Check if __consumer_offsets is present in your topics list. If it's not there, check kafka logs to find the reason.
We faced a similar issue. In our case the __consumer_offsets was not created because of the following error:
ERROR [KafkaApi-1001] Number of alive brokers '1' does not meet the required replication factor '3' for the offsets topic (configured via 'offsets.topic.replication.factor').

How do I delete/clean Kafka queued messages without deleting Topic

Is there any way to delete queue messages without deleting Kafka topics?
I want to delete queue messages when activating the consumer.
I know there are several ways like:
Resetting retention time
$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic MyTopic --config retention.ms=1000
Deleting kafka files
$ rm -rf /data/kafka-logs/<topic/Partition_name>

In 0.11 or higher you can run the bin/kafka-delete-records.sh command to mark messages for deletion.
https://github.com/apache/kafka/blob/trunk/bin/kafka-delete-records.sh
For example, publish 100 messages
seq 100 | ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mytest
then delete 90 of those 100 messages with the new kafka-delete-records.sh
command line tool
./bin/kafka-delete-records.sh --bootstrap-server localhost:9092 --offset-json-file ./offsetfile.json
where offsetfile.json contains
{"partitions": [{"topic": "mytest", "partition": 0, "offset": 90}], "version":1 }
and then consume the messages from the beginning to verify that 90 of the 100 messages are indeed marked as deleted.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytest --from-beginning
91
92
93
94
95
96
97
98
99
100

To delete all messages in a specific topic, you can run kafka-delete-records.sh
For example, I have a topic called test, which has 4 partitions.
Create a Json file , for example j.json:
{
"partitions": [
{
"topic": "test",
"partition": 0,
"offset": -1
}, {
"topic": "test",
"partition": 1,
"offset": -1
}, {
"topic": "test",
"partition": 2,
"offset": -1
}, {
"topic": "test",
"partition": 3,
"offset": -1
}
],
"version": 1
}
now delete all messages by this command :
/opt/kafka/confluent-4.1.1/bin/kafdelete-records --bootstrap-server 192.168.XX.XX:9092 --offset-json-file j.json
After executing the command, this message will be displayed
Records delete operation completed:
partition: test-0 low_watermark: 7
partition: test-1 low_watermark: 7
partition: test-2 low_watermark: 7
partition: test-3 low_watermark: 7

Kafka Connect JDBC sink connector not working

I am trying to use Kafka Connect JDBC sink connector to insert data into Oracle but it is throwing an error . I have tried with all the possible configurations of the schema. Below is the examples .
Please suggest if I am missing anything below are my configurations files and errors.
Case 1- First Configuration
internal.value.converter.schemas.enable=false .
so I am getting the
[2017-08-28 16:16:26,119] INFO Sink task WorkerSinkTask{id=oracle_sink-0} finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:233)
[2017-08-28 16:16:26,606] INFO Discovered coordinator dfw-appblx097-01.prod.walmart.com:9092 (id: 2147483647 rack: null) for group connect-oracle_sink. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:597)
[2017-08-28 16:16:26,608] INFO Revoking previously assigned partitions [] for group connect-oracle_sink (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:419)
[2017-08-28 16:16:26,609] INFO (Re-)joining group connect-oracle_sink (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:432)
[2017-08-28 16:16:27,174] INFO Successfully joined group connect-oracle_sink with generation 26 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:399)
[2017-08-28 16:16:27,176] INFO Setting newly assigned partitions [DJ-7, DJ-6, DJ-5, DJ-4, DJ-3, DJ-2, DJ-1, DJ-0, DJ-9, DJ-8] for group connect-oracle_sink (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:262)
[2017-08-28 16:16:28,580] ERROR Task oracle_sink-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask:455)
org.apache.kafka.connect.errors.ConnectException: No fields found using key and value schemas for table: DJ
at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:190)
at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:58)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:65)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:62)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:66)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:435)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:251)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:180)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2nd Configuration -
internal.key.converter.schemas.enable=true
internal.value.converter.schemas.enable=true
Log:
[2017-08-28 16:23:50,993] INFO Revoking previously assigned partitions [] for group connect-oracle_sink (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:419)
[2017-08-28 16:23:50,993] INFO (Re-)joining group connect-oracle_sink (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:432)
[2017-08-28 16:23:51,260] INFO (Re-)joining group connect-oracle_sink (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:432)
[2017-08-28 16:23:51,381] INFO Successfully joined group connect-oracle_sink with generation 29 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:399)
[2017-08-28 16:23:51,384] INFO Setting newly assigned partitions [DJ-7, DJ-6, DJ-5, DJ-4, DJ-3, DJ-2, DJ-1, DJ-0, DJ-9, DJ-8] for group connect-oracle_sink (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:262)
[2017-08-28 16:23:51,727] ERROR Task oracle_sink-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:148)
org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:308)
Oracle connector.properties looks like
name=oracle_sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=DJ
connection.url=jdbc:oracle:thin:#hostname:port:sid
connection.user=username
connection.password=password
#key.converter=org.apache.kafka.connect.json.JsonConverter
#value.converter=org.apache.kafka.connect.json.JsonConverter
auto.create=true
auto.evolve=true
Connect-Standalone.properties
My JSON looks like -
{"Item":"12","Sourcing Reason":"corr","Postal Code":"l45","OrderNum":"10023","Intended Node Distance":1125.8,"Chosen Node":"34556","Quantity":1,"Order Date":1503808765201,"Intended Node":"001","Chosen Node Distance":315.8,"Sourcing Logic":"reducesplits"}

Per the documentation
The sink connector requires knowledge of schemas, so you should use a suitable converter e.g. the Avro converter that comes with the schema registry, or the JSON converter with schemas enabled.
So if your data is JSON you would have the following configuration:
[...]
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
[...]
The error you see in the second instance is pertinent -- JsonConverter with schemas.enable requires "schema" and "payload" fields - the JSON you share does not meet this required format.
Here's a simple example of a valid JSON message with schema and payload :
{
"schema": {
"type": "struct",
"fields": [{
"type": "int32",
"optional": true,
"field": "c1"
}, {
"type": "string",
"optional": true,
"field": "c2"
}, {
"type": "int64",
"optional": false,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "create_ts"
}, {
"type": "int64",
"optional": false,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "update_ts"
}],
"optional": false,
"name": "foobar"
},
"payload": {
"c1": 10000,
"c2": "bar",
"create_ts": 1501834166000,
"update_ts": 1501834166000
}
}
What's your source for the data that you're trying to land to Oracle? If it's Kafka Connect inbound then you simply use the same converter configuration (Avro + Confluent Schema Registry) would be easier and more efficient. If it's a custom application, you'll need to get it to either (a) use the Confluent Avro serialiser or (b) write the JSON in the required format above, providing the schema of the payload inline with the message.

I've the same problem, after the read this post. I has been resolved with JDBC Sink MySQL
Below my Kafka Connect Configuration, as additional information:
curl --location --request POST 'http://localhost:8083/connectors/' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "ttib-transactions",
"connection.url": "jdbc:mysql://172.17.0.1:6603/tt-tran?verifyServerCertificate=true&useSSL=false",
"connection.user": "root",
"connection.password": "*******",
"value.converter.schema.registry.url": "https://psrc-j55zm.us-central1.gcp.confluent.cloud",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"insert.mode": "insert",
"batch.size":"0",
"table.name.format": "${topic}",
"pk.fields" :"id"
}
}'

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kafka console consumer to read avro messages in HDP 3 - apache-kafka

Use kafka-avro-console-consumer e.g. sh /usr/hdp/current/kafka-broker/bin/kafka-avro-console-consumer.sh \ --bootstrap-server localhost:6667 \ --topic test \ --from-beginning \ --property schema.registry.url=http://localhost:8081

Related

Publishing an AVRO messages to Kafka via Kafka REST

Clean Kafka topic in a cluster

Kafka consumer not able to consume messages using bootstrap server name

How do I delete/clean Kafka queued messages without deleting Topic

Kafka Connect JDBC sink connector not working

Categories

Resources