Kafka consumer not able to consume messages using bootstrap server name - apache-kafka

I am facing an issue while consuming message using the bootstrap-server i.e. Kafka server. Any idea why is it not able to consume messages without zookeeper?
Kafka Version: kafka_2.11-1.0.0
Zookeeper Version: kafka_2.11-1.0.0
Zookeeper Host and port: zkp02.mp.com:2181
Kafka Host and port: kfk03.mp.com:9092
Producing some message:
[kfk03.mp.com ~]$ /bnsf/kafka/bin/kafka-console-producer.sh --broker-list kfk03.mp.com:9092 --topic test
>hi
>hi
Consumer not able to consume messages if I give –-bootstrap-server:
[kfk03.mp.com ~]$
/bnsf/kafka/bin/kafka-console-consumer.sh --bootstrap-server kfk03.mp.com:9092 --topic test --from-beginning
Consumer able to consume messages when --zookeeper server is given instead of --bootstrap-server -:
[kfk03.mp.com ~]$ /bnsf/kafka/bin/kafka-console-consumer.sh --zookeeper zkp02.mp.com:2181 --topic test --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
hi
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
hi
hi
uttam
hi
hi
hi
hello
hi
^CProcessed a total of 17 messages

While consuming messages from kafka using bootstrap-server parameter, the connection happens via the kafka server instead of zookeeper. Kafka broker stores offset details in __consumer_offsets topic.
Check if __consumer_offsets is present in your topics list. If it's not there, check kafka logs to find the reason.
We faced a similar issue. In our case the __consumer_offsets was not created because of the following error:
ERROR [KafkaApi-1001] Number of alive brokers '1' does not meet the required replication factor '3' for the offsets topic (configured via 'offsets.topic.replication.factor').

Related

Kafka console consumer to read avro messages in HDP 3

Trying to consume kafka Avro messages from console consumer and not exactly sure how to deserialize the messages.
sh /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server localhost:6667 --topic test --consumer.config /home/user/kafka.consumer.properties --from-beginning --value-deserializer ByteArrayDeserializer
The Avro Schema in Schema Registry for the test topic is:
{
"type": "record",
"namespace": "test",
"name": "TestRecord",
"fields": [
{
"name": "Name",
"type": "string",
"default": "null"
},
{
"name": "Age",
"type": "int",
"default": -1
}
]
}
Using HDP 3.1 version and Kafka-clients-2.0.0.3.1.0.0-78
Could someone help me what would be the Deserializer required to read Avro messages from console.
Use kafka-avro-console-consumer
e.g.
sh /usr/hdp/current/kafka-broker/bin/kafka-avro-console-consumer.sh \
--bootstrap-server localhost:6667 \
--topic test \
--from-beginning \
--property schema.registry.url=http://localhost:8081

Forwarding messages from Kafka to Elasticsearch and Postgresql

I am trying to build an infrastructure in which I need to forward messages from one kafka topic to elasticsearch and postgresql. My infrastructure looks like in the picture below, and it all runs on the same host. Logstash is making some anonymization and some mutates, and sends the document back to kafka as json. Kafka should then forward the message to PostgreSQL and Elasticsearch
Everything works fine, accept the connection to postgresql, with which i'm having some trouble.
My config files looks like follows:
sink-quickstart-sqlite.properties
name=jdbc-test-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
#table.name.format=${topic}
topics=processed
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable:true
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable:true
connection.url=jdbc:postgresql://localhost:5432/postgres
connection.user=postgres
connection.password=luka
insert.mode=upsert
pk.mode=kafka
pk_fields=__connect_topic,__connect_partition,__connect_offset
fields.whitelist=ident,auth,response,request,clientip
auto.create=true
auto.evolve=true
confluent-distributed.properties
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=10000
plugin.path=/usr/share/java
quicstart-elasticsearch.properties
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
#topics=test-elasticsearch-sink
topics=processed
key.ignore=true
connection.url=http://localhost:9200
type.name=kafka-connect
schema.ignore=true
schemas.enable=false
The confluent-schema-registry service is running.
I'm getting the following error after curl http://localhost:8083/connectors/jdbc-sink/status | jq
{
"name": "jdbc-sink",
"connector": {
"state": "RUNNING",
"worker_id": "192.168.50.37:8083"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "192.168.50.37:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:465)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires \"schema\" and \"payload\" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:359)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
"
}
],
"type": "sink"
}
This looks like a message in my "processed" topic (the message in the topic is a one-liner, this is just formated):
{
"ROWTIME": 1587134287569,
"ROWKEY": "null",
"bytes": "4050",
"input": {
"type": "log"
},
"clientip": "156.226.170.95",
"#timestamp": "2020-04-17T14:38:06.346Z",
"timestamp": "17/Apr/2020:14:37:57 +0000",
"#version": "1",
"request": "/lists",
"ident": "536e605f097a92cb07c2a0a81f809f481c5af00d13305f0094057907792ce65e2a62b8ab8ba036f95a840504c3e2f05a",
"httpversion": "1.1",
"auth": "33a7f4a829adfaa60085eca1b84e0ae8f0aa2835d206ac765c22ad440e50d7ae462adafb13306aecfdcd6bd80b52b03f",
"agent": {
"ephemeral_id": "053b9c29-9038-4a89-a2b3-a5d8362460fe",
"version": "7.6.2",
"id": "50e21169-5aa0-496f-b792-3936e9c8de04",
"hostname": "HOSTNAME_OF_MY_AWS_INSTANCE",
"type": "filebeat"
},
"log": {
"offset": 707943,
"file": {
"path": "/home/ec2-user/log/apache.log"
}
},
"host": {
"name": "HOSTNAME_OF_MY_AWS_INSTANCE"
},
"verb": "DELETE",
"ecs": {
"version": "1.4.0"
},
"response": "503"
}
Please let me know if you need some more information.
Your error is here:
DataException: JsonConverter with schemas.enable requires \"schema\" and \"payload\" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
Since this is the JDBC Sink you must provide a schema to your data. If you have the option, I would suggest you use Avro. If not, you must structure your JSON data as required by Kafka Connect.
More info: https://www.youtube.com/watch?v=b-3qN_tlYR4&t=981s

Clean Kafka topic in a cluster

I know I can clean Kafka topic on a broker by either deleting logs under /data/kafka-logs/topic/* or by setting retention.ms config to 1000. I want to know how can clean topics in a multi-node cluster. Should I stop Kafka process on each broker, delete logs and start Kafka or only leader broker would suffice? If I want to clean by setting retension.ms to 1000, do I need to set it on each broker?
To delete all messages in a specific topic, you can run kafka-delete-records.sh
For example, I have a topic called test, which has 4 partitions.
Create a Json file , for example j.json:
{
"partitions": [
{
"topic": "test",
"partition": 0,
"offset": -1
}, {
"topic": "test",
"partition": 1,
"offset": -1
}, {
"topic": "test",
"partition": 2,
"offset": -1
}, {
"topic": "test",
"partition": 3,
"offset": -1
}
],
"version": 1
}
now delete all messages by this command :
/opt/kafka/confluent-4.1.1/bin/kafdelete-records --bootstrap-server 192.168.XX.XX:9092 --offset-json-file j.json
After executing the command, this message will be displayed
Records delete operation completed:
partition: test-0 low_watermark: 7
partition: test-1 low_watermark: 7
partition: test-2 low_watermark: 7
partition: test-3 low_watermark: 7
if you want to delete one topic, you can use kafka-topics :
for example, I want to delete test topic :
/opt/kafka/confluent-4.0.0/bin/kafka-topics --zookeeper 109.XXX.XX.XX:2181 --delete --topic test
You do not need to restart Kafka

How do I delete/clean Kafka queued messages without deleting Topic

Is there any way to delete queue messages without deleting Kafka topics?
I want to delete queue messages when activating the consumer.
I know there are several ways like:
Resetting retention time
$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic MyTopic --config retention.ms=1000
Deleting kafka files
$ rm -rf /data/kafka-logs/<topic/Partition_name>
In 0.11 or higher you can run the bin/kafka-delete-records.sh command to mark messages for deletion.
https://github.com/apache/kafka/blob/trunk/bin/kafka-delete-records.sh
For example, publish 100 messages
seq 100 | ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mytest
then delete 90 of those 100 messages with the new kafka-delete-records.sh
command line tool
./bin/kafka-delete-records.sh --bootstrap-server localhost:9092 --offset-json-file ./offsetfile.json
where offsetfile.json contains
{"partitions": [{"topic": "mytest", "partition": 0, "offset": 90}], "version":1 }
and then consume the messages from the beginning to verify that 90 of the 100 messages are indeed marked as deleted.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytest --from-beginning
91
92
93
94
95
96
97
98
99
100
To delete all messages in a specific topic, you can run kafka-delete-records.sh
For example, I have a topic called test, which has 4 partitions.
Create a Json file , for example j.json:
{
"partitions": [
{
"topic": "test",
"partition": 0,
"offset": -1
}, {
"topic": "test",
"partition": 1,
"offset": -1
}, {
"topic": "test",
"partition": 2,
"offset": -1
}, {
"topic": "test",
"partition": 3,
"offset": -1
}
],
"version": 1
}
now delete all messages by this command :
/opt/kafka/confluent-4.1.1/bin/kafdelete-records --bootstrap-server 192.168.XX.XX:9092 --offset-json-file j.json
After executing the command, this message will be displayed
Records delete operation completed:
partition: test-0 low_watermark: 7
partition: test-1 low_watermark: 7
partition: test-2 low_watermark: 7
partition: test-3 low_watermark: 7

Increasing Replication Factor in Kafka gives error - "There is an existing assignment running"

I am trying to increase the replication factor of a topic in Apache Kafka.In order to do so I am using the command
kafka-reassign-partitions --zookeeper ${zookeeperid} --reassignment-json-file ${aFile} --execute
Initially my topic has a replication factor of 1 and has 5 partitions, I am trying to increase it's replication factor to 3.There are quite a bit of messages in my topic. When I run the above command the error is - "There is an existing assignment running".
My json file looks like this :
{
"version": 1,
"partitions": [
{
"topic": "IncreaseReplicationTopic",
"partition": 0,
"replicas": [2,4,0]
},{
"topic": "IncreaseReplicationTopic",
"partition": 1,
"replicas": [3,2,1]
}, {
"topic": "IncreaseReplicationTopic",
"partition": 2,
"replicas": [4,1,0]
}, {
"topic": "IncreaseReplicationTopic",
"partition": 3,
"replicas": [0,1,3]
}, {
"topic": "IncreaseReplicationTopic",
"partition": 4,
"replicas": [1,4,2]
}
]
}
I am not able to figure out where I am getting wrong. Any pointers will be greatly appreciated.
This message means that there is already another assignment of any topic is being executed.
Please try it again after some time. Then you won't see this message