How to list producers writing to a certain kafka topic using kafka CLI ?
There is no command line tool available that is able to list all producers for a certain topic.
This would require that in Kafka there is a central place where all producers and their metadata are being stored which is not the case (as opposed to consumers and their ConsumerGroups).
Related
I am trying to plot an overall topology for my Kafka cluster (i.e., producers-->topics-->consumers).
For the mapping from topics to consumers, I'm able to obtain it using the kafka-consumer-groups.sh script.
However, for the mapping from producers to topics, I understand there is no equivalent script in vanilla Kafka.
Question:
Does the Schema Registry allow us to associate metadata with producers and/or topics or otherwise create a mapping of all producers producing to a particular topic?
Schema Registry has no such functionality
Closest I've seen to something like this, is using Distributed Tracing (Brave library) or Cloudera's SMM tool, which requires authorized Kafka clients so it can trace requests and Producer client.id to topics, then consumer instances to groups
There's also the Stream Registry project
which I helped with the initial version for the vision of managing client state/discovery, but I think it took different direction and the documentation is not maintained
I run a system comprising an InfluxDB, a Kafka Broker and data sources (sensors) producing time series data. The purpose of the broker is to protect the database from inbound event overload and as a format-agnostic platform for ingesting data. The data is transferred from Kafka to InfluxDB via Apache Camel routes.
I would like to use Kafka a intermediate message buffer in case a Camel route crashes or becomes unavailable - which is the most often error in the system. Up to now, I didn’t achieve to configure Kafka in a manner that inbound messages remain available for later consumption.
How do I configure it properly?
The messages will retain in Kafka topics based on its retention policies (you can choose between time or byte size limits) as described in the Topic Configurations. With
cleanup.policy=delete
Retention.ms=-1
the messages will in a Kafka topic will never be deleted.
Then your camel consumer will be able to re-read all messages (offsets) if you select a new consumer group or reset the offsets of the existing consumer group. Otherwise, your camel consumer might auto commit the messages (check corresponding consumer configuration) and it will not be possible to re-read offsets again for the same consumer group.
To limit the consumption rate of the camel consumer you may adjust configurations like maxPollRecords or fetchMaxBytes which are described in the docs.
I know about what is Producer and Consumer. But official documentation says
It is streaming platform.
It is enterprise messaging system.
Kafka has connectors which are import and export data from databases and other system also.
What does it mean?
I know Producers are client applications which send data to Kafka Broker and Consumers are also client applications which read data from Kafka Broker.
But my question is, can a Consumer push data into Kafka Broker?
And as per my knowledge, I assume that if Consumer wants to push data into Kafka Broker, it becomes a Producer. Is that correct?
1.It is a streaming platform.
It is used for distribution of a data on a public-subscriber model with a storage layer and processing layer.
2.It is an enterprise messaging system.
Big Data infrastructure is open source, so big data market cost per year approximately $40B and may be increased day by day. So it has come to host of hardware. Despite the open source nature of much of his software, there's a lot of money to be made.
3.Kafka has connectors which are import and export data from databases
and other systems also.
Kafka connect provides connectors i.e. Source connector, Sink Connector, JDBC Connector. It provides a facility to importing data from sources and exporting it to multiple targets.
Producers: It can only push data to a Kafka broker or we can say publish data.
Consumers: It can only pull data from the Kafka broker.
A producer produces/puts/publishes messages and as consumer consumes/gets/reads messages.
A consumer can only read, when you want to write you need a producer. A consumer cannot become a producer.
A producer only push data to a Kafka broker.
A consumer only pull data from a Kafka broker.
However, you can have a program being both, a producer and a consumer.
I am working on setting up a Kafka Connect Distributed Mode application which will be a Kafka to S3 pipeline. I am using Kafka 0.10.1.0-1 and Kafka Connect 3.1.1-1. So far things are going smoothly but one aspect that is important to the larger system I am working with requires knowing offset information of the Kafka -> FileSystem pipeline. According to the documentation, the offset.storage.topic configuration will be the location the distributed mode application uses for storing offset information. This makes sense given how Kafka stores consumer offsets in the 'new' Kafka. However, after doing some testing with the FileStreamSinkConnector, nothing is being written to my offset.storage.topic which is the default value: connect-offsets.
To be specific, I am using a Python Kafka producer to push data to a topic and using Kafka Connect with the FileStreamSinkConnect to output the data from the topic to a file. This works and behaves as I expect the connector to behave. Additionally, when I stop the connector and start the connector, the application remembers the state in the topic and there is no data duplication. However, when I go to the offset.storage.topic to see what offset metadata is stored, there is nothing in the topic.
This is the command that I use:
kafka-console-consumer --bootstrap-server kafka1:9092,kafka2:9092,kafka3:9092 --topic connect-offsets --from-beginning
I receive this message after letting this command run for a minute or so:
Processed a total of 0 messages
So to summarize, I have 2 questions:
Why is offset metadata not being written to the topic that should be storing this even though my distributed application is keeping state correctly?
How do I access offset metadata information for a Kafka Connect distributed mode application? This is 100% necessary for my team's Lambda Architecture implementation of our system.
Thanks for the help.
Liju is correct, connect-offsets is used to track offsets for source connectors (which have a producer but not a consumer). Sink connector have a consumer and track offsets the usual way - __consumer_offsets topic
The best way to look at last committed offsets is with the consumer group tool:
bin/kafka-consumer-groups.sh --group connect-elastic-login-connector --bootstrap-server localhost:9092 --describe
The group name is always "connect-" and the connector name (in my case, elastic-login-connector). This will show the latest offset committed by the group, which basically acknowledges that all messages up to this offset were written to Elastic.
The offsets might be committing to the kafka default offset commit topic i.e. _consumer_offsets
The new S3 Connector released by Confluent might be of interested to you.
From what you describe, maybe it can significantly simplify your goal of exporting records from Kafka to your S3 buckets.
I'm a new user of Apache Kafka and I'm still getting to know the internals.
In my use case, I need to increase the number of partitions of a topic dynamically from the Kafka Producer client.
I found other similar questions regarding increasing the partition size, but they utilize the zookeeper configuration. But my kafkaProducer has only the Kafka broker config, but not the zookeeper config.
Is there any way I can increase the number of partitions of a topic from the Producer side? I'm running Kafka version 0.10.0.0.
As of Kafka 0.10.0.1 (latest release): As Manav said it is not possible to increase the number of partitions from the Producer client.
Looking ahead (next releases): In an upcoming version of Kafka, clients will be able to perform some topic management actions, as outlined in KIP-4. A lot of the KIP-4 functionality is already completed and available in Kafka's trunk; the code in trunk as of today allows client to create and to delete topics. But unfortunately, for your use case, increasing the number of partitions is still not possible yet -- this is in scope for KIP-4 (see Alter Topics Request) but is not completed yet.
TL;DR: The next versions of Kafka will allow you to increase the number of partitions of a Kafka topic, but this functionality is not yet available.
It is not possible to increase the number of partitions from the Producer client.
Any specific use case use why you cannot use the broker to achieve this ?
But my kafkaProducer has only the Kafka broker config, but not the
zookeeper config.
I don't think any client will let you change the broker config. You can only access (read) the server side config at max.
Your producer can provide different keys for ProducerRecord's. The broker would place them in different partitions. For example, if you need two partitions, use keys "abc" and "xyz".
This can be done in version 0.9 as well.