Reading all offsets kafka in python - apache-kafka

I am trying to read all messages in the topic of Kafka. I am using confluent cloud service so don't run Kafka in my localhost. I set the configurations as: 'enable.auto.commit': 'True','auto.offset.reset': 'earliest', 'default.topic.config': {'auto.offset.reset': 'smallest'}. However it gives me no message, or if I send message from producer at the same time it gives only that message not all offset messages.
How can I read all offset messages in python?

I didn't use confluent cloud service, If you want to consume all offsets, there are a few things you should pay attention to.
set new consumer groupId that does not consume any data
set 'auto.offset.rese=earliest' or 'auto.offset.reset=smallest' , you need to refer to the kafka version
pay attention to the automatic expiration date of the Topic

Related

Can kafka publish messages to AWS lambda

I have to publish messages from a kafka topic to lambda to process them and store in a database using a springboot application, i did some research and found something to consume messages from kafka
public Function<KStream<String, String>, KStream<String, String>> process(){} however, im not sure if this is only used to publish the consumed messages to another kafka topic or can be used as an event source to lambda, I need some guidance on consuming and converting the consumed kafka message to event source.
Brokers do not push. Consumers always poll.
Code shown is for Kafka Streams API, which primarily writes to new Kafka topics. While you could fire HTTP events to start a lambda, that's not recommended.
Alternatively, Kafka is already supported as an event source. You don't need to write any consumer code.
https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-now-supports-self-managed-apache-kafka-as-an-event-source/
This is possible from MSK or a self managed Kafka
process them and store in a database
Your lambda could process the data and send to a new Kafka topic using a producer. You can then use MSK Connect or run your own Kafka Connect cluster elsewhere to dump records into a database. No Spring/Java code would be necessary.

Apache Kafka: how to configure message buffering properly

I run a system comprising an InfluxDB, a Kafka Broker and data sources (sensors) producing time series data. The purpose of the broker is to protect the database from inbound event overload and as a format-agnostic platform for ingesting data. The data is transferred from Kafka to InfluxDB via Apache Camel routes.
I would like to use Kafka a intermediate message buffer in case a Camel route crashes or becomes unavailable - which is the most often error in the system. Up to now, I didn’t achieve to configure Kafka in a manner that inbound messages remain available for later consumption.
How do I configure it properly?
The messages will retain in Kafka topics based on its retention policies (you can choose between time or byte size limits) as described in the Topic Configurations. With
cleanup.policy=delete
Retention.ms=-1
the messages will in a Kafka topic will never be deleted.
Then your camel consumer will be able to re-read all messages (offsets) if you select a new consumer group or reset the offsets of the existing consumer group. Otherwise, your camel consumer might auto commit the messages (check corresponding consumer configuration) and it will not be possible to re-read offsets again for the same consumer group.
To limit the consumption rate of the camel consumer you may adjust configurations like maxPollRecords or fetchMaxBytes which are described in the docs.

Google Pubsub vs Kafka comparison on the restart of pipeline

I am trying to write an ingestion application on GCP by using Apache Beam.I should write it in a streaming way to read data from Kafka or pubsub topics and then ingest to datasource.
while it seems straight forward to write it with pubsub and apache beam but my question is what would happen if my ingestion fails or to be restarted and if it again reads all data from the start of pubsub topic or like kafka it can read from latest committed offsets in the topic?
Pub/sub messages are persisted until they are delivered and acknowledge by the subscribers which receives pending messages from its subscription. Once the message is acknowledge, it's removed from the subscription's queue.
For more information regarding the message flow, check this document
Hope it helps.

Apache Kafka Consumer-Producer Confusion

I know about what is Producer and Consumer. But official documentation says
It is streaming platform.
It is enterprise messaging system.
Kafka has connectors which are import and export data from databases and other system also.
What does it mean?
I know Producers are client applications which send data to Kafka Broker and Consumers are also client applications which read data from Kafka Broker.
But my question is, can a Consumer push data into Kafka Broker?
And as per my knowledge, I assume that if Consumer wants to push data into Kafka Broker, it becomes a Producer. Is that correct?
1.It is a streaming platform.
It is used for distribution of a data on a public-subscriber model with a storage layer and processing layer.
2.It is an enterprise messaging system.
Big Data infrastructure is open source, so big data market cost per year approximately $40B and may be increased day by day. So it has come to host of hardware. Despite the open source nature of much of his software, there's a lot of money to be made.
3.Kafka has connectors which are import and export data from databases
and other systems also.
Kafka connect provides connectors i.e. Source connector, Sink Connector, JDBC Connector. It provides a facility to importing data from sources and exporting it to multiple targets.
Producers: It can only push data to a Kafka broker or we can say publish data.
Consumers: It can only pull data from the Kafka broker.
A producer produces/puts/publishes messages and as consumer consumes/gets/reads messages.
A consumer can only read, when you want to write you need a producer. A consumer cannot become a producer.
A producer only push data to a Kafka broker.
A consumer only pull data from a Kafka broker.
However, you can have a program being both, a producer and a consumer.

kafka messages monitoring to show the actual messages published or consumed

I have kafka installed on my local server, and through some other application running in the server produers are publishing messages to the brokers inside of my kafka server, through the zookeeper I can easily see the health of my kafka which shows all the topics created inside my kafka server a, offsets inside topics etc etc, so only thing zookeeper is not able to show is the messages that are inside the individual topics, so someone recommended kafka-manager tool, I installed and ran it, it worked fine, it showed lot of information from my kafka server, but still it was not able to show real messages that are published or consumed by respective consumers inside my kafka server, so my question is , is there a way/tool/code to find out the messages published or consumed, I mean in addition to this kafka-manager or I have Install some plugins inside of the same kafka-manager so that it will also show the respective messages.Thanks in advance!!
A Kafka broker cannot tell you how many messages are have been consumed for a given consumer on a given topic. The only thing that a Kafka broker knows about is the current log offset of the consumer, and the current max offset of the log. It cannot however, tell you how many messages before the current offset the consumer actually received, as it keeps no counters around this, and the consumer defines its own initial position (as well as being able to seek to various places in the log).
You can get both of these numbers using the $KAFKA_HOME/bin/kafka-consumer-offset-checker.sh script.