How to check the progress status of the messages in kafka? - apache-kafka

I have designed the REST Post API in java which actually publishes the message to particular Kafka topic, lets say its "ProductTopic".
In the background, a microservice is listening to this "ProductTopic" topic and start to consume the message and saves to DB. Now i would like write a GET REST API to see the progress(which gives the output of job) of the job, like how much messages are successfully consumed and how is still pending. So that end user will have an idea about what's happening.
Is there a way to achieve this ? I did searched a lot in google, all i see was the command line query to see the consumption of the messages. Not any java implementation example available from confluent side. Any help would be appreciated.

You should check consumer lag for the consumer group of your service. Lag is approximately endOffset-currentOffset. You can find examples here

Related

How to ensure exactly once processing in Kafka for older version 0.10.0

I have a scenario where I need to call an api to the notification service which will send an email to the user.
Based on the message that appears to the topic I need to consume and call an api.
But if the consumer fails in between due to the nature of the at-least once configuration of the consumer its hard to avoid duplicate processing.
Sorry for newbie question. I Tried to find the many blogs but most of them are using newer version of Kafka but in our infra they are still using older version.
Wondering how can we achieve this behavior?
Links that I referred earlier :
If exactly-once semantics are impossible, what theoretical constraint is Kafka relaxing?
I also saw there documentation where they mention to store the processing result and offset to the same storage. But my scenario is api call lets imagine if the consumer call api and dies in between before committing offset then next poll will get that message and call api. In this way email will be sent twice.
Any help is appreciated.

Throttling of messages on consumer side

I am beginner level at kafka and have developed consumer for kafka messages which looks good right now.
Though there is a requirement came along while testing of consumer that may be some throttling of messages will be needed at consumer side.
The consumer (.net core, using confluent), after receiving messages, calls api and api processes the message. As part this process, It has few number of read and write to database.
The scenario is, Consumer may receive millions or atleast few thousand of messages daily. This makes load on DB side as part of processing.
So I am thinking to put some throttling on receiving messages on kafka consumer so the DB will not be overloaded. I have checked the option for poll but seems its not all that I want.
For example, within 10 minutes, consumer can receive 100k messages only. Something like that.
Could anybody please suggest how to implement throttling of messages on kafka consumer or is there any better way that this can be handled?
I investigated more and come to know from expert that "throttling on consumer side is not easy to implement, since kafka consumer is implemented in such way to read and process messages as soon as they are available in kafka topic. So, speed is a benefit in kafka world :)"
Seems I can not do much at kafka consumer side. I am thinking to see on the other side and may be separating reads (to replica) and writes to the database can help.

Process messages pushed through Kafka

I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.
Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data

Mixing communication methods for microservices

I am working on a project which is actually will be a better version of an old project. We want it to be scalable to be able to deal with high load. So we decided to go with microservices instead of monolithic. Then I started to do research about microservices, how they communicate, common design patterns and other things. Since I want my services to be scalable, event based communication made sense to me. So I decided to use kafka for this purpose.
We have much more services in the system but to simplify my question lets say I have 2 types of services which are work-node and master-node. I want both of them to be scalable. For now they are communicating over kafka.
My question : for a case I want to publish an event (produce a message on a topic) from master-node and get that event (consume from the topic) from all work-nodes. But for an other case I need to send a message to specific work-node. To be able to cover first case, all my work-nodes have different group ids in kafka and when a message published on a topic they all get that message. I know that I am not able to send a message to specific consumer with kafka. Since my nodes are scalable and their number can increase or decrease depending on the load, creating a topic for each node does not seem a good idea. My first solution was adding work-node id in message. So other work-nodes can ignore that message. Well it works but I don't think it is a good solution. My second solution is sending http request if I am going to send a message to specific node. But I don't know mixing 2 communication methods is a good solution.
What do you guys think about this problem. Is there a better solution that I am missing ? Or my whole design is going wrong ?
Kafka is not an appropriate technology for the use case you describe. I would recommend using Cadence Workflow which natively supports routing tasks to specific nodes as well as dozens of other features that messaging systems lack.
Feel free to join Cadence Workflow slack channel if you have specific questions.
I think you should able to. Consider regular Kafka flow. You have some consumer groups subscribed to the topic. Producer doesn't send message to specific partition until you specify.
Now think about the scenario that you produce some message based on your algorithm to the specific partitions.
Message received from A
some kind of algorithm like hashcode generated always 0 for A
Message send to Partition 0
Consumer 1 connected to Partiton 0
Only Consumer 1 gets the message coming from A

Using a Kafka consumer in order for a message to be consumed by exactly once semantics

I am new to Kafka and I am seeking guidance on how to use Kafka in order to implement the following message pattern:
First, I want the message to be asynchronous and furthermore it needs to be "consumed" i.e. a single consumer should consume it and other consumers won't be able to consume it thereafter.
A use case of this message pattern is when you have multiple instances of a "delivery service" and you want only one of these instances to consume the message (this assumes one cannot leverage idempotency for some reason).
Can someone please advise how to configure the Kafka Consumer in order to achieve the above?
I think you're essentially looking to use Kafka as a traditional message queue (e.g. Rabbit MQ) where in the message gets removed after consumption. There has been quite a lot of debate on this. As it is always the case, there are merits and demerits on both sides of the fence.
The answers on this post are more or less against the idea ...
However...
This article talks about an approach on how you could possibly try and make it work. The messages won't really be deleted but the approach is quite similar. It is a fairly comprehensive post that covers the overhead and the optimisations that you could explore to make it more efficient.
I hope this helps!
Great question and its something a lot of us struggle with when deploying and using Kafka. In fact, there are a number of times where a project I was working on tried to use Kafka for the use case you described with very little success.
In a nutshell, there are a few Message Exchange Patterns that you come across when dealing with messaging:
Request->Reply
Publish/Subscribe
Queuing (which is what you are trying to do)
Without digging too deep into why, Kafka was really built simply for Publish/Subscribe. There are other products that implement the other features separately and one that actually does all three.
So a question I have for you is would you be open to using something other than Kafka for this project?
You may use spring kafka to do this. Spring Kafka takes care of lot of configurations and boiler plate code. Check example here https://www.baeldung.com/spring-kafka. This should get your started.
Also, you may need to read on how Kafka actually works. The messages that you publish to the Topics in Kafka are natively asynchronous. Your producers don't worry about who consumes it or what happens to the messages once published.
Then consumers in your delivery services should subscribe to the topics. If you want your delivery services to consume a message only once, then the consumers for your delivery services should be in the same group (same group id). Kafka takes care of making sure that the message that was consumed by one of the Consumers (in a same group) won't be available to other Consumers.
The default message retention period is seven days which is configurable in Kafka.