It is observerd that when data is sending through API from an external user to this real time portal, he is getting success response and when we are trying to see the data from reports, no data is found. I am trying to identify this issue.
In the infra, there is a Kafka Server with one broker only. When I try to see the list of consumers & producers, I didn't find the file of consumer group. Can anyone suggest where to search for that or any other suggestion.
Consumer groups are stored in the __consumer_offsets topic.
Topics have files in the Kafka data location, but it's not directly readable from there
There's nothing built in to Kafka that'll allow you to see active producers.
Related
I am interested in monitoring the consuming behavior. In particular, I would like to know when which messages were ready by which consumer group. Is there an offset or consumer history that I can access?
If it helps, I use Confluent Cloud for setting up the topics, etc.
If I understand your question correctly, you would like to know when events were processed by your consumer?
In that case, you should just add logging to your consumer code, then use a log-collection tool like Elasticsearch or Splunk like you'd use for tracking logs/history across any other services.
We've implemented some resilience in our kafka consumer by having a main topic, a retry topic and an error topic as outlined in this blog.
I'm wondering what patterns teams are using out there to redrive events in the error topic back into the retry topic for reprocessing. Do you use some kind of GUI to help do this redrive? I foresee a need to potentially append all events from the error topic into the retry topic, but also to selectively skip certain events in the error topic if they can't be reprocessed.
Two patterns I've seen
redeploy the app with a new topic config (via environment variables or other external config).
Or use a scheduled task within the code that checks the upstream DLQ topic(s)
If you want to use a GUI, that's fine, but seems like more work for little gain as there's no tooling already built around that
I want to process logs data from Kafka streaming to PySpark and save to Parquet files, but I don't know how to input the data to Spark. Please help me thanks.
My answer is on high level. You need to use spark-streaming and need to have some basic understanding of messaging systems like Kafka.
The application that sends data into Kafka (or any messaging system) is called "producer" and the application that receives data from Kafka is called as "consumer". When producer sends data, it will send data to a specific "topic". Multiple producers can send data to Kafka layer under different topics.
You basically need to create a consumer application. To do that, first you need to identify the topic you are going to consume data from.
You can find many sample programs online. Following page can help you to build your first application
https://www.rittmanmead.com/blog/2017/01/getting-started-with-spark-streaming-with-python-and-kafka/
I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.
Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data
I have connected a producer to IBM Message Hub in Bluemix. How could I get a view of topic and its depth in message hub. Is there a web console where i can see messages count?
Thanks
Raj
There isn't really the concept of a topic depth in Kafka, as Kafka is a commit log. You're not reading messages off a queue, you're reading messages from a point in the log, you can specify where in the log you start, and reading a message doesn't remove it from the log. So, the number of messages available is not affected by a read operation, but an individual consumer's place on the log moves.
Partitions per topic can be found from the Message Hub UI that lists the topics names, as well as the retention policy for the topic.
Input/Output rates for each topic are available in the Grafana tool.