I have come across Kafka WorkItem for JBPM (7.18.0) for publishing message to a kafka topic.
But how can i trigger a particular workflow when there is a message in topic.
or how can the process resume if there is a message in the topic
As far as I can tell, the only code in there is a producer, not a consumer
https://github.com/kiegroup/jbpm-work-items/tree/master/kafka-workitem/src/main/java/org/jbpm/process/workitem/kafka
I suppose the workaround for this would be write your own consumer, and stick it within a generic Java action, but that would need to be in an infinite polling loop, ideally in a separate thread within the process itself, so I'm not sure how it'll work for triggering later actions
Related
Is there a way to have a event delivered to every subscriber of a topic regardless of consumer group? think "Refresh your local cache" kind of scenario
As far as Kafka in concerned, you cannot subscribe to a topic without a consumer group.
Out of the box, this isn't a pattern of a Kafka consumer; there isn't a way to make all consumers in a group read all messages from all partitions. There cannot be more consumer clients than partitions (thereby making "fan out" hard), and only one message goes to any one consumer before the message offset gets committed and the entire consumer group seeks those offsets forward to consume later events.
You'd need a layer above the consumer to decouple yourself from the consumer-group limitations.
For example, with Interactive Queries, you'd consume a topic and build a StateStore from the data that comes in, effectively building a distributed cache. With that, you can layer in an RPC layer (mentioned in those docs) that allows external applications over a protocol of your choice (e.g. HTTP) to later query and poll that data. From an application that is polling the data, you then would have the option of forwarding "notification events" via any method of your choice.
As for a framework that already exposes most of this out-of-the-box, checkout Azkarra Streams (I have no affiliation)
Or you can use alternative solutions such as Kafka Connect and write data to Slack, Telegram, etc. "message boards" instead, where many people explicitly subscribe to those channel notifications.
Is there a way to notify consumer about the new events being published to kafka topics which consumer has subscribed to while consumer is not actively listening? I know the question itself seems confusing but i was thinking if it is really necessary to have one process running continuously in order to consume messages. I think it will make consumer process easier if we know when the message is available to read.
Consumers read messages by polling the topic, so fundamentally, you must have a process running continuously. If the consumer does not poll within the value of the property max.poll.interval.ms, the consumer will leave the group. A hallmark feature of event-driven architectures is that consumers and producers are decoupled; the consumer does not know whether the producer even exists. Therefore, there is no way to know when a message is available to read without actively polling.
We've implemented some resilience in our kafka consumer by having a main topic, a retry topic and an error topic as outlined in this blog.
I'm wondering what patterns teams are using out there to redrive events in the error topic back into the retry topic for reprocessing. Do you use some kind of GUI to help do this redrive? I foresee a need to potentially append all events from the error topic into the retry topic, but also to selectively skip certain events in the error topic if they can't be reprocessed.
Two patterns I've seen
redeploy the app with a new topic config (via environment variables or other external config).
Or use a scheduled task within the code that checks the upstream DLQ topic(s)
If you want to use a GUI, that's fine, but seems like more work for little gain as there's no tooling already built around that
I have a producer application which sends data to a Kafka topic, but only once in a while, as and when it receives from a source. I also have a consumer application (Spark) which keeps running all the time and receives data from Kafka when producer sends to it.
Since the consumer keeps running all the time, there is wastage of resources at times. Moreover, because my producer sends data only once in a while, is there any way to trigger the consumer application only when a kafka topic gets any data?
Sounds like you shouldn't be using Spark and would rather run some Serverless solution that can be triggered to run code on Kafka events.
Otherwise, run Cron to look at consumer lag. Define a threshold to submit your code at, then batch read from Kafka only then
I have started a Nifi process(Consume Kafka) and connected it to a topic. It is running but I am not able to (don't know) where can I view the messages?
ConsumeKafka processor runs and generates flowfile for each message. Only when you connect a processor to other components like another processor or an output port, will you be able to visualize the data being moved through.
For starters you can try this:
Connect ConsumeKafka with LogAttribute or any other processor for
that matter.
Stop or disable the LogAttribute processor.
Now when
you start ConsumeKafka, all the received messages from the
configured Kafka topic will be queued up in the form of flowfiles.
Right click that relationship where the flowfiles are queued up and
click List Queue and you can access the queue.
Click any item on
the queue, a context menu will come up. Click View button and you
can see the data.
This whole explanation of "viewing" the Kafka message is just to help you in debugging and get started with NiFi. Ideally you would be using other NiFi processors to work out your usecase.
Example
You receive messages from Kafka and wants to write it to MongoDB, so you can have the flow as:
Note:
There are record based processors like ConsumeKafkaRecord and PutMongoRecord but they are basically doing the same thing with more enhancements. Since you're new to this, I have suggested a simple flow. You can find details about the Record based processors here and try that.
You might need to consume messages --from-beginning if those messages have been consumed before (and therefore offsets have been committed).
On GetKafka processor, there is a property Auto Offset Reset which should be set to smallest which is the equivalent of --from-beginning in Kafka Console Consumer.