Consuming messages from a Hazelcast Queue only once in a distributed environment

Consuming messages from a Hazelcast Queue only once in a distributed environment - queue

I have a similar question as this post:
Consume message only once from Topic per listeners running in cluster
When I tried using a queue to publish messages and added an item listener in two different JVMs, I am receiving the messages twice in both of them. I want to receive the message only once in a clustered/distributed environments.
Here's my code snippet:
Publishing of the message:
getQueue().add("some sample message");
I have the same listener configured in two different JVMs which goes like this:
public HazelcastQueueListener(){
HazelcastInstance instance = HazelcastClient.newHazelcastClient(HazelClientConfig.getClientConfig());
IQueue<String> queue1 = instance.getQueue("SAMPLEQUEUE");
queue1.addItemListener(this, false);
}
public static void main(String args[]){
HazelcastQueueListener listener = new HazelcastQueueListener();
}
#Override
public void itemAdded(ItemEvent<String> arg0) {
// TODO Auto-generated method stub
if(arg0!=null){
System.out.println("Item coming out of queue 1" +arg0);
}
else{
System.out.println("null");
}
}

You have to poll the queue, like a standard java BlockingQueue in order to consume an item only once.
String item = queue1.take()
AFAIK, Hazelcast doesn't support asynchronous operation on queue. The ItemListener doesn't consume the item, it only notifies that an item is available.

Related

Spring Cloud Stream Kafka Commit Failed since the group is rebalanced

I have got the CommitFailedException for some time-consuming Spring Cloud Stream applications. I know to fix this issue I need to set the max.poll.records and max.poll.interval.ms to match my expectations for the time it takes to process the batch. However, I am not quite sure how to set it for consumers in Spring Cloud Stream.
Exception:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:808) at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:691) at
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1416) at
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1377) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:1554) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:1418) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:739) at
org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:700) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.lang.Thread.run(Thread.java:748)
Moreover, how can I ensure this situation won't happen at all? Or alternatively, how can I inject some sort of roll-back in the case of this exception? The reason is I am doing some other external works and once it is finished I publish the output message accordingly. Therefore, if the message cannot get published due to any issues after the work was done on the external system, I have to revert it back (some sort of atomic transaction over Kafka publish and other external systems).

You can set arbitrary Kafka properties either at the binder level documentation here
spring.cloud.stream.kafka.binder.consumerProperties
Key/Value map of arbitrary Kafka client consumer properties. In addition to support known Kafka consumer properties, unknown consumer properties are allowed here as well. Properties here supersede any properties set in boot and in the configuration property above.
Default: Empty map.
e.g. spring.cloud.stream.kafka.binder.consumerProperties.max.poll.records=10
Or at the binding level documentation here.
spring.cloud.stream.kafka.bindings.<channelName>.consumer.configuration
Map with a key/value pair containing generic Kafka consumer properties. In addition to having Kafka consumer properties, other configuration properties can be passed here. For example some properties needed by the application such as spring.cloud.stream.kafka.bindings.input.consumer.configuration.foo=bar.
Default: Empty map.
e.g. spring.cloud.stream.kafka.bindings.input.consumer.configuration.max.poll.records=10
You can get notified of commit failures by adding an OffsetCommitCallback to the listener container's ContainerProperties and setting syncCommits to false. To customize the container and its properties, add a ListenerContainerCustomizer bean to the application.
EDIT
Async commit callback...
#SpringBootApplication
#EnableBinding(Sink.class)
public class So57970152Application {
public static void main(String[] args) {
SpringApplication.run(So57970152Application.class, args);
}
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<byte[], byte[]>> customizer() {
return (container, dest, group) -> {
container.getContainerProperties().setAckMode(AckMode.RECORD);
container.getContainerProperties().setSyncCommits(false);
container.getContainerProperties().setCommitCallback((map, ex) -> {
if (ex == null) {
System.out.println("Successful commit for " + map);
}
else {
System.out.println("Commit failed for " + map + ": " + ex.getMessage());
}
});
container.getContainerProperties().setClientId("so57970152");
};
}
#StreamListener(Sink.INPUT)
public void listen(String in) {
System.out.println(in);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template) {
return args -> {
template.send("input", "foo".getBytes());
};
}
}
Manual commits (sync)...
#SpringBootApplication
#EnableBinding(Sink.class)
public class So57970152Application {
public static void main(String[] args) {
SpringApplication.run(So57970152Application.class, args);
}
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<byte[], byte[]>> customizer() {
return (container, dest, group) -> {
container.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
container.getContainerProperties().setClientId("so57970152");
};
}
#StreamListener(Sink.INPUT)
public void listen(String in, #Header(KafkaHeaders.ACKNOWLEDGMENT) Acknowledgment ack) {
System.out.println(in);
try {
ack.acknowledge(); // MUST USE MANUAL_IMMEDIATE for this to work.
System.out.println("Commit successful");
}
catch (Exception e) {
System.out.println("Commit failed " + e.getMessage());
}
}
#Bean
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template) {
return args -> {
template.send("input", "foo".getBytes());
};
}
}

Set you heartbeat interval to less that 1/3rd of your session timeout. If the broker cannot determine if your consumer is alive, it will initiate a partition rebalance among the remaining consumers. So you have a heartbeat thread to inform the broker that the consumer is alive in case the application is taking a bit longer to process. Change these in your consumer configs:
heartbeat.interval.ms
session.timeout.ms
Try increasing the session timeout if it does not work. You have to fiddle around with these values.

How to handle backpressure when using Reactor Kafka?

I'm using Reactor Kafka to both consume and produce Kafka events. In the case of consuming events my consumer is slow and therefor I need to handle backpressure.
However, I experience that no matter what I call Subscription.request() with, the publisher will publish all events from the topic immediately, therefor overwhelming the consumer.
I'm using a custom Subscriber, setting a small number of initial request by calling Subscription.request(), when I subscribe to KafkaReceiver.receive() to do this. To my understanding this is how I tell the publisher how many events my consumer initially wants.
My subscriber:
public class KafkaEventSubscriber extends BaseSubscriber {
private final int numberOfItemsToRequestOnSubscribe;
private final int numberOfItemsToRequestOnNext;
public KafkaEventSubscriber(int numberOfItemsToRequestOnSubscribe,
int numberOfItemsToRequestOnNext) {
this.numberOfItemsToRequestOnSubscribe = numberOfItemsToRequestOnSubscribe;
this.numberOfItemsToRequestOnNext = numberOfItemsToRequestOnNext;
}
#Override
protected void hookOnSubscribe(Subscription subscription) {
subscription.request(numberOfItemsToRequestOnSubscribe);
}
#Override
protected void hookOnNext(EnrichedMetadata value) {
request(numberOfItemsToRequestOnNext);
}
}
How I use the subscriber:
kafkaReceiver.receive().map(ReceiverRecord::value).map(KafkaConsumer::acknowledge).subscribe(new KafkaEventSubscriber(10, 1));
I expect the KafkaReceiver to output 10 events before any call to the subscribers onNext() method is done, but the KafkaReceiver outputs all events that are not already ACK:ed from the topic.
I experience that no matter what we call Subscription.request() with, the publisher will publish all events from the topic immediately, not respecting the backpressure measures I've been taking.

How process() method in kafka-stream-processor is called automatically?

I am learning kafka streams and have written a simple app, snippet below:
MainApp:
Topology topology = new Topology();
topology.addSource("SOURCE", "source-topic");
topology.addProcessor("Processor1", () -> new Processor1(), "SOURCE");
topology.addProcessor("Processor2", () -> new Processor2(), "Processor1");
topology.addProcessor("Processor3", () -> new Processor3(), "Processor2");
topology.addSink("SINK", "sink-topic", "Processor3");
KafkaStreams streams = new KafkaStreams(topology, config);
streams.start();
Snippet of Individual stream proccesor:
public class Processor1 implements Processor<String, String> {
// Rest of code
#Override
public void process(String key, String value) {
System.out.println("Inside Processor1#process() method");
context.forward(key, value);
}
I understood that we need to create Topology and then to initiate it, we invoke streams.start();
I am not able to understand how process() method is being invoked automatically and who calls it?

Processor process() method invoked by ProcessorContextImpl class automatically on each incoming message for specific topology node.
For your built topology, when a message arrived at the incoming topic, SOURCE node consumes it and forwards (propagates) message to child node by internally calling forward method (you could debug/take a look at code from class ProcessorContextImpl). In your case, SOURCE node forwards key and value to child node Processor1. After that, process() method from class Processor1 triggered. When code reaches context.forward(), message forwards to the next child node, Processor2. After that message propagates to Processor3 and SINK nodes in a similar manner, and finally, message produced to outbound topic. Such pipeline for specific message executes on a single thread (and if you have a default value for config num.stream.threads = 1, all messages will be processed on a single thread per app instance).

Samza: Delay processing of messages until timestamp

I'm processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I'd like to postpone the processing until after that timestamp. In the meantime, I'd like to keep processing other incoming messages.
What I tried to do is make my Task queue the messages and implement the WindowableTask to periodically check the messages if their timestamp allows to process them. The basic idea looks like this:
public class MyTask implements StreamTask, WindowableTask {
private HashSet<MyMessage> waitingMessages = new HashSet<>();
#Override
public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
MyMessage parsedMessage = MyMessage.parseFrom(message);
if (parsedMessage.getValidFromDateTime().isBeforeNow()) {
// Do the processing
} else {
waitingMessages.add(parsedMessage);
}
}
#Override
public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
for (MyMessage message : waitingMessages) {
if (message.getValidFromDateTime().isBeforeNow()) {
// Do the processing and remove the message from the set
}
}
}
}
This obviously has some downsides. I'd be losing my waiting messages in memory when I redeploy my task. So I'd like to know the best practice for delaying the processing of messages with Samza. Do I need to reemit the messages to the same topic again and again until I can finally process them? We're talking about delaying the processing for a few minutes up to 1-2 hours here.

It's important to keep in mind, when dealing with message queues, is that they perform a very specific function in a system: they hold messages while the processor(s) are busy processing preceding messages. It is expected that a properly-functioning message queue will deliver messages on demand. What this implies is that as soon as a message reaches the head of the queue, the next pull on the queue will yield the message.
Notice that delay is not a configurable part of the equation. Instead, delay is an output variable of a system with a queue. In fact, Little's Law offers some interesting insights into this.
So, in a system where a delay is necessary (for example, to join/wait for a parallel operation to complete), you should be looking at other methods. Typically a queryable database would make sense in this particular instance. If you find yourself keeping messages in a queue for a pre-set period of time, you're actually using the message queue as a database - a function it was not designed to provide. Not only is this risky, but it also has a high likelihood of hurting the performance of your message broker.

I think you could use key-value store of Samza to keep state of your task instance instead of in-memory Set.
It should look something like:
public class MyTask implements StreamTask, WindowableTask, InitableTask {
private KeyValueStore<String, MyMessage> waitingMessages;
#SuppressWarnings("unchecked")
#Override
public void init(Config config, TaskContext context) throws Exception {
this.waitingMessages = (KeyValueStore<String, MyMessage>) context.getStore("messages-store");
}
#Override
public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector,
TaskCoordinator taskCoordinator) {
byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
MyMessage parsedMessage = MyMessage.parseFrom(message);
if (parsedMessage.getValidFromDateTime().isBefore(LocalDate.now())) {
// Do the processing
} else {
waitingMessages.put(parsedMessage.getId(), parsedMessage);
}
}
#Override
public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
KeyValueIterator<String, MyMessage> all = waitingMessages.all();
while(all.hasNext()) {
MyMessage message = all.next().getValue();
// Do the processing and remove the message from the set
}
}
}
If you redeploy you task Samza should recreate state of key-value store (Samza keeps values in special kafka topic related to key-value store). You need of course provide some extra configuration of your store (in above example for messages-store).
You could read about key-value store here (for the latest Samza version):
https://samza.apache.org/learn/documentation/0.14/container/state-management.html

Java server framework to listen to PostgreSQL NOTIFY statements

I need to write a server which listens to PostgreSQL NOTIFY statements and considers each notification as a request to serve (actually, more like a task to process). My main requirements are:
1) A mechanism to poll on PGConnection (Ideally this would be a listener, but in the PgJDBC implementation, we are required to poll for pending notifications. Reference)
2) Execute a callback based on the "request" (using channel name in the NOTIFY notification), on a separate thread.
3) Has thread management stuff built in. (create/delete threads when a task is processed/finished, put on a queue when too many tasks being concurrently processed etc.)
Requirements 1 and 2 are something which are easy for me to implement myself. But I would prefer not to write thread management myself.
Is there an existing framework meeting this requirements? An added advantage would be if the framework automatically generates request statistics.

To be honest, requirement 3 could probably be easily satistied just using standard ExecutorService implementations from Executors, which will allow you to, for example, get a fixed-size thread pool and submit work to them in the form of Runnable or Callable implementations. They will deal with the gory details of creating threads up to the limit etc.. You can then have your listener implement a thin layer of Runnable to collect statistics etc.
Something like:
private final ExecutorService threadPool = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
private final NotificationCallback callback;
private int waiting, executing, succeeded, failed;
public void pollAndDispatch() {
Notification notification;
while ((notification = pollDatabase()) != null) {
final Notification ourNotification = notification;
incrementWaitingCount();
threadPool.submit(new Runnable() {
public void run() {
waitingToExecuting();
try {
callback.processNotification(ourNotification);
executionCompleted();
} catch (Exception e) {
executionFailed();
LOG.error("Exeception thrown while processing notification: " + ourNotification, e);
}
}
});
}
}
// check PGconn for notification and return it, or null if none received
protected Notification pollDatabase() { ... }
// maintain statistics
private synchronized void incrementWaitingCount() { ++waiting; }
private synchronized void waitingToExecuting() { --waiting; ++executing; }
private synchronized void executionCompleted() { --executing; ++succeeded; }
private synchronized void executionFailed() { --executing; ++failed; }
If you want to be fancy, put the notifications onto a JMS queue and use its infrastructure to listen for new items and process them.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Consuming messages from a Hazelcast Queue only once in a distributed environment - queue

You have to poll the queue, like a standard java BlockingQueue in order to consume an item only once. String item = queue1.take() AFAIK, Hazelcast doesn't support asynchronous operation on queue. The ItemListener doesn't consume the item, it only notifies that an item is available.

Related

Spring Cloud Stream Kafka Commit Failed since the group is rebalanced

How to handle backpressure when using Reactor Kafka?

How process() method in kafka-stream-processor is called automatically?

Samza: Delay processing of messages until timestamp

Java server framework to listen to PostgreSQL NOTIFY statements

Categories

Resources