Persistent Akka Mailboxes and Losslessness - queue

In Akka, when an actor dies while processing a message (inside onReceive(...) { ... }, that message is lost. Is there a way to guarantee losslessness? Is there a way to configure Akka to always persist messages before sending them to onReceive, so that they can be recovered and replayed when the actor does die?
Perhaps something like a persistent mailbox?

Yes, take a look at Akka Persistence, in particular AtLeastOnceDelivery. This stores messages on the sender side in order to also cover losses during the delivery process, because otherwise the message might not ever reach the destination mailbox.

Related

Message order issue in single consumer connected to ActiveMQ Artemis queue

Any possibility of message order issue while receive single queue consumer and multiple producer?
producer1 publish message m1 at 2021-06-27 02:57:44.513 and producer2 publish message m2 at 2021-06-27 02:57:44.514 on same queue worker_consumer_queue. Client code connected to the queue configured as single consumer should receive message in order m1 first and then m2 correct? Sometimes message receive in wrong order. version is ActiveMQ Artemis 2.17.0.
Even though I mentioned that multiple producer, message publish one after another from same thread using property blockOnDurableSend=false.
I create and close producer on each message publish. On same JVM, my assumption is order of published messages in queue, from same thread or from different threads even with async. timestamp is getJMSTimestamp(). async publish also maintain any internal queue has order?
If you use blockOnDurableSend=false you're basically saying you don't strictly care about the order or even if the message makes it to the broker at all. Using blockOnDurableSend=false basically means "fire and forget."
Furthermore, the JMSTimetamp is not when the message is actually sent as noted in the javax.jms.Message JavaDoc:
The JMSTimestamp header field contains the time a message was handed off to a provider to be sent. It is not the time the message was actually transmitted, because the actual send may occur later due to transactions or other client-side queueing of messages.
With more than one producer there is no guarantee that the messages will be processed in order.
More producers, ActiveMQ Artemis and one consumer are a distributed system and the lack of a global clock is a significant characteristic of distributed systems.
Even if producers and ActiveMQ Artemis were on the same machine and used the same clock, ActiveMQ Artemis could not receive the messages in the same order producers would create and send their messages. Because the time to create a message and the time to send a message include variable time latencies.
The easiest solution is to trust the order of the messages received by ActiveMQ Artemis, adding a timestamp with an interceptor or enabling the ingress timestamp, see ARTEMIS-2919 for further details.
If the easiest solution doesn't work, the distributed solution is to implement a distributed system total ordering algorithm as lamport timestamps.
Well, as it seams it is not a bug within Artemis, when it comes to a millisecond difference it is more like a network lag or something like this.
So to workaround I got to the idea, you could create a algorythm in which a recieved message will wait for ~100ms before it is really worked through (whatever you want to be doing with this message) and check if there is another message which your application recieved afterwards but is send before. So basicly have your own receiver queue with a delay.
IF there is message that was before, you could simply move that up in your personal algorythm. You could also think about to reject the first message back to your bus, depending on your settings on queues and topics it would be able to recieve it afterwards again.

Is batch message sending possible with Sink.actorRefWithAck?

I'm using Akka Streams and I came across Sink.actorRefWithAck. I understand that it sends a message and only tries pulling in another element from the stream when an acknowledgement for the previous message has been received. Is there a way to batch-process messages with this sink? Example: pull five messages and only pull the next five once the first five have been acknowledged. I've thought about something like
source.grouped(5).to(Sink.actorRefWithAck(...))
But that would require the receiver to change to work with sequences, which let's assume is out of the question.
No, that is not possible with Sink.actorRefWithAck() while keeping individual messages being queued in the actor mailbox rather than the entire batch.
One idea to queue up messages in the actor inbox more eagerly would be to use source.mapAsync(n)(ask-actor).to(Sink.ignore). This would send n to the actor and then as soon as the first one gets a response from the actor, it would pull and enqueue a new element.

multiplexing consumer and producer in kafka

In my kafka consumer threads(high level), after I consumed a message I am applying some business logic to this message and forwarding this to a WS. But this webservice may be down sometimes and since I consumed this object from kafka and offset is moved forward, i would missed this object.
One way get rid of from this problem is to disabling autocommit in zookeeper and committing offset by calling programmaticaly but i expect that this is a very costly operation. I will be producing to kafka at about 2000 tps and may increase later times.
Another way - which i am not sure if it is a good idea - is if i face with any problem, producing this consumed object to kafka again but i didn't see any post related to this across all my googleings. Is this a thing which is even not considerable?
Can you please give me some insights about handling this situation.
Thanks
You can post back the failed message to the same topic or another of your choice.
If you use the same topic, you will push the messages at the end of the topic and they will be picked up after the others (so if order matters to you don't do this). Also if the action that you perform before sending the message is not idempotent you will have to something to identifying this records so they don't perform the action twice.
If you use a failed_topic, you can push the messages that you can't send to this topic and when the WS is healthy again you need to create a consumer that consumes all the messages there and sends them to the WS.
Hope it helps!
Moving such messages to an error queue and retrying them later is a well known approach.
See Dead letter channel

Email notification when Kafka producer and consumer goes down

I have developed a data pipeline using Kafka. Right now I have one type of producer and two types of consumers setup in the cluster.
Producer: gets the message from a windows server
Consumer: Consumer A uses Spark Streaming to transform and present a real time view. Consumer B stores the RAW data, might be useful for building the schema at a later stage.
For various reasons starting from network, the consumers do not receive any data and also it is possible that the consumer process might die in case there is a system failure.
I would be interested in knowing if there is a way to implement something which sends you email notification when the consumer stops receiving messages or the consumer thread dies altogether. Do Kafka or Zookeeper provide a way of doing it?
Right now I am thinking of checking the target system if it is receiving messages or not. But in future if the number of targets increase it will be really complex to write email notification systems for individual targets.

How to prevent actor mailbox growing in Scala?

As for as I know, the mailboxes of Scala actors have no size limit. So, if an actor reads messages from its mailbox slower than others send messages to that mailbox, then it eventually creates a memory leak.
How can we make sure that it not does happen? Should we limit the mailbox size anyway ? What are the best practices to prevent the mailbox growing?
Instead of having a push strategy where producers send directly messages to consumers, you could use a pull strategy, where consumers request messages from producers.
To be sure that the reply is almost instantaneous, producers can produce a limited number of data in advance. When they receive a request, first they send one of the pregenerated data, then they generate a new one.
You could also use Akka actors, which provide bounded mailbox.