Clearing Messages From a persistent Akka journal - scala

I am testing a persistent Akka actor. I am using in-memory persistence for this. The test starts and I see the actor recovering persisted messages. I try the following:
I send a message to the actor that makes it trigger deleteMessages(LastMessage). I was hoping this message would cause the journal to be cleared.
The actor does not seem to process this message as the messages being recovered had previously run into an exception. It thus throws the exception and does not proceed to process the message.
How can I clear the persisted the journal?
I also thought the in memory persistence does not recover previous tests messages from the journal

For a more capable in-memory journal implementation to use in tests, I'd recommend using https://github.com/dnvriend/akka-persistence-inmemory.
It supports clearing the journal (and snapshots): https://github.com/dnvriend/akka-persistence-inmemory#clearing-journal-and-snapshot-messages, as well as ReadJournal.

Related

Avoid Data Loss While Processing Messages from Kafka

Looking out for best approach for designing my Kafka Consumer. Basically I would like to see what is the best way to avoid data loss in case there are any
exception/errors during processing the messages.
My use case is as below.
a) The reason why I am using a SERVICE to process the message is - in future I am planning to write an ERROR PROCESSOR application which would run at the end of the day, which will try to process the failed messages (not all messages, but messages which fails because of any dependencies like parent missing) again.
b) I want to make sure there is zero message loss and so I will save the message to a file in case there are any issues while saving the message to DB.
c) In production environment there can be multiple instances of consumer and services running and so there is high chance that multiple applications try to write to the
same file.
Q-1) Is writing to file the only option to avoid data loss ?
Q-2) If it is the only option, how to make sure multiple applications write to the same file and read at the same time ? Please consider in future once the error processor
is build, it might be reading the messages from the same file while another application is trying to write to the file.
ERROR PROCESSOR - Our source is following a event driven mechanics and there is high chance that some times the dependent event (for example, the parent entity for something) might get delayed by a couple of days. So in that case, I want my ERROR PROCESSOR to process the same messages multiple times.
I've run into something similar before. So, diving straight into your questions:
Not necessarily, you could perhaps send those messages back to Kafka in a new topic (let's say - error-topic). So, when your error processor is ready, it could just listen in to the this error-topic and consume those messages as they come in.
I think this question has been addressed in response to the first one. So, instead of using a file to write to and read from and open multiple file handles to do this concurrently, Kafka might be a better choice as it is designed for such problems.
Note: The following point is just some food for thought based on my limited understanding of your problem domain. So, you may just choose to ignore this safely.
One more point worth considering on your design for the service component - You might as well consider merging points 4 and 5 by sending all the error messages back to Kafka. That will enable you to process all error messages in a consistent way as opposed to putting some messages in the error DB and some in Kafka.
EDIT: Based on the additional information on the ERROR PROCESSOR requirement, here's a diagrammatic representation of the solution design.
I've deliberately kept the output of the ERROR PROCESSOR abstract for now just to keep it generic.
I hope this helps!
If you don't commit the consumed message before writing to the database, then nothing would be lost while Kafka retains the message. The tradeoff of that would be that if the consumer did commit to the database, but a Kafka offset commit fails or times out, you'd end up consuming records again and potentially have duplicates being processed in your service.
Even if you did write to a file, you wouldn't be guaranteed ordering unless you opened a file per partition, and ensured all consumers only ran on a single machine (because you're preserving state there, which isn't fault-tolerant). Deduplication would still need handled as well.
Also, rather than write your own consumer to a database, you could look into Kafka Connect framework. For validating a message, you can similarly deploy a Kafka Streams application to filter out bad messages from an input topic out into a topic to send to the DB

Akka save/persist only last state

I'm implementing a directory watcher with the Akka framework : it simply creates an Actor for each new file being created in the directory and materialize a stream with FileTailSource that send the content of the file to a Kafka topic.
I'd like to be able to recover from a crash without reading again the same lines in the file. For this reason an actor state stores the line offset for each file.
I use Akka Persistence journal to persist states and snapshot to reduce the amount of records in journal (with deleteMessages).
It really is overkill as I only need to store the last state unlike a event sourced architecture. The LevelDB journal is also not an option as not supported on the AIX platform where the application is run, I'm stuck with the in-memory journal and its limitations.
Is there a lighter alternative to this architecture ? I could probably use a simple serialization to store the state to disk but I'm curious if this supported on Akka Persistence ?

AKKA.NET Journals and Snapshot Store

Since I have not seen any example of using AKKA.NET Journals and Snapshot store, I assume I have to use both type of actors to implement an Event Store and CQRS.
Is the Snapshot store expected to be updated every time when the actor state is changed, or should be set on a scheduled update like every 10 seconds?
Should the Snapshot store actors talk to the Journal actors only, so the actors having the state should not talk to Journals and Snapshot at the same time? I'm thinking in the line of SOC.
Assume I have to shut down the server and back up. A user tries to access a product (like computers) through a Web UI. At that time, the product actor does not exist in the actor system. To retrieve the state of the product, shouldn't I go to the snapshot store instead of running all the journals to recreate the state?
In Akka.Persistence both Journal and SnapshotStore are in fact actors used to abstract your actors from particular persistent provider. You almost never will have to use them directly - PersistentView and PersistentActor use them automatically under the hood.
Snapshot stores are only way to optimize speed of your actor recovery in case when your persistent actor has a loot of events to recover from. In distributed environment snapshotting without event sourcing is not a mean to achieve persistence . Good idea is to have counter which produces a snapshot after X events being processed by the persistent actor. Time-based updates have no sense - in many cases actor probably didn't changed over specified time. Performance is also bad (lot of unnecessary cycles).
SnapshotStores and Journals are unaware of each other. Akka.Persistence persistent actors have built-in recovering mechanism which handles actor's state recovery from SnapshtoStores and Journals and exposes methods to communicate with them.
As I said you'd probably don't want to communicate with snapshot stores and journals directly. This is what persistent actors/persistent views are for. Ofc you could probably just read actor state directly from backend storage, but the you should compare if there are no other events after latest saved snapshot etc. Recreation of persistent actor/view on different working node is IMO a better solution.

Must msmq queues be transactional?

I've just recently gotten into using Rebus, and noticed that it always creates transactional msmq-queues resulting in heavy traffic to the HDD (0,5 - 5mb/sec). Is this intentional - and can something be done to avoid it?
It's correctly observed that Rebus (with its default MsmqMessageQueue transport) always creates transactional MSMQ queues. It will also refuse to work with non-transactional input queues, throwing an error at startup if you created a non-transactional queue yourself and attempt to use it.
This is because the philosophy in Rebus revolves around the idea that messages are important data, just as important as the data in your SQL Server or whichever database you're using.
And yes, the way MSMQ implements durability is that messages are written to disk when they're sent, so that probably explains the disk activity you're seeing.
If you have a different opinion as to how you'd like your system to treat its messages, there's nothing that prevents you from replacing Rebus' transport with something that can work with non-transactional MSMQ. Keep in mind though, that all of Rebus' delivery guarantees will be void if you do so ;)
We had the very same observation, the annoying aspect is that we have 300/500 KB/sec write on disk even when there are no message on the queue. It seems that only polling from the queue causes a constant write on disk.
Gian Maria.

Akka Slick and ThreadLocal

I'm using slick to store data in database, and there I use the threadLocalSession to store the sessions.
The repositories are used to do the crud, and I have an Akka service layer that access the slick repositories.
I found this link, where Adam Gent asks something near what I'm asking here: Akka and Java libraries that use ThreadLocals
My concern is about how does akka process a message, as I store the database session in a threadLocal, can I have two messages been processed at the same time in the same thread?
Let's say: Two add user messages (A and B) sent to the userservice, and message A is partially processed, and stopped, thread B start to process in the same thread that thread A has started to process, which will have the session stored in it's localSession?
Each actor processes its messages one at a time, in the order it received them*. Therefore, if you send messages A, B to the same actor, then they are never processed concurrently (of course the situation is different if you send each of the messages to different actors).
The problem with the use of ThreadLocals is that in general it is not guaranteed that an actor processes each of its messages on the same thread.
So if you send a message M1 and then a message M2 to actor A, it is guaranteed that M1 is processed before M2. What is not guaranteed that M2 is processed on the same thread as M1.
In general, you should avoid using ThreadLocals, as the whole point of actors is that they are a unit of consistency, and you are safe to modify their internal state via message passing. If you really need more control on the threads which execute the processing of messages, look into the documentation of dispatchers: http://doc.akka.io/docs/akka/2.1.0/java/dispatchers.html
*Except if you change their mailbox implementation, but that's a non-default behavior.