Akka Persistence – Force replay - scala

My application needs to log all messages processed by an actor and replay messages between minSequenceNr and maxSequenceNr sometimes.
Is akka-persistence a good for this use case? If yes, How can I force replay messages from journal? I can use Persistence(actorSystem).journalFor("x") to get a journal's ActorRef but I can't send JournalProtocol.ReplayMessages to it because JournalProtocol is private for akka.persistence.

This question was asked and answered on akka-user already: https://groups.google.com/forum/#!topic/akka-user/AJjdIt_bztM
On Akka 2.3.x (very old version)
Have you read the docs about recovery http://doc.akka.io/docs/akka/2.3.4/scala/persistence.html#recovery ?
You can start recovery by sending an Recover(toSequenceNr: Long) message to yourself.
We do not support ranged (as in “from 200 to 400”) playback, skipping events (the “from N”) does not match eventsourcing philosophy very well.
On the other hand, you can easily issue an replay “to 400”, and simply in your actor choose to ignore any event with seqNr lower than 200,
which achieves the same end result you’re after.
On Akka 2.4.x
Akka Persistence since entering the stable release in 2.4 disallows randomly replaying in the middle of your lifetime. We found it caused more bugs than benefit to people. Please read http://doc.akka.io/docs/akka/2.4.5/scala/persistence.html
I hope this helps, happy hakking!

Related

Handling dead actors in Akka Typed

I have some actors that kill themselves when idle or other system constraints require them to. The actors that have ActorRefs to them are watching for their Terminated(ref), but there is a race condition of messages meant for the actors being sent before the termination arrives and I'm trying to figure out a clean way to handle that.
I was considering subscribing to DeadLetter and using that to signal the sender that their ref is stale and that they need to get or spawn a new target ActorRef.
However, in Akka Typed, I cannot find any way to get to dead letters other than using the untyped co-existence path, so I figure I'm likely approaching this wrong.
Is there a better pattern for dealing dead downstream refs and re-directing messages to a new downstream refs, short of requiring some kind of ack hand-shake for every message?
Consider dead letters as a debugging tool rather something to use to implement delivery guarantees with (true for both Akka typed and untyped).
If an actor needs to be certain that a message was delivered the message protocol will need to include an an ack. To do resending the actor will also need to keep a buffer for in-flight/not yet acknowledged messages to be able to resend.
We have some ideas on an abstraction for different levels of reliability for message delivery, we'll see if that fits in Akka 2.6 or happens later though, prototyped in: https://github.com/akka/akka/pull/25099

How do I keep the RDMS and Kafka in sync?

We want to introduce a Kafka Event Bus which will contain some events like EntityCreated or EntityModified into our application so other parts of our system can consume from it. The main application uses an RDMS (i.e. postgres) under the hood to store the entities and their relationship.
Now the issue is how you make sure that you only send out EntityCreated events on Kafka if you successfully saved to the RDMS. If you don't make sure that this is the case, you end up with inconsistencies on the consumers.
I saw three solutions, of which none is convincing:
Don't care: Very dangerous, there can be something going wrong when inserting into an RDMS.
When saving the entity, also save the message which should be sent into a own table. Then have a separate process which consumes from this table and publishes to Kafka and after a success deleted from this table. This is quiet complex to implement and also looks like an anti-pattern.
Insert into the RDMS, keep the (SQL-) Transaction open until you wrote successfully to Kafka and only then commit. The problem is that you potentially keep the RDMS transaction open for some time. Don't know how big the problem is.
Do real CQRS which means that you don't save at all to the RDMS but construct the RDMS out of the Kafka queue. That seems like the ideal way but is difficult to retrofit to a service. Also there are problems with inconsistencies due to latencies.
I had difficulties finding good solutions on the internet.
Maybe this question is to broad, feel free to point me somewhere it fits better.
When saving the entity, also save the message which should be sent into a own table. Then have a separate process which consumes from this table and publishes to Kafka and after a success deleted from this table. This is quiet complex to implement and also looks like an anti-pattern.
This is, in fact, the solution described by Udi Dahan in his talk: Reliable Messaging without Distributed Transactions. It's actually pretty close to a "best practice"; so it may be worth exploring why you think it is an anti-pattern.
Do real CQRS which means that you don't save at all to the RDMS but construct the RDMS out of the Kafka queue.
Noooo! That's where the monster is hiding! (see below).
If you were doing "real CQRS", your primary use case would be that your writers make events durable in your book of record, and the consumers would periodically poll for updates. Think "Atom Feed", with the additional constraint that the entries, and the order of entries, is immutable; you can share events, and pages of events; cache invalidation isn't a concern because, since the state doesn't change, the event representations are valid "forever".
This also has the benefit that your consumers don't need to worry about message ordering; the consumers are reading documents of well ordered events with pointers to the prior and subsequent documents.
Furthermore, you've additionally gotten a solution to a versioning story: rather than broadcasting N different representations of the same event, you send out one representation, and then negotiate the content when the consumer polls you.
Now, polling does have latency issues; you can reduce the latency by broadcasting an announcement of the update, and notifying the consumers that new events are available.
If you want to reduce the rate of false polling (waking up a consumer for an event that they don't care about), then you can start adding more information into the notification, so that the consumer can judge whether to pull an update.
Notice that "wake up and maybe poll" is a process that is triggered by a single event in isolation. "Wake up and poll just this message" is another variation on the same idea. We broadcast a thin version of EmailDeliveryScheduled; and the service responsible for that calls back to ask for the email/an enhanced version of the event with the details needed to construct the email.
These are specializations of "wake up and consume the notification". If you have a use case where you can't afford the additional latency required to poll, you can use the state in the representation of the isolated event.
But trying to reproduce an ordered sequence of events when that information is already exposed as a sharable, cacheable document... That's a pretty unusual use case right there. I wouldn't worry about it as a general problem to solve -- my guess is that these cases are rare, and not easily generalized.
Note that all of the above is about messaging, not about Kafka. Notice that messaging and event sourcing are documented as different use cases. Jay Kreps wrote (2013)
I use the term "log" here instead of "messaging system" or "pub sub" because it is a lot more specific about semantics and a much closer description of what you need in a practical implementation to support data replication.
You can think of the log as acting as a kind of messaging system with durability guarantees and strong ordering semantics
The book of record should be the sole authority for the order of event messages. Any consumer that cares about order should be reading ordered documents from the book of record, rather than reading unordered documents and reconstructing the order.
In your current design....
Now the issue is how you make sure that you only send out EntityCreated events on Kafka if you successfully saved to the RDMS.
If the RDBMS is the book of record (the source of "truth"), then the Kafka log isn't (yet).
You can get there from here, over a number of gentle steps; roughly, you add events into the existing database, you read from the existing database to write into kafka's log; you use kafka's log as a (time delayed) source of truth to build a replica of the existing RDBMS, you migrate your read use cases to the replica, you migrate your write use cases to kafka, and you decommission the legacy database.
Kafka's log may or may not be the book of record you want. Greg Young has been developing Get Event Store for quite some time, and has enumerated some of the tradeoffs (2016). Horses for courses - I wouldn't expect it to be too difficult to switch the log from one of these to the other with a well written code base, but I can't speak at all to the additional coupling that might occur.
There is no perfect way to do this if your requirement is look SQL & kafka as a single node. So the question should be: "What bad things(power failure, hardware failure) I can afford if it happen? What the changes(programming, architecture) I can take if it must apply to my applications?"
For those points you mentioned:
What if the node fail after insert to kafka before delete from sql?
What if the node fail after insert to kafka before commit the sql transaction?
What if the node fail after insert to sql before commit the kafka offset?
All of them will facing the risk of data inconsistency(4 is slightly better if the data insert to sql can not success more than once such as they has a non database generated pk).
From the viewpoint of changes, 3 is smallest, however, it will decrease sql throughput. 4 is biggest due to your business logic model will facing two kinds of database when you coding(write to kafka by a data encoder, read from sql by sql sentence), it has more coupling than others.
So the choice is depend on what your business is. There is no generic way.

Scala Replacement for Akka Transactors

In version 2.3 Akka dropped support for transactors.
A quote from Akka's 2.0 documentation about transactors:
When you really need composable message flows across many actors updating their internal local state but need them to do that atomically in one big transaction. Might not be often but when you do need this then you are screwed without it.
What is the best way to manage STM transactions across multiple actors in the absence of transactors?
Is it a bad idea using STM in the first place?

akka persistence or not

Current application uses Akka eventstream and its publish/subscribe for a use case which imports a lot of data and upon receiving data for each row it publishes and event and there is an subscriber to it. this design is running into risk of losing events if something goes wrong with either publisher/subscriber as such.
I am wondering if using Akka persistence makes sense here, for a few reasons
1)Persist events
2)Audit history
3)Recreate scenario with snapshot
note there isn't a shared/global state (generally described as a use case in almost all Akka persistence blogs/examples) in the system.
Does Akka persistence make sense here?
If I understand your scenario correctly, I'd say no for 1), yes for 2), no for 3):
1) If the message is lost due to a problem with the pub/sub mediator (which you don't really control), it will never reach your persistent actors and therefore will never be saved in the event stream, thus never replayed.
2) Recorded message will be lookable upon during audit.
3) If your actors are stateless processors, what scenario are you going to recreate/save in the snapshot?
I'd suggest you can work around 1 by using a confirmation/retry mechanism in which you resend the message at regular intervals until you receive an ack from the consumer.

How to get Acknowledgement from Kafka

How to I exactly get the acknowledgement from Kafka once the message is consumed or processed. Might sound stupid but is there any way to know the start and end offset of that message for which the ack has been received ?
What I found so far is in 0.8 they have introduced the following way to choose from the offset for reading ..
kafka.api.OffsetRequest.EarliestTime() finds the beginning of the data in the logs and starts streaming from there, kafka.api.OffsetRequest.LatestTime() will only stream new messages.
example code
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
Still not sure about the acknowledgement part
Kafka isn't really structured to do this. To understand why, review the design documentation here.
In order to provide an exactly-once acknowledgement, you would need to create some external tracking system for your application, where you explicitly write acknowledgements and implement locks over the transaction id's in order to ensure things are only ever processed once. The computational cost of implementing such as system is extraordinarily high, and is one of the main reasons that large transactional systems require comparatively exotic hardware and have arguably lower scalability than systems such as Kafka.
If you do not require strong durability semantics, you can use the groups API to keep rough track of when the last message was read. This ensures that every message is read at least once. Note that since the groups API does not provide you the ability to explicitly track your applications own processing logic, that your actual processing guarantees are fairly weak in this scenario. Schemes that rely on idempotent processing are common in this environment.
Alternatively, you may use the poorly-named SimpleConsumer API (it is quite complex to use), which enables you to explicitly track timestamps within your application. This is the highest level of processing guarantee that can be achieved through the native Kafka API's since it enables you to track your applications own processing of the data that is read from the queue.