How do I warm up an actor's state from database when starting up? - persistence

My requirement is to start a long running process to tag all the products that are expired. This is run every night at 1:00 AM. The customers may be accessing some of the products on the website, so they have instances around the time when the job is run. The others are in the persistent media, not yet having instances because the customers are not accessing them.
Where should I hook up the logic to read the latest state of an actor from a persistent media and create a brand new actor? Should I have that call in the Prestart override method? If so, how can I tell the ProductActor that a new actor being created.
Or should I send a message to the ProductActor like LoadMeFromAzureTable which will load the state from the persistent media after an actor being created?

There are different ways to do it depending on what you need, as opposed to there being precisely one "right" answer.
You could use a Persistent Actor to recover state from a durable store automatically on startup (or in case of crash, to recover). Or, if you don't want to use that module (still in beta as of July 2015), you could do it yourself one of two ways:
1) You could load your state in PreStart, but I'd only go with this if you can make the operation async via your database client and use the PipeTo pattern to send the results back to yourself incrementally. But if you need to have ALL the state resident in memory before you start doing work, then you need to...
2) Make a finite state machine using behavior switching. Start in a gated state, send yourself a message to load your data, and stash everything that comes in. Then switch to a receiving state and unstash all messages when your state is done loading. This is the approach I prefer.
Example (just mocking the DB load with a Task):
public class ProductActor : ReceiveActor, IWithUnboundedStash
{
public IStash Stash { get; set; }
public ProductActor()
{
// begin in gated state
BecomeLoading();
}
private void BecomeLoading()
{
Become(Loading);
LoadInitialState();
}
private void Loading()
{
Receive<DoneLoading>(done =>
{
BecomeReady();
});
// stash any messages that come in until we're done loading
ReceiveAny(o =>
{
Stash.Stash();
});
}
private void LoadInitialState()
{
// load your state here async & send back to self via PipeTo
Task.Run(() =>
{
// database loading task here
return new Object();
}).ContinueWith(tr =>
{
// do whatever (e.g. error handling)
return new DoneLoading();
}).PipeTo(Self);
}
private void BecomeReady()
{
Become(Ready);
// our state is ready! put all those stashed messages back in the mailbox
Stash.UnstashAll();
}
private void Ready()
{
// handle those unstashed + new messages...
ReceiveAny(o =>
{
// do whatever you need to do...
});
}
}
/// <summary>
/// Marker interface.
/// </summary>
public class DoneLoading {}

Related

Kafka Streams - Transformers with State in Fields and Task / Threading Model

I have a Transformer with a state store that uses punctuate to operate on said state store.
After a few iterations of punctuate, the operation may have finished, so I'd like to cancel the punctuate -- but only for the Task that has actually finished the operation on the partition's respective state store. The punctuate operations for the Tasks that are not done yet should keep running. To that purpose my transformer keeps a reference to the Cancellable returned by schedule().
As far as I can tell, every Task always gets its own isolated Transformer instance and every Task gets its own isolated scheduled punctuate() within that instance (?)
However, since this is effectively state, but not inside a stateStore, I'm not sure how safe this is. For instance, are there certain scenarios in which one transformer instance might be shared across tasks (and therefore absolutely no state must be kept outside of StateStores)?
public class CoolTransformer implements Transformer {
private KeyValueStore stateStore;
private Cancellable taskPunctuate; // <----- Will this lead to conflicts between tasks?
public void init(ProcessorContext context) {
this.store = context.getStateStore(...);
this.taskPunctuate = context.schedule(Duration.ofMillis(...), PunctuationType.WALL_CLOCK_TIME, this::scheduledOperation);
}
private void scheduledOperation(long l) {
stateStore.get(...)
// do stuff...
if (done) {
this.taskPunctuate.cancel(); // <----- Will this lead to conflicts between tasks?
}
}
public KeyValue transform(key, value) {
// do stuff
stateStore.put(key, value)
}
public void close() {
taskPunctuate.cancel();
}
}
You might be able to look into TransformerSupplier, specifically TransformSupplier#get(), this will ensure that ensure we new transformer will be created for when they should be kept independent. Also the Transformers should not share objects, so be careful of this with your Cancellable taskPunctuate. If either of these cases are violated you should see errors like org.apache.kafka.streams.errors.StreamsException: Current node is unknown, ConcurrentModificationException or InstanceAlreadyExistsException.

Samza: Delay processing of messages until timestamp

I'm processing messages from a Kafka topic with Samza. Some of the messages come with a timestamp in the future and I'd like to postpone the processing until after that timestamp. In the meantime, I'd like to keep processing other incoming messages.
What I tried to do is make my Task queue the messages and implement the WindowableTask to periodically check the messages if their timestamp allows to process them. The basic idea looks like this:
public class MyTask implements StreamTask, WindowableTask {
private HashSet<MyMessage> waitingMessages = new HashSet<>();
#Override
public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
MyMessage parsedMessage = MyMessage.parseFrom(message);
if (parsedMessage.getValidFromDateTime().isBeforeNow()) {
// Do the processing
} else {
waitingMessages.add(parsedMessage);
}
}
#Override
public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
for (MyMessage message : waitingMessages) {
if (message.getValidFromDateTime().isBeforeNow()) {
// Do the processing and remove the message from the set
}
}
}
}
This obviously has some downsides. I'd be losing my waiting messages in memory when I redeploy my task. So I'd like to know the best practice for delaying the processing of messages with Samza. Do I need to reemit the messages to the same topic again and again until I can finally process them? We're talking about delaying the processing for a few minutes up to 1-2 hours here.
It's important to keep in mind, when dealing with message queues, is that they perform a very specific function in a system: they hold messages while the processor(s) are busy processing preceding messages. It is expected that a properly-functioning message queue will deliver messages on demand. What this implies is that as soon as a message reaches the head of the queue, the next pull on the queue will yield the message.
Notice that delay is not a configurable part of the equation. Instead, delay is an output variable of a system with a queue. In fact, Little's Law offers some interesting insights into this.
So, in a system where a delay is necessary (for example, to join/wait for a parallel operation to complete), you should be looking at other methods. Typically a queryable database would make sense in this particular instance. If you find yourself keeping messages in a queue for a pre-set period of time, you're actually using the message queue as a database - a function it was not designed to provide. Not only is this risky, but it also has a high likelihood of hurting the performance of your message broker.
I think you could use key-value store of Samza to keep state of your task instance instead of in-memory Set.
It should look something like:
public class MyTask implements StreamTask, WindowableTask, InitableTask {
private KeyValueStore<String, MyMessage> waitingMessages;
#SuppressWarnings("unchecked")
#Override
public void init(Config config, TaskContext context) throws Exception {
this.waitingMessages = (KeyValueStore<String, MyMessage>) context.getStore("messages-store");
}
#Override
public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector,
TaskCoordinator taskCoordinator) {
byte[] message = (byte[]) incomingMessageEnvelope.getMessage();
MyMessage parsedMessage = MyMessage.parseFrom(message);
if (parsedMessage.getValidFromDateTime().isBefore(LocalDate.now())) {
// Do the processing
} else {
waitingMessages.put(parsedMessage.getId(), parsedMessage);
}
}
#Override
public void window(MessageCollector messageCollector, TaskCoordinator taskCoordinator) {
KeyValueIterator<String, MyMessage> all = waitingMessages.all();
while(all.hasNext()) {
MyMessage message = all.next().getValue();
// Do the processing and remove the message from the set
}
}
}
If you redeploy you task Samza should recreate state of key-value store (Samza keeps values in special kafka topic related to key-value store). You need of course provide some extra configuration of your store (in above example for messages-store).
You could read about key-value store here (for the latest Samza version):
https://samza.apache.org/learn/documentation/0.14/container/state-management.html

Using Reactive Extensions to stream model changes

I am working on a server component which is responsible for caching models in memory and then stream any changes to interested clients.
When the first client requests a model (well model key, each model has a key to identify it) the model will be created (along with any subscriptions to downstream systems) and then sent to the client, followed by a stream of updates (generated by downstream systems). Any subsequent client's should get this cached (updated) model, again with the stream of updates. When the last client unsubscribes to the model the downstream subscriptions should be destroyed and the cached model destroyed.
Could anyone point me in the right direction as regards to how Rx could help here. I guess what isn't clear to me at the moment is how I synchronize state (of the object) and the stream of changes? Would I have two separate IObservables for the model and updates?
Update: here's what I have so far:
Model model = null;
return Observable.Create((IObserver<ModelUpdate> observer) =>
{
model = _modelFactory.GetModel(key);
_backendThing.Subscribe(model, observer.OnNext);
return Disposable.Create(() =>
{
_backendThing.Unsubscribe(model);
});
})
.Do((u) => model.MergeUpdate(u))
.Buffer(_bufferLength)
.Select(inp => new ModelEvent(inp))
.Publish()
.RefCount()
.StartWith(new ModelEvent(model)
If I understood the problem correctly, there are Models coming in dynamically. At any point in time in your Application's lifetime, the number of Models are unknown.
For that purpose an IObservable<IEnumerable<Model>> looks like a way to go. Each time there is a new Model added or an existing one removed, the updated IEnumerable<Model> would be streamed. Now it would essentially preserve the older objects as opposed to creating all Models each time there is an update unless there is a good reason to do so.
As for the update on each Model object's state such as any field value or property value changed, I would look into Paul Betts' ReactiveUI project, it has something called ReactiveObject. Reactive object helps you get change notifications easily, but that library is mainly designed for WPF MVVM applications.
Here is how a Model's state update would go with ReactiveObject
public class Model : ReactiveObject
{
int _currentPressure;
public int CurrentPressure
{
get { return _currentPressure; }
set { this.RaiseAndSetIfChagned(ref _currentPressure, value); }
}
}
now anywhere you have Model object in your application you could easily get an Observable that will give you updates about the object's pressure component. I can use When or WhenAny extension methods.
You could however not use ReactiveUI and have a simple IObservable whenever a state change occurs.
Something like this may work, though your requirements aren't exactly clear to me.
private static readonly ConcurrentDictionary<Key, IObservable<Model>> cache = new...
...
public IObservable<Model> GetModel(Key key)
{
return cache.GetOrAdd(key, CreateModelWithUpdates);
}
private IObservable<Model> CreateModelWithUpdates(Key key)
{
return Observable.Using(() => new Model(key), model => GetUpdates(model).StartWith(model))
.Publish((Model)null)
.RefCount()
.Where(model => model != null);
}
private IObservable<Model> GetUpdates(Model model) { ... }
...
public class Model : IDisposable
{
...
}

How to get outArgument WorkflowApplication when wf wait for response(bookmark OR idle) and not complete

Accessing Out Arguments with WorkflowApplication when wf wait for response(bookmark OR idle) and not complete
I also used Tracking to retrieve the values, but instead of saving it to a database I come up with the following solution.
Make a Trackingparticipant and collect the data from an activity.
You can fine tune the tracking participant profile with a spefic tracking query.
I have added a public property Output to set the value of the data from the record.
public class CustomTrackingParticipant : TrackingParticipant
{
//TODO: Fine tune the profile with the correct query.
public IDictionary<String, object> Outputs { get; set; }
protected override void Track(TrackingRecord record, TimeSpan timeout)
{
if (record != null)
{
if (record is CustomTrackingRecord)
{
var customTrackingRecord = record as CustomTrackingRecord;
Outputs = customTrackingRecord.Data;
}
}
}
}
In your custom activity you can set the values you want to expose for tracking with a CustomTrackingRecord.
Here is a sample to give you an idea.
protected override void Execute(NativeActivityContext context)
{
var customRecord = new CustomTrackingRecord("QuestionActivityRecord");
customRecord.Data.Add("Question", Question.Get(context));
customRecord.Data.Add("Answers", Answers.Get(context).ToList());
context.Track(customRecord);
//This will create a bookmark with the display name and the workflow will go idle.
context.CreateBookmark(DisplayName, Callback, BookmarkOptions.None);
}
On the WorklfowApplication instance you can add the Tracking participant to the extensions.
workflowApplication.Extensions.Add(new CustomTrackingParticipant());
On the persistable idle event from the workflowApplication instance I subscribed with the following method.
In the method I get the tracking participant from the extensions.
Because we have set the outputs in the public property we can access them and set them in a member outside the workflow. See the following example.
private PersistableIdleAction PersistableIdle(WorkflowApplicationIdleEventArgs
workflowApplicationIdleEventArgs)
{
var ex = workflowApplicationIdleEventArgs.GetInstanceExtensions<CustomTrackingParticipant>();
Outputs = ex.First().Outputs;
return PersistableIdleAction.Unload;
}
I hope this example helped.
Even simpler: Use another workflow activity to store the value you are looking for somewhere (database, file, ...) before starting to wait for a response!
You could use Tracking.
required steps would be:
define a tracking profile which queries ActivityStates with the state closed
Implement an TrackingParticipant to save the OutArgument in process memory, a database or a file on disk
hook everything together
The link cotains all the information you will need to do this.

Java server framework to listen to PostgreSQL NOTIFY statements

I need to write a server which listens to PostgreSQL NOTIFY statements and considers each notification as a request to serve (actually, more like a task to process). My main requirements are:
1) A mechanism to poll on PGConnection (Ideally this would be a listener, but in the PgJDBC implementation, we are required to poll for pending notifications. Reference)
2) Execute a callback based on the "request" (using channel name in the NOTIFY notification), on a separate thread.
3) Has thread management stuff built in. (create/delete threads when a task is processed/finished, put on a queue when too many tasks being concurrently processed etc.)
Requirements 1 and 2 are something which are easy for me to implement myself. But I would prefer not to write thread management myself.
Is there an existing framework meeting this requirements? An added advantage would be if the framework automatically generates request statistics.
To be honest, requirement 3 could probably be easily satistied just using standard ExecutorService implementations from Executors, which will allow you to, for example, get a fixed-size thread pool and submit work to them in the form of Runnable or Callable implementations. They will deal with the gory details of creating threads up to the limit etc.. You can then have your listener implement a thin layer of Runnable to collect statistics etc.
Something like:
private final ExecutorService threadPool = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
private final NotificationCallback callback;
private int waiting, executing, succeeded, failed;
public void pollAndDispatch() {
Notification notification;
while ((notification = pollDatabase()) != null) {
final Notification ourNotification = notification;
incrementWaitingCount();
threadPool.submit(new Runnable() {
public void run() {
waitingToExecuting();
try {
callback.processNotification(ourNotification);
executionCompleted();
} catch (Exception e) {
executionFailed();
LOG.error("Exeception thrown while processing notification: " + ourNotification, e);
}
}
});
}
}
// check PGconn for notification and return it, or null if none received
protected Notification pollDatabase() { ... }
// maintain statistics
private synchronized void incrementWaitingCount() { ++waiting; }
private synchronized void waitingToExecuting() { --waiting; ++executing; }
private synchronized void executionCompleted() { --executing; ++succeeded; }
private synchronized void executionFailed() { --executing; ++failed; }
If you want to be fancy, put the notifications onto a JMS queue and use its infrastructure to listen for new items and process them.