How to implement "trigger" for redis datastore? - triggers

I have a program, which will poll on a certain key from the redis datastore, and do something when the value satisfies a certain condition.
However, I think periodically polling on redis is quite inefficient, I'm wondering if there is a "trigger" mechanism for redis, when the value changes and satisfies the condition, the trigger will be called. The trigger might be a RPC function, or an HTTP msg, or something else, so that I don't need to poll on it any more, just like the difference between poll and interrupt.
Is this possible?

You can use the Pub/Sub feature of Redis. It's exactly what you need given your circumstances as you described.
Essentially, you SUBSCRIBE to a "channel", and the other part of your application writes (PUBLISH) the value being changed to that channel. Your subscriber (consumer, the client that wants to know about the change) will get notified in virtually realtime.

Since Redis 2.8 (released 22 Nov 2013), there is now a feature called Keyspace Notifications which lets clients subscribe to special Pub/Sub channels for keyspace events, which you can use as a trigger on a certain key.
The feature is disabled by default because "while not very sensible the feature uses some CPU power." To enable, send a CONFIG SET command to configure the feature. For example, the following command will enable keyspace events for String commands:
> CONFIG SET notify-keyspace-events K$
OK
Next, use the regular pubsub SUBSCRIBE command to subscribe to the specially-named channel. For example, to listen to keyspace events on the mykey key in DB 0:
> SUBSCRIBE __keyspace#0__:mykey
Reading messages... (press Ctrl-C to quit)
Test out the feature by setting the key's value from another client:
> SET mykey myvalue
OK
You should receive a message in the subscribed client:
1) "message"
2) "__keyspace#0__:mykey"
3) "set"
After receiving the event, you can fetch the updated value and see if it satisfies the condition in your application code.

If you can use Pub/Sub, that's best. If for some reason that doesn't work, you could also use the (performance-impacting) MONITOR command, which will send you every command the server receives. That's probably not a good idea.
Specifically for lists, you have BLPOP, which will block the connection until a new item is available to pop from a list.

How about a message box to deal with?For example, 2 message (AND operation) could trigger another message, I think this could make some point? Like JBPM, but not complex than that.

Related

How do I make sure that I process one message at a time at most?

I am wondering how to process one message at a time using Googles pub/sub functionality in Go. I am using the official library for this, https://pkg.go.dev/cloud.google.com/go/pubsub#section-readme. The event is being consumed by a service that runs with multiple instances, so any in memory locking mechanism will not work.
I realise that it's an anti-pattern to do this, so let me explain my use-case. Using mongoDB I store an array of objects as an embedded document for each entity. The event being published is modifying parts of this array and saves it. If I receive more than one event at a time and they start processing exactly at the same time, one of the saves will override the other. So I was thinking a solution for this is to make sure that only one message will be processed at a time, and it would be nice to use any built-in functionality in cloud pub/sub to do so. Otherwise I was thinking of implementing some locking mechanism in the DB but i'd like to avoid that.
Any help would be appreciated.
You can imagine 2 things:
You can use ordering key in PubSub. Like that, all the message in relation with the same object will be delivered in order and one by one.
You can use a PUSH subscription to PubSub, to push to Cloud Run or Cloud Functions. With Cloud Run, set the concurrency to 1 (it's by default with Cloud Functions gen1), and set the max instance to 1 also. Like that you can process only one message at a time, all the other message will be rejected (429 HTTP error code) and will be requeued to PubSub. The problem is that you can parallelize the processing as before with ordering key
A similar thing, and simpler to implement, is to use Cloud Tasks instead of PubSub. With Cloud Tasks you can set a rate limit on a queue, and set the maxConcurrentDispatches to 1 (and you haven't to do the same with Cloud Functions max instances or Cloud Run max instances and concurrency)

External processing using Kafka Streams

There are several questions regarding message enrichment using external data, and the recommendation is almost always the same: ingest external data using Kafka Connect and then join the records using state stores. Although it fits in most cases, there are several other use cases in which it does not, such as IP to location and user agent detection, to name a few.
Enriching a message with an IP-based location usually requires a lookup by a range of IPs, but currently, there is no built-in state store that provides such capability. For user agent analysis, if you rely on a third-party service, you have no choices other than performing external calls.
We spend some time thinking about it, and we came up with an idea of implementing a custom state store on top of a database that supports range queries, like Postgres. We could also abstract an external HTTP or GRPC service behind a state store, but we're not sure if it is the right way.
In that sense, what is the recommended approach when you cannot avoid querying an external service during the stream processing, but you still must guarantee fault tolerance? What happens when an error occurs while the state store is retrieving data (a request fails, for instance)? Do Kafka Streams retry processing the message?
Generally, KeyValueStore#range(fromKey, toKey) is supported by build-in stores. Thus, it would be good to understand how the range queries you try to do are done? Also note, that internally, everything is stored as byte[] arrasy and RocksDB (default storage engine) sorts data accordingly -- hence, you can actually implement quite sophisticated range queries if you start to reason about the byte layout, and pass in corresponding "prefix keys" into #range().
If you really need to call an external service, you have "two" options to not lose data: if an external calls fails, throw an exception and let the Kafka Streams die. This is obviously not a real option, however, if you swallow error from the external lookup you would "skip" the input message and it would be unprocessed. Kafka Streams cannot know that processing "failed" (it does not know what your code does) and will not "retry", but consider the message as completed (similar if you would filter it out).
Hence, to make it work, you would need to put all data you use to trigger the lookup into a state store if the external call fails, and retry later (ie, do a lookup into the store to find unprocessed data and retry). This retry can either be a "side task" when you process the next input message, of you schedule a punctuation, to implement the retry. Note, that this mechanism changes the order in which records are processed, what might or might not be ok for your use case.

Batching the send operation on outbound adapter in Spring Integration

I have an outbound channel adapter (in this case is SFTP but it would be the same for a JMS or WS) at the end of a Spring Integration flow. By using direct channels every time there is a messaging flowing, it will be sent out synchronously.
Now, I need to process messages all the way until they reach the outbound adapter, but wait for a predetermined interval before sending them out. In other words batching the send operation.
I know the Spring Batch project might offer a solution to this but I need to find a solution with Spring Integration compoonents (in the int-* namespaces)
What would be a typical pattern to achieve this?
The Aggregator pattern is for you.
In your particular case I'd call that like window, because you don't have any specific correlation to group messages, but just need to build a batch as you call it.
So, I think your Aggregator config may look like:
<int:aggregator input-channel="input" output-channel="output"
correlation-strategy-expression="1"
release-strategy-expression="size() == 10"
expire-groups-upon-completion="true"
send-partial-result-on-expiry="true"/>
correlation-strategy-expression="1" means group any incoming messages
release-strategy-expression="size() == 10" allows to form and release batches by 10 messages
expire-groups-upon-completion="true" says to aggregator to remove the releases group from it store. That allow to for a new group for the same correlationKey (1 in our case)
send-partial-result-on-expiry="true" specifies that normal release operation (send to the output-channel) must be done on expire function when we don't have enough messages to build a whole batch (size 10 in our case). For these options, please, follow with documentation mentioned above.

NEventStore: Sagas, Commands and not Losing Them

NEventStore: 5.1
Simple setup: WebApp (Asp.NET 4.5) == command-side
I'm searching for the "right" way for not losing commands, with an eye on sagas/process-managers which maybe would wait endlessly for an event produced from a command that was actually never handled.
Old: Dispatchers
I initially used sync commands, but with an eye on sagas/process-managers I thought it would be safer to first store them an then get them through SyncDispatcher (or AsyncDispatcher). Otherwise, that's my concern, if a saga would try to send a command and the command didn't finish due to app-crash/powerloss/..., it would be lost and noone would know.
So I created a command-stream and appended each command to that. The IsDispatched showed, if that command was already handled.
That worked.
PollingClient and Command-Stream
Now that the dispatchers are obsolete, I switched to PollingClient. What I lost is the Dispatched information.
A startup-issue arose:
I naively started polling from the current latest checkpoint going forward, but when the application restarted there was a chance that commands were stored but not executed before the crash and therefore lost (that actually happened).
I just came across the idea:
store the basic outcome of commands as (non-domain-)events in another stream.
This stream would contain CommandSucceeded and CommandFailed events.
Whenever the application starts the latest command-id or command-checkpoint-number gets extracted used to load the commands right after that one...
Questions
Are my concerns, that sync command-handling leads to the danger of losing a saga-generated command, wrong? If yes, why?
Is this generally a good idea: one big command stream?
Is this generally a good idea: store generic command-outcome-events in a stream?
You can:
Store you command in a command queue | persistent log
Use command id (guid) as Commit Id on NEventStore
Mark your command as executed in your Command Handler | Pipeline Hook | Polling Client
NEventStore gives you idempotency on same AggregateId (streamid) + CommitId, so if you app crashes before the command is marked as processed and you replay your command, the resulting commits are automatically discarded by NES.
Afaik NEventStore is meant to be the storage for event sourcing i.e storing domain objects as a stream of events. Commands and sagas have nothing to do with it. It's your service bus which should take care of durability and saga management.
Personally, I treat the event store simply as a repository detail. The application service (command handler) will dispatch the generated events, after they've been persisted.
If the app crashes and the service bus is durable (not a memory one) then the event/command will be handled again automatically, because the service bus should detect if a message wasn't successfully handled. Of course, your message handlers should be idempotent for that reason.

How to deal with stale persisted subscriptions?

Let's say I have deployed an NSB endpoint that subscribes to events A,B, and C.
6 months later, version 1.1 of the endpoint adds a handler for event D, but the handler for event B is removed. What is a sensible process for removing the persisted subscription record for event B? I presume there is no automagic way for this to happen, and my choices would be:
Delete the entire contents of the subscription table and restart all endpoints.
Delete selectively based on what I know about the delta
Have some shutdown mode where my subscriber would call Unsubscribe on all its message types on the way down (and therefore would start with a clean slate on the way up)
Has anyone implemented any of these strategies, or am I missing some alternative?
The best solution would probably be option 1. The operational overhead involved in this would be fairly small:
Shut down publisher host
Clear down subscriptions db
Bounce all subscribers
Start up publisher host
Option 3 would also be possible but would involve making an unsubscribe call from every subscriber which is IMO much higher overhead (plus would require a redeployment if unscubscribe call not already implemented and then a shutdown to trigger the call).
Option 2 seems a bit hacky but would be lowest cost as you can just run a sql statement against the publisher db and bob's your mother's brother.
I would recommend option 1.