Redis Streams data exchange between services - service

I'm working on microservices with Redis Streams.
Basically, the service A is sending jobs to service B using Redis Streams (XADD).
Once service B is receiving a job through Redis Streams, this service execute an HTTP request to an external API, parse the result and XACK the job. During this time service A is waiting for data coming from service B.
What is the right way to send the parsed result of the HTTP request from service B to service A which is waiting for data ?
Possible answers I think about:
Use another Redis Streams (XREAD or XREADGROUP) on service A which is waiting for incomming data sent from B but I need to provide a stream name where to sent data into the message sent from A to B.
Use another Redis data type like lists which can use blocking requests (BLPOP, BRPOP...) but how to know if the data comming into this list is the right message ? Maybe I need to use a unique ID somewhere.
Use pub/sub but I don't really like this solution as there is no history or persistent data.
I hope I was clear enough.

Related

Artemis REST API (through Jolokia) for clearing all messages on all queues

Is there any API that can help me clear messages in all queues withing Artemis?
mbean: "org.apache.activemq.artemis:broker=\"1.1.1.1\",component=addresses,address=\"myaddressName\",subcomponent=queues,routing-type=\"multicast\",queue=\"myqueueName\""
operation: "removeAllMessages()"
type: "exec"
There's no programmatic way to simply delete all the data from every queue on the broker. However, you can combine a few management operations (e.g. in a script) to get the same result. You can use the getQueueNames method to get the name of every queue and then pass those names to the destroyQueue(String) method.
However, the simplest way to clear all the data would probably be to simply stop the broker, clear the data directory, and then restart the broker.

Architecture for ML jobs platform

I'm building a platform to run ML jobs.
Jobs will be started from an interface.
I'm making a service for each type of jobs. Some times, a service S1 might require to first make a request to another service S2 and get its output before running its own job.
Each service is split into 2 Kubernetes deployment:
one that will pull the message from a topic, check it and persist it to a database (D1)
one that will read request from the database, run the actual job, update the request state in the database and then answer to the client (D2)
Here is the flow:
interface generates a PubSub message to a topic T1
D1 pulls message from T1 and persist a request to a database
D2 sees the new request in the database and runs it then update its state in the database and answer to the client
To answer to the client, D2 has 2 options:
push a message to a pubsub topic T2 that will continiously be checked by the client. An id is passed in both request and response so that only the client can pull it from the topic.
use a callback provided by the client to make a POST request
What do you think abouut this architecture ? Does the usage of PubSub makes sense ? Also does it make sense to split each service into 2 deployment (1 that deals with request, 1 that runs the actual job ) ?
interface generates a PubSub message to a topic T1 D1 pulls message
from T1 and persist a request to a database
If there's only one database, I'm not sure I see much advantage in using a topic (implying pub/sub). Another approach would be to use a queue: the interface creates jobs into the queue, then you can have any number of workers processing it. Depending on the situation you may not even need the database at all - if all the data needed can be in the message in the queue.
use a callback provided by the client to make a POST request
That's better if you can do it, on the assumption that there's only one consumer for the event; pub/sub is more for broadcasting out to multiple consumers. Polling works but is really inefficient and has limits on how much it can scale.
Also does it make sense to split each service into 2 deployment (1
that deals with request, 1 that runs the actual job ) ?
Having separate deployables make sense if they are built by different teams and have a different release cadence or if you need to scale them out independently, otherwise it may not be necessary.

Scala and playframework shared cache between nodes

I have a complex problem and I can't figure out which one is the best solution to solve it.
this is the scenario:
I have N servers under a single load balancer and a Database.
All the servers connect to the database
All the servers run the same identical application
I want to implement a Cache in order to decrease the response time and reduce to the minimum the HTTP calls Server -> Database
I implemented it and works like a charm on a single server...but I need to find a mechanism to update all the other caches in the other servers when the data is not valid anymore.
example:
I have server A and server B, both have their own cache.
At the first request from the outside, for example, get user information, replies server A.
his cache is empty so he needs to get the information from the database.
the second request goes to B, also here server B cache is empty, so he needs to get information from the database.
the third request, again on server A, now the data is in the cache, it replies immediately without database request.
the fourth request, on server B, is a write request (for example change user name), server B can make the changes on the database and update his own cache, invalidating the old user.
but server A still has the old invalid user.
So I need a mechanism for server B to communicate to server A (or N other servers) to invalidate/update the data in the cache.
whats is the best way to do this, in scala play framework?
Also, consider that in the future servers can be in geo-redundancy, so in different geographical locations, in a different network, served by a different ISP.
would be great also to update all the other caches when one user is loaded (one server request from database update all the servers caches), this way all the servers are ready for future request.
Hope I have been clear.
Thanks
Since you're using Play, which under the hood, already uses Akka, I suggest using Akka Cluster Sharding. With this, the instances of your Play service would form a cluster (including failure detection, etc.) at startup, and organize between themselves which instance owns a particular user's information.
So proceeding through your requests, the first request to GET /userinfo/:uid hits server A. The request handler hashes uid (e.g. with murmur3: consistent hashing is important) and resolves it to, e.g., shard 27. Since the instances started, this is the first time we've had a request involving a user in shard 27, so shard 27 is created and let's say it gets owned by server A. We send a message (e.g. GetUserInfoFor(uid)) to a new UserInfoActor which loads the required data from the DB, stores it in its state, and replies. The Play API handler receives the reply and generates a response to the HTTP request.
For the second request, it's for the same uid, but hits server B. The handler resolves it to shard 27 and its cluster sharding knows that A owns that shard, so it sends a message to the UserInfoActor on A for that uid which has the data in memory. It replies with the info and the Play API handler generates a response to the HTTP request from the reply.
In this way, all subsequent requests (e.g. the third, the same GET hitting server A) for the user info will not touch the DB, no matter which server they hit.
For the fourth request, which let's say is POST /userinfo/:uid and hits server B, the request handler again hashes the uid to shard 27 but this time, we send, e.g., an UpdateUserInfoFor(uid, newInfo) message to that UserInfoActor on server A. The actor receives the message, updates the DB, updates its in-memory user info and replies (either something simple like Done or the new info). The request handler generates a response from that reply.
This works really well: I've personally seen systems using cluster sharding keep terabytes in memory and operate with consistent single-digit millisecond latency for streaming analytics with interactive queries. Servers crash, and the actors running on the servers get rebalanced to surviving instances.
It's important to note that anything matching your requirements is a distributed system and you're requiring strong consistency, i.e. you're requiring that it be unavailable under a network partition (if B is unable to communicate an update to A, it has no choice but to fail the request). Once you start talking about geo-redundancy and multiple ISPs, you're going to see partitions pretty regularly. The only way to get availability under a network partition is to relax the consistency demand and accept that sometimes the GET will not incorporate the latest PUT/POST/DELETE.
This is probably not something that you want to build yourself. But there are plenty of distributed caches out there that you can use, such as Ehcache or InfiniSpan. I suggest you look into one of those two.

Pessimistic locking mechanism with IReliableQueue in Azure Service Fabric

I understand locking is scoped per transaction for IReliableQueue in Service Fabric. I have a requirement where once the data is read from the ReliableQueue within a transaction, I need to pass the data back to my client and preserve the lock on that data for a certain duration and if the processing fails in client, then write the data back to queue (preferably at the head so that it is picked first in next iteration).
Service Fabric doesn't support this. I recommend you look into using an external queuing mechanism for this. For example, Azure Service Bus Queues provides the functionality you describe.
You can use this package to receive SB messages within your services.
preserve the lock on that data for a certain duration
We made that once or twice too in other contexts with success using modifiable-lists and a document-field LockedUntillUtc (initialized to mininimum or null, or using a different reliable collection of locked keys (sorted on LockedUntillUtc?) - which best suites your needs?).
If you can't trust your clients to adhere to such a lock-request and write/un-lock-request contract, consider an ETag pattern - only returned on a successfull lock-request...

Flink kafka to S3 with retry

I am a new starter in Flink, I have a requirement to read data from Kafka, enrich those data conditionally (if a record belongs to category X) by using some API and write to S3.
I made a hello world Flink application with the above logic which works like a charm.
But, the API which I am using to enrich doesn't have 100% uptime SLA, so I need to design something with retry logic.
Following are the options that I found,
Option 1) Make an exponential retry until I get a response from API, but this will block the queue, so I don't like this
Option 2) Use one more topic (called topic-failure) and publish it to topic-failure if the API is down. In this way it won't block the actual main queue. I will need one more worker to process the data from the queue topic-failure. Again, this queue has to be used as a circular queue if the API is down for a long time. For example, read a message from queue topic-failure try to enrich if it fails to push to the same queue called topic-failure and consume the next message from the queue topic-failure.
I prefer option 2, but it looks like not an easy task to accomplish this. Is there is any standard Flink approach available to implement option 2?
This is a rather common problem that occurs when migrating away from microservices. The proper solution would be to have the lookup data also in Kafka or some DB that could be integrated in the same Flink application as an additional source.
If you cannot do it (for example, API is external or data cannot be mapped easily to a data storage), both approaches are viable and they have different advantages.
1) Will allow you to retain the order of input events. If your downstream application expects orderness, then you need to retry.
2) The common term is dead letter queue (although more often used on invalid records). There are two easy ways to integrate that in Flink, either have a separate source or use a topic pattern/list with one source.
Your topology would look like this:
Kafka Source -\ Async IO /-> Filter good -> S3 sink
+-> Union -> with timeout -+
Kafka Source dead -/ (for API call!) \-> Filter bad -> Kafka sink dead