store cancellation tokens in service fabric services - azure-service-fabric

I am trying to achieve cancel tasks feature in service fabric stateful services.
Plan is using cancellation token to propagate notices to linked threads / Tasks.
The problem is, while there are these long running tasks and threads waiting for this signal, I am not sure how I can find the right cancellation tokens based on another Web API calls.
I was thinking of using reliable dictionary, then even before trying it out, I assume this will hit deadend because cancellationToken can't be serialized / deserialized.
Please help me what could be good solution to solve this issue.
Update (I didn't want to create a new thread losing some of important contexts mentioned in this thread so updating in this post.)
Confirmed that below link description does show Reliable service and actor methods can support cancellation token. However typical use case would be receiving cancellation request directly via web API with user triggering such as click refresh, go to another page, etc. In such scenario, the exact same end point needs to receive the request while previous http requst is lingering with some long running task or stuck. That is not the scenario in this thread.
From link: https://blogs.msdn.microsoft.com/azureservicefabric/2016/02/23/service-fabric-sdk-v1-5-175-and-the-adoption-of-virtual-machine-scale-sets/
CancellationToken support for IService/IActor
Reliable Service and Reliable Actor methods now support a cancellation token that can be remoted via ActorProxy and ServiceProxy, allowing you to implement cooperative cancellation. Clients that want to cancel a long running service or actor method can signal the cancellation token and that cancellation intent will be propagated to the actor/service method. That method can then determine when to stop execution by looking at the state of its cancellation token argument.
For example, an actor’s contract that has a possibly long-running method can be modelled as shown below:
public interface IPrimeNumberActorInterface : IActor
{
Task<ulong> FindNextPrimeNumberAsync
(ulong previous, CancellationToken cancellationToken);
}
The client code that wishes to cancel the method execution can communicate its intent by canceling the cancellation token.

CancellationToken & CancellationTokenSource are not serializable and doesn't flow across service calls or Data Replication in SF. It can only be used to tell the handler within the same process that an operation has been cancelled and should stop any processing or ignore any continuation in case a response is received.
If you want to be able to start and cancel an operation in another service, you should split the operation in two calls.
The first will generate an Operation ID to be returned to the client, and Create a CancellationTokenSource for this operation to generate a CancellationToken to be passed to the Task\Thread running in the background
The second will receive and OperationID and identify if a CancellationTokenSource exists and cancel it, so that the token provided to any Task\Thread can stop any processing, if not already completed or cancelled.
You could simply store it as a Dictionary<Guid, CancellationTokenSource> in the process\partition running the task.
In case you are running these tasks in multiple partitions in SF, and is planning to store it in a Reliable Dictionary, it is not a good idea, because as said previously, you can't serialize the cancellation to other partitions.
In this case you can store the OperationID and the PartitionID, so all partitions know where an operation is running, when you receive a call for cancellation on any of the partitions, the service will lookup in this reliable dictionary where the operation is running and forward the cancellation to the right partition.

Related

Should users await for a response after the http request in a saga pattern architecture?

I am designing a microservice architecture, using a database per service pattern.
Following the example of Order Service and Shipping Service, when a user makes an HTTP REST request to the Order Service, this one fires an event to notify shipping service. All this happens asynchronously. So, what happens with the user experience? I mean, the user needs an immediate response from the HTTP request. How can I handle this scenario?
All this happens asynchronously. So, What happen with the user experience? I mean, the user needs an immediately response from the HTTP request. How can I handle this scenario?
Respond as soon as you have stored the request.
Part of the point of microservices is that you have a system composed of independently deployable elements that do not require coordination.
If you want a system that is reliable even though the services don't have 100% uptime, then you need to have some form of durable message storage so that the sender and the receiver don't need to be running at the same time.
Therefore, your basic pattern for data from the outside is that the information from the incoming HTTP request is copied, not directly into a running service, but instead into the message store, to be processed by the service at some later time.
In other words, your REST API is a facade in front of your storage, not in front of the service itself.
The actor model may be a useful analogy; information moves around by copying messages into different inboxes, and are later consumed by the subscribing actor.
From the perspective of the client, the HTTP response is an acknowledgement that the request has been received and recognized as valid. Think "thank you for your order, we'll send you an email when your purchase is ready for pick up."
On the web, we would include in the response links to other useful resources; click here to see the status of your order, click there to see your history of recent orders, and so on.

Idempotent Keys From Client's Perspective

Suppose I have an API that calls a downstream service's API called /charge (POST). Suppose while doing charge, a timeout happened at the reverse-proxy and I got a 5xx. But the charge actually happened.
In this case, I would respond with a 5xx to my consumer. Now, if the consumer calls with the same idempotent key, then his request can succeed as the downstream service would return a cached copy of the response. But if he uses a different idempotent key while calling my API, he would keep getting 409s as the payment was already charged.
Here's my two questions:
How does the client know when to retry with the same idempotentId or initiate a new request altogether?
(Augmenting the previous question) How does the UI make the decision to use different idempotent Ids? Does each new request contain a new Id and only the retry logic reuses the same Id?
Basically, I am trying to understand idempotent keys from the client
's perspective.
A timeout should be retried automatically a few times before returning a failure response to the user. Thus if the error is transient, the user wouldn't notice any issue (except possibly a negligible delay in response).
The request originating system should maintain a log of all requests with their status. Thus if the glitch persists for a longer duration, the system can retry failed requests periodically as well as provide a detailed UI view of the submitted requests to the user. This eliminates the need for the user to ever retry a request. The system will do that on user's behalf.

Invoke Cancel manually from client side in RunAsync(CancellationToken)

In the RunAsync(CancellationToken) background method, current code is spawning new thread and also having some long running tasks inside while(true) loop.
The web api controller is exposed to provide cancel request and then using enqueue / dequeue, this request is accessed in the RunAsync(cancellationToken) while(true) loop.
I just can't make connection between the web api controller receiving this cancel request with the cancellationtoken passed down to the thread running inside runasync method.
RunAsync(cancellationToken)
{
while(True)
{
new thread(cancellationtoken)
}
}
One thing I am pretty sure is that there is no connection between the cancel request somehow invoked by user and the cancellationToken as argument of RunAsync() as shown in the code above. It seems they are not connected. we don't want to get out of the forever loop in RunAsync() background upon user cancel request, which is only for the specific thread run.
Please guide me to correct direction to design cancel request terminating the thread.
As suggested by Peter Bons, the cancellation token passed to the RunAsync, is created and managed by Service Fabric to tell a service that it is being shut down. You should watch for this cancellation to make a graceful shutdown of your services when service fabric wants to to upgrade or move the service between nodes.
Other point is, you don't cancel CancellationToken, you cancel CancellationTokenSource, so in this case, any thread created by your code, should create a their ownCancellationTokenSource for each thread to be cancelled individually and the token generated by this CancellationTokenSource must be provided to the thread so it knows when it has been cancelled.
Another point is, if you want make it smooth, you should create a linked CancellationTokenSource using CancellationTokenSource.CreateLinkedTokenSource(SFTokenPassedOnRunAsync) so that when Service Fabric wants to shutdown the service, the main cancellation token created will cancel any child operations, otherwise you have to handle that from your code.
Regarding the main question,
You can only cancel an Operation within the same process that created the CancellationTokenSource, The easier way is expose the an Endpoint in the Service (Via remoting or via Rest API) that will receive a call, find the token and cancel the operation.
Would be something like:
The service create the CancellationTokenSource and start the new thread with the token generated
The CancellationTokenSource will be stored in a static variable visible within the sameprocess, so that the API can see it
The Api call will get this CancellationTokenSource and call Cancel()
In case it is a list of running operations(multiple threads), you can store the CTS in a Dictionary and give IDs to each operation, then you can find the CTS based on the ID of the operation.

POST/PUT response REST in a CQRS/ES system

I'm implementing a CQRS/ES based system with a RESTful interface which is used by a webapp.
When performing certain actions e.g. creating a new profile I need to be able to check certain conditions, such as uniqueness of the profile ID, or that the person has the right to create a resource under a group. Which means I have a couple of options:
Context: POST/profiles { "email": "unique#example.com" }
From my REST API return 202 from my service with a location of the new resource where my client can poll for it. In this case, however, how do I handle errors as in effect the view will not exist or ever exist.
Create a saga on the initial request then dispatch the event. Once my service creates the view or finds the error then the result is written to the saga. When the saga has been completed return the result to the user.
From these two options - the second seems more reasonable to me, if not more complex. Is this a viable option for building RESTful request/response models on a CQRS/ES event sourced backend?
Yes, the second solution seems to better fit the business.
From what I understand from your case, from the DDD point of view, the creation of a user profile is a business process, with more than one steps (verifying the uniqueness of the profile, creating the profile and recovering from a duplicate profile situation). This process acts like an entity, it starts, runs and ends with a result (success or error). Being an entity it has an ID and it can be viewed as a REST resource. A Saga will be responsible for executing it.
So, in response to the client's request you send the URI of the process resource where the client can poll for the status. In case of error, it reads the error message. In case of success, it gets the URI of its profile.
The first solution can still be used if the use-case is simpler, if the command can be executed synchronously and the client gets the final result (error or success) as an immediate response.
From my REST API return 202 from my service with a location of the new resource where my client can poll for it. In this case, however, how do I handle errors as in effect the view will not exist or ever exist.
The usual answer here is that, as part of the 202 Accepted response, you include monitoring information
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
In other words, a link to a resource that will change when the accepted request is finally run.
So in describing the protocol, in addition to the resource that you create, you'll also need to document the representation used when you defer the work for later, and the representation used by the monitor.
When the saga has been completed return the result to the user.
Depending on the work, that may be overkill.
Which is to say, you are raising two different questions here; one of those is whether the request should be handled synchronously (don't respond until the work is done) or asynchronously (return right away, but give the client the means to monitor progress).
The other question is how the work looks from the business layer. If you are going to need multiple transactions to make the change, and if you may need to "revert" previously committed transactions in some variants of the process, then a saga (or a process manager) makes sense.
Set Validation -- the broader term for enforcing an invariant like "uniqueness" -- is awkward. Make sure you study, and ensure that you and the business understand the impact of having a failure.

Long running REST API with queues

We are implementing a REST API, which will kick off multiple long running backend tasks. I have been reading the RESTful Web Services Cookbook and the recommendation is to return HTTP 202 / Accepted with a Content-Location header pointing to the task being processed. (e.g. http://www.example.org/orders/tasks/1234), and have the client poll this URI for an update on the long running task.
The idea is to have the REST API immediately post a message to a queue, with a background worker role picking up the message from the queue and spinning up multiple backend tasks, also using queues. The problem I see with this approach is how to assign a unique ID to the task and subsequently let the client request a status of the task by issuing a GET to the Content-Location URI.
If the REST API immediately posts to a queue, then it could generate a GUID and attach that as an attribute on the message being added to the queue, but fetching the status of the request becomes awkward.
Another option would be to have the REST API immediately add an entry to the database (let's say an order, with a new order id), with an initial status and then subsequently put a message on the queue to kick off the back ground tasks, which would then subsequently update that database record. The API would return this new order ID in the URI of the Content-Location header, for the client to use when checking the status of the task.
Somehow adding the database entry first, then adding the message to the queue seems backwards, but only adding the request to the queue makes it hard to track progress.
What would be the recommended approach?
Thanks a lot for your insights.
I assume your system looks like the following. You have a REST service, which receives requests from the client. It converts the requests into commands which the business logic can understand. You put these commands into a queue. You have a single or multiple workers which can process and remove these commands from the queue and send the results to the REST service, which can respond to the client.
Your problem that by your long running tasks the client connection timeouts, so you cannot send a response. So what you can do is sending a 202 accepted after you put the commands into the queue and add a polling link, so the client will be able to poll for the changes. Your tasks have multiple subtasks so there is progress, not just pending and complete status changes.
If you want to stick with polling, you should create a new REST resource, which contains the actual state and the progress of the long running task. This means that you have to store this info in a database, so the REST service will be able to respond to requests like GET /tasks/23461/status. This means that your worker has to update the database when it is completed a subtask or the whole task.
If your REST service is running as a daemon, then you can notify it by progress, so storing the task status in the database won't be the responsibility of the worker. This kind of REST service can store the info in the memory as well.
If you decide to use websockets to notify the client, then you can create a notification service. By REST you have to respond with a task id. After that you send back this task id on the websocket connection, so the notification service will know which websocket connection subscribed to the events of a certain task. After that you won't need the REST service, you can send the progress through the websocket connection as long as the client does not close the connection.
You can combine these solutions the following way. You let your REST service to create a task resource, so you'll be able to access the progress by using a polling link. After that you send back an identifier with 202 which you send back through the websockets connection. So you can use a notification service to notify the client. By progress your worker will notify the REST service, which will create a link like GET /tasks/23461/status and send that link to the client through the notification service. After that the client can use the link to update its status.
I think the last one is the best solution if your REST service runs as a daemon. It is because you can move the notification responsibility to a dedicated notification service, which can use websockets, polling, SSE, whatever you want. It can collapse without killing the REST service, so the REST service will stay stable and fast. If you send back a manual update link too with the 202, then the client can do manual update (assuming a human controlled client), so you will have something like graceful degradation if the notification service is not available. You don't have to maintain the notification service because it won't know anything about the tasks, it will just send data to the clients. Your worker won't have to know anything about how to send notifications and how to create hyperlinks. It will be easier to maintain the client code too, since it will be almost a pure REST client. The only extra feature will be the subscription for the notification links, which does not change frequently.