Service architecture using service fabric - azure-service-fabric

I am designing a stateless service which essentially processes a stream of information and then based on conditions sends emails. I want to host this in service fabric, with more than one active in case of failure, however how do I limit the email to be sent from only the "primary".
Is active/active only valid for stateful services which are partitioned?
If the services have to be active/passive then how does the service know when it is now the active one?

There's no built-in mechanism for leader election (that you can use) inside SF. You could use a blob lease.
The leader will be the one who acquires the lease, and needs to refresh it while it's 'alive'. If it crashes, it will lose the lease and another instance can get it.
This does introduce an external dependency, lowering the overall availability % of your system.
You could also create a Stateful service that does something similar.

I will go with Stateful service for couple reasons:
You only want one "primary" to handle the email.
You want a
backup/replica in case the primary went down. This is default by
stateful service
Its difficult with multiple instance of stateless
service. When you have stream of information that handled by multiple
instance. What if the condition for sending email is not happening on
"primary" node. You then have to a separate mechanism to transfer
that data/state to the "primary" node.

Another option is to have a pool of stateless workers that process your data stream, and then whenever it wants to send an email, it'll notify another service (through ServiceRemoting/Rest/ServiceBus/other communication channel) and this service will handle the actual sending of emails.
If this email sending service is stateful, it can then handle duplicates if that's one concern you have.

Related

What is common strategy for synchronized communication between replica's of same PODS?

Lets say we have following apps ,
API app : Responsible for serving the user requests.
Backend app: Responsible for handling the user requests which are long running tasks. It updates the progress to database (postgres) and distributed cache (Redis).
Both apps are scalable service. Single Backend app handles multiple tenants e.g. Customer here but one customer is assigned to single backend app only.
I have a usecase where I need API layer to connect to specific replica which is handling that customer. Do we have a common Pattern for this ?
Few strategies in mind
Pub/Sub: Problem is we want sync guranteed response , probably using Redis
gRPC : Using POD IP to connect to specific pod is not a standard way
Creating a Service at runtime by adding labels to the replicas and use those. -- Looks promising
Do let me know if there is common pattern or example architecture of this or standard way of doing this?
Note :[Above is a simulation of production usecase, names and actual use case is changed]
You should aim to keep your services stateless, in a Kubernetes environment there is no telling when one pod might be replaced by another due to worker node maintenance.
If you have long running task that cannot be completed during the configured grace period for pods to shutdown during a worked node drain/evacuation you need to implement some kind of persistent work queue as your are think about in option 1. I suggest you look into the saga pattern.
Another pattern we usually employ is to let the worker service write the current state of the job into the database and let the client pull the status every few seconds. This does however require some way of handling half finished jobs that might be abandoned by pods that are forced to shutdown.

Wrap event based system with REST API

I'm designing a system that uses a microservices architecture with event-based communication (using Google Cloud Pub/Sub).
Each of the services is listening and publishing messages so between the services everything is excellent.
On top of that, I want to provide a REST API that users can use without breaking the event-based approach. However, if I have an endpoint that triggers event X, how will I send the response to the user? Does it make sense to create a subscriber for a "ProcessXComplete" event and than return 200 OK?
For example:
I have the following microservices:
Service A
Service B
Frontend Service - REST Endpoints
I'm want to send this request "POST /posts" - this request sent to the frontend service.
The frontend service should trigger "NewPostEvent."
Both Service A and Service B will listen to this event and do something.
So far, so good, but here is where things are starting to get messy for me.
Now I want to return the user that made the request a valid response that the operation completed.
How can I know that all services finished their tasks, and how to create the handler to return this response?
Does it even make sense to go this way or is there a better design to implement both event-based communications between services and providing a REST API
What you're describing is absolutely one of the challenges of event-based programming and how eventual-consistency (and lack of atomicity) coordinates with essentially synchronous UI/UX.
It generally does make sense to have an EventXComplete event. Our microservices publish events on completion of anything that could potentially fail. So, there are lots of ServiceA.EventXSuccess events flowing through the queues. I'm not familiar with Google Cloud PubSub specifically, but in general in Messaging systems there is little extra cost to publishing messages with few (or no) subscribers to require compute power. So, we tend to over-articulate service status by default; it's easy to come back later and tone down messaging as needed. In fact, some of our newer services have Messaging Verbosity configurable via an Admin API.
The Frontend Service (which here is probably considered a Gateway Service or Facade Layer) has taken on the responsibility of being a responsive backing for your UI, so it needs to, in fact, BE responsive. In this example, I'd expect it to persist the User's POST request, return a 200 response and then update its local copy of the request based on events it's subscribed to from ServiceA and ServiceB. It also needs to provide a mechanism (events, email, webhook, gRPC, etc.) to communicate from the Frontend Service back to any UI if failure happens (maybe even if success happens). Which communication you use depends on how important and time-sensitive the notification is. A good example of this is getting an email from Amazon saying billing has failed on an Order you placed. They let you know via email within a few minutes, but they don't make you wait for the ExecuteOrderBilling message to get processed in the UI.
Connecting Microservices to the UI has been one of the most challenging aspects of our particular journey; avoiding tight coupling of models/data structures, UI workflows that are independent of microservice process flows, and perhaps the toughest one for us: authorization. These are the hidden dark-sides of this distributed architecture pattern, but they too can be overcome. Some experimentation with your particular system is likely required.
It really depends on your business case. If the REST svc is dropping message in message queue , then after dropping the message we simply return the reference ID that client can poll to check the progress.
E.g. flight search where your system has to calls 100s of backend services to show you flight deals . Search api will drop the message in the queue and save the same in the database with some reference ID and you return same id to client. Once worker are done with the message they will update the reference in DB with results and meanwhile your client will be polling (or web sockets preferably) to update the UI with results.
The idea is you can't block the request and keep everything async , this will make system scaleable.

Communication between microservices - request data

I am dealing with communication between microservices.
For example (fictive example, just for the illustration):
Microservice A - Store Users (getUser, etc.)
Microservice B - Store Orders (createOrder, etc.)
Now if I want to add new Order from the Client app, I need to know user address. So the request would be like this:
Client -> Microservice B (createOrder for userId 5) -> Microservice A (getUser with id 5)
The microservice B will create order with details (address) from the User Microservice.
PROBLEM TO SOLVE: How effectively deal with communication between microservice A and microservice B, as we have to wait until the response come back?
OPTIONS:
Use RestAPI,
Use AMQP, like RabbitMQ and deal with this issue via RPC. (https://www.rabbitmq.com/tutorials/tutorial-six-dotnet.html)
I don't know what will be better for the performance. Is call faster via RabbitMQ, or RestAPI? What is the best solution for microservice architecture?
In your case using direct REST calls should be fine.
Option 1 Use Rest API :
When you need synchronous communication. For example, your case. This option is suitable.
Option 2 Use AMQP :
When you need asynchronous communication. For example when your order service creates order you may want to notify product service to reduce the product quantity. Or you may want to nofity user service that order for user is successfully placed.
I highly recommend having a look at http://microservices.io/patterns/index.html
It all depends on your service's communication behaviour to choose between REST APIs and Event-Based design Or Both.
What you do is based on your requirement you can choose REST APIs where you see synchronous behaviour between services
and go with Event based design where you find services needs asynchronous behaviour, there is no harm combining both also.
Ideally for inter-process communication protocol it is better to go with messaging and for client-service REST APIs are best fitted.
Check the Communication style in microservices.io
REST based Architecture
Advantage
Request/Response is easy and best fitted when you need synchronous environments.
Simpler system since there in no intermediate broker
Promotes orchestration i.e Service can take action based on response of other service.
Drawback
Services needs to discover locations of service instances.
One to one Mapping between services.
Rest used HTTP which is general purpose protocol built on top of TCP/IP which adds enormous amount of overhead when using it to pass messages.
Event Driven Architecture
Advantage
Event-driven architectures are appealing to API developers because they function very well in asynchronous environments.
Loose coupling since it decouples services as on a event of once service multiple services can take action based on application requirement. it is easy to plug-in any new consumer to producer.
Improved availability since the message broker buffers messages until the consumer is able to process them.
Drawback
Additional complexity of message broker, which must be highly available
Debugging an event request is not that easy.
Personally I am not a fan of using a message broker for RPC. It adds unnecessary complexity and overhead.
How do you host your long-lived RabbitMQ consumer in your Users web service? If you make it some static singleton, in your web service how do you deal with scaling and concurrency? Or do you make it a stand-alone daemon process? Now you have two User applications instead of one. What happens if your Users consumer slows down, by the time it consumes the request message the Orders service context might have timed-out and sent another message or given up.
For RPC I would suggest simple HTTP.
There is a pattern involving a message broker that can avoid the need for a synchronous network call. The pattern is for services to consume events from other services and store that data locally in their own database. Then when the time comes when the Orders service needs a user record it can access it from its own database.
In your case, your Users app doesn't need to know anything about orders, but your Orders app needs to know some details about your users. So every time a user is added, modified, removed etc, the Users service emits an event (UserCreated, UserModified, UserRemoved). The Orders service can subscribe to those events and store only the data it needs, such as the user address.
The benefit is that is that at request time, your Orders service has one less synchronous dependency on another service. Testing the service is easier as you have fewer request time dependencies. There are also drawbacks however such as some latency between user record changes occuring and being received by the Orders app. Something to consider.
UPDATE
If you do go with RabbitMQ for RPC then remember to make use of the message TTL feature. If the client will timeout, then set the message expiration to that period. This will help avoid wasted work on the part of the consumer and avoid a queue getting backed up under load. One issue with RPC over a message broker is that once a queue fills up it can add long latencies that take a while to recover from. Setting your message expiration to your client timeout helps avoid that.
Regarding RabbitMQ for RPC. Normally we use a message broker for decoupling and durability. Seeing as RPC is a synchronous communication, that is, we are waiting for a response, then durability is not a consideration. That leaves us decoupling. The question is does that decoupling buy you anything over the decoupling you can do with HTTP via a gateway or Docker service names?

Service Fabric dynamic partitioning

So I am doing some research into using Service Fabric for a very large application. One thing I need to have is a service that is partitioned by name, which seems fairly trivial at the application manifest level.
However, I really would like to be able to add and remove named partitions on the fly without having to republish the application.
Each partition represents our equivalent of a tenant, and we want to have a backend management app to add new tenants.
Each partition will be a long-running application that fires up a TCP server that uses a custom protocol, and I'll need to be able to query for the address by name from the cluster.
Is this possible with Service Fabric, and if so is there any documentation on this, or something I should be looking for?
Each partition represents our equivalent of a tenant, and we want to have a backend management app to add new tenants.
You need to rethink your model. Partitioning is for distributing data so it accessible fast, for read and write. But within the same logical container.
If you want to do some multitenant in Service Fabric you can deploy an Application multiple times to the cluster.
From Visual Studio it seems you can only have one instance of an Application. This is because in the ApplicationManifest.xml there are DefaultServices defined. This is okay for developing on the local Service Fabric cluster. For production you might want to consider deploying the application with powershell, this will open up the possibility to deploy the same application multiple times with settings for each instance(like: tenant name, security, ... )
And not only Applications can be deployed multiple times, stateful/stateless services as well. So you could have one application and for each tenant you deploy a service of a certain type. Services are findable via the naming service inside Service Fabric, see the FabricClient class for more info on that.
It is not possible to change the partition count for an existing application.
From https://azure.microsoft.com/en-us/documentation/articles/service-fabric-concepts-partitioning/#plan-for-partitioning (emphasis mine):
In rare cases, you may end up needing more partitions than you have initially chosen. As you cannot change the partition count after the fact, you would need to apply some advanced partition approaches, such as creating a new service instance of the same service type. You would also need to implement some client-side logic that routes the requests to the correct service instance, based on client-side knowledge that your client code must maintain.
You are encouraged to do up-front capacity planning to determine the maximum number of partitions you will need - and if you end up needing more, you'll need to implement some special client side handling to cope.
We had the same problem and ended up creating an instance of the service for each tenant. This is pretty easy to do and will scale to any number of tenants.

Rolling Over Streaming Connections During Upgrades

I am working on an application that uses Amazon Kinesis, and one of the things I was wondering about is how you can roll over an application during an upgrade without data loss on streams. I have heard about things like blue/green deployments and such, but I was wondering what is the best practice for upgrading a data streaming service so you don't loose data from your streams.
For example, my application has an HTTP endpoint that ingests data as a series of POST operations. If I want to replace the service with a newer version, how do I manage existing application streaming to my endpoint?
One common method is having a software load balancer (LB) with a virtual IP; behind this LB there would be at least two HTTP ingestion endpoints during normal operation. During upgrade, each endpoint is announced out and upgraded in turn. The LB ensures that no traffic is forwarded to an announced out endpoint.
(The endpoints themselves can be on separate VMs, Docker containers or physical nodes).
Of course, the stream needs to be finite; the TCP socket/HTTP stream is owned by one of the endpoints. However, as long as the stream can be stopped gracefully, the following flow works, assuming endpoint A owns the current ingestion:
Tell endpoint A not to accept new streams. All new streams will be redirected only to endpoint B by the LB.
Gracefully stop existing streams on endpoint A.
Upgrade A.
Announce A back in.
Rinse and repeat with endpoint B.
As a side point, you would need two endpoints with a load balanced (or master/slave) set-up if you require any reasonable uptime and reliability guarantees.
There are more bespoke methods which allow hot code swap on the same endpoint, but they are more bespoke and rely on specific internal design (e.g. separate process between networking and processing stack connected by IPC).