Service Fabric Strategies for Bi-Directional Communication with External Devices - azure-service-fabric

My company is interested in using a stand-alone Service Fabric cluster to manage communications with robots. In our scenario, each robot would host its own rosbridge server, and our Service Fabric application would maintain WebSocket clients to each robot. I envision a stateful service partitioned along device ids which opens connections on startup. It should monitor connection health via heartbeats, pass messages from the robots to some protocol gateway service, and listen to other services for messages to pass to the robots.
I have not seen discussion of this style of external communications in the Service Fabric documentation - I cannot tell if this is because:
There are no special considerations for managing WebSockets (or any two-way network protocol) this way from Service Fabric. I've seen no discussion of restrictions and see no reason, conceptually, why I can't do this. I originally thought replication would be problematic (duplicate messages?), but since only one replica can be primary at any time this appears to be a non-issue.
Service Fabric is not well-suited to bi-directional communication with external devices
I would appreciate some guidance on whether this architecture is feasible. If not, discussion on why it won't work will be helpful. General discussion of limitations around bi-directional communication between Service Fabric services and external devices is welcome. I would prefer if we could keep discussion to stand-alone clusters - we have no plans to use Azure services at this time.

Any particular reason you want SF to host the client and not the other way around?
Doing the way you suggest, I think you will face big challenges to make SF find these devices on your network and keep track of them, for example, Firewall, IPs, NAT, planned maintenance, failures, connection issues, unless you are planning to do it by hand.
From the brief description I saw in the docs your provided about rosbridge server, I could understand that you have to host it on a Server(like you would with a service fabric service) and your devices would connect to it, in this case, your devices would have installed the ROS to make this communication.
Regarding your concerns about the communication, service fabric services are just executable programs you would normally run on your local machine, if it works there will likely work on service fabric environment on premise, the only extra care you have to worry is the external access to the cluster(if in azure or network configurations) and service discovery.
In my point of view, you should use SF as the central point of communication, and each device would connect to SF services.
The other approach would be using Azure IoT Hub to bridge the communication between both. There is a nice Iot Hub + Service Fabric Sample that might be suitable for your needs.
Because you want to avoid Azure, you could in this case replace IoT Hub with another messaging platform or implement the rosbridge in your service to handle the calls.

I hope I understood everything right.
About the obstacles:
I think the major issue here is that bi-directional connection can be established between service replica and the robot.
This has two major problems:
Only primary replica has write access - i.e. only one replica would be able to modify state. This issue hence could be mitigated by creating a separate partition for each robot (but please remember that you can't change partition count after the service was created) or by creating a separate service instance for each robot (this would allow you to dynamically add or remove robots but would require additional logic related to service discoverability).
The replica can be shutdown (terminated), moved to another node (shutdown and start of new replica) or even demoted (the primary replica get's demoted to secondary and another secondary replica get's promoted to primary) by various reasons. So the service code and robot communication code should be able to handle this.
About WebSockets
This looks possible by implementing custom ICommunicationListener and other things using WebSockets.

Related

General guidelines to self-organized task allocation within a Microservice

I have some general problems/questions regarding self managed Microservices (in Kubernetes).
The Situation:
I have a provider (Discord API) for my desired state, which tells me the count (or multiples of the count) of sharded connections (websocket -> stateful in some way) I should establish with the provider.
Currently a have a "monolithic" microservice (it can't be deployed in an autoscaling service and has to be stateful), which determines the count of connections i should have and a factor based on the currently active pods, that can establish a connection to this API.
It further (by heartbeating and updating the connection target of all those pods) manages the state of every pod and achieves this target configuration.
It also handles the case of a pod being removed from the service and a change of target configuration, by rolling out the updated target and only after updating the target discontinuing the old connections.
The Cons:
This does not in any way resemble a good microservice architecture
A failure of the manager (even when persisting the current state in a cache or db of some sort) results in the target of the target provider not being achieved and maybe one of the pods having a failure without graceful handling of the manager
The Pros:
Its "easy" to understand and maintain a centrally managed system
There is no case (assuming a running manager system) where a pod can fail and it wont be handled -> connection resumed on another pod
My Plan:
I would like this websocket connection pods to manage themselves in some way.
Theoretically there has to be a way in which a "swarm" (swarm here is just a descriptive word for pods within a service) can determine a swarm wide accepted target.
The tasks to achieve this target (or change of target) should then be allocated across the swarm by the swarm itself.
Every failure of a member of the swarm has to be recognized, and the now unhandled tasks (in my case websocket connections) have to be resumed on different members of the swarm.
Also updates of the target have to be rolled out across the swarm in a distinct manner, retaining the tasks for the old target till all tasks for the new target are handled.
My ideas so far:
As a general syncing point a cache like redis or a db like mongodb could be used.
Here the current target (and the old target, for creating earlier mentioned smooth target changes) could be stored, along with all tasks that have to be handled to achieve this desired target.
This should be relatively easy to set up and also a "voting process" for the current target could be possible - if even necessary (every swarm member checks the current target of the target provider and the target that is determined by most of the swarm members is set as the vote outcome).
But now we face the problem already mentioned in the pros for the managed system, I currently cant think of a way the failure of a swarm member can be recognized and managed by the swarm consistently.
How should a failure be determined without a constant exchange between swarm members, which i think should be avoided because of the:
swarms should operate entirely target driven and interact with each other as litte as possible
kubernetes itself isn't really designed to have easy intra service communication
Every contribution, idea or further question here helps.
My tech stack would be but isn't limited to:
Java with Micronaut for the application
Grpc as the only exchange protocol
Kubernetes as the orchestrator
Since you're on the JVM, you could use Akka Cluster to take care of failure detection between the pods (even in Kubernetes, though there's some care needed with service meshes to exempt the pod-to-pod communications from being routed through the mesh) and use (as one of many possibilities for this) Distributed Data's implementations of CRDTs to distribute state (in this case the target) among the pods.
This wouldn't require you to use Akka HTTP or Akka's gRPC implementations, so you could still use Micronaut for external interactions. It would effectively create a stateful self-organizing service which presents to Kubernetes as a regular stateless service.
If for some reason Akka isn't appealing, looking through the code and docs for its failure detection (phi-accrual) might provide some ideas for implementing a failure detector using (e.g.) periodic updates to a DB.
Disclaimer: I am employed by Lightbend, which provides commercial support for Akka and employs or has employed at some point most of the contributors to and maintainers of Akka.

Is it possible to deploy a cluster with one service running locally?

Say you have 3 or more services that communicate with each other constantly, if they are deployed remotely to the same cluster all is good cause they can see each other.
However, I was wondering how could I deploy one of those locally, using minikube for instance, in a way that they are still able to talk to each other.
I am aware that I can port-forward the other two so that the one I have locally deployed can send calls to the others but I am not sure how I could make it work for the other two also be able to send calls to the local one.
TL;DR Yes, it is possible but not recommended, it is difficult and comes with a security risk.
Charlie wrote very well in the comment and is absolutely right:
Your local service will not be discoverable by a remote service unless you have a direct IP. One other way is to establish RTC or Web socket connection between your local and remote services using an external server.
As you can see, it is possible, but also not recommended. Generally, both containerization and the use of kubernetes tend to isolate environments. If you want your services to communicate with each other anyway being in completely different clusters on different machines, you need to configure the appropriate network connections over the public internet. It also may come with a security risk.
If you want to set up the environment locally, it will be a much better idea to run these 3 services as an independent whole. Also take into account that the Minikube is mainly designed for learning and testing certain solutions and is not entirely suitable for production solutions.

Service Fabric - Reliable services pub/sub or broadcast events

I could not found any broadcast or pub/sub pattern between Reliable Services in any documentation. Did I miss anything?
My use case is , we need to notify custom event to all the SF stateful service replica in cluster if there any state change in any primary replica.
I am aware of Reliable state manager events which triggers when any change in Reliable collections.
Is there any other broadcast , pub/sub events to communicate between services replicas of the cluster ?
Thanks,
Ashish
Did you see this oss project and package? It allows pub/sub messaging between services.
Why reinvent the wheel?
Service Fabric does not contains a brokered messaging engine because:
There are lot's of options already in the market available for this.
Would make your system tight coupled with service fabric runtime.
Why not just use Service Bus Pub\Sub Topics?
If the concern is latency, why not run RabitMQ, ActiveMQ or any other messaging system as a guest executable service or maybe inside a container.
If you had this feature on SF, you would have to write your services dependent on this feature, once you start adding external dependencies, you gonna face an integration challenge to forward these events to systems outside your cluster, having to create a service listening to these events just to forward it to another queue\topic.
It will just add extra work, complexity and maintenance to your solution.

Is it possible to isolate applications from one another in Service Fabric?

When running a Service Fabric cluster, it would make sense to have multiple applications running in it, but those applications might not be dependant on each other in any way. For example, I can have a CustomerApp in there, and a WikiApp.
Now from a security standpoint, it would be great if the WikiApp could be isolated from the CustomerApp, as a Wiki clearly should not be able to connect to services from an App that is holding customer data. I could put authentication into the services of the CustomerApp itself to allow only calls from authenticated services, but in addition, it would be even better if the WikiApp would not even be able to connect or see the other App and not able to resolve an endpoint adress from the naming service.
So is there a way to really isolate applications from each other in Service Fabric with a platform feature? I could not find anything about it in the documentation, and I also doubt it's possible the way Service Fabric works, but it would be very useful.
And to be clear, I'm really talking about isolating applications (ApplicationTypes) from each other, not services within a single application.
There are some levels of isolation built in:
Application instances have process-level isolation, in that each application instance runs in its own process.
Node isolation is possible, using placement constraints, to "isolate" services from each other by constraining them to run on different nodes.
Container support will be available in the future, where applications and services can run inside containers for further environment and resource isolation.
Services can run under unique user accounts, which you can use to perform authentication yourself at the application level.
But unfortunately there is no fine-grained role-based access mechanism built in to the platform today. So, for example, system-wide operations like running queries to get a list of applications or services or resolving endpoints using the naming service doesn't have any role-based access built in.

MSMQ redundancy

I'm looking into WCF/MSMQ.
Does anyone know how one handles redudancy with MSMQ? It is my understanding that the queue sits on the server, but what if the server goes down and is not recoverable, how does one prevent the messages from being lost?
Any good articles on this topic?
There is a good article on using MSMQ in the enterprise here.
Tip 8 is the one you should read.
"Using Microsoft's Windows Clustering tool, queues will failover from one machine to another if one of the queue server machines stops functioning normally. The failover process moves the queue and its contents from the failed machine to the backup machine. Microsoft's clustering works, but in my experience, it is difficult to configure correctly and malfunctions often. In addition, to run Microsoft's Cluster Server you must also run Windows Server Enterprise Edition—a costly operating system to license. Together, these problems warrant searching for a replacement.
One alternative to using Microsoft's Cluster Server is to use a third-party IP load-balancing solution, of which several are commercially available. These devices attach to your network like a standard network switch, and once configured, load balance IP sessions among the configured devices. To load-balance MSMQ, you simply need to setup a virtual IP address on the load-balancing device and configure it to load balance port 1801. To connect to an MSMQ queue, sending applications specify the virtual IP address hosted by the load-balancing device, which then distributes the load efficiently across the configured machines hosting the receiving applications. Not only does this increase the capacity of the messages you can process (by letting you just add more machines to the server farm) but it also protects you from downtime events caused by failed servers.
To use a hardware load balancer, you need to create identical queues on each of the servers configured to be used in load balancing, letting the load balancer connect the sending application to any one of the machines in the group. To add an additional layer of robustness, you can also configure all of the receiving applications to monitor the queues of all the other machines in the group, which helps prevent problems when one or more machines is unavailable. The cost for such queue-monitoring on remote machines is high (it's almost always more efficient to read messages from a local queue) but the additional level of availability may be worth the cost."
Not to be snide, but you kind of answered your own question. If the server is unrecoverable, then you can't recover the messages.
That being said, you might want to back up the message folder regularly. This TechNet article will tell you how to do it:
http://technet.microsoft.com/en-us/library/cc773213.aspx
Also, it will not back up express messages, so that is something you have to be aware of.
If you prefer, you might want to store the actual messages for processing in a database upon receipt, and have the service be the consumer in a producer/consumer pattern.