Maintaining state between two machines - session-state

We have two industrial controllers that are used to control critical systems. The idea is that on failure of one controller, the other controller will automatically take over. To ensure the swap over is seamless, each the standby controller must mirror the state of the online controller at all time.
We have a solution, which is poorly coded and documented. The question is, is there a common design pattern that implements such a system or open source software that achieves a similar thing thaty could be used to create a generic solution that could be used for controllers or PC's and can be extended to allow any number of controllers to act as standby routines.

On approach is "cache coherence". Commercial products -- Tangosol, for example -- do this.
Another approach is a light-weight version of an Enterprise Service Bus (ESB) or Service Oriented Architecture (SOA). Almost all the SOA vendors have products for this. I'd start with Tibco, which has a lightweight component set that you can use for this.
Since SOA isn't that hard, you can roll your own using the HTTP protocol so one controller can POST status to it's shadow controllers.

There is a difference between failover and transparent failover. Do you really have requirements for transparent failover? If so, you're going to end up paying for it (in both cost and complexity).
That being said, take a look at this post on Buddy Replication for an elegant solution to the problem.

There is the standard Master-Slave pattern used my almost all DBMS' that support clustering, distributed architectures and replication (http://en.wikipedia.org/wiki/Database_replication).
So, very basically in your situation you could have the Master machine maintaining state, and the slave sitting there doing nothing except updating its own state from that of the master. If the master goes down, the slave sees the master is no longer there, and can take over the control of state, with the master only being used again once it has updated its own state from that of the slave (which has maintained state while the master has not been active).

The traditional approach taken in controlling realtime critical systems is to run the two units in lockstep. Tandem have been building some very impressive fault-tolerant machines using this technique for years.
However, lockstep is very much a hardware-level solution; i don't think you could implement classic lockstep purely at the software level. Or at least, not straightforwardly. Maybe using state machines synchronised by exchange of vector clocks or something equally propeller-headed?

There is an analogous situation with the space shuttle computers. In that situation, they used 5 computers and if one machine was late or different from the others, it was (in essence) voted off of the island.
In your situation, how do you determine which controller has gone bad? Is the determining machine also considered for single-point failure?
What level of communications are available between the two controllers? Shared memory, Ethernet, or something even slower?
How fast does state information change between the two?
Is it possible to feed identical information to both controllers and would both controllers calculate the same state transitions?

Maybe a shared SQLite database or something similar?

Related

Microservices communication model

Consider microservices architecture, where you need to expose functionality to manage simple configuration shared with different microservices. Configuration is not changing often, but still, I would like to see changes whenever I ask for any value.
Using REST microservice seems easy, but it is adding latency.
Alternative could be RPC over messaging (i.e. RabbitMQ), but interface becomes more complicated.
What communication are you using for internal, simple services and what are pros and cons?
Any examples?
I tried with REST API, but it means a lot of "slow" requests, which add a latency to overall requests.
I've found that using RESTful APIs with some judicious implementation of cache-control headers actually works fairly well for this use case. The biggest challenge is ensuring that the HTTP client underneath your REST client actually respects the things.
It's fairly easy to implement, fits nicely into HTTP, and generally scales really well. It gives control to the client to decide if they want to respect the caching suggestions, allows server to optimize if it "knows" the configs haven't change (304 Not modified) to optimize if the client wants to ask for new versions.
You don't have to get into anything too complicated from a cache-invalidation, and you can leverage things like edge caching to further accelerate things in interesting ways.
The question to ask is ultimately the extent to which it is a requirement that a change to the configuration immediately affects everything.
If that's actually a requirement, then we're talking about strong consistency which implies some combination of:
all other processing must be effectively executed one-at-a-time against the (there can only ultimately be one: if there's multiple, then they will be affected at different times) component against which the change is made
all other processing must stop for the duration of time that it takes to propagate the change to all components
(these can be combined: you can have multiple instances depend on the configuration and stop for as long as it takes to update those and then you can execute things in parallel... an example of this is making it static configuration in the dependent services and taking them all down to update the configuration: if these updates are sufficiently rare, you can fit them into your error/downtime budget)
Needless to say, there's a (likely surprisingly small) consistency budget you're dealing with.
If you don't actually need strong absolute consistency like I've described (and the set of problems which actually need it is perhaps surprisingly small: anything to do with money for instance doesn't actually need strong consistency because it's only money), then it's a question of how much inconsistency is acceptable (typically you'll quantify this with some sort of bounded staleness and a liveness guarantee that you don't go back in time (unless there's a really good reason to go back in time...)). At this point, we've established that you want eventual consistency, we're just haggling over "how eventual?".
For this, propagating the configuration changes via durable publish-subscribe log (Kafka being the exemplar of this approach) is probably the place to start. Components subscribe to this log and update local state as it changes (and probably store the log position and the last value in some local store to prevent inadvertently going backward in time when they initially read the log). Then you can distribute the configuration so that it's in local memory of the subscribers, though during an update, there will be a window where different subscribers will have different views of that configuration.
A lot of solutions exist to externalize microservice configuration to a central location depending on what frameworks/programming languages you used to build your services. If it happened you would be using Spring, take a look at Spring Cloud Config. Off course Eureka is not the only solution tailored for this purpose.

Akka -- Deploy two ActorSystems on the same host

I'm writing this as a follow up to PlayFramework -- Look up actors in another local ActorSystem, but this time targetting the question specifically to the Akka crowd.
The question is simple: Does it make sense to deploy two ActorSystems on the same host (not just on the same host but even on the same JVM), given that there appears to be no way to simply lookup the other system through system.actorSelection unless you remote to localhost?
In other words, since system1.actorSelection("akka://system2/user/my-actor") does not work, but system1.actorSelection("akka.tcp://system2#127.0.0.1:2552/user/my-actor") does, why even consider deploying two systems?
I suspect you're going to ask about a use case, so here's one for you. Assume I have a complex real-time system using Akka and that this system is deployed as autonomous agents on any number of machines. Ideally, I'd like to have fine-grained control of the resources I allocate to this system and I'd like it to be somewhat isolated. Furthermore, assume that I want to write a small control interface (e.g., a REST API) with the specific purpose to provide input and monitor the real-time system. Naturally, I would make that control system another ActorSystem which interacts with the first system. It makes sense, right? I don't want to have actors running in the same ActorSystem as the real-time processing (for isolation, practicality, separate logging, non pollution of resource monitoring, supervision -- that would add one more branch to the hierarchy --, etc.). That control ActorSystem would never be deployed on a separate machine since it goes hand in hand with the real-time system. Yet, the only way for these two systems to communicate is through loopback tcp.
Is what I'm suggesting not the proper/intended way to do things? Am I missing something? Is there a way to do this that I haven't considered? Does my use case even call for using Akka?
Thanks in advance for your input!
Instead of having two separate actor systems, you could have a top level actor for each of the branches and run each branch on a dedicated dispatcher. Each top level actor will have its own error kernel as well. Having 2 actor systems mostly makes sense, when they are not related, but as yours communicate, I would not separate them.

Why can't we build virtualization that runs on several machines? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
You're probably familiar with virtualization which takes a single host and is able to "emulate" many instances by sharing the resources among them all. You probably heard about XEN.
Is it completely insane to imagine the "opposite" of XEN : a layer that would abstract several hosts in a single running instance? I believe this would allow building apps which wouldn't need to really care much about a "clustering" layer themselves.
I wonder what are the technical limits to this, because I'm pretty sure some people are already working on it somewhere :)
The goal is NOT to achieve any kind of failure recovery. I believe this can (and should?) be handled at a higher level. For example, if someone is able to run a MySQL server on a gigantic instance (made of say 50 hosts), then one can easily use MySQL's replication features to replicate the database over a similar virtual instance.
Good question. Microsoft Azure is attempting to address this by allowing you to put applications "in the cloud" and not have to be as concerned with scalability up/down, redundancy, data storage, etc. But this is not accomplished at the hypervisor level.
http://www.microsoft.com/windowsazure/
Hardware-wise, there are some downsides to having everything be one big VM rather than many smaller ones. For one thing, software doesn't always understand how to handle all the resources. For example, some applications still can't handle multiple processor cores. I've seen informal benchmarks showing that IIS performs better spreading the same resources over multiple instances rather than one giant instance.
From a management perspective, it is probably better to have multiple VMs in certain cases. Imagine that a bad deployment corrupts a node. If that were your one and only (albeit giant) node, now your whole application is down.
You're probably talking about the concept Single System Image.
There used to be a Linux implementation, openMosix that since closed down. I don't know of any replacements. openMosix made it very easy to create and use SSI on a standard Linux kernel; too bad it got overtaken by events.
I do not know enough about Xen to know if it is possible but with VMware you can create pools of resources which come from many physical hosts. Then you can assign the resources to your VMs. That could be many VMs or just one VM.
Aggregation: Transform Isolated Resources into Shared Pools
Simulating a single core over multiple physical cores is very inefficient. You can do it, but it'll be slower than a cluster. Two physical cores can talk to each other in near-real-time, if they're on separate machines then you're doing something like say clocking down your motherboard speed by factors of 10 or more if these two physical cores (and RAM) are communicating even over a fibre optic network.
Dual cores can communicate faster than two distinct CPUs on the same motherboard, if they are on separate machines, thats slower again, if there are multiple machines, slower even again.
Basically you can, but there is net performance loss compared to the net performance gain you would be hoping to achieve.
Real life example, I had a bunch of VMs on a dual quad core server (~2.5Ghz/core) performing way, way below what they should have been. On closer inspection, it turned out that the hypervisor was emulating a single 3.5-4Ghz core when the load on an individual VM was more than 2.5Ghz -- after limiting each VM to 2.5Ghz performance went back to what was expected.
I agree with saidimu, you are talking about the Single System Image concept. In addition to the OpenMosix project, there have been several commercial implementations of the same idea (one contemporary example is ScaleMP). It's not a new idea.
I just wanted to elaborate on some of the technical points of SSI.
Basically, the reason it's not done is because the performance is generally absolutely unpredictable or terrible. There is a concept in computer systems known as [NUMA][3], which basically means that the cost of accessing different pieces of memory is not uniform. This can apply to huge systems where CPUs may have some memory accesses routed around to different chips, or in cases where memory is accessed remotely over a network (such as in SSI). Typically, the operating system will attempt to compensate for this by laying out programs and data in memory in such a way that a program can run as quickly as possible. I.e., the code and data will all be placed in the same NUMA "region", and be scheduled on the closest possible CPU.
However, in cases where you are running big applications (attempting to use all the memory in your SSI), there is little the operating system can do to reduce the impact of remote memory fetches. MySQL is not aware that accessing page 0x1f3c will cost 8 nanoseconds, while accessing page 0x7f46 will stall it for hundreds of microseconds, possibly milliseconds while the memory is fetched over the network. This means that non-NUMA aware applications will run like crap (seriously, very bad) in this kind of environment. As far as I know, most comtemporary SSI products rely on the fastest possible interconnects (such as Infiniband) between machines to achieve even a passable performance.
This is also why frameworks that expose the true cost of accessing data to the programmer (such as MPI: message passing interface) have achieved more traction than SSI or DSM (distributed shared memory) approaches. In fact, there is basically no way for a programmer to optimize an application to run in an SSI environment, which just sucks.

Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

We're developing a server system in Scala + Akka for a game that will serve clients in Android, iPhone, and Second Life. There are parts of this server that need to be highly available, running on multiple machines. If one of those servers dies (of, say, hardware failure), the system needs to keep running. I think I want the clients to have a list of machines they will try to connect with, similar to how Cassandra works.
The multi-node examples I've seen so far with Akka seem to me to be centered around the idea of scalability, rather than high availability (at least with regard to hardware). The multi-node examples seem to always have a single point of failure. For example there are load balancers, but if I need to reboot one of the machines that have load balancers, my system will suffer some downtime.
Are there any examples that show this type of hardware fault tolerance for Akka? Or, do you have any thoughts on good ways to make this happen?
So far, the best answer I've been able to come up with is to study the Erlang OTP docs, meditate on them, and try to figure out how to put my system together using the building blocks available in Akka.
But if there are resources, examples, or ideas on how to share state between multiple machines in a way that if one of them goes down things keep running, I'd sure appreciate them, because I'm concerned I might be re-inventing the wheel here. Maybe there is a multi-node STM container that automatically keeps the shared state in sync across multiple nodes? Or maybe this is so easy to make that the documentation doesn't bother showing examples of how to do it, or perhaps I haven't been thorough enough in my research and experimentation yet. Any thoughts or ideas will be appreciated.
HA and load management is a very important aspect of scalability and is available as a part of the AkkaSource commercial offering.
If you're listing multiple potential hosts in your clients already, then those can effectively become load balancers.
You could offer a host suggestion service and recommends to the client which machine they should connect to (based on current load, or whatever), then the client can pin to that until the connection fails.
If the host suggestion service is not there, then the client can simply pick a random host from it internal list, trying them until it connects.
Ideally on first time start up, the client will connect to the host suggestion service and not only get directed to an appropriate host, but a list of other potential hosts as well. This list can routinely be updated every time the client connects.
If the host suggestion service is down on the clients first attempt (unlikely, but...) then you can pre-deploy a list of hosts in the client install so it can start immediately randomly selecting hosts from the very beginning if it has too.
Make sure that your list of hosts is actual host names, and not IPs, that give you more flexibility long term (i.e. you'll "always have" host1.example.com, host2.example.com... etc. even if you move infrastructure and change IPs).
You could take a look how RedDwarf and it's fork DimDwarf are built. They are both horizontally scalable crash-only game app servers and DimDwarf is partly written in Scala (new messaging functionality). Their approach and architecture should match your needs quite well :)
2 cents..
"how to share state between multiple machines in a way that if one of them goes down things keep running"
Don't share state between machines, instead partition state across machines. I don't know your domain so I don't know if this will work. But essentially if you assign certain aggregates ( in DDD terms ) to certain nodes, you can keep those aggregates in memory ( actor, agent, etc ) when they are being used. In order to do this you will need to use something like zookeeper to coordinate which nodes handle which aggregates. In the event of failure you can bring the aggregate up on a different node.
Further more, if you use an event sourcing model to build your aggregates, it becomes almost trivial to have real-time copies ( slaves ) of your aggregate on other nodes by those nodes listening for events and maintaining their own copies.
By using Akka, we get remoting between nodes almost for free. This means that which ever node handles a request that might need to interact with an Aggregate/Entity on another nodes can do so with RemoteActors.
What I have outlined here is very general but gives an approach to distributed fault-tolerance with Akka and ZooKeeper. It may or may not help. I hope it does.
All the best,
Andy

Erlang: How do you reload an application env configuration?

How do you reload an application's configuration? Or, what are good strategies for managing dynamic application configuration?
For example, let's say I had log levels and I wanted to change them at runtime. Also, let's assume this is one of many such options. Does it make sense to have a "configuration server" that holds configuration state for other parts of the application to query? Do people do that or did I just make it up?
I believe it's reasonable to keep all your configuration data in a repository (subversion, mercurial etc.) and have applications download it every time they start or attempt to reload some their configuration options. This is centralized approach — however you could have many configuration servers to avoid SPOF — and it:
allows you to keep track of changes so that you
know who put these and when (s)he did
that (none wants to be in charge of
unproper configuration);
enables you to use the same configuration for
all applications throughout you
network;
easiness of changes: you can just modify
configuration and notify concerned applications
using gen_server:abcast call or other means.
proplists(3) are useful when reading configuration.
If my understanding is correct, the problem is the following:
You want to create a distributed, scalable system and of course Erlang is the first choice that comes into mind, since it was designed for such purposes.
You will have several nodes that will be running local applications and also distributed applications as well.
Here the simplest hierarchy is to have a hot-standby backup for every major functionality.
This can be achieved by implementing a distributed application controller.
Simplest example is to have a server start on a node, while a slave server is started simultaneously on a mate node.
Distributed Application controllers have many advantages.
Easy example is to handle node_up messages differently by introducing new messages that indicate that a node is not only erlang VM ready, but all vital applications are running. This way the mate node can be sure that the stand-by node is ready and can start sync-ing.
Please elaborate or comment if I misunderstood something.
Good luck!