Anyone have any insight into how GitHub deals with the potential failure or temporary unavailability of a Redis server when using Resque?
There are others that seem to have put together semi-complicated solutions as a holdover for redis-cluster using zookeeper (see https://github.com/ryanlecompte/redis_failover and Solutions for resque failover redis). Others seem to have 'poor mans failover' that switches the slave to the master on first sight of connectivity issues without coordination between redis clients (but this seems problematic in the temporary unavailability scenario).
The question: Has Defunkt ever talked about how GitHub handles Redis failure? Is there a best practice for failover that doesn't involve zookeeper?
The original post on resque states part of the rational for the selection of Redis was the master-slave capability of redis, but the post doesn't describe how GitHub leverages this since all workers need both read+write access to Redis (see https://github.com/blog/542-introducing-resque).
The base Resque library does not handle failures. If a box dies immediately after poping off a message, the message is gone forever. You'll have to write your own code to handle failures, which is quite tricky.
https://github.com/resque/resque/issues/93
Related
I've configured a small cluster with just one primary/backup group.
So far, failover and failback work as expected.
But as this configuration is susceptible to network-isolation problems and I don't think, the pinger approach heals this sufficiently, I would prefer the backup to just be there and receive updates, but not automatically do a failover when the primary is unreachable.
Instead I want an intelligent human with better situational awareness to make the failover decision.
The decreased availability introduced by such a procedure is acceptable for us.
I've tried to get the backup to act this way (arbitrarily delaying failover) by using the following ha-policy > replication > slave parameters:
quorum-size
quorum-vote-wait
vote-retries
vote-retry-wait
but had no success so far.
Is it possible to somehow delay the automatic failover arbitrarily, and trigger the actual failover by changing the broker.xml?
ActiveMQ Artemis doesn't implement the functionality you're looking for - at least not in any automated way. I expect you could arbitrarily delay failover by setting quorum-size to something larger than the actual size of the cluster. However, there is no management operation to tell the backup broker to activate and become live. The only way you could do that would be to stop the broker, change the ha-policy to be a master and then restart the broker.
Zookeeper plays several roles in the open-source workflow framework dolphinscheduler, such as heartbeat detection among masters and workers, task queue,event listener and distributed lock.
dolphin-sche framework
Is it possible to replace it by using database (mysql)? The main reason is to simplify the project structure .
zookeeper in DS is mainly used as:
Task queue, for master sending tasks to worker
Lock, for the communication between host(masters and workers)
Event watcher. Master listens the event that worker added or removed
it costs to replace zk as mysql.
zk mainly assumes the responsibility of the registry and monitors the application status. zk is very mature in this area and is a recognized solution in the industry. If MySQL wants to do this, the technical implementation cost will be larger, and may not achieve the desired effect.
BTW, their team is currently working on the SPI development for the registry, and in later versions, perhaps you can use other components, such as etcd, to achieve similar functionality.
for now, MasterServer and the WorkerServer nodes in the system all use the Zookeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper. We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation.
so now DolphinScheduler can't work fine without Zookeeper, maybe in the future.
DolphinScheduler System Architecture:
For more documents please refer: Official Document.
I want to run a flink job on kubernetes, using a (persistent) state backend it seems like crashing taskmanagers are no issue as they can ask the jobmanager which checkpoint they need to recover from, if I understand correctly.
A crashing jobmanager seems to be a bit more difficult. On this flip-6 page I read zookeeper is needed to be able to know what checkpoint the jobmanager needs to use to recover and for leader election.
Seeing as kubernetes will restart the jobmanager whenever it crashes is there a way for the new jobmanager to resume the job without having to setup a zookeeper cluster?
The current solution we are looking at is: when kubernetes wants to kill the jobmanager (because it want to move it to another vm for example) and then create a savepoint, but this would only work for graceful shutdowns.
Edit:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-HA-with-Kubernetes-without-Zookeeper-td15033.html seems to be interesting but has no follow-up
Out of the box, Flink requires a ZooKeeper cluster to recover from JobManager crashes. However, I think you can have a lightweight implementation of the HighAvailabilityServices, CompletedCheckpointStore, CheckpointIDCounter and SubmittedJobGraphStore which can bring you quite far.
Given that you have only one JobManager running at all times (not entirely sure whether K8s can guarantee this) and that you have a persistent storage location, you could implement a CompletedCheckpointStore which retrieves the completed checkpoints from the persistent storage system (e.g. reading all stored checkpoint files). Additionally, you would have a file which contains the current checkpoint id counter for CheckpointIDCounter and all the submitted job graphs for the SubmittedJobGraphStore. So the basic idea is to store everything on a persistent volume which is accessible by the single JobManager.
I implemented a light version of file-based HA, based on Till's answer and Xeli's partial implementation.
You can find the code in this github repo - runs well in production.
Also wrote a blog series explaining how to run a job cluster on k8s in general and about this file-based HA implementation specifically.
For everyone interested in this, I currently evaluate and implement a similar solution using Kubernetes ConfigMaps and a blob store (e.g. S3) to persist job metadata overlasting JobManager restarts. No need to use local storage as the solution relies on state persisted to blob store.
Github thmshmm/flink-k8s-ha
Still some work to do (persist Checkpoint state) but the basic implementation works quite nice.
If someone likes to use multiple JobManagers, Kubernetes provides an interface to do leader elections which could be leveraged for this.
I recently ran across this Netflix Blog article http://techblog.netflix.com/2013/08/deploying-netflix-api.html
They are talking about red/black deployment where they run the old and new code side by side and direct the production traffic to both of them. If something goes wrong they do a rollback.
How does the directing of the traffic work? and is it possible to adapt this strategy with e.g two Docker containers?
One way of directing traffic is using Weighted Routing, as you can do in AWS Route 53.
Initially you have 100% traffic going to server(s) with old code. Then gradually you change that to have some traffic to server(s) with new code.
Also, as you can read in this blog, you can use Docker to achieve it:
Even with the best testing, things can go wrong after deployment and a
rollback may be required. Containers make this easy and we’ve brought
similar tools to the operating system with Project Atomic. Red/Black
deployments can be done throughout the entire stack with Atomic and
Docker.
I think they use Spinnaker to implement a red/black strategy. https://spinnaker.io/docs/concepts/
I have created a cluster consists of three RabbitMQ nodes using join_cluster command.
i.e.
rabbitmqctl –n rabbit2#MYPC1 join_cluster rabbit2#MYPC1
(currently the cluster runs on a single computer)
Questions:
In the documents it says there is one implemetation for active passive and one for active active.
What did I configure?
How do I know?
How can it be changed?
Is there a big performance trade off between Active Active & Active Passive?
What is the best practice to interact with active/active?
i.e. install a load balancer? apache that will round robin
What is the best practice to interact with active/passive?
if I interact with only the active - this is a single point f failure
Thanks.
I have been doing some research into availability options with RabbitMQ and while I am still fairly new, I'll attempt to answer your questions with the knowledge I do have. Please understand that these answers are not intended to be comprehensive.
Before getting to the questions and answers, I think it's worth pointing out that I think using the terms Active/Active and Active/Passive in the context of a cluster running on a single computer does not really apply. Active/Active and Active/Passive are typically terms used to describe highly available clusters where you have a system of more than one logical server (in your case, multiple RabbitMQ clusters), shared/redundant storage, network capabilities, power, etc.
What did I configure?
Without any load balancing for the nodes in your cluster or queue mirroring you have neither, meaning you do not have a highly available cluster.
How do I know?
RabbitMQ does not provide any connection management so traffic with a failed node will not automatically be passed on to a different node, which is required for an active/active cluster. Without queue mirroring you do not have fully redundant nodes in your cluster, which is required for active/passive.
How can it be changed?
Even if you implement load balancing and/or queue mirroring you are missing a number of requirements to offer a highly-available RabbitMQ cluster. Primarily, with a RabbitMQ cluster you only have a single logical broker (at least two are required for an HA cluster).
Is there a big performance trade off between Active Active & Active Passive?
I think you will start seeing performance penalties as you start introducing data replication and/or redundancy, which would affect both Active/Active and Active/Passive. If you are using synchronous data replication then you will see a bigger performance hit than if you replicate data asynchronously. There's a lot more to it, but to me this feels like there may be a bigger performance hit by using Active/Active but this depends heavily on how fast all of the pieces are working together. In Active/Passive where you may be using asynchronous replication across servers your performance may appear better but in a failover situation you would need to wait for that replication to complete before you can switch to your secondary server.
What is the best practice to interact with active/active? i.e. install a load balancer? apache that will round robin
RabbitMQ recommends using a load balancer so that you do not have to leak details about the nodes in your cluster to the clients.
What is the best practice to interact with active/passive? if I interact with only the active - this is a single point of failure
It is a point of failure but with Active/Passive you can implement a failure strategy to retry the next available server or all remaining servers. With these strategies in place you can establish a scenario where the capabilities of your cluster are merely degraded while a failover is happening instead of totally unavailable. Also, you can interact with the passive side but the types of interactions may be very different (i.e. read-only access) since there may be fewer resources available on the passive side and there may be delays in data replication.
Here are some references used to gather this information:
High-Availability Cluster on Wikipedia
Clustering with RabbitMQ
Highly Available Queues in a RabbitMQ Cluster
High Availability in RabbitMQ