Is there a sane way to stagger a cron job across 4 hosts with Ansible? - deployment

I've been experimenting with writing playbooks for a few days and I'm writing a playbook to deploy an application right now. It's possible that I may be discovering it's not the right tool for the job.
The application is deployed HA across 4 systems on 2 sites and has a worst case SLA of 1 hour. That's being accomplished with a staggered cron that runs every 15 minutes. i.e. s1 runs at 0, s2 runs at 30 s3 runs at 15, ...
I've looked through all kinds of looping and cron and other modules that Ansible supports and can't really find a way that it supports incrementing an integer by 15 as it moves across a list of hosts, and maybe that's a silly way of doing things.
The only communication that these 4 servers have with each other is a directory on a non-HA NFS share. So the reason I'm doing it as a 15 minute staggered cron is to survive network partitions and the death of the NFS connection.
My other thoughts are ... I can just bite the bullet, make it a */15, and have an architecture that relies on praying that NFS never dies which would make writing the Ansible playbook trivial. I'm also considering deploying this with Fabric or a Bash script, it's just that the process for getting implementation plans approved, and for making changes by following them is very heavy, and I just want to simplify the steps someone has to take late at night.

Solution 1
You could use host_vars or group_vars, either in separate files, or directly in the inventory.
I will try to produce a simple example, that fits your description, using only the inventory file (and the playbook that applies the cron):
[site1]
host1 cron_restart_minute=0
host2 cron_restart_minute=30
host3 cron_restart_minute=15
host4 cron_restart_minute=45
[site2]
host5 cron_restart_minute=0
host6 cron_restart_minute=30
host7 cron_restart_minute=15
host8 cron_restart_minute=45
This uses host variables, you could also create other groups and use group variables, if the repetition became a problem.
In a playbook or role, you can simply refer to the variable.
On the same host:
- name: Configure the cron job
cron:
# your other options
minute: "{{ cron_restart_minute }}"
On another host, you can access other hosts variables like so:
hostvars[host2].cron_restart_minute
Solution 2
If you want a more dynamic solution, for example because you keep adding and removing hosts, you could set a variable in a task using register or set_fact, and calculate, for example by the number of hosts in the only group that the current host is in.
Example:
- name: Set fact for cron_restart_minute
set_fact:
cron_restart_minute: "{{ 60 / groups[group_names[0]].length * (1 + groups[group_names[0]].index(inventory_hostname)) | int }}"
I have not tested this expression, but I am confident that it works. It's Python / Jinja2. group_names is a 1 element array, given above inventory, since no host is in two groups at the same time. groups contains all hosts in a group, and then we find its length or the index of the current host by its inventory_hostname (0, 1, 2, 3).
Links to relevant docs:
Inventory
Variables, specifically this part.

Related

Ansible release with serial: 50% for two different backend in haproxy

I have the following configuration in haproxy.
backend 1
machine-1 machine-1.com:8080
machine-2 machine-2.com:8080
machine-3 machine-3.com:8080
machine-4 machine-4.com:8080
machine-5 machine-5.com:8080
machine-6 machine-6.com:8080
machine-7 machine-7.com:8080
machine-8 machine-8.com:8080
machine-9 machine-9.com:8080
machine-10 machine-10.com:8080
backend 2
machine-11 machine-11.com:8080
machine-12 machine-12.com:8080
Serial is set to 50% in ansible rolling deployment.We also change the state of the machines to maintenance in this window. Thus ansible puts machine 1-6 in maintenance mode in the first go while making 7-12 as maintenance in the second go.
As it puts 7-12 as maintenance in the second go; the backend 2 cluster has no nodes online to take the traffic. This causes a huge number of issues on the application side.
How should I remediate this? I am using ansible 2.0.0.
EDIT 1
Two solutions that I can think of
Make two releases for two backends
replace one machine from 1-6 with one machine in backend 2, say 11.
Looking for solutions other than these. more in the line of using ansible to solve it.
Creating a host groups for each backend and run the update for each backend group in a separate run would be IMHO the best solution. If there is no way to do that it is possible to define batch sizes as a list since Ansible 2.2.
So this should work:
- name: test play
hosts: backend servers.
serial:
- 5
- 1

How to handle file paths in distributed environment

I'm working on setting up a distributed celery environment to do OCR on PDF files. I have about 3M PDFs and OCR is CPU-bound so the idea is to create a cluster of servers to process the OCR.
As I'm writing my task, I've got something like this:
#app.task
def do_ocr(pk, file_path):
content = run_tesseract_command(file_path)
item = Document.objects.get(pk=pk)
item.content = ocr_content
item.save()
The question I have what the best way is to make the file_path work in a distributed environment. How do people usually handle this? Right now all my files simply live in a simple directory on one of our servers.
If your are in linux environment the easiest way is mount a remote filesystem, using sshfs, in the /mnt folder foreach node in cluster. Then you can pass the node name to do_ocr function and work as all data is local to current node
For example, your cluster has N nodes named: node1, ... ,nodeN
Let's configure node1, foreach node mount remote filesystem. Here's a sample node1's /etc/fstab file
sshfs#user#node2:/var/your/app/pdfs /mnt/node2 fuse port=<port>,defaults,user,noauto,uid=1000,gid=1000 0 0
....
sshfs#user#nodeN:/var/your/app/pdfs /mnt/nodeN fuse port=<port>,defaults,user,noauto,uid=1000,gid=1000 0 0
In current node (node1) create a symlink named as current server pointing to pdf's path
ln -s /var/your/app/pdfs node1
Your mnt folder should contain remote's filesystem and a symlink
user#node1:/mnt$ ls -lsa
0 lrwxrwxrwx 1 user user 16 apr 12 2016 node1 -> /var/your/app/pdfs
0 lrwxrwxrwx 1 user user 16 apr 12 2016 node2
...
0 lrwxrwxrwx 1 user user 16 apr 12 2016 nodeN
Then your function should look like this:
import os
MOUNT_POINT = '/mtn'
#app.task
def do_ocr(pk, node_name, file_path):
content = run_tesseract_command(os.path.join(MOUNT_POINT,node_name,file_path))
item = Document.objects.get(pk=pk)
item.content = ocr_content
item.save()
It works like all files are in the current machine but there's remote-logic working for you transparently
Well, there are multiple ways to handle it, but let's stick to one of the simpliest one:
since you'd like to process big amount of files using multiple servers, my first suggestion would be to use the same OS in each server, so you won't have to worry about cross-platform compatibility
using the word 'cluster' indicates that all of those servers should know their mutual state - it adds complexity, try to switch to the farm of stateless workers (by 'stateless' I mean "not knowing about other's" as they should be aware of at least their own state, e.g.: IDLE, IN_PROGRESS, QUEUE_FULL or more if needed)
for the file list processing part you could use pull or push model:
push model could be easily implemented by a simple app that crawls the files and dispatches them (e.g.: over SCP, FTP, whatever) to a set of available servers; servers can monitor their local directories for changes and pick up new files to process; it's also very easy to scale - just spin up more servers and update the push client (even in runtime); the only limit is your push client's performance
pull model is a little bit more tricky, cause you have to handle more complexity; having a set of servers implicates having a proper starting index per node and offset - it will make error handling more difficult, plus, it doesn't scale easily (imagine adding twice as more servers to speedup the processing and updating indices and offsets properly on each node.. seems like an error-prone solution)
I assume that the network traffic isn't a big concern - having 3M files to process will generate it somewhere, one way or the other..
collecting/storing the results is a different ballpark, but here the list of possible solutions is limitless
Since I miss a lot of your architecture details and your application specifics, you can take this answer as a guiding answer rather than a strict one.
You can take this approach, in the following order:
1- deploy an internal file server that stores all the files in one place and serve them
Example:
http://interanal-ip-address/storage/filenameA.pdf
http://interanal-ip-address/storage/filenameB.pdf
http://interanal-ip-address/storage/filenameC.pdf
and so on ...
2- Install/Deploy Redis
3- Create an upload client/service/process that takes the files you want to upload and pass them to the above storage location (/storage/), so your files will be available once they are uploaded, at the same time push the full file path URL to a predefined Redis List/Queue (build on linked lists data structure), like this: http://internal-ip-address/storage/filenameA.pdf
You can get more details here about LPUSH and RPOP under Redis Lists here: http://redis.io/topics/data-types-intro
Examples:
A file upload form, that stores the files directly to storage area
A file upload utility/command-line/background-process, that you can create it yourself or use some existing tool to upload files to the storage location, that gets the files from specific location, be it a web address or some other server that has your files
4- Now we come to your celery workers, each one of your workers should pull (RPOP) one of the files URLs from Redis queue, download the file from your internal file server (we built in first step), and do the required processing on the way you wanted it to be.
An important thing to note from Redis documentation:
Lists have a special feature that make them suitable to implement
queues, and in general as a building block for inter process
communication systems: blocking operations.
However it is possible that sometimes the list is empty and there is
nothing to process, so RPOP just returns NULL. In this case a consumer
is forced to wait some time and retry again with RPOP. This is called
polling, and is not a good idea in this context because it has several
drawbacks
So Redis implements commands called BRPOP and BLPOP which are versions
of RPOP and LPOP able to block if the list is empty: they'll return to
the caller only when a new element is added to the list, or when a
user-specified timeout is reached.
Let me know if that answers your question.
Things to keep in mind
You can add as many workers as you want since this solution is very
scalable, and your only bottleneck is Redis server, which you can make cluster of and persist your queue in case of power outage or server crash
You can replace redis with RabbitMQ, Beanstalk, Kafka, or any other queuing/messaging system, but Redis has ben nominated in this race due to simplicity and meriad of features introduced out of the box.

Octopus - deploying multiple copies of same service

I've got an Octopus deployment for an NServiceBus consumer. Until recently, there's only been one queue to consume. Now we're trying to get smart about putting different types of messages in different queues. Right now we've broken that up into 3 queues, but that number might increase in the future.
The plan now is to install the NSB consumer service 3 times, in 3 separate folders, under 3 different names. The only difference in the 3 deployments will be an app.config setting:
<add key="NsbConsumeQueue" value="RedQueue" />
So we'll have a Red service, a Green service and a Blue service, and each one will be configured to consume the appropriate queue.
What's the best way to deploy these 3 services in Octopus? My ideal would be to declare some kind of list of services somewhere e.g.
ServiceName QueueName
----------- ---------
RedService RedQueue
GreenService GreenQueue
BlueService BlueQueue
and loop through those services, deploying each one in its own folder, and substituting the value of NsbConsumeQueue in app.config to the appropriate value. I don't think this can be done using variables, which leaves PowerShell.
Any idea how to write a PS script that would do this?
At my previous employer, we used the following script to deploy from Octopus:
http://www.layerstack.net/blog/posts/deploying-nservicebus-with-octopus-deploy
Add the two Powershell scripts to your project that contains the NServiceBus host. Be sure to override the host identifier or ServicePulse will go mad, because every deployment gets its own folder, due to Octopus.
But as mentioned in the comments, be sure that you're splitting endpoints for the right reason. We also had/have at least 4 services, but that's because we have a logical separation. For example, we have a finance service where all finance messages go to. And a sales service where all sales services go to. This follows the DDD bounded context principle and is there for reasons. I hope your services aren't actually called red, green and blue! :)
Powershell should not be needed for this. Variables in Octopus can be scoped to a step in the deployment process. So you could have 3 steps, one for each service, and 3 variables for the queue names, each scoped to one of the steps.
You could also add variables for the service names, and use those variables in the process step settings. That would let you see both the service names and queue names from the variables page.

Mongodb cluster with aws cloud formation and auto scaling

I've been investigating creating my own mongodb cluster in AWS. Aws mongodb template provides some good starting points. However, it doesn't cover auto scaling or when a node goes down. For example, if I have 1 primary and 2 secondary nodes. And the primary goes down and auto scaling kicks in. How would I add the newly launched mongodb instance to the replica set?
If you look at the template, it uses an init.sh script to check if the node being launched is a primary node and waits for all other nodes to exist and creates a replica set with thier ip addresses on the primary. When the Replica set is configured initailly, all the nodes already exist.
Not only that, but my node app uses mongoose. Part of the database connection allows you to specify multiple nodes. How would I keep track of what's currently up and running (I guess I could use DynamoDB but not sure).
What's the usual flow if an instance goes down? Do people generally manually re-configure clusters if this happens?
Any thoughts? Thanks.
This is a very good question and I went through this very painful journey myself recently. I am writing a fairly extensive answer here in the hope that some of these thoughts of running a MongoDB cluster via CloudFormation are useful to others.
I'm assuming that you're creating a MongoDB production cluster as follows: -
3 config servers (micros/smalls instances can work here)
At least 1 shard consisting of e.g. 2 (primary & secondary) shard instances (minimum or large) with large disks configured for data / log / journal disks.
arbiter machine for voting (micro probably OK).
i.e. https://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/
Like yourself, I initially tried the AWS MongoDB CloudFormation template that you posted in the link (https://s3.amazonaws.com/quickstart-reference/mongodb/latest/templates/MongoDB-VPC.template) but to be honest it was far, far too complex i.e. it's 9,300 lines long and sets up multiple servers (i.e. replica shards, configs, arbitors, etc). Running the CloudFormation template took ages and it kept failing (e.g. after 15 mintues) which meant the servers all terminated again and I had to try again which was really frustrating / time consuming.
The solution I went for in the end (which I'm super happy with) was to create separate templates for each type of MongoDB server in the cluster e.g.
MongoDbConfigServer.template (template to create config servers - run this 3 times)
MongoDbShardedReplicaServer.template (template to create replica - run 2 times for each shard)
MongoDbArbiterServer.template (template to create arbiter - run once for each shard)
NOTE: templates available at https://github.com/adoreboard/aws-cloudformation-templates
The idea then is to bring up each server in the cluster individually i.e. 3 config servers, 2 sharded replica servers (for 1 shard) and an arbitor. You can then add custom parameters into each of the templates e.g. the parameters for the replica server could include: -
InstanceType e.g. t2.micro
ReplicaSetName e.g. s1r (shard 1 replica)
ReplicaSetNumber e.g. 2 (used with ReplicaSetName to create name e.g. name becomes s1r2)
VpcId e.g. vpc-e4ad2b25 (not a real VPC obviously!)
SubnetId e.g. subnet-2d39a157 (not a real subnet obviously!)
GroupId (name of existing MongoDB group Id)
Route53 (boolean to add a record to an internal DNS - best practices)
Route53HostedZone (if boolean is true then ID of internal DNS using Route53)
The really cool thing about CloudFormation is that these custom parameters can have (a) a useful description for people running it, (b) special types (e.g. when running creates a prefiltered combobox so mistakes are harder to make) and (c) default values. Here's an example: -
"Route53HostedZone": {
"Description": "Route 53 hosted zone for updating internal DNS (Only applicable if the parameter [ UpdateRoute53 ] = \"true\"",
"Type": "AWS::Route53::HostedZone::Id",
"Default": "YA3VWJWIX3FDC"
},
This makes running the CloudFormation template an absolute breeze as a lot of the time we can rely on the default values and only tweak a couple of things depending on the server instance we're creating (or replacing).
As well as parameters, each of the 3 templates mentioned earlier have a "Resources" section which creates the instance. We can do cool things via the "AWS::CloudFormation::Init" section also. e.g.
"Resources": {
"MongoDbConfigServer": {
"Type": "AWS::EC2::Instance",
"Metadata": {
"AWS::CloudFormation::Init": {
"configSets" : {
"Install" : [ "Metric-Uploading-Config", "Install-MongoDB", "Update-Route53" ]
},
The "configSets" in the previous example shows that creating a MongoDB server isn't simply a matter of creating an AWS instance and installing MongoDB on it but also we can (a) install CloudWatch disk / memory metrics (b) Update Route53 DNS etc. The idea is you want to automate things like DNS / Monitoring etc as much as possible.
IMO, creating a template, and therefore a stack for each server has the very nice advantage of being able to replace a server extremely quickly via the CloudFormation web console. Also, because we have a server-per-template it's easy to build the MongoDB cluster up bit by bit.
My final bit of advice on creating the templates would be to copy what works for you from other GitHub MongoDB CloudFormation templates e.g. I used the following to create the replica servers to use RAID10 (instead of the massively more expensive AWS provisioned IOPS disks).
https://github.com/CaptainCodeman/mongo-aws-vpc/blob/master/src/templates/mongo-master.template
In your question you mentioned auto-scaling - my preference would be to add a shard / replace a broken instance manually (auto-scaling makes sense with web containers e.g. Tomcat / Apache but a MongoDB cluster should really grow slowly over time). However, monitoring is very important, especially the disk sizes on the shard servers to alert you when disks are filling up (so you can either add a new shard to delete data). Monitoring can be achieved fairly easily using AWS CloudWatch metrics / alarms or using the MongoDB MMS service.
If a node goes down e.g one of the replicas in a shard, then you can simply kill the server, recreate it using your CloudFormation template and the disks will sync across automatically. This is my normal flow if an instance goes down and generally no re-configuration is necessary. I've wasted far too many hours in the past trying to fix servers - sometimes lucky / sometimes not. My backup strategy now is run a mongodump of the important collections of the database once a day via a crontab, zip up and upload to AWS S3. This means if the nuclear option happens (complete database corruption) we can recreate the entire database and mongorestore in an hour or 2.
However, if you create a new shard (because you're running out of space) configuration is necessary. For example, if you are adding a new Shard 3 you would create 2 replica nodes (e.g. primary with name => mongo-s3r1 / secondary with name => mongo-s3r2) and 1 arbitor (e.g. with name mongo-s3r-arb) then you'd connect via a MongoDB shell to a mongos (MongoDB router) and run this command: -
sh.addShard("s3r/mongo-s3r1.internal.mycompany.com:27017,mongo-s3r2.internal.mycompany.com:27017")
NOTE: - This commands assumes you are using private DNS via Route53 (best practice). You can simply use the private IPs of the 2 replicas in the addShard command but I have been very badly burned with this in the past (e.g. serveral months back all the AWS instances were restarted and new private IPs generated for all of them. Fixing the MongoDB cluster took me 2 days as I had to reconfigure everything manually - whereas changing the IPs in Route53 takes a few seconds ... ;-)
You could argue we should also add the addShard command to another CloudFormation template but IMO this adds unnecessary complexity because it has to know about a server which has a MongoDB router (mongos) and connect to that to run the addShard command. Therefore I simply run this after the instances in a new MongoDB shard have been created.
Anyways, that's my rather rambling thoughts on the matter. The main thing is that once you have the templates in place your life becomes much easier and defo worth the effort! Best of luck! :-)

Run multiple instance of a process on a number of servers

I would like to run multiple instances of a randomized algorithm. For performance reason, I'd like to distribute the tasks on several machines.
Typically, I run my program as follows:
./main < input.txt > output.txt
and it takes about 30 minutes to return a solution.
I would like to run as many instances of this as possible, and ideally not change the code of the program. My questions are:
1 - What online services offer computing resources that would suit my need?
2 - Practically, how should I launch remotely all the processes, get notified of the termination, and then aggregate the results (basically, pick up the best solution). Is there a simple framework that I could use or should I look into ssh-based scripting?
1 - What online services offer computing resources that would suit my need?
Amazon EC2.
2 - Practically, how should I launch remotely all the processes, get notified of the termination, and then aggregate the results (basically, pick up the best solution). Is there a simple framework that I could use or should I look into ssh-based scripting?
Amazon EC2 has an API for launching virtual machines. Once they're launched, you can indeed use ssh to control jobs, and I would recommend this solution. I would expect that other softwares for distributed job management exist, but they aren't likely to be any simpler to configure than ssh.