Ansible custom tool ,retry and redeploy - deployment

I am trying to use Ansible as a deployment tool for a set of hosts and I'm not able to find the right way of doing it.
I want to run a custom tool that installs rpm in a host.
Now I can do
ansible dev -i hosts.txt -m shell -a "rpmdeployer --install package_v2.rpm"
But this doesn't give a retry file(failed hosts)
I made a playbook to get the retry file
I tried a simple playbook
---
- hosts: dev
tasks:
- name: deployer
command: rpmdeployer --install package_v2.rpm
I know this not in the spirit of ansible to execute custom commands and scripts. Is there a better way of doing this? Also is there a way to keep trying till all hosts succeeds?

Is there a better way of doing this?
You can write a custom module. The custom module could even be the tool, so you get rid of installing that dependency. Modules can be written in any language but it is advisable to use Python because:
Python is a requirement of Ansible anyway
When using Python you can use the API provided by Ansible
If you'd have a custom module for your tool your task could look like this:
- name: deployer
deployer: package_v2.rpm
Also is there a way to keep trying till all hosts succeeds?
Ansible can automatically retry tasks.
- name: deployer
command: rpmdeployer --install package_v2.rpm
register: result
until: result | success
retries: 42
delay: 1
This works, given your tool returns correct exit codes (0 on success and >0 on failure). If not you can apply any custom condition, e.g. search the stdout for content etc.
I'm not aware of a tool to automatically retry when the playbook actually failed. But it shouldn't be too hard create a small wrapper script to check for the retry file and run the playbook with --limit #foo.retry until it is not re-created.
But I'm not sure that makes sense. If installing an rpm with your tool fails, I guess it is guaranteed it will also fail on any retries, unless there are unknown components in the play like downloading the rpm in the first place. So of course the download could fail then and a retry might succeed.

Related

How to wait for full cloud-initialization before VM is marked as running

I am currently configuring a virtual machine to work as an agent within Azure (with Ubuntu as image). In which the additional configuration is running through a cloud init file.
In which, among others, I have the below 'fix' within bootcmd and multiple steps within runcmd.
However the machine already gives the state running within the azure portal, while still running the cloud configuration phase (cloud_config_modules). This has as a result pipelines see the machine as ready for usage while not everything is installed/configured yet and breaks.
I tried a couple of things which did not result in the desired effect. After which I stumbled on the following article/bug;
The proposed solution worked, however I switched to a rhel image and it stopped working.
I noticed this image is not using walinuxagent as the solution states but waagent, so I tried to replacing that like the example below without any success.
bootcmd:
- mkdir -p /etc/systemd/system/waagent.service.d
- echo "[Unit]\nAfter=cloud-final.service" > /etc/systemd/system/waagent.service.d/override.conf
- sed "s/After=multi-user.target//g" /lib/systemd/system/cloud-final.service > /etc/systemd/system/cloud-final.service
- systemctl daemon-reload
After this, also tried to set the runcmd steps to the bootcmd steps. This resulted in a boot which took ages and eventually froze.
Since I am not that familiar with rhel and Linux overall, I wanted to ask help if anyone might have some suggestions which I can additionally try.
(Apply some other configuration to ensure await on the cloud-final.service within a waagent?)
However the machine already had the state running, while still running the cloud configuration phase (cloud_config_modules).
Could you please be more specific? Where did you read the machine state?
The reason I ask is that cloud-init status will report status: running until cloud-init is done running, at which point it will report status: done
I what is the purpose of waiting until cloud-init is done? I'm not sure exactly what you are expecting to happen, but here are a couple of things that might help.
If you want to execute a script "at the end" of cloud-init initialization, you could put the script directly in runcmd, and if you want to wait for cloud-init in an external script you could do cloud-init status --wait, which will print a visual indicator and eventually return once cloud-init is complete.
On not too old Azure Linux VM images, cloud-init rather than WALinuxAgent acts as the VM provisioner. The VM is marked provisioned by the Azure cloud-init datasource module very early during cloud-init processing (source), before any cloud-init modules configurable with user data. WALinuxAgent is only responsible for provisioning Azure VM extensions. It does not appear to be possible to delay sending the 'VM ready' signal to Azure without modifying the VM image and patching the source code of cloud-init Azure datasource.

Docker compose build order

I have a problem with docker compose and build order. Below is my dockerfile for starting my .net application
As you can see as part of my build process I run some tests using "RUN dotnet test backend_test/backend_test.csproj"
These tests require a mongodb database to be present.
I try to solve this dependency with docker-compose and its "depends_on" feature, see below.
However this doesn't seem to work as when I run "docker-compose up" I get the following:
The tests eventually timeout since there is no mongodb present.
Does depends_on actually affect build order at all or does it only affect start-order (i.e builds everything the proceeds to start in correct order) ?
Is there another way of doing this ? (I want tests to run as part of building my final app)
Ty in advance, let me know If you need extra information
As you guessed, depends_on is for runtime order only, not build time - it just affects docker-compose up and docker-compose stop.
I highly recommend you make all the builds independent of each other. Perhaps you need to consider separate builder and runtime images here, and / or use a Docker-based CI (Gitlab, Travis, Circle etc) to have these dependencies available for testing.
Note also, depends_on often disappoints people - as it just waits for Docker's startup to finish, not the application startup. So your DB / service / whatever may still be starting up when the container that depends on it start will start using it, causing timeouts etc. This is why HEALTH_CHECK now exists (with a similar healthcheck feature in Docker Compose)

Snakemake: sub-workflow error when executing in LSF cluster

I have two Snakemake workflows that are very similar. Both of them share a sub-workflow and a couple of includes. Both of them work when doing dry runs. Both of them use the same cluser config file, and I'm running them with the same launch command. One of them fails when submitting to the LSF cluster with this error:
Executing subworkflow wf_common.
WorkflowError:
Config file __default__ not found.
I'm wondering whether it's "legal" in Snakemake for two workflows to share a sub-workflow, like in this case, and if not, whether the fact that I ran the workflow that does work first could have this effect.
Can you try Snakemake 3.12.0? It fixed a bug with passing the cluster config to a subworkflow. I would think that this solves your problem.

How to use sidekiq on Bluemix

I'm trying to use sidekiq on Bluemix. I think that I'm on the right track, but it's not working completely.
I have an app with Sinatra that uses sidekiq jobs to make many actions. I set the following line in my manifest.yml file:
command: bundle exec rackup config.ru -p $PORT && bundle exec sidekiq -r ./server.rb -c 3
I thought that with this command sidekiq will run, However, when I call the endpoint that creates a job, it's still on the "Queue" section on the Sidekiq panel.
What actions do I need to take to get sidekiq to process the job?
PS: I'm beginner on Bluemix. I'm trying to migrate my app from Heroku to Bluemix.
Straightforward answer to this question "as asked":
Your start-up command does not evaluate a second part, the one after '&&'. If you try starting that in your local environment, the result'll be the same. The server will start up and the console will simply tail the server logs, not technically evaluating to true until you send it a kill signal (so the part after '&&' will never run at the same time).
Subbing that with just '&' sort-of-kinda fixes that, since both will run at the same time.
command: bundle exec rackup config.ru -p $PORT & bundle exec sidekiq
What is not ideal with that solution? Eh, probably a lot of stuff. The biggest offender though: having two processes active at the same time, only one of them expected and observed ( the second one ).
Sending '(bluemix) cf stop' to the application instance created by the manifest with this command stops only the observed one before decommissioning the instance - in that case we can not be sure that the first process freed up external resources by properly sending notifications or closing the connections, or whatever.
What you probably could consider instead:
1. Point one.
Bluemix is a CF implementation, and with a quick manifest.yml deploy, there is nothing preventing you from having the app server and sidekiq workers run on separate instances.
2. Better shell.
command: sh -c 'command1 & command2 & wait'
**3. TBD, probably a lot of options, but I am a beginner as well. **
Separate app instances on CloudFoundry for your rack-based application and your workers would be preferable because you can then:
Scale web / workers independently (more traffic? Just scale the web application)
Deploy each component independently, if needed
Make sure each process is health-checked
The downside of using & to join commands, as suggested in the other answer, is that the first process will launch in the background. This means you won't have reliable monitoring and automatic restarts if the first process crashes.
There's a slightly out of date example on the CloudFoundry website which demos using two application manifests (one for web, one for workers) to deploy each part independently.

How to take Jenkins master node offline using CLI?

I have been manually taking the master (and only) node offline whenever I sync our development database to avoid a bunch of false test failures. I have a script that does the full DB import and would like to automate the node maintenance as well.
How can I mark the master node temporarily offline using the command-line interface JAR?
I wrote a simple Bash script to execute Jenkins tasks.
I can authenticate using that script.
$ jenkins who-am-i
Authenticated as: david
Authorities:
david
authenticated
However, I cannot get offline-node or online-node to recognize the master node. The help states that I can omit the node name for "master", but that doesn't work.
$ jenkins offline-node
Argument "NAME" is required
java -jar jenkins-cli.jar online-node NAME
Stop using a node for performing builds temporarily, until the next "online-node" command.
NAME : Slave name, or empty string for master
It seems to be looking specifically for a slave, but I need to take the master's executor offline.
$ jenkins offline-node master
No such slave "master" exists. Did you mean "null"?
It's not exactly intuitive, but the Jenkins documentation is correct. If you want to specify the master node for offline-node or online-node, use the empty string:
java -jar jenkins-cli.jar offline-node ""
That said, you should probably use #gareth_bowles answer anyways in case you add slaves in the future.
If you only have one build executor on the master and no standalone build nodes, use this command instead:
java -jar jenkins-cli.jar quiet-down
This will stop any new builds from executing. You can use
java -jar jenkins-cli.jar cancel-quiet-down
to put Jenkins back on line; at this point it will run any builds that were queued up while it was offline.