Ansible use current hosts for task as a variable - deployment

I have the following code that connects to the logger service haproxies and drains the first logger VM.
Then in a separate task connects to the logger lists of hosts, where the first host is drained and does a service reload.
- name: Haproxy Warmup
hosts: role_max_logger_lb
tasks:
- name: analytics-backend 8300 range
haproxy: 'state=disabled host=maxlog-rwva1-{{ env }}-1.example.com backend=analytics-backend socket=/var/run/admin.sock'
become: true
when: warmup is defined and buildnum is defined
- name: logger-backend 8200
haproxy: 'state=disabled host=maxlog-rwva1-prod-1.example.com :8200 backend=logger-backend socket=/var/run/admin.sock'
become: true
when: warmup is defined and buildnum is defined
- name: Warmup Deploy
hosts: "role_max_logger"
serial: 1
tasks:
- shell: pm2 gracefulReload max-logger
when: warmup is defined and buildnum is defined
- pause: prompt="First host has been deployed to. Please verify the logs before continuing. Ctrl-c to exit, Enter to continue deployment."
when: warmup is defined and buildnum is defined
This code is pretty bad and doesn't work when I try to expand it to do a rolling restart for several services with several haproxies. I'd need to somehow drain 33% of all the app VMs from the haproxy backend and then connect to a different list and do the 33% reboot process there. Then resume at 34-66% of the draining list and then resume at 34% and 66% on the reboot list.
- name: 33% at a time drain
hosts: "role_max_logger_lb"
serial: "33%"
tasks:
- name: analytics-backend 8300 range
haproxy: 'state=disabled host=maxlog-rwva1-prod-1.example.com
backend=analytics-backend socket=/var/run/admin.sock'
become: true
when: warmup is defined and buildnum is defined
- name: logger-backend 8200
haproxy: 'state=disabled host=maxlog-rwva1-prod-1.example.com:8200 backend=logger-backend socket=/var/run/admin.sock'
become: true
when: buildnum is defined and service is defined
- name: 33% at a time deploy
hosts: "role_max_logger"
serial: "33%"
tasks:
- shell: pm2 gracefulReload {{ service }}
when: buildnum is defined and service is defined
- pause: prompt="One third of machines in the pool have been deployed to. Enter to continue"
I could do this much easier in Chef, just query the chef server for all nodes registered in a given role and do all my logic in real ruby. If it matters the host lists I'm calling here are actually ripped from my Chef server and fed in as json.
I don't know what the proper Ansible way of doing this without being able to drop into arbitrary scripting to do all the dirty work.
I was thinking maybe I could do something super hacky like this inside of the of a shell command in Ansible under the deploy, which might work if there is a way of pulling the current host that is being processed out of the host list, like an Ansible equivalent of node['fqdn'] in Chef.
ssh maxlog-lb-rwva1-food-1.example.com 'echo "disable server logger-backend/maxlog-rwva1-food-1.example.com:8200" | socat stdio /run/admin.sock'
Or maybe there is a way I can wrap my entire thing in a serial 33% and include sub-plays that do things. Sort of like this, but again I don't know how to properly pass around a thirded list of my app servers within the sub-plays
- name: Deployer
hosts: role_max_logger
serial: "33%"
- include: drain.yml
- include: reboot.yml
Basically I don't know what I'm doing, I can think of a bunch of ways of trying to do this but they all seem terrible and overly obtuse. If I were to go down these hacky roads I would probably be better off just writing a big shell script or actual ruby to do this.
Reading lots of official Ansible documentation for this has overly simplified examples that don't really map to my situation.
Particularly here where the load balancer is on the same host as the app server.
- hosts: webservers
serial: 5
tasks:
- name: take out of load balancer pool
command: /usr/bin/take_out_of_pool {{ inventory_hostname }}
delegate_to: 127.0.0.1
http://docs.ansible.com/ansible/playbooks_delegation.html
I guess my questions are:
Is there an Ansible equivalent of Chef's node['fqdn'] to use the currently being processed host as a variable
Am I just completely off the rails for how I'm trying to do this?

Is there an Ansible equivalent of Chef's node['fqdn'] to use the currently being processed host as a variable
ansible_hostname, ansible_fqdn (both taken from the actual machine settings) or inventory_hostname (defined in the inventory file) depending which you want to use.

As you correctly noted, you need to use delegation for this task.
Here is some pseudocode for you to start with:
- name: 33% at a time deploy
hosts: role_max_logger
serial: 33%
tasks:
- name: take out of lb
shell: take_out_host.sh --name={{ inventory_hostname }}
delegate_to: "{{ item }}"
with_items: "{{ groups['role_max_logger_lb'] }}"
- name: reload backend
shell: reload_service.sh
- name: add back to lb
shell: add_host.sh --name={{ inventory_hostname }}
delegate_to: "{{ item }}"
with_items: "{{ groups['role_max_logger_lb'] }}"
I assume that group role_max_logger defines servers with backend services to be reloaded and group role_max_logger_lb defines servers with load balancers.
This play take all hosts from role_max_logger, splits them into 33% batches; then for each host in the batch it executes take_out_host.sh on each of load balancers passing current backend hostname as parameter; after all hosts from current batch are disabled on load balancers, backend services are reloaded; after that, hosts are added back to LB as in the first task. This operation is then repeated for every batch.

Related

How to set up kong service and routes from yaml file in Azure devops pipeline

So I have this yaml file with kong service, routes and plugins for a microservice:
_format_version: "1.1"
_info:
defaults: {}
select_tags:
- ms-planning-and-finance
services:
- connect_timeout: 60000
enabled: true
host: ms-planning-and-finance-svc.pdgr-business-services.svc.cluster.local
name: planning-and-finance-api
path: /api/planning-and-finance
port: 4002
protocol: http
read_timeout: 60000
retries: 5
routes:
- https_redirect_status_code: 426
name: planning-and-finance
path_handling: v0
paths:
- /api/planning-and-finance
plugins:
- config:
bearer_only: "yes"
client_id: ...
client_secret: ...
...
...
and I have its CICD pipeline configured in Azure devops (a YAML pipeline), which has a kong step where it creates the service, routes and plugins by using CURL (http PUT and POST requests).
Now Im trying to update that step so it becomes simpler, in the sense that I would like to use that kong.yaml file above to create everything "at once". I'm still researching on this but I haven't found anything useful so far...
How can I "call" that kong.yaml file from my azure yaml pipeline, in order to create those kong resources?
Haven't found anything useful so far...
After some better research, we configured deck in the agent we're using.
So now the pipelines will call that to sync the changes in the yaml file with the ones currently in the kong gateway. More specifically, it uses deck to:
Ping the connection to kong (tests if it connects to the endpoint successfully);
Validates the state YAML file with the configurations to update/create in kong;
Syncs the changes in the state file with the current kong configuration.
(deck CLI reference)

Traefik, Docker - different docker-compose.yml files

I need to use Traefik for reverse proxy, for docker. My user case requires to spin up containers from different docker-compose.yml files. Ideally I want to use on docker-compose.yml file for Traefik itself and different docker-compose.yml files for my other websites. Our websites are interconnected but come from different development streams (and different repositories).
This is for the dev to be able to pull down the sites to their local, spin up each one, develope code, and then push up to the relevant depository.
I am looking for examples on how to use labels correctly to do this (if this is the correct way).
Thanks A.
To use Traefik and its labels for dynamic deployments is probably the best choice you can make. It will make the routing so easy to work with. We use it inside docker swarm, but that's just compose with a few extra steps, so you can reuse our configuration.
You must have 1 common network for all containers & Traefik to share it so it can parse the labels.
For the labels on the services side I use:
labels:
# Traefik
- "traefik.enable=true"
- "traefik.docker.network=traefik-proxy" #that common network i was talking about
# Routers
- "traefik.http.routers.service-name.rule=Host(`$SWARM_HOST`) && PathPrefix(`/service-path`)"
- "traefik.http.routers.service-name.service=service-name"
- "traefik.http.routers.service-name.entrypoints=http" #configuration inside traefik stack
- "traefik.http.routers.service-name.middlewares=strip-path-prefix" # we use this to strip the /service-path/... part off the request so all requests hit / inside our containers (no need to worry about that on the API side)
# Services
- "traefik.http.services.service-name.loadbalancer.server.port=${LISTEN_PORT}"
For the actual Traefik service I will attach the whole compose configuration and you can cut out only parts you need and skip the swarm specific stuff:
version: '3.9'
services:
traefik:
# Use the latest v2.2.x Traefik image available
image: traefik:v2.5.4
healthcheck:
test: ["CMD", "traefik", "healthcheck", "--ping"]
interval: 10s
timeout: 5s
retries: 3
start_period: 15s
deploy:
mode: global
update_config:
order: start-first
failure_action: rollback
parallelism: 1
delay: 15s
monitor: 30s
restart_policy:
condition: any
delay: 10s
max_attempts: 3
labels:
# Enable Traefik for this service, to make it available in the public network
- "traefik.enable=true"
# Use the traefik-public network (declared below)
- "traefik.docker.network=traefik-proxy"
# Uses the environment variable DOMAIN
- "traefik.http.routers.dashboard.rule=Host(`swarm-traefik.company.org`)"
- "traefik.http.routers.dashboard.entrypoints=http"
# Use the special Traefik service api#internal with the web UI/Dashboard
- "traefik.http.routers.dashboard.service=api#internal"
# Enable HTTP Basic auth, using the middleware created above
- "traefik.http.routers.dashboard.middlewares=admin-auth"
# Define the port inside of the Docker service to use
- "traefik.http.services.dashboard.loadbalancer.server.port=8080"
# Middlewares
- "traefik.http.middlewares.strip-path-prefix.replacepathregex.regex=^/[a-z,0-9,-]+/(.*)"
- "traefik.http.middlewares.strip-path-prefix.replacepathregex.replacement=/$$1"
# admin-auth middleware with HTTP Basic auth
- "traefik.http.middlewares.admin-auth.basicauth.users=TODO_GENERATE_USER_BASIC_AUTH"
placement:
constraints:
- "node.role==manager"
volumes:
# Add Docker as a mounted volume, so that Traefik can read the labels of other services
- /var/run/docker.sock:/var/run/docker.sock:ro
command:
# Enable Docker in Traefik, so that it reads labels from Docker services
- --providers.docker
# Do not expose all Docker services, only the ones explicitly exposed
- --providers.docker.exposedbydefault=false
# Enable Docker Swarm mode
- --providers.docker.swarmmode
# Adds default network
- --providers.docker.network=traefik-proxy
# Create an entrypoint "http" listening on port 80
- --entrypoints.http.address=:80
# Enable the Traefik log, for configurations and errors
- --log
#- --log.level=INFO
# Enable the Dashboard and API
- --api
# Enable Access log - in our case we dont need it because we have Nginx infront which has top level access logs
# - --accesslog
# Enable /ping healthcheck route
- --ping=true
# Enable zipkin tracing & configuration
#- --tracing.zipkin=true
#- --tracing.zipkin.httpEndpoint=https://misc-zipkin.company.org/api/v2/spans
networks:
# Use the public network created to be shared between Traefik and
# any other service that needs to be publicly available with HTTPS
- traefik-proxy
networks:
traefik-proxy:
external: true

Start a depends_on docker only if it is not started

I have a service that runs in a docker. For reasons I want to run a suite of tests on it in parallel, for example integration tests and performance tests.
I have a docker-compose.yaml that looks like this:
# My service - the thing under test in this scenario
service:
ports:
- 4000:4000
...
# Integration tests
integration:
depends_on:
- service
...
# Performance tests
performance:
depends_on:
- service
...
I would like to continue to expose 4000 so that components outside of docker world can interact with it. However when I run these tests in parallel I get this error for one of the tests
Cannot start service service ... 0.0.0.0:4000 failed: port is already in use.
This is because docker-compose is trying to start a service for each of the tests. Is it possible to tell docker-compose to use the same instance of service? Is there a better way to achieve the same results?
I've solved this for myself and I'll document it here for anyone who faces a similar problem in the future.
Publishing ports from the service by default is the problem here. Depending on the context of how the service is started, the ports can be published or not. Better to use the docker subnet for communication between the docker containers.
The docker-compose.yaml would look more like this now:
service:
# no ports declaration
...
integration:
depends_on:
- service
environment:
- SERVICE_URL=http://service:4000
...
performance:
depends_on:
- service
environment:
- SERVICE_URL=http://service:4000
...
Instead ports are published when needed with whatever starts the service:
docker-compose run -p 4000:4000 service

Ansible - default to everything when no arguments are specified

I have a fairly large playbook that is capable of updating up to 10 services on a given host.
Let's say I have the services a b c d and I'd like to be to be able to selectively update the services by passing command line arguments, but default to updating everything when no arguments are passed. How could you do this in Ansible without being able to drop into arbitrary scripting?
Right now what I have is a when check on each service and define whether or not the service is true at playbook invocation. Given I may have as many as 10 services, I can't write boolean logic to accommodate every possibility.
I was hoping there is maybe a builtin like $# in bash that lists all arguments and I can do a check along the lines of when: $#.length = 0
ansible-playbook deploy.yml -e "a=true b=true d=true"
when: a == "true"
when: b == "true"
when: c == "true"
when: d == "true"
I would suggest to use tags. Lets say we have two services for example nginx and fpm. Then tag the play for nginx with nginx and for fpm with fpm. Below is an example for task level tagging, let say its named play.yml
- name: tasks for nginx
service: name=nginx state=reloaded
tags:
- nginx
- name: tasks for php-fpm
service: name=php-fpm state=reloaded
tags:
- fpm
Exeucting ansible-playbook play.yml will by default run both the tasks. But, If i change the command to
ansible-playbook play.yml --tags "nginx"
then only the task with nginx tag is executed. Tags can also be applied over play level or role level.
Play level tagging would look like
- hosts: all
remote_user: user
tasks:
- include: play1.yml
tags:
- play1
- include: play2.yml
tags:
- play2
In this case, all tasks inside the playbook play1.yml will inherit the tag play1 and the same for play2. While running ansible-playbook with the tag play1 then all tasks inside play1.yml are executed. Rather if we dont specify any tag all tasks from play1 and play2 are executed.
Note: A tasks is not limited to just one tag.
If you have a single play that you want to loop over the services, define that list in group_vars/all or somewhere else that makes sense:
services:
- first
- second
- third
- fourth
Then your tasks in start_services.yml playbook can look like this:
- name: Ensure passed variables are in services list
fail:
msg: "{{ item }} not in services list"
when: item not in services
with_items: "{{ varlist | default(services) }}"
- name: Start services
service:
name: "{{ item }}"
state: started
with_items: "{{ varlist | default(services) }}"
Pass in varlist as a JSON array:
$ ansible-playbook start_services.yml --extra-vars='{"varlist":[first,third]}'

How to make this ansible chkconfig task idempotent ?

I have a ansible task like this in my playbook to be run against a centos server:
- name: Enable services for automatic start
action: command /sbin/chkconfig {{ item }} on
with_items:
- nginx
- postgresql
This task changes every time I run it. How do I make this task pass the idempotency test ?
The best option is to use the enabled=yes with service module:
- name: Enable services for automatic start
service:
name: "{{ item }}"
enabled: yes
with_items:
- nginx
- postgresql
Hope that help you.