Is there any standard location for persistent files in AWX? - ansible-awx

In AWX, is there a standard place where I can save files like the one generated by a task like the following temp1.txt?:
- name: Create user temp1
user:
name: "temp1"
groups: sudo
shell: /bin/bash
password: "{{ lookup('password', 'credentials/temp1.txt encrypt=sha512_crypt length=12') }}"
createhome: yes

In the past I've did some troubleshooting of executing from Ansible Tower (awx).
The Ansible Tower or AWX installation will be usually done under an account and group of awx, as well the execution of projects, playbooks, etc.. To be able to save files and store information as requested
... is there a standard place where I can save files ...
you will need to do have the rights to write. So it could be possible to place files under /var/lib/awx/projects or after creation of a folder with correct rights under
ll /home/
drwx------. 2 awx awx 4096 Jan 21 12:34 awx
This means, you will need to create the structure for storing your files persistent by yourself.
Please take note that one could have a Clustered Envirionment and therefore more than one Tower Node. Which is mainly the reason why there is no "standard place" for storage on the Execution Node. For more information see
Doucmentation
Instance Groups and Isolated Nodes
Execution Environments
Instance Groups

Related

How to run docker-compose across different lifecycle environments

How to run docker-compose across different lifecycle environments (say dev, qa, staging, production).
Sometimes a larger VM is being shared by multiple developers, so would like to start the containers with appropriate developer specific suffixes (say dev1, dev2, dev3 ..). Should port customization be handled manually via the environment file (i.e. .env file)
This is an unusual use case for docker-compose, but I'll leave some tips anyway! :)
There's two different ways to name stuff you start with docker-compose. One is to name the service that you specify under the main services: key of your docker-compose.yml file. By default, individual running containers will be assigned names indicating what project they are from (by default, the name of the directory from which your docker-compose file is in), what service they run (this is what's specified under your services: key), and which instance of that service they are (this number changes if eg. you're using replicas). Eg. default container names for a service named myservice specified in a compose file ~/my_project/docker/docker-compose.yml will have a name like docker_myservice_1 (or _2, _3, etc if more than one container is supposed to run).
You can use environment variables to specify a lot of key-value pairs in docker-compose files, but you can't conditionally specify the service name - service keys are only allowed to have alphanumeric characters in them and compose files can't look like eg:
version: "3"
services:
${ENVVAR}:
image: ubuntu:20.04
However, you can override the container naming scheme by using the container_name field in your docker-compose file (documentation for usage here). Maybe a solution you could use looks like this:
version: "3"
services:
myservice:
image: ubuntu:20.04
container_name: ${DEVELOPER_ENVVAR?err}
this will require a developer to specify DEVELOPER_ENVVAR at runtime, either by exporting it in their shell or by running docker-compose like DEVELOPER_ENVVAR=myservice_dev1 docker-compose up. Note that using container_name is incompatible with using replicas to run multiple containers for the same service - the names have to be unique for those running containers, so you'll either have to define separate services for each name, or give up on using container_name.
However, you're in a pickle if you expect multiple developers to be able to run containers with different names using the same compose file in the same directory. That's because when starting a service, docker-compose has a Recreating step where, if there's already containers implementing that service running, they'll wait for that container to finish. Ultimately, I think this is for the best - if multiple developers were trying to run the exact same compose project at once, should a developer have control over other developers' running containers? Probably not, right?
If you want multiple developers to be able to run services at once in the same VM, I think you probably want to do two things:
first, (and you may well have already done this! but it's still a good reminder) make sure that this is a good idea. Are there going to be resource contention issues (eg. for port-forwarding) that make different running instances of your project conflict? For many Docker services, there are going to be, but there probably won't be for eg. images that are meant to be run in a swarm.
second, have different compose files checked out in different directories, so that there are separate compose projects for each developer. To use .env files one way one obvious option is to just maintain separate copies, one per developer directory. If, for your use case, it's unsatisfactory to maintain one copy of .env per developer this way, you could use symlinks named .env (or whatever your env file is named) to the same file somewhere else on the VM.
After you've done this, you'll be able to tell from the container names who is running what.
If none of these are satisfactory, you might want to consider, eg. using one VM per developer, or maybe even considering using a different container management system than docker-compose.
I have done very similar automation and I've used Ansible to create "docker compose" config on the fly.
So based on input-Environment , the ansible playbook will create the relevant docker-compose file. So basically I have a docker-compose template in my git repository with values that are dynamic and ansible playbook populates them etc.
and also you can use ansible to trigger such creation or automation one after another
A similar sample has been posted at ansible_docker_splunk repository.
Basically the whole project is to automate end-to-end docker cluster from CSV file

Is there a sane way to stagger a cron job across 4 hosts with Ansible?

I've been experimenting with writing playbooks for a few days and I'm writing a playbook to deploy an application right now. It's possible that I may be discovering it's not the right tool for the job.
The application is deployed HA across 4 systems on 2 sites and has a worst case SLA of 1 hour. That's being accomplished with a staggered cron that runs every 15 minutes. i.e. s1 runs at 0, s2 runs at 30 s3 runs at 15, ...
I've looked through all kinds of looping and cron and other modules that Ansible supports and can't really find a way that it supports incrementing an integer by 15 as it moves across a list of hosts, and maybe that's a silly way of doing things.
The only communication that these 4 servers have with each other is a directory on a non-HA NFS share. So the reason I'm doing it as a 15 minute staggered cron is to survive network partitions and the death of the NFS connection.
My other thoughts are ... I can just bite the bullet, make it a */15, and have an architecture that relies on praying that NFS never dies which would make writing the Ansible playbook trivial. I'm also considering deploying this with Fabric or a Bash script, it's just that the process for getting implementation plans approved, and for making changes by following them is very heavy, and I just want to simplify the steps someone has to take late at night.
Solution 1
You could use host_vars or group_vars, either in separate files, or directly in the inventory.
I will try to produce a simple example, that fits your description, using only the inventory file (and the playbook that applies the cron):
[site1]
host1 cron_restart_minute=0
host2 cron_restart_minute=30
host3 cron_restart_minute=15
host4 cron_restart_minute=45
[site2]
host5 cron_restart_minute=0
host6 cron_restart_minute=30
host7 cron_restart_minute=15
host8 cron_restart_minute=45
This uses host variables, you could also create other groups and use group variables, if the repetition became a problem.
In a playbook or role, you can simply refer to the variable.
On the same host:
- name: Configure the cron job
cron:
# your other options
minute: "{{ cron_restart_minute }}"
On another host, you can access other hosts variables like so:
hostvars[host2].cron_restart_minute
Solution 2
If you want a more dynamic solution, for example because you keep adding and removing hosts, you could set a variable in a task using register or set_fact, and calculate, for example by the number of hosts in the only group that the current host is in.
Example:
- name: Set fact for cron_restart_minute
set_fact:
cron_restart_minute: "{{ 60 / groups[group_names[0]].length * (1 + groups[group_names[0]].index(inventory_hostname)) | int }}"
I have not tested this expression, but I am confident that it works. It's Python / Jinja2. group_names is a 1 element array, given above inventory, since no host is in two groups at the same time. groups contains all hosts in a group, and then we find its length or the index of the current host by its inventory_hostname (0, 1, 2, 3).
Links to relevant docs:
Inventory
Variables, specifically this part.

Copying directories into minikube and persisting them

I am trying to copy some directories into the minikube VM to be used by some of the pods that are running. These include API credential files and template files used at run time by the application. I have found you can copy files using scp into the /home/docker/ directory, however these files are not persisted over reboots of the VM. I have read files/directories are persisted if stored in the /data/ directory on the VM (among others) however I get permission denied when trying to copy files to these directories.
Are there:
A: Any directories in minikube that will persist data that aren't protected in this way
B: Any other ways of doing the above without running into this issue (could well be going about this the wrong way)
To clarify, I have already been able to mount the files from /home/docker/ into the pods using volumes, so it's just the persisting data I'm unclear about.
Kubernetes has dedicated object types for these sorts of things. API credential files you might store in a Secret, and template files (if they aren't already built into your Docker image) could go into a ConfigMap. Both of them can either get translated to environment variables or mounted as artificial volumes in running containers.
In my experience, trying to store data directly on a node isn't a good practice. It's common enough to have multiple nodes, to not directly have login access to those nodes, and for them to be created and destroyed outside of your direct control (imagine an autoscaler running on a cloud provider that creates a new node when all of the existing nodes are 90% scheduled). There's a good chance your data won't (or can't) be on the host where you expect it.
This does lead to a proliferation of Kubernetes objects and associated resources, and you might find a Helm chart to be a good resource to tie them together. You can check the chart into source control along with your application, and deploy the whole thing in one shot. While it has a couple of useful features beyond just packaging resources together (a deploy-time configuration system, a templating language for the Kubernetes YAML itself) you can ignore these if you don't need them and just write a bunch of YAML files and a small control file.
For minikube, data kept in $HOME/.minikube/files directory is copied to / directory in VM host by minikube.

Opsworks Chef 12 recipes

Has anyone attempted to convert Opsworks Chef v11 recipes to Chef v12?
Im running multiple stacks on Chef 11 and decided to start converting some of them to Chef 12. Since AWS dropped their opsworks app layers, such as rails layer recipes, we (opsworks users) are now responsible for creating deploy user, git checkout repos into deploy_to, etc.
Its all good with flexibility and no more namespace conflicts, but we are missing all that good stuff opsworks was giving us for free.
Wonder if someone converted recipes for Chef 12 and open-sourced? Otherwise, is community interested in these recipes at all? Im pretty sure Im not alone here.
Thank you in advance!
The opsworks_ruby cookbook on the Supermarket is basically everything you need. It even puts the apps into the same directories (i.e. /srv/www/app_name/), sets up your database.yml, etc, etc.
The main difference between this recipe and other non-OpsWorks recipes is that this will pull everything out of the OpsWorks configuration for you. You don't have to customize the recipes, just make sure your app and layers are named correctly - it'll build everything from there - including your RDS configuration for the database.yml!
The main difference is that the layers in OpsWorks won't be "Ruby aware" so you won't have fields for Rails-ish or Ruby-ish things and instead will need to manage those elsewhere. The way ENV vars is loaded is a little different too.
Also be sure to read up on AWS' implementation of Chef 12 for OpsWorks. They technically have two chef cookbooks running, their internal one and yours. Theirs include managing the agent is up-to-date, loading up users (for ssh), wiring up monitoring, etc. You'll have to manage the rest.
We've either replaced stuff from their huge cookbook with individual cookbooks from the Supermarket or just rewrote it. For instance, the old Chef 11 opsworks_initial_setup had a couple things around tweaking network and linux settings - we recreated that.
It also does use deploy users as applicable, for instance:
$ ps -eo user,command
USER COMMAND
// snip
root nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
aws opsworks-agent: master 10820
aws opsworks-agent: keep_alive of master 10820
aws opsworks-agent: statistics of master 10820
aws opsworks-agent: process_command of master 10820
deploy unicorn_rails master --env production --daemonize -c /srv/www/app/shared/config/unicorn.conf
deploy unicorn_rails worker[0] --env production --daemonize -c /srv/www/app/shared/config/unicorn.conf
deploy unicorn_rails worker[1] --env production --daemonize -c /srv/www/app/shared/config/unicorn.conf
deploy unicorn_rails worker[2] --env production --daemonize -c /srv/www/app/shared/config/unicorn.conf
deploy unicorn_rails worker[3] --env production --daemonize -c /srv/www/app/shared/config/unicorn.conf
nginx nginx: worker process
nginx nginx: worker process
Just a small example of the process output but root boots things as needed and each process utilizes their own users to limit rights and access.
I think the most common way is to use the "application" cookbook from the supermarket: https://supermarket.chef.io/cookbooks/application/versions/4.1.6 (which is also based on Poise). Attention: use version ~4, they removed almost all of the good features in v5.
It will create the directory structure, supports different deploy-strategies and offers some events to hook.
Be aware: In my opinion, the Opsworks documentation is semi-good when it comes to the "deploy with opsworks and chef12" topic: The information from the gui (like repo-url etc), is not on the node object but in a databag from the application. For debugging it can be very helpful to have a look into the /var/chef/runs/<run-id>/ directory to see what is available from where.
Small snippet that shows the idea:
app = search("aws_opsworks_app").first
application "#{app['shortname']}" do
owner 'root'
group 'root'
repository app['app_source']['url']
revision 'master'
path "/srv/#{app['shortname']}"
end
This will create the releases/current directory structure on /srv and checkout the code. Note: you could think that ssh-key you specify in the GUI is somehow automatically put in the proper place. Its not, you'll have to take care of that on your own. Check the chef11 opsworks cookbook: https://github.com/aws/opsworks-cookbooks/blob/release-chef-11.10/scm_helper/libraries/git.rb
I don't know about the old OpsWorks cookbooks but check out https://github.com/poise/application_examples/ for some examples of doing doing Rails (and more) deploys using plain Chef (will work on OpsWorks too).

How to handle file paths in distributed environment

I'm working on setting up a distributed celery environment to do OCR on PDF files. I have about 3M PDFs and OCR is CPU-bound so the idea is to create a cluster of servers to process the OCR.
As I'm writing my task, I've got something like this:
#app.task
def do_ocr(pk, file_path):
content = run_tesseract_command(file_path)
item = Document.objects.get(pk=pk)
item.content = ocr_content
item.save()
The question I have what the best way is to make the file_path work in a distributed environment. How do people usually handle this? Right now all my files simply live in a simple directory on one of our servers.
If your are in linux environment the easiest way is mount a remote filesystem, using sshfs, in the /mnt folder foreach node in cluster. Then you can pass the node name to do_ocr function and work as all data is local to current node
For example, your cluster has N nodes named: node1, ... ,nodeN
Let's configure node1, foreach node mount remote filesystem. Here's a sample node1's /etc/fstab file
sshfs#user#node2:/var/your/app/pdfs /mnt/node2 fuse port=<port>,defaults,user,noauto,uid=1000,gid=1000 0 0
....
sshfs#user#nodeN:/var/your/app/pdfs /mnt/nodeN fuse port=<port>,defaults,user,noauto,uid=1000,gid=1000 0 0
In current node (node1) create a symlink named as current server pointing to pdf's path
ln -s /var/your/app/pdfs node1
Your mnt folder should contain remote's filesystem and a symlink
user#node1:/mnt$ ls -lsa
0 lrwxrwxrwx 1 user user 16 apr 12 2016 node1 -> /var/your/app/pdfs
0 lrwxrwxrwx 1 user user 16 apr 12 2016 node2
...
0 lrwxrwxrwx 1 user user 16 apr 12 2016 nodeN
Then your function should look like this:
import os
MOUNT_POINT = '/mtn'
#app.task
def do_ocr(pk, node_name, file_path):
content = run_tesseract_command(os.path.join(MOUNT_POINT,node_name,file_path))
item = Document.objects.get(pk=pk)
item.content = ocr_content
item.save()
It works like all files are in the current machine but there's remote-logic working for you transparently
Well, there are multiple ways to handle it, but let's stick to one of the simpliest one:
since you'd like to process big amount of files using multiple servers, my first suggestion would be to use the same OS in each server, so you won't have to worry about cross-platform compatibility
using the word 'cluster' indicates that all of those servers should know their mutual state - it adds complexity, try to switch to the farm of stateless workers (by 'stateless' I mean "not knowing about other's" as they should be aware of at least their own state, e.g.: IDLE, IN_PROGRESS, QUEUE_FULL or more if needed)
for the file list processing part you could use pull or push model:
push model could be easily implemented by a simple app that crawls the files and dispatches them (e.g.: over SCP, FTP, whatever) to a set of available servers; servers can monitor their local directories for changes and pick up new files to process; it's also very easy to scale - just spin up more servers and update the push client (even in runtime); the only limit is your push client's performance
pull model is a little bit more tricky, cause you have to handle more complexity; having a set of servers implicates having a proper starting index per node and offset - it will make error handling more difficult, plus, it doesn't scale easily (imagine adding twice as more servers to speedup the processing and updating indices and offsets properly on each node.. seems like an error-prone solution)
I assume that the network traffic isn't a big concern - having 3M files to process will generate it somewhere, one way or the other..
collecting/storing the results is a different ballpark, but here the list of possible solutions is limitless
Since I miss a lot of your architecture details and your application specifics, you can take this answer as a guiding answer rather than a strict one.
You can take this approach, in the following order:
1- deploy an internal file server that stores all the files in one place and serve them
Example:
http://interanal-ip-address/storage/filenameA.pdf
http://interanal-ip-address/storage/filenameB.pdf
http://interanal-ip-address/storage/filenameC.pdf
and so on ...
2- Install/Deploy Redis
3- Create an upload client/service/process that takes the files you want to upload and pass them to the above storage location (/storage/), so your files will be available once they are uploaded, at the same time push the full file path URL to a predefined Redis List/Queue (build on linked lists data structure), like this: http://internal-ip-address/storage/filenameA.pdf
You can get more details here about LPUSH and RPOP under Redis Lists here: http://redis.io/topics/data-types-intro
Examples:
A file upload form, that stores the files directly to storage area
A file upload utility/command-line/background-process, that you can create it yourself or use some existing tool to upload files to the storage location, that gets the files from specific location, be it a web address or some other server that has your files
4- Now we come to your celery workers, each one of your workers should pull (RPOP) one of the files URLs from Redis queue, download the file from your internal file server (we built in first step), and do the required processing on the way you wanted it to be.
An important thing to note from Redis documentation:
Lists have a special feature that make them suitable to implement
queues, and in general as a building block for inter process
communication systems: blocking operations.
However it is possible that sometimes the list is empty and there is
nothing to process, so RPOP just returns NULL. In this case a consumer
is forced to wait some time and retry again with RPOP. This is called
polling, and is not a good idea in this context because it has several
drawbacks
So Redis implements commands called BRPOP and BLPOP which are versions
of RPOP and LPOP able to block if the list is empty: they'll return to
the caller only when a new element is added to the list, or when a
user-specified timeout is reached.
Let me know if that answers your question.
Things to keep in mind
You can add as many workers as you want since this solution is very
scalable, and your only bottleneck is Redis server, which you can make cluster of and persist your queue in case of power outage or server crash
You can replace redis with RabbitMQ, Beanstalk, Kafka, or any other queuing/messaging system, but Redis has ben nominated in this race due to simplicity and meriad of features introduced out of the box.