Downsides of using Makefile rules for deployment? - deployment

I have an open-source project where I'm using Make for things like tests and static checks. The deployment process is approximately this:
make html
rsync static_pages my_domain.com:static_pages
ssh my_domain.com "cp -r static_pages /var/www/webroot"
Is there a reason not to have this deployment process publicly in the Makefile? Does Make have gotchas that make it unsuitable for deployment? Is there a security issue revealing server names or directory structure? Or something else that I haven't though of?

As long as this does the job, it is OK. You worry about privacy of some rules, but why is your make script more vulnerable to alien curiosity than any other process on your build or deployment workstation? If this is because make script is published as open source, the easy workaround could be to keep sensitive names as environment variables, e.g.
html:
rsync static_pages $(DOMAIN):static_pages
ssh $(DOMAIN) "cp -r static_pages $(WEBROOT)"

Related

Docker and sensitive information used at run-time

We are dockerizing an application (written in Node.js) that will need to access some sensitive data at run-time (API tokens for different services) and I can't find any recommended approach to deal with that.
Some information:
The sensitive information is not in our codebase, but it's kept on another repository in encrypted format.
On our current deployment, without Docker, we update the codebase with git, and then we manually copy the sensitive information via SSH.
The docker images will be stored in a private, self-hosted registry
I can think of some different approaches, but all of them have some drawbacks:
Include the sensitive information in the Docker images at build time. This is certainly the easiest one; however, it makes them available to anyone with access to the image (I don't know if we should trust the registry that much).
Like 1, but having the credentials in a data-only image.
Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we're doing right now. This is very convenient too, but then we can't spin up new servers easily (maybe we could use something like etcd to synchronize them?)
Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).
PS: I've done some research but couldn't find anything similar to my problem. Other questions (like this one) were about sensitive information needed at build-time; in our case, we need the information at run-time
I've used your options 3 and 4 to solve this in the past. To rephrase/elaborate:
Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we're doing right now.
I use config management (Chef or Ansible) to set up the credentials on the host. If the app takes a config file needing API tokens or database credentials, I use config management to create that file from a template. Chef can read the credentials from encrypted data bag or attributes, set up the files on the host, then start the container with a volume just like you describe.
Note that in the container you may need a wrapper to run the app. The wrapper copies the config file from whatever the volume is mounted to wherever the application expects it, then starts the app.
Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).
Yes, it's cumbersome to pass a bunch of env variables using -e key=value syntax, but this is how I prefer to do it. Remember the variables are still exposed to anyone with access to the Docker daemon. If your docker run command is composed programmatically it's easier.
If not, use the --env-file flag as discussed here in the Docker docs. You create a file with key=value pairs, then run a container using that file.
$ cat >> myenv << END
FOO=BAR
BAR=BAZ
END
$ docker run --env-file myenv
That myenv file can be created using chef/config management as described above.
If you're hosting on AWS you can leverage KMS here. Keep either the env file or the config file (that is passed to the container in a volume) encrypted via KMS. In the container, use a wrapper script to call out to KMS, decrypt the file, move it in to place and start the app. This way the config data is not exposed on disk.

How to deploy changes to a Cassandra CQL schema

We have an application which is using Cassandra for its database. How should we deploy schema changes in a live production environment.
In development we are just blowing the database away and recreating it with a 'database.cql' script kept in version control. This clearly isn't a solution in production.
In the relational world I would either use a sequence of upgrade scripts and apply them in order, or use a tool to interactively compare the staging and production databases and make the appropriate schema changes.
How do I solve the same problem in the Cassandra?
Here's one I've started and have been using for a while.
https://github.com/heartysoft/aedes
It supports multiple environments, and versioning. Since we're Windows based, it's mainly powershell, but there's no reason a bash script couldn't be written to do the equivalent. The powershell script itself is extremely simple. It requires Powershell v3+. Usage is pretty easy:
aedes.ps1 192.168.40.4 [-u username -p password -env dev]
will look for schema files in the ..\schema folder. Schema files are expected to have a n_ prefix. Environment specific files have a .env.cql postfix. So, if the files are:
1_people.dev.cql
1_people.prod.cql
2_people_some_indexes.cql
3_jobs.dev.cql
3_jobs.prod.cql
4_jobs_something_changed.cql
And run it for prod, then the ones with .prod.cql and no "env" .cql will be applied in order. You can also specify a $start version that can be used to specify where to start applying from (e.g. if start is specified as 3, then anything with 1_ and 2_ will be skipped).
It's pretty basic but seems to work quite well. We just have Cassandra downloaded (not installed) on the "applier machine" (which could be your machine, i.e. not part of a cluster) and have cqlsh on the PATH for easier application. Did (and do) have plans for more features, but working nicely as is for the time being.
Since there wasn't an existing tool, I ended up writing one.
It is called cql-migrate, and provides incremental updates to a deployed Cassandra schema.
[update] Since writing this, I have found a couple more options: one for for rails and another for go

Google Cloud Storage - GSUtil Update fails because file used by another process

We use an ETL process to pull data from Google Cloud Storage, but annoyingly it hangs everytime Google releases udpates to GSUtil, because it sits at a prompt asking if you want to update the library. Fine if you are doing this manually, but not cool when it's being run in an automated SSIS package, as jobs don't finish for days and you keep wasting time with the same stupid cause.
I thought I was going to be cleaver, and add "python gsutil update -n" to the top of the bash script I'm automating the building/execution of in my SSIS Package in the hope to curb this problem, but when I run this command from the prompt in either Windows Server 2008r2 or Windows 7 I get the following:
C:\gsutil>python gsutil update -f -n
Copying gs://pub/gsutil.tar.gz...
OSError: The process cannot access the file because it is being used by another process.
Any help?
P.S. - Also, Google engineers... can you PLEASE remove these prompts? for all of us using these tools in automated processes? I have other things to work on, instead of constantly going back to things like this every few days/weeks.
What version of gsutil are you running?
Also, to be clear: Are you talking about the fact that gsutil checks for available software updates periodically, and if it finds them it then prompts you whether you want to update? Or are you talking about the fact that the gsutil update command asks if you want to perform the update?
If the former, gsutil shouldn't be performing this check/prompting if you are running gsutil from a script not connected to at TTY. If that's not working correctly we'd like to know.
And also, if that's the problem you're having, you can completely disable automated software update checks by setting software_update_check_period=0 in the [GSUtil] section of your .boto config file.

what's the purpose of the '--delete-after' option of wget?

I came across the "--delete-after" option when I was reading the manpage of wget ?
what's the purpose of providing such an option ? Is it just for testing the page is ok for downloading ? Or maybe there are other situations where this option is useful, I hope you guys may give me some hints.
With reference to your comments above. I'm providing some examples of how we use it. We have a few websites running on Rackspace Cloud Sites which is a managed cloud hosting solution. We don't have access to regular cron.
We had an issue with runaway usage on a site using WordPress because WP kept calling wp-cron.php. To give you a sense of runaway usage, it used up in one day the allotted CPU cycles for a month. Anyway what I did was disable wp-cron.php being called within the WordPress system and manually call it through wget. I'm not interested in the output from the process so if I don't use --delete-after with wget (wget ... > /dev/null 2>&1 works well too) the folder where wget runs would get filled with hundreds of useless logs and output of each time the script was called.
We also have SugarCRM installed and that system requires its cron script to be called to handle system maintenance. We use wget silently for that as well. Basically a lot of these kinds of web-based systems have cron scripts. If you can't call your scripts directly say using php on the machine then the other option is calling it silently with wget.
The command to call these cron scripts is quite basic - wget --delete-after http://example.com/cron.php?parameters=if+needed
I'm using wget (with cron) to automate commands to a web application, so I have no interest in the contents of the pages. --delete-after is ideal for this.
You can use it for testing if a page is downloading ok, but usually it's used to force proxy servers to cache their contents.
If your sitting on a connection where there's a network appliance caching content between the site and your endpoint, and you have a site that's popular among users on that network, then what you may want to do as a sysadmin, is to use a down level machine just after the proxy to script a recursive "-r" or mirror "-m" wget operation.
The proxy appliance will see this and pre-cache the site and it's assets, thus making site accesses for uses after said proxy a bit faster.
You'd then want to specify "--delete-after" to free up the disk space used unless your wanting to keep a local copy of all sites you force to cache.
Sometimes you only need to visit a website to set an IP address - say if you are rolling your own dyn dns service.

pylons/paste config files in fastcgi (deployment)

I'm running a pylons app using fastcgi and apache2. There are two versions (different revisions from my svn repo), one for staging and one for production. I'd like them to use different paste config files.
Right now, my dispatch.fcgi inside htdocs in the pylons app just uses one config file (so both stage and live use the same configuration). I'd like to be able to have debugging enabled on the stage server but not on the live server, for example. Any suggestions?
One approach would be to have more than one dispatch.fcgi prepared (referencing different INI files), then run a script on deployment to copy the correct one into the active position.
Another approach would be to have two .fcgi files, then use an IfDefine directive to select the proper rules in your main httpd.conf.
In other words, on the staging server, you start httpd with httpd -D staging, then put the staging config inside <IfDefine staging></IfDefine> and the other config inside <IfDefine !staging></IfDefine>
The limitation of this approach is that since IfDefine is binary, going past two options while still having a "default" option requires a bunch of extra lines. It's not the end of the world, and if you require a parameter to be given on all deployments it stays clean.
Still, I would use option #1.