Docker and sensitive information used at run-time - deployment

We are dockerizing an application (written in Node.js) that will need to access some sensitive data at run-time (API tokens for different services) and I can't find any recommended approach to deal with that.
Some information:
The sensitive information is not in our codebase, but it's kept on another repository in encrypted format.
On our current deployment, without Docker, we update the codebase with git, and then we manually copy the sensitive information via SSH.
The docker images will be stored in a private, self-hosted registry
I can think of some different approaches, but all of them have some drawbacks:
Include the sensitive information in the Docker images at build time. This is certainly the easiest one; however, it makes them available to anyone with access to the image (I don't know if we should trust the registry that much).
Like 1, but having the credentials in a data-only image.
Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we're doing right now. This is very convenient too, but then we can't spin up new servers easily (maybe we could use something like etcd to synchronize them?)
Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).
PS: I've done some research but couldn't find anything similar to my problem. Other questions (like this one) were about sensitive information needed at build-time; in our case, we need the information at run-time

I've used your options 3 and 4 to solve this in the past. To rephrase/elaborate:
Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we're doing right now.
I use config management (Chef or Ansible) to set up the credentials on the host. If the app takes a config file needing API tokens or database credentials, I use config management to create that file from a template. Chef can read the credentials from encrypted data bag or attributes, set up the files on the host, then start the container with a volume just like you describe.
Note that in the container you may need a wrapper to run the app. The wrapper copies the config file from whatever the volume is mounted to wherever the application expects it, then starts the app.
Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).
Yes, it's cumbersome to pass a bunch of env variables using -e key=value syntax, but this is how I prefer to do it. Remember the variables are still exposed to anyone with access to the Docker daemon. If your docker run command is composed programmatically it's easier.
If not, use the --env-file flag as discussed here in the Docker docs. You create a file with key=value pairs, then run a container using that file.
$ cat >> myenv << END
FOO=BAR
BAR=BAZ
END
$ docker run --env-file myenv
That myenv file can be created using chef/config management as described above.
If you're hosting on AWS you can leverage KMS here. Keep either the env file or the config file (that is passed to the container in a volume) encrypted via KMS. In the container, use a wrapper script to call out to KMS, decrypt the file, move it in to place and start the app. This way the config data is not exposed on disk.

Related

Is it possible to load a user command from an in-memory namespace?

From what I can tell, user commands can only be loaded from namespace scripts located in the directories specified by the SALT cmddir setting.
But I have an interest in loading a user command directly from an in-memory namespace, without ever having a namespace script reside on a locally accessible disk.
An example use case might be loading a namespace that defines one or more user commands from a remote repository via ]get, and then "installing" the user commands into the workspace directly from memory.
Is this possible?
Bad news: No, you cannot currently do that.
Good news: I'm working on a rewrite of the user command system which makes this trivial to do.
Source: I'm in charge of the user command system at Dyalog.

customize the location of save files (apkovl)

Trying to be concise, I would like alpine to save backups made with lbu ci within a subdir of the bootable disk while the behavior is to put the saves in its root.
Insight
I have searched the internet and tried various things but they all failed.
Here it talks about the boot parameter of syslinux.conf:
A relative path, interpreted relative to the root of the alpine_dev.
This is my append inside syslinux.conf
This boot parameter should be used to specify where the backups are at startup, while where they should be saved with lbu ci should be written in /etc;/lbu/lbu.conf.
however, I don't understand how to use these variables here either,
although it should be clear.

How to use different PostgreSQL instace for testing from production

I need my docker containers to connect to different PostgreSQL server, depending on the environment (test & production). What I desire is testing my application locally with local database instance, and push the fixes after. From what I read, PostgreSQL's default connection parameters can be determined by environment variables, so I think writing two different environment variables files for test/production and pass the desired one in with --env-file option of docker run command would do the trick.
Is this a suitable way to test & deploy an web application? If not, what would be a better solution?
Yes, in general this is the approach you should take when using Docker. Store your DB connection parameters (URL, Username, Password) in environment variables. There is no real need to use an environment file unless you have a ton of environment variables, you could also pass an arbitrary number of "-e" parameters to docker as well. This is closer to how services like amazon's ECS will expect you to pass parameters.
If you're going to write those to a file, make sure that the file is encrypted/encoded somehow - storing database passwords in a file in plaintext is not a great security practice.

Is there a way to upload data from FTP server to Amazon S3? [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
Is there a way to connect to an Amazon S3 bucket with FTP or SFTP rather than the built-in Amazon file transfer interface in the AWS console? Seems odd that this isn't a readily available option.
There are three options.
You can use a native Amazon Managed SFTP service (aka AWS Transfer for SFTP), which is easier to set up.
Or you can mount the bucket to a file system on a Linux server and access the files using the SFTP as any other files on the server (which gives you greater control).
Or you can just use a (GUI) client that natively supports S3 protocol (what is free).
Managed SFTP Service
In your Amazon AWS Console, go to AWS Transfer for SFTP and create a new server.
In SFTP server page, add a new SFTP user (or users).
Permissions of users are governed by an associated AWS role in IAM service (for a quick start, you can use AmazonS3FullAccess policy).
The role must have a trust relationship to transfer.amazonaws.com.
For details, see my guide Setting up an SFTP access to Amazon S3.
Mounting Bucket to Linux Server
Just mount the bucket using s3fs file system (or similar) to a Linux server (e.g. Amazon EC2) and use the server's built-in SFTP server to access the bucket.
Install the s3fs
Add your security credentials in a form access-key-id:secret-access-key to /etc/passwd-s3fs
Add a bucket mounting entry to fstab:
<bucket> /mnt/<bucket> fuse.s3fs rw,nosuid,nodev,allow_other 0 0
For details, see my guide Setting up an SFTP access to Amazon S3.
Use S3 Client
Or use any free "FTP/SFTP client", that's also an "S3 client", and you do not have setup anything on server-side. For example, my WinSCP or Cyberduck.
WinSCP has even scripting and .NET/PowerShell interface, if you need to automate the transfers.
Update
S3 now offers a fully-managed SFTP Gateway Service for S3 that integrates with IAM and can be administered using aws-cli.
There are theoretical and practical reasons why this isn't a perfect solution, but it does work...
You can install an FTP/SFTP service (such as proftpd) on a linux server, either in EC2 or in your own data center... then mount a bucket into the filesystem where the ftp server is configured to chroot, using s3fs.
I have a client that serves content out of S3, and the content is provided to them by a 3rd party who only supports ftp pushes... so, with some hesitation (due to the impedance mismatch between S3 and an actual filesystem) but lacking the time to write a proper FTP/S3 gateway server software package (which I still intend to do one of these days), I proposed and deployed this solution for them several months ago and they have not reported any problems with the system.
As a bonus, since proftpd can chroot each user into their own home directory and "pretend" (as far as the user can tell) that files owned by the proftpd user are actually owned by the logged in user, this segregates each ftp user into a "subdirectory" of the bucket, and makes the other users' files inaccessible.
There is a problem with the default configuration, however.
Once you start to get a few tens or hundreds of files, the problem will manifest itself when you pull a directory listing, because ProFTPd will attempt to read the .ftpaccess files over, and over, and over again, and for each file in the directory, .ftpaccess is checked to see if the user should be allowed to view it.
You can disable this behavior in ProFTPd, but I would suggest that the most correct configuration is to configure additional options -o enable_noobj_cache -o stat_cache_expire=30 in s3fs:
-o stat_cache_expire (default is no expire)
specify expire time(seconds) for entries in the stat cache
Without this option, you'll make fewer requests to S3, but you also will not always reliably discover changes made to objects if external processes or other instances of s3fs are also modifying the objects in the bucket. The value "30" in my system was selected somewhat arbitrarily.
-o enable_noobj_cache (default is disable)
enable cache entries for the object which does not exist. s3fs always has to check whether file(or sub directory) exists under object(path) when s3fs does some command, since s3fs has recognized a directory which does not exist and has files or subdirectories under itself. It increases ListBucket request and makes performance bad. You can specify this option for performance, s3fs memorizes in stat cache that the object (file or directory) does not exist.
This option allows s3fs to remember that .ftpaccess wasn't there.
Unrelated to the performance issues that can arise with ProFTPd, which are resolved by the above changes, you also need to enable -o enable_content_md5 in s3fs.
-o enable_content_md5 (default is disable)
verifying uploaded data without multipart by content-md5 header. Enable to send "Content-MD5" header when uploading a object without multipart posting. If this option is enabled, it has some influences on a performance of s3fs when uploading small object. Because s3fs always checks MD5 when uploading large object, this option does not affect on large object.
This is an option which never should have been an option -- it should always be enabled, because not doing this bypasses a critical integrity check for only a negligible performance benefit. When an object is uploaded to S3 with a Content-MD5: header, S3 will validate the checksum and reject the object if it's corrupted in transit. However unlikely that might be, it seems short-sighted to disable this safety check.
Quotes are from the man page of s3fs. Grammatical errors are in the original text.
Answer from 2014 for the people who are down-voting me:
Well, S3 isn't FTP. There are lots and lots of clients that support S3, however.
Pretty much every notable FTP client on OS X has support, including Transmit and Cyberduck.
If you're on Windows, take a look at Cyberduck or CloudBerry.
Updated answer for 2019:
AWS has recently released the AWS Transfer for SFTP service, which may do what you're looking for.
Or spin Linux instance for SFTP Gateway in your AWS infrastructure that saves uploaded files to your Amazon S3 bucket.
Supported by Thorntech
Amazon has released SFTP services for S3, but they only do SFTP (not FTP or FTPES) and they can be cost prohibitive depending on your circumstances.
I'm the Founder of DocEvent.io, and we provide FTP/S Gateways for your S3 bucket without having to spin up servers or worry about infrastructure.
There are also other companies that provide a standalone FTP server that you pay by the month that can connect to an S3 bucket through the software configuration, for example brickftp.com.
Lastly there are also some AWS Marketplace apps that can help, here is a search link. Many of these spin up instances in your own infrastructure - this means you'll have to manage and upgrade the instances yourself which can be difficult to maintain and configure over time.
WinSCp now supports S3 protocol
First, make sure your AWS user with S3 access permissions has an “Access key ID” created. You also have to know the “Secret access key”. Access keys are created and managed on Users page of IAM Management Console.
Make sure New site node is selected.
On the New site node, select Amazon S3 protocol.
Enter your AWS user Access key ID and Secret access key
Save your site settings using the Save button.
Login using the Login button.
Filezilla just released a Pro version of their FTP client. It connects to S3 buckets in a streamlined FTP like experience. I use it myself (no affiliation whatsoever) and it works great.
As other posters have pointed out, there are some limitations with the AWS Transfer for SFTP service. You need to closely align requirements. For example, there are no quotas, whitelists/blacklists, file type limits, and non key based access requires external services. There is also a certain overhead relating to user management and IAM, which can get to be a pain at scale.
We have been running an SFTP S3 Proxy Gateway for about 5 years now for our customers. The core solution is wrapped in a collection of Docker services and deployed in whatever context is needed, even on-premise or local development servers. The use case for us is a little different as our solution is focused data processing and pipelines vs a file share. In a Salesforce example, a customer will use SFTP as the transport method sending email, purchase...data to an SFTP/S3 enpoint. This is mapped an object key on S3. Upon arrival, the data is picked up, processed, routed and loaded to a warehouse. We also have fairly significant auditing requirements for each transfer, something the Cloudwatch logs for AWS do not directly provide.
As other have mentioned, rolling your own is an option too. Using AWS Lightsail you can setup a cluster, say 4, of $10 2GB instances using either Route 53 or an ELB.
In general, it is great to see AWS offer this service and I expect it to mature over time. However, depending on your use case, alternative solutions may be a better fit.

Sensible deployment using EC2

We're currently using RightScale, and every time we deploy, we execute a script on the server or server array that we want to update. It pulls the code from a GitHub repository, creates a new folder in /var/www/releases/TIMESTAMP, and symlinks the document root, /var/www/current, to that directory.
We're looking to get a better deployment strategy, such as something where we SSH into one of the servers on the private network, and run a command-line script to deploy what we want to deploy.
However, this means that this one server has to have its public key in the authorized_keys of all of the servers we want to deploy to. Is this safe? Wouldn't this be a single server that would allow all the other servers to be accessed?
What's the best way to approach this?
Thanks!
We use a similar strategy to deploy, though we're not with Rightscale anymore.
I think generally that approach is fine and I'd be interested to learn what you think is not serious about it.
If you want to do your ssh thing, then I'd go about it the following:
Lock down ssh using security groups, e.g. open ssh only up to specific IP or servers with a deploy security-group, or similar. The disadvantage here is that you might lock yourself out when the other servers are down, etc..
I'd put public keys on each instance to allow a password-less login. If you're security concious, you rotate those keys on a monthly basis or for example, when employees are leaving, etc..
Use fabric or capistrano to log into your servers (from the deploy master) using ssh and do your deployment.
Again, I think Rightscale's approach is not unique to them. A lot of services do it like that. The reason is that e.g. when you symlink and keep the previous version around, it's easier to rollback and so on.