Google jib - Change owner of all files and folders - jib

All the app files and extraDirectories are owned by root.
/app/libs/
/app/resources/
/app/classes/
/app/logs
I want to run the application as non-root user and i want these files/folders to be owned by that user only and not root.
Is there any way to do this ? I found below mentioned jib maven plugin to alter the owner but it recommends not to do it. Is there any better way ?
https://github.com/GoogleContainerTools/jib-extensions/tree/master/first-party/jib-ownership-extension-maven

The reason you want to change the ownership of some part of the app directory is that your app wants to modify some files or create new ones inside it at runtime. Generally speaking, it is considered a good practice to build an image to be immutable as much as possible.
Since you mentioned /app/logs, I suspect that your app generates log files while it is running. On some modern container orchestration platforms (such as Kubernetes), apps are usually designed to output logs to stdout and stderr.
The best practice is to write your application logs to the standard output (stdout) and standard error (stderr) streams.
Think about it: if your app generates logs files at /app/logs inside a container (there will be multiple containers of the same image running), how would you collect and monitor them in a unified way? What if different apps generate log files at different file system locations? But more importantly, if your container crashes, you'll just lose the log files. By writing logs to stdout and stderr, the platform (say, Kubernetes) will take care of all the complexities of managing and co-relating logs from all pods.
If you cannot change your app about the log files, at least you should mount a volume at /app/logs at runtime. For any container runtime (be it k8s or Docker), this is easily configurable. The mounted directory will be usually world-writable, so you won't need to change the ownership. But you'll still have to think about how to collect and manage the log files.
Likewise, if it is not for log files but that your app needs a file system to create a temporary file inside the app directory and you cannot change the location for some reason, at least you should try to mount an ephemeral volume before falling back to the last-resort of using the Jib Ownership Extension you mentioned.
Conclusively, give a careful assessment of why you have to change the ownership first. If the app wants to mutate itself at runtime, usually it's not a good practice for containerization and there must be some root cause that you may need to resolve in a proper way.

Related

Copying directories into minikube and persisting them

I am trying to copy some directories into the minikube VM to be used by some of the pods that are running. These include API credential files and template files used at run time by the application. I have found you can copy files using scp into the /home/docker/ directory, however these files are not persisted over reboots of the VM. I have read files/directories are persisted if stored in the /data/ directory on the VM (among others) however I get permission denied when trying to copy files to these directories.
Are there:
A: Any directories in minikube that will persist data that aren't protected in this way
B: Any other ways of doing the above without running into this issue (could well be going about this the wrong way)
To clarify, I have already been able to mount the files from /home/docker/ into the pods using volumes, so it's just the persisting data I'm unclear about.
Kubernetes has dedicated object types for these sorts of things. API credential files you might store in a Secret, and template files (if they aren't already built into your Docker image) could go into a ConfigMap. Both of them can either get translated to environment variables or mounted as artificial volumes in running containers.
In my experience, trying to store data directly on a node isn't a good practice. It's common enough to have multiple nodes, to not directly have login access to those nodes, and for them to be created and destroyed outside of your direct control (imagine an autoscaler running on a cloud provider that creates a new node when all of the existing nodes are 90% scheduled). There's a good chance your data won't (or can't) be on the host where you expect it.
This does lead to a proliferation of Kubernetes objects and associated resources, and you might find a Helm chart to be a good resource to tie them together. You can check the chart into source control along with your application, and deploy the whole thing in one shot. While it has a couple of useful features beyond just packaging resources together (a deploy-time configuration system, a templating language for the Kubernetes YAML itself) you can ignore these if you don't need them and just write a bunch of YAML files and a small control file.
For minikube, data kept in $HOME/.minikube/files directory is copied to / directory in VM host by minikube.

Can we consider using containers (and kubernetes) for monolith, stateful web applications?

I'm learning about Containers and Kubernetes and was evaluating if we can move our monolith, stateful appplication to kubernetes?
I was also looking at https://kubernetes.io/blog/2018/03/principles-of-container-app-design/ and "Self-Containment" looks close. We can consider using "storage".
Properties of my application:
1. Runs on a JVM
2. Does not have a database. Saves all its data/content to TAR files on the file-system
3. Should be able to backup and retain state if the container goes down.
In our current scenarios, we deploy the app to a VM and our IT teams generally take snapshots of these VM's as backups and restore them if the app fails or they have to restore to a point where the app was working good. I wanted to avoid this.
Please advice.
You call it as web application, but based on what it does it just a process which writes to file system.
If you move to k8s, write to NFS or persistent storage from pod. If you can only run one instance, then you can't use k8s horizontal scaling.

How to implement the "One Binary" principle with Docker

The One Binary principle explained here:
http://programmer.97things.oreilly.com/wiki/index.php/One_Binary states that one should...
"Build a single binary that you can identify and promote through all the stages in the release pipeline. Hold environment-specific details in the environment. This could mean, for example, keeping them in the component container, in a known file, or in the path."
I see many dev-ops engineers arguably violate this principle by creating one docker image per environment (ie, my-app-qa, my-app-prod and so on). I know that Docker favours immutable infrastructure which implies not changing an image after deployment, therefore not uploading or downloading configuration post deployment. Is there a trade-off between immutable infrastructure and the one binary principle or can they complement each-other? When it comes to separating configuration from code what is the best practice in a Docker world??? Which one of the following approaches should one take...
1) Creating a base binary image and then having a configuration Dockerfile that augments this image by adding environment specific configuration. (i.e my-app -> my-app-prod)
2) Deploying a binary-only docker image to the container and passing in the configuration through environment variables and so on at deploy time.
3) Uploading the configuration after deploying the Docker file to a container
4) Downloading configuration from a configuration management server from the running docker image inside the container.
5) Keeping the configuration in the host environment and making it available to the running Docker instance through a bind mount.
Is there another better approach not mentioned above?
How can one enforce the one binary principle using immutable infrastructure? Can it be done or is there a trade-off? What is the best practice??
I've about 2 years of experience deploying Docker containers now, so I'm going to talk about what I've done and/or know to work.
So, let me first begin by saying that containers should definitely be immutable (I even mark mine as read-only).
Main approaches:
use configuration files by setting a static entrypoint and overriding the configuration file location by overriding the container startup command - that's less flexible, since one would have to commit the change and redeploy in order to enable it; not fit for passwords, secure tokens, etc
use configuration files by overriding their location with an environment variable - again, depends on having the configuration files prepped in advance; ; not fit for passwords, secure tokens, etc
use environment variables - that might need a change in the deployment code, thus lessening the time to get the config change live, since it doesn't need to go through the application build phase (in most cases), deploying such a change might be pretty easy. Here's an example - if deploying a containerised application to Marathon, changing an environment variable could potentially just start a new container from the last used container image (potentially on the same host even), which means that this could be done in mere seconds; not fit for passwords, secure tokens, etc, and especially so in Docker
store the configuration in a k/v store like Consul, make the application aware of that and let it be even dynamically reconfigurable. Great approach for launching features simultaneously - possibly even accross multiple services; if implemented with a solution such as HashiCorp Vault provides secure storage for sensitive information, you could even have ephemeral secrets (an example would be the PostgreSQL secret backend for Vault - https://www.vaultproject.io/docs/secrets/postgresql/index.html)
have an application or script create the configuration files before starting the main application - store the configuration in a k/v store like Consul, use something like consul-template in order to populate the app config; a bit more secure - since you're not carrying everything over through the whole pipeline as code
have an application or script populate the environment variables before starting the main application - an example for that would be envconsul; not fit for sensitive information - someone with access to the Docker API (either through the TCP or UNIX socket) would be able to read those
I've even had a situation in which we were populating variables into AWS' instance user_data and injecting them into container on startup (with a script that modifies containers' json config on startup)
The main things that I'd take into consideration:
what are the variables that I'm exposing and when and where am I getting their values from (could be the CD software, or something else) - for example you could publish the AWS RDS endpoint and credentials to instance's user_data, potentially even EC2 tags with some IAM instance profile magic
how many variables do we have to manage and how often do we change some of them - if we have a handful, we could probably just go with environment variables, or use environment variables for the most commonly changed ones and variables stored in a file for those that we change less often
and how fast do we want to see them changed - if it's a file, it typically takes more time to deploy it to production; if we're using environment variable
s, we can usually deploy those changes much faster
how do we protect some of them - where do we inject them and how - for example Ansible Vault, HashiCorp Vault, keeping them in a separate repo, etc
how do we deploy - that could be a JSON config file sent to an deployment framework endpoint, Ansible, etc
what's the environment that we're having - is it realistic to have something like Consul as a config data store (Consul has 2 different kinds of agents - client and server)
I tend to prefer the most complex case of having them stored in a central place (k/v store, database) and have them changed dynamically, because I've encountered the following cases:
slow deployment pipelines - which makes it really slow to change a config file and have it deployed
having too many environment variables - this could really grow out of hand
having to turn on a feature flag across the whole fleet (consisting of tens of services) at once
an environment in which there is real strive to increase security by better handling sensitive config data
I've probably missed something, but I guess that should be enough of a trigger to think about what would be best for your environment
How I've done it in the past is to incorporate tokenization into the packaging process after a build is executed. These tokens can be managed in an orchestration layer that sits on top to manage your platform tools. So for a given token, there is a matching regex or xpath expression. That token is linked to one or many config files, depending on the relationship that is chosen. Then, when this build is deployed to a container, a platform service (i.e. config mgmt) will poke these tokens with the correct value with respect to its environment. These poke values most likely would be pulled from a vault.

"Injecting" configuration files at startup

I have a number of legacy services running which read their configuration files from disk and a separate daemon which updates these files as they change in zookeeper (somewhat similar to confd).
For most of these types of configuration we would love to move to a more environment variable like model, where the config is fixed for the lifetime of the pod. We need to keep the outside config files as the source of truth as services are transitioning from the legacy model to kubernetes, however. I'm curious if there is a clean way to do this in kubernetes.
A simplified version of the current model that we are pursuing is:
Create a docker image which has a utility for fetching config files and writing them to disk ones. Then writes a /donepath/done file.
The main image waits until the done file exists. Then allows the normal service startup to progress.
Use an empty dir volume and volume mounts to get the conf from the helper image into the main image.
I keep seeing instances of this problem where I "just" need to get a couple of files into the docker image at startup (to allow per-env/canary/etc variance), and running all of this machinery each time seems like a burden throw on devs. I'm curious if there is a more simplistic way to do this already in kubernetes or on the horizon.
You can use the ADD command in your Dockerfile. It is used as ADD File /path/in/docker. This will allow you to add a complete file quickly to your container. You need to have the file you want to add to the image in the same directory as the Dockerfile when you build the container. You can also add a tar file this way which will be expanded during the build.
Another option is the ENV command in a your Dockerfile. This adds the data as an environment variable.

Using Zookeeper or equivalent for .NET configuration management?

I have a proprietary CMS that keeps a lot (20k lines) of configuration files on disk. I have quite a few nodes, all with the same configurations except for one or two elements that designate the node name and the IP.
Since this is proprietary I do not have a lot of leverage for going in and completely overhauling the configuration loading to look at an endpoint, though I might be able to be creative.
My questions are simple, but I do not know a better place to answer them:
Is this a use case for distributed configuration management like Zookeeper? Ideally I'd like to spin up a box and have it look for a service endpoint to load config files rather than have the config files deployed through source. This way I can update the configuration in one place, and have it replicate to all nodes without doing a full deployment.
Can Zookeeper (or equivalent) mimic a file system? Could I mount an NFS point and have it expose configuration as if they were files on the filesystem, even if these are symbolic constructs? Does this make sense?
Your configuration use case seems more like a a job for chef, puppet or similar system. They will allow you to update the configuration in one place, keep them version controlled, and distribute them properly to all target nodes.
Zookeeper makes sense when your application/service needs to dynamically get fresh configuration data during live operation, and when multiple nodes in your system need the same consistent view of that data. If you don't have this requirements, Zookeeper might be too much of an overhead for just laying down mostly static config files on disk.
As for mimicking a filesystem, there is zkfuse which you could use to mount it. But again, it doesn't look like this is what you want. Zookeeper should not be used as an actual file system replacement or file distribution system. It is best for storing small bits of metadata that needs to be consistent across your distributed system.