Does Skaffold risk overloading a registry when used with a remote cluster? - kubernetes

While most layers of a given image would probably be reused during development and only pushed once, it seems that pushing new images/layers to a registry on each code change would fill up a registry rather quickly - especially with a team of developers.
Is this the case with Skaffold or does it have a way to manage that?

Related

Kubernetes "packaging" for environment do update and delete all in one batch for branch based environments?

Using Kubernetes we make use of Helm and Kustomize to bundle our application. This helps consistently updating something like an application, but gets kind of bloated for a hole “environment” or cluster.
ArgoCD seems like a good solution for updating a hole cluster, as you can “mirror” your git state to the cluster. This works even when dropping resources or updating an existing complex deployment.
Now I want to build branch based ephemeral environments and think ArgoCD seems a bit bloated for this feature as for every branch environment I would have to commit to the git repository and add something.
The idea is, every branch based environment lives in its own namespace. I search for a tool managing this namespace and being able to do updates, and drops of the hole thing.
What is a good solution for this problem?

How to secure the environment repo in a GitOps setup?

In a GitOps setting, there are usually two repositories - a code repo and an environment repo. My understanding is that there are some security benefits in separating the repos so developers only need to be given access to the code repo, and environment repo's write access can be limited to only the CI/CD tools. As the environment repo is the source-of-truth in GitOps, this is claimed to be more secure as it minimizes human involvement in the process.
My questions are:
If the assumption above is correct, what CI/CD tools should be given access to the environment repo? Is it just the pipeline tools such as Tekton (CI) and Flux (CD), or can other tools invoked by the pipelines be also included in this "trusted circle"? What are the best practices around securing the environment repo in GitOps?
What is the thought process around sync'ing intermediate / dynamic states of the cluster back to the environment repo, e.g., number of replicas in a deployment controlled by an HPA, network routing controlled by a service mesh provider (e.g., Istio), etc.? From what I have seen, most of the CD pipelines are only doing uni-directional sync from the environment repo to the cluster, and never the other way around. But there could be benefit in keeping some intermediate states, e.g., in case one needs to re-create other clusters from the environment repo.
there are usually two repositories - a code repo and an environment repo. My understanding is that there are some security benefits in separating the repos so developers only need to be given access to the code repo, and environment repo's write access can be limited to only the CI/CD tools.
It is a good practice to have a separate code repo and configuration repo when practicing any form of Continuous Delivery. This is described in the "classical" Continuous Delivery book. The reason is that the two repos change in a different cycle, e.g. first the code is changed and after a pipeline has verified changes, an updated to config repo can be made, with e.g. Image Digest.
The developer team should have access to both repos. They need to be able to change the code, and they need to be able to change the app configuration for different environments. A build tool, e.g. from a Tekton pipeline may only need write access to config repo, but read access to both repos.
What is the thought process around sync'ing intermediate / dynamic states of the cluster back to the environment repo, e.g., number of replicas in a deployment controlled by an HPA, network routing controlled by a service mesh provider (e.g., Istio), etc.? From what I have seen, most of the CD pipelines are only doing uni-directional sync from the environment repo to the cluster, and never the other way around.
Try to avoid sync'ing "current state" back to a Git repo, it will only be complicated. For you, it is only valueable to keep the "desired state" in a repo - it is useful to see e.g. who changes what an when - but also for disaster recovery or to create a new identical cluster.

How do I automate Kubernetes deployment YAML without relying on :latest?

I have a repository with a Kubernetes deployment YAML. Pipelines run on each commit that builds and pushes an image into our repository, versioned with the commit (eg. my_project:bm4a83). Then I'm updating the deployment image
kubectl set image deployment/my_deployment my_deployment=my_project:bm4a83.
This works, but I also want to keep the rest of the deployment YAML specification in version control.
I thought I could just keep it in the same repository, but that means my changes that may only be infrastructure (eg, changing replicas) triggers new builds, without code changes.
What felt like it made the most sense was keeping the deployment YAML in a totally separate repository. I figured I can manage all the values from there, independently from actual code changes. The only problem with that is the image key would need to be kept up to date. The only way around that, is working with some floating :latest-type version, but I don't really think that's ideal.
What's a sensible workflow for managing this? Am I missing something entirely?
What's a sensible workflow for managing this? Am I missing something entirely?
Some of the answer depends on the kind of risk you're trying to drive down with any process you have in place. If it's "the cluster was wiped out by a hurricane and I need to recover my descriptors," then Heptio Ark is a good solution for that. If the risks are more "human-centric," then IMHO you will have to walk a very careful line between locking down all the things and crippling the very agile, empowering, tools that kubernetes provides to a team. A concrete example of that model running up against your model is: what happens when a developer edits a Deployment but does not (remember|know) to update the descriptor in the repo? So do you revoke the edit rights? Use some diff-esque logic to detect a changed in-cluster config?
To speak to something you said specifically: it is a highly suboptimal idea to commit a descriptor change just to resize a (Deployment|ReplicationController|StatefulSet). Separately, a well-built CI pipeline would also understand if no buildable artifact changed and bail out (either early, or not even triggering a build, if the CI tool is that smart).
Finally, if you do want to carry on with the current situation, then the best practice I can think of is textual replacement right before applying a descriptor:
$ grep "image: " the-deployment.yml
image: example.com/something:#CI_PIPELINE_IID#
$ sed -i'' -e "s/#CI_PIPELINE_IID#/${CI_PIPELINE_IID}/" the-deployment.yml
$ kubectl apply -f the-deployment.yml
so that the copy in the repo remains textually pristine, and also isn't inadvertently actually applied since it won't actually result in a runnable Deployment.
but I also want to keep the rest of the deployment YAML specification in version control.
Yes, you want to do that. Putting everything under version control is a good practice to achieve immutable infrastructure.
If you want the deployment to have a separate piece of metadata (for whatever auditing / change tracking reason), why can't you just leverage the Kubernetes metadata block?
metadata:
name: my_deployment
commit: bm4a83
Then you inject such information through Jinja, Ruby ERBs, Go Templates, etc.

Spring Cloud Configuration recommended architecture in data center

I have been playing with Spring Cloud Configuration. I like the simplicity of the solution and the fact that it uses git as it's default configuration store.
There are two aspects I need to figure out before pushing it as a solution for centralized configuration management.
The aspects are:
High availability
How to gradually roll out configuration changes (to support canary releases)
If you already implemented this in your data center or just playing with that please share your ideas!
Also I would like to hear from the creators, how they see the recommended deployment in single/cross data-center environments.
The Config Server itself is stateless, so you can spin up as many as these as you need and find them via eureka. Underneath the server itself, the git implementation you point to needs to be highly available as well. So if you point to github (private or public), then git is as available as github is. If the config server can't reach git it will continue to serve what it has checked out even if it is stale.
As far as gradual config changes, you could use a different branch and configure the canary to use that branch via spring.cloud.config.label and them merge the branch. You could also use profiles (eg application-<profilename>.properties) and configure the canary to use the specified profile.
I think the branch makes a little more sense, because you wouldn't have to reconfigure the non-canary nodes to use the new profile each time, just configure canary to use the branch.
Either way, the only time apps see config chages (when using spring cloud config client) is on startup or when you POST to /refresh on each node. You can also POST to /bus/refresh?destination=<servicename> if you use the Spring Cloud Bus to refresh all instances of a service at once.

Simplest way to use mercurial to manage differences between web development and deployment?

I am using mercurial for website development. I "think" I'm using it correctly.
I develop on my development machine, commit fairly regularly. I will somewhat regularly push my commits to my hosted site-dev repository.
If things are set up how I want them for the live site, I push from my dev machine to the hosted site-live repository. Then I pull down from that repository onto the live server.
However, there are some changes that need to be made (changing directories from localhost to www.example.com, changing the DB connection stuff, etc.).
What I did was made these changes on my live machine, then pushed them back up to the site-live repository. I don't know why I did that, really, but at least there's a changeset sitting there with the necessary config changes.
What I don't know how to do is manage this process. I'm a little lost beyond committing, pushing and pulling with hg. I'm a single developer and haven't even done a merge yet.
Is there some way to keep that particular changeset identified, and just apply it, hopefully even BEFORE I pull from the repo down to the live server?
I think you can tell from my question that I'm in a little over my head with hg and workflow at the moment ;)
This is my understanding:
What essentially you are trying to do is have a development, staging and deployment environment. You do your development using 'development' repository, test it on a staging environment and then once satisfied, pull those changes into deployment repository.
And when you pull from staging to deployment, you need to change your environment / configuration data.
My take is you should not be changing the configuration at all.
You should have configuration files such that you have a
basic configuration file
basic.conf
Environment specific overrides
basic.dev.conf, basic.staging.conf and basic. deployment.conf
Use environment variable:
The overrides to the basic configuration data should be defined via an environment
specific variable : APP_ENV : dev or staging or deploy
This way you should be able to override the configuration based on the environment without changing the configuration information.
It is not a good idea to rely on making changes to config files each time you pull your code from development to staging to deployment.
I would keep the live server outside the version control. Meaning that I would have a small "install" script that pulls updates from the repository, removes any unnecessary development files, and applies the correct configuration files. Both development and production configuration files should be in version control.