management of kubernetes secrets - kubernetes

we are starting with Kubernetes and wondering how other projects manage Kubernetes secrets:
Since Kubernetes secrets values are just base64 encoded, it's not recommended to commit the secrets into source control
If not committing to source control, it should be kept in some central place somewhere else, otherwise there's no single source of truth. If it's stored in some other places (.e.g. Hashicorp Vault), how's the integration with CI? Does the CI get values from Vault, create secrets resource on demand in Kubernetes?
Another approach is probably to have a dedicated team to handle infrastructure and only that team knows and manages secrets. But if this team can potentially become a bottleneck if number of projects are large

how other projects manage Kubernetes secrets
Since they are not (at least not yet) proper secrets (base64 encoded), we treat them to separate restricted access git repository.
Most of our projects have code repository (with non-secret related manifests such as deployments and services as part of CI/CD pipeline) and separate manifest repository (holding namespaces, shared database inits, secrets and more or less anything that is either one-time init separate from CI/CD, requires additional permission to implement or that should be restricted in any other way such as secrets).
With that being said, although regular developer doesn't have access to restricted repository, special care must be given to CI/CD pipelines since even if you secure secrets, they are known (and can be displayed/misused) during CI/CD stage, so that can be weak security spot there. We mitigate that by having one of our DevOps supervise and approve (protected branches) any change to CI/CD pipeline in much the same manner that senior lead is supervising code changes to be deployed to production environment.
Note that this is highly dependent on project volume and staffing, as well as actual project needs in term of security/development pressure/infrastructure integration.

I came across this in github called SealedSecrets. https://github.com/bitnami-labs/sealed-secrets. I haven't used it myself. Though it seems to be a good alternative.
But take note of this github issue (https://github.com/bitnami-labs/sealed-secrets/issues/92). It may cause you to lose labels and annotations.
In a nut shell SealedSecrets allows you to create a custom resource definition which encrypts your secret. In turn when you deploy the resource it will decrypt the CRD and turn it to a kubernetes Secret. This way you can commit your SealedSecret resource in your source code repo.

I use k8 secrets as the store where secrets are kept. As in when I define a secret I define it in k8 not somewhere else to then figure out how to inject it into k8. I have a handy client to create lookup and modify my secrets. I don't need to worry about my secrets leaving the firewall. They are easily injected into my services
If you want an extra layer of protection you can encrypt the secrets in k8 yourself with a KMS or something like that

We recently released a project called Kamus. The idea is to allow developers to encrypt secrets for a specific application (identified with a Kubernetes service account), while only this application can decrypt it. Kamus was designed to support GitOps flow, as the encrypted secret can be committed to source control. Take a look at this blog post for more details.

Related

Vault direct integration or via a microservice

I am using Hashicorp Vault to store multiple secrets in the KV Secrets engine, one of which is the database connection string - username, password, host ip and port. I have multiple microservices, which need to use this db secret to connect with db.
Please clarify which of these integration pattern is valid:
Direct Integration with Vault: Each of the microservices will have direct connection with Vault to get the secrets needed for the operation. All the microservices will have the vault token configured (in K8s secrets) for accessing the vault.
Retrieving secrets via another microservice: Should there be an abstract layer i.e. a separate microservice for Vault interaction and all the other microservices will call the APIs of this vault-microservice to get the secrets they want. The vault token (in K8s Secrets) will be accessed by only one microservice.
The other microservice is an abstraction layer. It is extra work that might allow you to change secrets provider in the future.
Unless you can justify writing and maintaining that abstraction layer (because you want to use Vault in some deployments and AWS Secrets Manager in others), then don't bother.
The other issue is that although Vault's KV store is quite common and there are several other implementation, what if you want to use Transit, PKI or SSH CA? These services exist elsewhere (in AWS for example), but they don't have feature parity. You probably don't want to be on the hook to support those differences in your abstraction layer.
A low(er) cost alternative that allows you decouple the implementation from your code would be to wrap the Vault API class using a simple KVSecrets class in your code, a software design pattern know as the facade. But remember that unless you test your class with two services, you can't garantee it will be possible to migrate to another service one day.
So considering all this, just call the API directly or use the Vault library for your programming language.

Is it possible to prevent k8s secret of being empty (zero bytes)?

Is it possible to configure k8s in a way that empty secrets are not possible?
I had a problem in a service that somewhat the secret got overwritten with an empty one (zero bytes) and thereby my service malfunctioned. I see no advantage of having an secret empty at any time and would like to prevent empty secrets all together.
Thans for your help!
While it's not a simple answer to implement, as best I can tell what you are looking for is an Admission Controller, with a very popular one being OPA Gatekeeper
The theory is that kubernetes, as a platform, does not understand your business requirement to keep mistakes from overwriting Secrets. But OPA as a policy rules engine allows you to specify those things without requiring the upstream kubernetes to adopt those policies for everyone
An alternative is to turn on audit logging and track down the responsible party for re-education
A further alternative is to correctly scope RBAC Roles to actually deny writes to Secrets except for those credentials that are known to be trusted

GKE with Hashicorp Vault - Possible to use Google Cloud Run?

I'm looking into deploying a cluster on Google Kubernetes Engine in the near future. I've also been looking into using Vault by Hashicorp in order to manage the secrets that my cluster has access to. Specifically, I'd like to make use of dynamic secrets for greater security.
However, all of the documentation and Youtube videos that cover this type of setup always mention that a set of nodes strictly dedicated to Vault should operate as their own separate cluster - thus requiring more VMs.
I am curious if a serverless approach is possible here. Namely, using Google Cloud Run to create Vault containers on the fly.
This video (should start at the right time) mentions that Vault can be run as a Deployment so I don't see there being an issue with state. And since Google mention that each Cloud Run service gets its own stable HTTPS endpoint, I believe that I can simply pass this endpoint to my configuration and all of the pods will be able to find the service, even if new instances are created. However, I'm new to using Kubernetes so I'm not sure if I'm entirely correct here.
Can anyone with more experience using Kubernetes and/or Vault point out any potential drawbacks with this approach? Thank you.
In beta since 3 weeks, and not officially announced (It should be in a couple of days) you can have a look to secret-manager. It's a serverless secret manager with, I think, all the basic requirements that you need.
The main reason that it has not yet announced, it's because the client library in several languages aren't yet released/finished
The awesome guy on your video link, Seth Vargo, has been involved in this project.
He has also released Berglas. It's write in Python, use KMS for ciphering the secret and Google Cloud Storage for storing them. I also recommend it.
I built a python library to easily use Berglas secret in Python.
Hope that this secret management tool will meet your expectation. In any case, it's serverless and quite cheap!

Kubernetes secrets and service accounts

I've been working with kubernetes for the past 6 months and we've deployed a few services.
We're just about to deploy another which stores encrypted data and puts the keys in KMS. This requires two service accounts, one for the data and one for the keys.
Data access to this must be audited. Since access to this data is very sensitive we are reluctant to put both service accounts in the name namespace as if compromised in any way the attacker could gain access to the data and the keys without it being audited.
For now we have one key in a secret and the other we're going to manually post to the single pod.
This is horrible as it requires that a single person be trusted with this key, and limits scalability. Luckily this service will be very low volume.
Has anyone else came up against the same problem?
How have you gotten around it?
cheers
Requirements
No single person ever has access to both keys (datastore and KMS)
Data access to this must be audited
If you enable audit logging, every API call done via this service account will be logged. This may not help you if your service isn't ever called via the API, but considering you have a service account being used, it sounds like it would be.
For now we have one key in a secret and the other we're going to manually post to the single pod.
You might consider using Vault for this. If you store the secret in vault, you can use something like this to have the environment variable pushed down into the pod as an environment variable automatically. This is a little more involved than your process, but is considerably more secure.
You can also use Vault alongside Google Cloud KMS which is detailed in this article
What you're describing is pretty common - using a key/ service account/ identity in Kubernetes secrets to access an external secret store.
I'm a bit confused by the double key concept - what are you gaining by having a key in both secrets and in the pod? If secrets are compromised, then etcd is compromised and you have bigger problems. I would suggest you focus instead on locking down secrets, using audit logs, and making the key is easy to rotate in case of compromise.
A few items to consider:
If you're mostly using Kubernetes, consider storing (encrypted) secrets in Kubernetes secrets.
If you're storing secrets centrally outside of Kubernetes, like you're describing, consider just using a single Kubernetes secret - you will get Kubernetes audit logs for access to the secret (see the recommended audit-policy), and Cloud KMS audit logs for use of the key.

Multiple environments (Staging, QA, production, etc) with Kubernetes

What is considered a good practice with K8S for managing multiple environments (QA, Staging, Production, Dev, etc)?
As an example, say that a team is working on a product which requires deploying a few APIs, along with a front-end application. Usually, this will require at least 2 environments:
Staging: For iterations/testing and validation before releasing to the client
Production: This the environment the client has access to. Should contain stable and well-tested features.
So, assuming the team is using Kubernetes, what would be a good practice to host these environments? This far we've considered two options:
Use a K8s cluster for each environment
Use only one K8s cluster and keep them in different namespaces.
(1) Seems the safest options since it minimizes the risks of potential human mistake and machine failures, that could put the production environment in danger. However, this comes with the cost of more master machines and also the cost of more infrastructure management.
(2) Looks like it simplifies infrastructure and deployment management because there is one single cluster but it raises a few questions like:
How does one make sure that a human mistake might impact the production environment?
How does one make sure that a high load in the staging environment won't cause a loss of performance in the production environment?
There might be some other concerns, so I'm reaching out to the K8s community on StackOverflow to have a better understanding of how people are dealing with these sort of challenges.
Multiple Clusters Considerations
Take a look at this blog post from Vadim Eisenberg (IBM / Istio): Checklist: pros and cons of using multiple Kubernetes clusters, and how to distribute workloads between them.
I'd like to highlight some of the pros/cons:
Reasons to have multiple clusters
Separation of production/development/test: especially for testing a new version of Kubernetes, of a service mesh, of other cluster software
Compliance: according to some regulations some applications must run in separate clusters/separate VPNs
Better isolation for security
Cloud/on-prem: to split the load between on-premise services
Reasons to have a single cluster
Reduce setup, maintenance and administration overhead
Improve utilization
Cost reduction
Considering a not too expensive environment, with average maintenance, and yet still ensuring security isolation for production applications, I would recommend:
1 cluster for DEV and STAGING (separated by namespaces, maybe even isolated, using Network Policies, like in Calico)
1 cluster for PROD
Environment Parity
It's a good practice to keep development, staging, and production as similar as possible:
Differences between backing services mean that tiny incompatibilities
crop up, causing code that worked and passed tests in development or
staging to fail in production. These types of errors create friction
that disincentivizes continuous deployment.
Combine a powerful CI/CD tool with helm. You can use the flexibility of helm values to set default configurations, just overriding the configs that differ from an environment to another.
GitLab CI/CD with AutoDevops has a powerful integration with Kubernetes, which allows you to manage multiple Kubernetes clusters already with helm support.
Managing multiple clusters (kubectl interactions)
When you are working with multiple Kubernetes clusters, it’s easy to
mess up with contexts and run kubectl in the wrong cluster. Beyond
that, Kubernetes has restrictions for versioning mismatch between the
client (kubectl) and server (kubernetes master), so running commands
in the right context does not mean running the right client version.
To overcome this:
Use asdf to manage multiple kubectl versions
Set the KUBECONFIG env var to change between multiple kubeconfig files
Use kube-ps1 to keep track of your current context/namespace
Use kubectx and kubens to change fast between clusters/namespaces
Use aliases to combine them all together
I have an article that exemplifies how to accomplish this: Using different kubectl versions with multiple Kubernetes clusters
I also recommend the following reads:
Mastering the KUBECONFIG file by Ahmet Alp Balkan (Google Engineer)
How Zalando Manages 140+ Kubernetes Clusters by Henning Jacobs (Zalando Tech)
Definitely use a separate cluster for development and creating docker images so that your staging/production clusters can be locked down security wise. Whether you use separate clusters for staging + production is up to you to decide based on risk/cost - certainly keeping them separate will help avoid staging affecting production.
I'd also highly recommend using GitOps to promote versions of your apps between your environments.
To minimise human error I also recommend you look into automating as much as you can for your CI/CD and promotion.
Here's a demo of how to automate CI/CD with multiple environments on Kubernetes using GitOps for promotion between environments and Preview Environments on Pull Requests which was done live on GKE though Jenkins X supports most kubernetes clusters
It depends on what you want to test in each of the scenarios. In general I would try to avoid running test scenarios on the production cluster to avoid unnecessary side effects (performance impact, etc.).
If your intention is testing with a staging system that exactly mimics the production system I would recommend firing up an exact replica of the complete cluster and shut it down after you're done testing and move the deployments to production.
If your purpose is testing a staging system that allows testing the application to deploy I would run a smaller staging cluster permanently and update the deployments (with also a scaled down version of the deployments) as required for continuous testing.
To control the different clusters I prefer having a separate ci/cd machine that is not part of the cluster but used for firing up and shutting down clusters as well as performing deployment work, initiating tests, etc. This allows to set up and shut down clusters as part of automated testing scenarios.
It's clear that by keeping the production cluster appart from the staging one, the risk of potential errors impacting the production services is reduced. However this comes at a cost of more infrastructure/configuration management, since it requires at least:
at least 3 masters for the production cluster and at least one master for the staging one
2 Kubectl config files to be added to the CI/CD system
Let’s also not forget that there could be more than one environment. For example I've worked at companies where there are at least 3 environments:
QA: This where we did daily deploys and where we did our internal QA before releasing to the client)
Client QA: This where we deployed before deploying to production so that the client could validate the environment before releasing to production)
Production: This where production services are deployed.
I think ephemeral/on-demand clusters makes sense but only for certain use cases (load/performance testing or very « big » integration/end-to-end testing) but for more persistent/sticky environments I see an overhead that might be reduced by running them within a single cluster.
I guess I wanted to reach out to the k8s community to see what patterns are used for such scenarios like the ones I've described.
Unless compliance or other requirements dictate otherwise, I favor a single cluster for all environments. With this approach, attention points are:
Make sure you also group nodes per environment using labels. You can then use the nodeSelector on resources to ensure that they are running on specific nodes. This will reduce the chances of (excess) resource consumption spilling over between environments.
Treat your namespaces as subnets and forbid all egress/ingress traffic by default. See https://kubernetes.io/docs/concepts/services-networking/network-policies/.
Have a strategy for managing service accounts. ClusterRoleBindings imply something different if a clusters hosts more than one environment.
Use scrutiny when using tools like Helm. Some charts blatantly install service accounts with cluster-wide permissions, but permissions to service accounts should be limited to the environment they are in.
I think there is a middle point. I am working with eks and node groups. The master is managed, scaled and maintained by aws. You could then create 3 kinds of node groups (just an example):
1 - General Purpose -> labels: environment=general-purpose
2 - Staging -> labels: environment=staging (taints if necessary)
3 - Prod -> labels: environment=production (taints if necessary)
You can use tolerations and node selectors on the pods so they are placed where they are supposed to be.
This allows you to use more robust or powerful nodes for production's nodegroups, and, for example, SPOT instances for staging, uat, qa, etc... and has a couple of big upsides:
Environments are physically separated (and virtually too, in namespaces)
You can reduce costs by sharing not only the masters, but also some nodes with pods shared by the two environments and by using spot or cheaper instances in staging/uat/...
No cluster-management overheads
You have to pay attention to roles and policies to keep it secure. You can implement network policies using, for example eks+calico.
Update:
I found a doc that may be useful when using EKS. It has some details on how to safely run multi-tenant cluster, and some of this details may be useful to isolate production pods and namespaces from the ones in staging.
https://aws.github.io/aws-eks-best-practices/security/docs/multitenancy/
Using multiple clusters is the norm, at the very least to enforce a strong separation between production and "non-production".
In that spirit, do note that GitLab 13.2 (July 2020) now includes:
Multiple Kubernetes cluster deployment in Core
Using GitLab to deploy multiple Kubernetes clusters with GitLab previously required a Premium license.
Our community spoke, and we listened: deploying to multiple clusters is useful even for individual contributors.
Based on your feedback, starting in GitLab 13.2, you can deploy to multiple group and project clusters in Core.
See documentation and issue.
A few thoughts here:
Do not trust namespaces to protect the cluster from catastrophe. Having separate production and non-prod (dev,stage,test,etc) clusters is the minimum necessary requirement. Noisy neighbors have been known to bring down entire clusters.
Separate repositories for code and k8s deployments (Helm, Kustomize, etc.) will make best practices like trunk-based development and feature-flagging easier as the teams scale.
Using Environments as a Service (EaaS) will allow each PR to be tested in its own short-lived (ephemeral) environment. Each environment is a high-fidelity copy of production (including custom infrasture like database, buckets, dns, etc.), so devs can remotely code against a trustworthy environment (NOT minikube). This can help reduce configuration drift, improve release cycles, and improve the overall dev experience. (disclaimer: I work for an EaaS company).
I think running a single cluster make sense because it reduces overhead, monitoring. But, you have to make sure to place network policies, access control in place.
Network policy - to prohibit dev/qa environment workload to interact with prod/staging stores.
Access control - who have access on different environment resources using ClusterRoles, Roles etc..