What is the purpose of using a secret injector in k8s instead of coding in my software the stuff to handle my secrets in a vault like google SM - kubernetes

Ok.. so, we have Google Secret Manager on GCP, AWS Secret Manager in AWS, Key Vault in Azure... and so on.
Those services give you libs so you can code the way your software will access the secrets there. They all look straightforward and sort of easy to implement. Right?
For instance, using Google SM you can like:
from google.cloud import secretmanager
client = secretmanager.SecretManagerServiceClient()
request = {"name": f"projects/<proj-id>/secrets/mysecret/versions/1"}
response = client.access_secret_version(request)
payload = response.payload.data.decode("UTF-8")
and you're done.
I mean, if we talk about K8S, you can improve the code above by reading the vars from a configmap where you may have all the resources of your secrets, like:
apiVersion: v1
kind: ConfigMap
metadata:
name: myms
namespace: myns
data:
DBPASS: projects/<proj-id>/secrets/mysecretdb/versions/1
APIKEY: projects/<proj-id>/secrets/myapikey/versions/1
DIRTYSECRET: projects/<proj-id>/secrets/mydirtysecret/versions/1
And then use part of the code above to load the vars and get the secrets from the SM.
So, when I was looking the interwebs for best practices and examples, I found projects like the below:
https://github.com/GoogleCloudPlatform/secrets-store-csi-driver-provider-gcp
https://github.com/doitintl/secrets-init
https://github.com/doitintl/kube-secrets-init
https://github.com/aws-samples/aws-secret-sidecar-injector
https://github.com/aws/secrets-store-csi-driver-provider-aws
But those projects don't clearly explain what's the point of mounting my secrets as files or env_vars..
I got really confused, maybe I'm too newbie on the K8S and cloud world... and that's why I'm here asking, maybe, a really really dumb questions. Sorry :/
My questions are:
Are the projects, mentioned above, recommended for old code that I do not want to touch? I mean, let's say that my code already use a env var called DBPASS=mypass and I would like to workaround it so the value from the DBPASS env would be hackinjected by a value from a SM.
The implementation to handle a secret manager is very hard. So it is recommended to use one of the solutions above?
What are the advantages of such injection approach?
Thx a lot!

There are many possible motivations why you may want to use an abstraction (such as the CSI driver or sidecar injector) over a native integration:
Portability - If you're multi-cloud or multi-target, you may have multiple secret management solutions. Or you might have a different secret manager target for local development versus production. Projecting secrets onto a virtual filesystem or into environment variables provides a "least common denominator" approach that decouples the application from its secrets management provider.
Local development - Similar to the previous point on portability, it's common to have "fake" or fakeish data for local development. For local dev, secrets might all be fake and not need to connect to a real secret manager. Moving to an abstraction avoids error-prone spaghetti code like:
let secret;
if (process.env.RAILS_ENV === 'production') {
secret = secretmanager.access('...')
} else {
secret = 'abcd1234'
}
De-risking - By avoiding a tight coupling, you can de-risk upstream API changes in an abstraction layer. This is conceptual similar to the benefits of microservices. As a platform team, you make a guarantee to your developers that their secret will live at process.env.FOO, and it doesn't matter how it gets there, so long as you continue to fulfill that API contract.
Separate of duties - In some organizations, the platform team (k8s team) is separate from the security team, is separate from development teams. It might not be realistic for a developer to ever have direct access to a secret manager.
Preserving identities - Depending on the implementation, it's possible that the actor which accesses the secret varies. Sometimes it's the k8s cluster, sometimes it's the individual pod. They both had trade-offs.
Why might you not want this abstraction? Well, it adds additional security concerns. Exposing secrets via environment variables or via the filesystem makes you subject to a generic series of supply chain attacks. Using a secret manager client library or API directly doesn't entirely prevent this, but it forces a more targeted attack (e.g. core dump) instead of a more generic path traversal or env-dump-to-pastebin attack.

Related

AWS Proton vs CloudFormation

Recently, I went to the AWS Proton service, I also tried to do a hands-on service, unfortunately, I was not able to succeed.
What I am not able to understand is what advantage I am getting with Proton, because the end to end pipeline I can build using CodeCommit, CodeDeploy, CodePipeline, and CloudFormation.
It will be great if someone could jot down the use cases where Proton can be used compared to the components which I suggested above.
From what I understand, AWS Proton is similar to AWS Service Catalog in that it allows
administrators prepare some CloudFormation (CFN) templates which Developers/Users can provision when they need them. The difference is that AWS Service Catalog is geared towards general users, e.g. those who just want to start a per-configured instance by Administrators, or provision entire infrastructures from the set of approve architectures (e.g. instance + rds + lambda functions). In contrast, AWS Proton is geared towards developers, so that they can provision by themselves entire architectures that they need for developments, such as CICD pipelines.
In both cases, CFN is used as a primary way in which these architectures are defined and provisioned. You can think of AWS Service Catalog and AWS Proton as high level services, while CFN as low level service which is used as a building block for the two others.
because the end to end pipeline I can build using CodeCommit, CodeDeploy, CodePipeline, and CloudFormation
Yes, in both cases (AWS Service Catalog and AWS Proton) you can do all of that. But not everyone want's to do it. Many AWS users and developers do not have time and/or interest in defining all the solutions they need in CFN. This is time consuming and requires experience. Also, its not a good security practice to allow everyone in your account provision everything they need without any constrains.
AWS Service Catalog and AWS Proton solve these issues as you can pre-define set of CFN templates and allow your users and developers to easily provision them. It also provide clear role separation in your account, so you have users which manage infrastructure and are administrators, while the other ones users/developers. This way both these groups of users concentrate on what they know best - infrastructure as code and software development.

Secrets management across environments in HashiCorp Vault

Problem - there are Vault instances in dev/stage/production (each independent). Now if I need to create/modify a key/value (kv secrets engine) then I can manually do so in each env. This leads to environments being out of sync and as part of the deployment we need to remember to write values to vault (I'm sure you can guess what happens).
What I would like to be able to do is externalize the paths/keys so they can applied to each environment. This could potentially be a file in Git and we just write the file through script in each env but then how do we get the values to use? Further these values are different in each environment and they need a secure location to live.
I have tried to search for this and while there is plenty about managing Vault itself I have not found any solution to managing the data within Vault. Any guidance would be much appreciated.

Advantages of Templates ( ie infrastructure as code) over API calls

I am trying to setup a module to deploy resources in the cloud (it could be any cloud provider). I don't see the advantages of using templates (ie. the deploy manager) over direct API calls :
Creation of VM using a template :
# deployment.yaml
resources:
- type: compute.v1.instance
name: quickstart-deployment-vm
properties:
zone: us-central1-f
machineType: f1-micro
...
# bash command to deploy yaml file
gcloud deployment-manager deployments create vm-deploy --config deployment.yaml
Creation of VM using a API call :
def addInstance(http, listOfHeaders):
url = "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances"
body = {
"name": "quickstart-deployment-vm",
"zone": " us-central1-f",
"machineType": "f1-micro",
...
}]
bodyContentURLEncoded = urllib.urlencode(bodyContent)
http.request(uri=url, method="POST", body=body)
Can someone explain to me what benefits I get using templates?
readability\easy of use\authentication handled for you\no need to be a coder\etc. There can be many advantages, it really depends on how you look at it. It depends on your background\tools you use.
It might be more beneficial to use python all the way for you specifically.
It's easier to use templates and you get a lot of builtin functionality such as running a validation on your template to scan for possible security vulnerabilities and similar. You can also easily delete your infra using the same template as you create it. FWIW, I've gone all the way with templates and do as much as I can with templates and in smaller units. It makes it easy to move out a part of the infra or duplicate it to another project, using a pipeline in GitLab to deploy it for example.
The reason to use templates over API calls is that templates can be used in use cases where a deterministic outcome is required.
Both Template and API call has its own benefits. There is always a tradeoff between the two options. If you want more flexibility in the deployment, then the API call suits you better. On the other hand, if the security and complete revision is your priority, then Template should be your choice. Details can be found in this online documentation.
When using a template, orchestration of the deployment is handled by the platform. When using API calls (or other imperative approaches) you need to handle orchestration.

How do micro services in Cloud Foundry communicate?

I'm a newbie in Cloud Foundry. In following the reference application provided by Predix (https://www.predix.io/resources/tutorials/tutorial-details.html?tutorial_id=1473&tag=1610&journey=Connect%20devices%20using%20the%20Reference%20App&resources=1592,1473,1600), the application consisted of several modules and each module is implemented as micro service.
My question is, how do these micro services talk to each other? I understand they must be using some sort of REST calls but the problem is:
service registry: Say I have services A, B, C. How do these components 'discover' the REST URLs of other components? As the component URL is only known after the service is pushed to cloud foundry.
How does cloud foundry controls the components dependency during service startup and service shutdown? Say A cannot start until B is started. B needs to be shutdown if A is shutdown.
The ref-app 'application' consists of several 'apps' and Predix 'services'. An app is bound to the service via an entry in the manifest.yml. Thus, it gets the service endpoint and other important configuration information via this binding. When an app is bound to a service, the 'cf env ' command returns the needed info.
There might still be some Service endpoint info in a property file, but that's something that will be refactored out over time.
The individual apps of the ref-app application are put in separate microservices, since they get used as components of other applications. Hence, the microservices approach. If there were startup dependencies across apps, the CI/CD pipeline that pushes the apps to the cloud would need to manage these dependencies. The dependencies in ref-app are simply the obvious ones, read-on.
While it's true that coupling of microservices is not in the design. There are some obvious reasons this might happen. Language and function. If you have a "back-end" microservice written in Java used by a "front-end" UI microservice written in Javascript on NodeJS then these are pushed as two separate apps. Theoretically the UI won't work too well without the back-end, but there is a plan to actually make that happen with some canned JSON. Still there is some logical coupling there.
The nice things you get from microservices is that they might need to scale differently and cloud foundry makes that quite easy with the 'cf scale' command. They might be used by multiple other microservices, hence creating new scale requirements. So, thinking about what needs to scale and also the release cycle of the functionality helps in deciding what comprises a microservice.
As for ordering, for example, the Google Maps api might be required by your application so it could be said that it should be launched first and your application second. But in reality, your application should take in to account that the maps api might be down. Your goal should be that your app behaves well when a dependent microservice is not available.
The 'apps' of the 'application' know about each due to their name and the URL that the cloud gives it. There are actually many copies of the reference app running in various clouds and spaces. They are prefaced with things like Dev or QA or Integration, etc. Could we get the Dev front end talking to the QA back-end microservice, sure, it's just a URL.
In addition to the aforementioned, etcd (which I haven't tried yet), you can also create a CUPS service 'definition'. This is also a set of key/value pairs. Which you can tie to the Space (dev/qa/stage/prod) and bind them via the manifest. This way you get the props from the environment.
If micro-services do need to talk to each other, generally its via REST as you have noticed.However microservice purists may be against such dependencies. That apart, service discovery is enabled by publishing available endpoints on to a service registry - etcd in case of CloudFoundry. Once endpoint is registered, various instances of a given service can register themselves to the registry using a POST operation. Client will need to know only about the published end point and not the individual service instance's end point. This is self-registration. Client will either communicate to a load balancer such as ELB, which looks up service registry or client should be aware of the service registry.
For (2), there should not be such a hard dependency between micro-services as per micro-service definition, if one is designing such a coupled set of services that indicates some imminent issues such as orchestrating and synchronizing. If such dependencies do emerge, you will have rely on service registries, health-checks and circuit-breakers for fall-back.

How to implement the "One Binary" principle with Docker

The One Binary principle explained here:
http://programmer.97things.oreilly.com/wiki/index.php/One_Binary states that one should...
"Build a single binary that you can identify and promote through all the stages in the release pipeline. Hold environment-specific details in the environment. This could mean, for example, keeping them in the component container, in a known file, or in the path."
I see many dev-ops engineers arguably violate this principle by creating one docker image per environment (ie, my-app-qa, my-app-prod and so on). I know that Docker favours immutable infrastructure which implies not changing an image after deployment, therefore not uploading or downloading configuration post deployment. Is there a trade-off between immutable infrastructure and the one binary principle or can they complement each-other? When it comes to separating configuration from code what is the best practice in a Docker world??? Which one of the following approaches should one take...
1) Creating a base binary image and then having a configuration Dockerfile that augments this image by adding environment specific configuration. (i.e my-app -> my-app-prod)
2) Deploying a binary-only docker image to the container and passing in the configuration through environment variables and so on at deploy time.
3) Uploading the configuration after deploying the Docker file to a container
4) Downloading configuration from a configuration management server from the running docker image inside the container.
5) Keeping the configuration in the host environment and making it available to the running Docker instance through a bind mount.
Is there another better approach not mentioned above?
How can one enforce the one binary principle using immutable infrastructure? Can it be done or is there a trade-off? What is the best practice??
I've about 2 years of experience deploying Docker containers now, so I'm going to talk about what I've done and/or know to work.
So, let me first begin by saying that containers should definitely be immutable (I even mark mine as read-only).
Main approaches:
use configuration files by setting a static entrypoint and overriding the configuration file location by overriding the container startup command - that's less flexible, since one would have to commit the change and redeploy in order to enable it; not fit for passwords, secure tokens, etc
use configuration files by overriding their location with an environment variable - again, depends on having the configuration files prepped in advance; ; not fit for passwords, secure tokens, etc
use environment variables - that might need a change in the deployment code, thus lessening the time to get the config change live, since it doesn't need to go through the application build phase (in most cases), deploying such a change might be pretty easy. Here's an example - if deploying a containerised application to Marathon, changing an environment variable could potentially just start a new container from the last used container image (potentially on the same host even), which means that this could be done in mere seconds; not fit for passwords, secure tokens, etc, and especially so in Docker
store the configuration in a k/v store like Consul, make the application aware of that and let it be even dynamically reconfigurable. Great approach for launching features simultaneously - possibly even accross multiple services; if implemented with a solution such as HashiCorp Vault provides secure storage for sensitive information, you could even have ephemeral secrets (an example would be the PostgreSQL secret backend for Vault - https://www.vaultproject.io/docs/secrets/postgresql/index.html)
have an application or script create the configuration files before starting the main application - store the configuration in a k/v store like Consul, use something like consul-template in order to populate the app config; a bit more secure - since you're not carrying everything over through the whole pipeline as code
have an application or script populate the environment variables before starting the main application - an example for that would be envconsul; not fit for sensitive information - someone with access to the Docker API (either through the TCP or UNIX socket) would be able to read those
I've even had a situation in which we were populating variables into AWS' instance user_data and injecting them into container on startup (with a script that modifies containers' json config on startup)
The main things that I'd take into consideration:
what are the variables that I'm exposing and when and where am I getting their values from (could be the CD software, or something else) - for example you could publish the AWS RDS endpoint and credentials to instance's user_data, potentially even EC2 tags with some IAM instance profile magic
how many variables do we have to manage and how often do we change some of them - if we have a handful, we could probably just go with environment variables, or use environment variables for the most commonly changed ones and variables stored in a file for those that we change less often
and how fast do we want to see them changed - if it's a file, it typically takes more time to deploy it to production; if we're using environment variable
s, we can usually deploy those changes much faster
how do we protect some of them - where do we inject them and how - for example Ansible Vault, HashiCorp Vault, keeping them in a separate repo, etc
how do we deploy - that could be a JSON config file sent to an deployment framework endpoint, Ansible, etc
what's the environment that we're having - is it realistic to have something like Consul as a config data store (Consul has 2 different kinds of agents - client and server)
I tend to prefer the most complex case of having them stored in a central place (k/v store, database) and have them changed dynamically, because I've encountered the following cases:
slow deployment pipelines - which makes it really slow to change a config file and have it deployed
having too many environment variables - this could really grow out of hand
having to turn on a feature flag across the whole fleet (consisting of tens of services) at once
an environment in which there is real strive to increase security by better handling sensitive config data
I've probably missed something, but I guess that should be enough of a trigger to think about what would be best for your environment
How I've done it in the past is to incorporate tokenization into the packaging process after a build is executed. These tokens can be managed in an orchestration layer that sits on top to manage your platform tools. So for a given token, there is a matching regex or xpath expression. That token is linked to one or many config files, depending on the relationship that is chosen. Then, when this build is deployed to a container, a platform service (i.e. config mgmt) will poke these tokens with the correct value with respect to its environment. These poke values most likely would be pulled from a vault.