I am using Hashicorp Vault to store multiple secrets in the KV Secrets engine, one of which is the database connection string - username, password, host ip and port. I have multiple microservices, which need to use this db secret to connect with db.
Please clarify which of these integration pattern is valid:
Direct Integration with Vault: Each of the microservices will have direct connection with Vault to get the secrets needed for the operation. All the microservices will have the vault token configured (in K8s secrets) for accessing the vault.
Retrieving secrets via another microservice: Should there be an abstract layer i.e. a separate microservice for Vault interaction and all the other microservices will call the APIs of this vault-microservice to get the secrets they want. The vault token (in K8s Secrets) will be accessed by only one microservice.
The other microservice is an abstraction layer. It is extra work that might allow you to change secrets provider in the future.
Unless you can justify writing and maintaining that abstraction layer (because you want to use Vault in some deployments and AWS Secrets Manager in others), then don't bother.
The other issue is that although Vault's KV store is quite common and there are several other implementation, what if you want to use Transit, PKI or SSH CA? These services exist elsewhere (in AWS for example), but they don't have feature parity. You probably don't want to be on the hook to support those differences in your abstraction layer.
A low(er) cost alternative that allows you decouple the implementation from your code would be to wrap the Vault API class using a simple KVSecrets class in your code, a software design pattern know as the facade. But remember that unless you test your class with two services, you can't garantee it will be possible to migrate to another service one day.
So considering all this, just call the API directly or use the Vault library for your programming language.
Related
I am fairly new to kubernetes and learning kubernetes deployments from scratch. For a microservice based projecct that I am working on, each microservice has to authenticate with their own client-id and client-secret to the auth server, before requesting any information (JWT). These ids and secrets are required for each services and needs to be in their environment variables. Initially the auth service will generate those ids and secrets via database seeds. What is the best way in the world of kubernetes to automatically set this values in the environments of a pod deployment before pod creation?
Depends on how automatic you want it to be. A simple approach would be an initContainer to provision a new token, put that in a shared volume file, and then an entrypoint script in the main container which reads the file and sets the env var.
The problem with that is authenticating the initContainer is hard. The big hammer solution would be to write a custom operator to manage this but if you're new to Kubernetes that's going to be super hard and probably overkill anyway.
I'm looking into deploying a cluster on Google Kubernetes Engine in the near future. I've also been looking into using Vault by Hashicorp in order to manage the secrets that my cluster has access to. Specifically, I'd like to make use of dynamic secrets for greater security.
However, all of the documentation and Youtube videos that cover this type of setup always mention that a set of nodes strictly dedicated to Vault should operate as their own separate cluster - thus requiring more VMs.
I am curious if a serverless approach is possible here. Namely, using Google Cloud Run to create Vault containers on the fly.
This video (should start at the right time) mentions that Vault can be run as a Deployment so I don't see there being an issue with state. And since Google mention that each Cloud Run service gets its own stable HTTPS endpoint, I believe that I can simply pass this endpoint to my configuration and all of the pods will be able to find the service, even if new instances are created. However, I'm new to using Kubernetes so I'm not sure if I'm entirely correct here.
Can anyone with more experience using Kubernetes and/or Vault point out any potential drawbacks with this approach? Thank you.
In beta since 3 weeks, and not officially announced (It should be in a couple of days) you can have a look to secret-manager. It's a serverless secret manager with, I think, all the basic requirements that you need.
The main reason that it has not yet announced, it's because the client library in several languages aren't yet released/finished
The awesome guy on your video link, Seth Vargo, has been involved in this project.
He has also released Berglas. It's write in Python, use KMS for ciphering the secret and Google Cloud Storage for storing them. I also recommend it.
I built a python library to easily use Berglas secret in Python.
Hope that this secret management tool will meet your expectation. In any case, it's serverless and quite cheap!
I am planning to use Hashicorp Vault for secrets management. Secrets/tokens stored in my Java application can be read by multiple services. Services may store the tokens in memory.
One service is designed to update the secrets in Vault.
Once a secret is updated in Vault, I want the Java application to get notified about the change.
Does vault provide any inbuilt solution for this?
For example: Servlet Filters in Java. All the request can be intercepted using filters.
I've been working with kubernetes for the past 6 months and we've deployed a few services.
We're just about to deploy another which stores encrypted data and puts the keys in KMS. This requires two service accounts, one for the data and one for the keys.
Data access to this must be audited. Since access to this data is very sensitive we are reluctant to put both service accounts in the name namespace as if compromised in any way the attacker could gain access to the data and the keys without it being audited.
For now we have one key in a secret and the other we're going to manually post to the single pod.
This is horrible as it requires that a single person be trusted with this key, and limits scalability. Luckily this service will be very low volume.
Has anyone else came up against the same problem?
How have you gotten around it?
cheers
Requirements
No single person ever has access to both keys (datastore and KMS)
Data access to this must be audited
If you enable audit logging, every API call done via this service account will be logged. This may not help you if your service isn't ever called via the API, but considering you have a service account being used, it sounds like it would be.
For now we have one key in a secret and the other we're going to manually post to the single pod.
You might consider using Vault for this. If you store the secret in vault, you can use something like this to have the environment variable pushed down into the pod as an environment variable automatically. This is a little more involved than your process, but is considerably more secure.
You can also use Vault alongside Google Cloud KMS which is detailed in this article
What you're describing is pretty common - using a key/ service account/ identity in Kubernetes secrets to access an external secret store.
I'm a bit confused by the double key concept - what are you gaining by having a key in both secrets and in the pod? If secrets are compromised, then etcd is compromised and you have bigger problems. I would suggest you focus instead on locking down secrets, using audit logs, and making the key is easy to rotate in case of compromise.
A few items to consider:
If you're mostly using Kubernetes, consider storing (encrypted) secrets in Kubernetes secrets.
If you're storing secrets centrally outside of Kubernetes, like you're describing, consider just using a single Kubernetes secret - you will get Kubernetes audit logs for access to the secret (see the recommended audit-policy), and Cloud KMS audit logs for use of the key.
I'm looking at consensus-type tools like ZooKeeper, Consul and Eureka and they all seem to market the same set of solutions:
Service discovery
Dynamic, centralized configuration management
Synchronization primitives
Consensus algorithms
However the more I read about these things, the more I struggle to see how service discovery is really any different than a dynamic, centralized configuration management (KV pair) system.
My understanding (so far) of service discovery is that it allows nodes to dynamically search for, find and connnect to remote services. So if an application uses an AuthService for authentication, authorization, it would use service discovery to find an AuthService node at, say, http://auth103.example.org:9103 and use it.
My understanding of dynamic config systems is that they provide a centralized infrastructure for nodes to dynamically receive updates from, as well as publish updates to, config servers. So if an app instance decides it needs to update a configuration for all its other instances, it would contact the config service and update, say, the numPurgerThreads config. The config service would then update all other app instances so that they updated their respective configs properly.
But aren't these exactly the same problem?
In both cases, you:
Connect to a lookup service of some sort
Query it for data; or
Publish data to it, which then ripples out to other nodes
Service discovery is dynamic configuration, right?!?!
What I'm really driving at is: Couldn't I just implement one config service with one of these tools, which coincidentally also solves service discovery? Or is there a reason why I would need to have, say, 1 Consul cluster for config/KV management, and another, say, Consul cluster for service discovery?
Well, if you look at it like that, these are all just flavors of databases, or data stores if you will, as you describe it:
Connect to a lookup service of some sort
Query it for data; or
Publish data to it, which then ripples out to other nodes
All of the use cases you mentioned require some sort of a data store that multiple clients connect to. What makes some of them more optimal then others for different use cases is their interface, data model, consistency guaranties etc.
Specifically, in case of service discovery, a nice feature can be failure detection - e.g. a k/v store that allows removing service data when the service is no longer connected. This way, you can register a service to your service discovery tool and know that when the service goes down or looses connectivity, it will no longer be present in the stored data.
Just to clarify a few terms, (imo) dynamic config systems can update their state at runtime, in contrast to static where everything is defined beforehand (i.e. config files). Centralized CM means there's a single place to store all config data. So centralized CM can both be static or dynamic, right?
IMO, service discovery has a component which CM doesn't and that's a protocol for automatic detection of services.
Guess you'd need a dynamic CM (KV store) to implement service discovery, but you couldn't implement a CM store just using a service discovery (protocol). Take DHCP for example (hope we agree it is a service discovery protocol?) - Dynamic Host Configuration Protocol', so it has a configuration aspect, but also a protocol, which is a bit more than a simple KV store. Btw, it's decentralized, just to illustrate the previous point that CM doesn't have to be centralized..
So service discovery has a configuration aspect, but CM doesn't (necessarily) have a service discovery one. Does this mean SD is a subset of CM? I'd say no.