Autounseal Vault with GCP KMS - kubernetes

I would like to use auto unseal vault mechanism using the GCP KMS.
I have been following this tutorial (section: 'Google KMS Auto Unseal') and applying the official hashicorp helm chart with the following values:
global:
enabled: true
server:
logLevel: "debug"
injector:
logLevel: "debug"
extraEnvironmentVars:
GOOGLE_REGION: global
GOOGLE_PROJECT: ESGI-projects
GOOGLE_APPLICATION_CREDENTIALS: /vault/userconfig/kms-creds/credentials.json
extraVolumes:
- type: 'secret'
name: 'kms-creds'
ha:
enabled: true
replicas: 3
raft:
enabled: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
seal "gcpckms" {
project = "ESGI-projects"
region = "global"
key_ring = "gitter"
crypto_key = "vault-helm-unseal-key"
}
storage "raft" {
path = "/vault/data"
}
I have created a kms-creds with the json credentials for a service account (I have tried with Cloud KMS Service Agent and owner role but none of them work.
Here are the keys in my key ring :
My cluster is just a local cluster created with kind.
On the first replica of the vault server all seems ok (but not running though):
And on the two others got the normal message claiming that the vault is sealed:
Any idea what could be wrong? Should I create one key for each replica?

OK well, I have succeeded in setting in place the Vault with auto unseal !
What I did:
Change the project (the id was required, not the name)
I disabled the raft (raft.enabled: false)
I moved the backend to google cloud storage adding to the config:
storage "gcs" {
bucket = "gitter-secrets"
ha_enabled = "true"
}
ha_enabled=true was compulsory (with regional bucket)
My final helm values is:
global:
enabled: true
server:
logLevel: "debug"
injector:
logLevel: "debug"
extraEnvironmentVars:
GOOGLE_REGION: global
GOOGLE_PROJECT: esgi-projects-354109
GOOGLE_APPLICATION_CREDENTIALS: /vault/userconfig/kms-creds/credentials.json
extraVolumes:
- type: 'secret'
name: 'kms-creds'
ha:
enabled: true
replicas: 3
raft:
enabled: false
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
seal "gcpckms" {
project = "esgi-projects-354109"
region = "global"
key_ring = "gitter"
crypto_key = "vault-helm-unseal-key"
}
storage "gcs" {
bucket = "gitter-secrets"
ha_enabled = "true"
}
Using a service account with permissions:
Cloud KMS CryptoKey Encrypter/Decrypter
Storage Object Admin Permission on gitter-secrets only
I had an issue at first, the vault-0 needed to run a vault operator init. After trying several things (post install hooks among others) and comming back to the initial state the pod were unsealing normally without running anything.

Related

Gitlab: How to configure Backup when using object-store

we are running GitLab installed in our Kubernetes Cluster, using rook-ceph Rados-Gateway as S3 Storage backend. We want to use the backup-utility delivered in the tools container from gitlab.
As backup target we configured an external minio Instance.
When using the backup-utility, this error messages occurs:
Bucket not found: gitlab-registry-bucket. Skipping backup of registry ...
Bucket not found: gitlab-uploads-bucket. Skipping backup of uploads ...
Bucket not found: gitlab-artifacts-bucket. Skipping backup of artifacts ...
Bucket not found: gitlab-lfs-bucket. Skipping backup of lfs ...
Bucket not found: gitlab-packages-bucket. Skipping backup of packages ...
Bucket not found: gitlab-mr-diffs. Skipping backup of external_diffs ...
Bucket not found: gitlab-terraform-state. Skipping backup of terraform_state ...
Bucket not found: gitlab-pages-bucket. Skipping backup of pages ...
When I'm executing s3cmd ls, I only see the two Backup Buckets on our minio Instance, not the "source" Buckets.
Can someone tell me, how to configure the backup-utility or the s3cmd so it can access both, the Rados-Gateway for the Source Buckets and the minio as Backup Target?
I have tried to insert multiple connections into the .s3cfg File like this:
[target]
host_base = file01.xxx.xxx:80
host_bucket = file01.xxx.xxx:80
use_https = false
bucket_location = us-east-1
access_key = xxx
secret_key = xxx
[source]
host_base = s3.xxx.xxx:80
host_bucket = s3.xxx.xxx:80
use_https = false
bucket_location = us-east-1
access_key = xxx
secret_key = xxx
but that did not show any buckets from the Target when using s3cmd ls.
#Löppinator : Please check GitLab Documentation link here
for values.yaml and sample configuration looks like below :
global:
.
.
.
pages: #pages bucket to be added with connection
enabled: true
host: <hostname>
artifactsServer: true
objectStore:
enabled: true
bucket: <s3-bucket-name>
# proxy_download: true
connection:
secret: <secret-for-s3-connection>
.
.
.
appConfig:
.
.
.
object_store:
enabled: true
proxy_download: true
connection:
secret: <secret-for-s3-connection>
lfs:
enabled: true
proxy_download: false
bucket: <s3-bucket-name>
connection: {}
artifacts:
enabled: true
proxy_download: true
bucket: <s3-bucket-name>
connection: {}
uploads:
enabled: true
proxy_download: true
bucket: <s3-bucket-name>
connection: {}
packages:
enabled: true
proxy_download: true
bucket: <s3-bucket-name>
connection: {}
externalDiffs:
enabled: true
proxy_download: true
bucket: <s3-bucket-name>
connection: {}
terraformState:
enabled: true
bucket: <s3-bucket-name>
connection: {}
ciSecureFiles:
enabled: true
bucket: <s3-bucket-name>
connection: {}
dependencyProxy:
enabled: true
proxy_download: true
bucket: <s3-bucket-name>
connection: {}
backups:
bucket: <s3-bucket-name>
tmpBucket: <s3-bucket-name>
registry: #registry bucket also should be added in S3 and no connection is required here
bucket: <s3-bucket-name>
You have to check indentation to consider Pages and registry buckets which will be under global config and rest of the buckets will be under appConfig if you see my code above.
I hope this helps!!!

ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner

I read through the karpenter document at https://karpenter.sh/v0.16.1/getting-started/getting-started-with-terraform/#install-karpenter-helm-chart. I followed instructions step by step. I got errors at the end.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
Below is the source code:
cat main.tf
terraform {
required_version = "~> 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.5"
}
kubectl = {
source = "gavinbunney/kubectl"
version = "~> 1.14"
}
}
}
provider "aws" {
region = "us-east-1"
}
locals {
cluster_name = "karpenter-demo"
# Used to determine correct partition (i.e. - `aws`, `aws-gov`, `aws-cn`, etc.)
partition = data.aws_partition.current.partition
}
data "aws_partition" "current" {}
module "vpc" {
# https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
source = "terraform-aws-modules/vpc/aws"
version = "3.14.4"
name = local.cluster_name
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
one_nat_gateway_per_az = false
public_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = 1
}
}
module "eks" {
# https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
source = "terraform-aws-modules/eks/aws"
version = "18.29.0"
cluster_name = local.cluster_name
cluster_version = "1.22"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Required for Karpenter role below
enable_irsa = true
node_security_group_additional_rules = {
ingress_nodes_karpenter_port = {
description = "Cluster API to Node group for Karpenter webhook"
protocol = "tcp"
from_port = 8443
to_port = 8443
type = "ingress"
source_cluster_security_group = true
}
}
node_security_group_tags = {
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
# (i.e. - at most, only one security group should have this tag in your account)
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
# Only need one node to get Karpenter up and running.
# This ensures core services such as VPC CNI, CoreDNS, etc. are up and running
# so that Karpenter can be deployed and start managing compute capacity as required
eks_managed_node_groups = {
initial = {
instance_types = ["m5.large"]
# Not required nor used - avoid tagging two security groups with same tag as well
create_security_group = false
min_size = 1
max_size = 1
desired_size = 1
iam_role_additional_policies = [
"arn:${local.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore", # Required by Karpenter
"arn:${local.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:${local.partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly", #for access to ECR images
"arn:${local.partition}:iam::aws:policy/CloudWatchAgentServerPolicy"
]
tags = {
# This will tag the launch template created for use by Karpenter
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
}
}
}
#The EKS module creates an IAM role for the EKS managed node group nodes. We’ll use that for Karpenter.
#We need to create an instance profile we can reference.
#Karpenter can use this instance profile to launch new EC2 instances and those instances will be able to connect to your cluster.
resource "aws_iam_instance_profile" "karpenter" {
name = "KarpenterNodeInstanceProfile-${local.cluster_name}"
role = module.eks.eks_managed_node_groups["initial"].iam_role_name
}
#Create the KarpenterController IAM Role
#Karpenter requires permissions like launching instances, which means it needs an IAM role that grants it access. The config
#below will create an AWS IAM Role, attach a policy, and authorize the Service Account to assume the role using IRSA. We will
#create the ServiceAccount and connect it to this role during the Helm chart install.
module "karpenter_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.3.3"
role_name = "karpenter-controller-${local.cluster_name}"
attach_karpenter_controller_policy = true
karpenter_tag_key = "karpenter.sh/discovery/${local.cluster_name}"
karpenter_controller_cluster_id = module.eks.cluster_id
karpenter_controller_node_iam_role_arns = [
module.eks.eks_managed_node_groups["initial"].iam_role_arn
]
oidc_providers = {
ex = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["karpenter:karpenter"]
}
}
}
#Install Karpenter Helm Chart
#Use helm to deploy Karpenter to the cluster. We are going to use the helm_release Terraform resource to do the deploy and pass in the
#cluster details and IAM role Karpenter needs to assume.
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", local.cluster_name]
}
}
}
resource "helm_release" "karpenter" {
namespace = "karpenter"
create_namespace = true
name = "karpenter"
repository = "https://charts.karpenter.sh"
chart = "karpenter"
version = "v0.16.1"
set {
name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = module.karpenter_irsa.iam_role_arn
}
set {
name = "clusterName"
value = module.eks.cluster_id
}
set {
name = "clusterEndpoint"
value = module.eks.cluster_endpoint
}
set {
name = "aws.defaultInstanceProfile"
value = aws_iam_instance_profile.karpenter.name
}
}
#Provisioner
#Create a default provisioner using the command below. This provisioner configures instances to connect to your cluster’s endpoint and
#discovers resources like subnets and security groups using the cluster’s name.
#This provisioner will create capacity as long as the sum of all created capacity is less than the specified limit.
provider "kubectl" {
apply_retry_count = 5
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
load_config_file = false
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
resource "kubectl_manifest" "karpenter_provisioner" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
limits:
resources:
cpu: 1000
provider:
subnetSelector:
Name: "*private*"
securityGroupSelector:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
tags:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
ttlSecondsAfterEmpty: 30
YAML
depends_on = [
helm_release.karpenter
]
}
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 0
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
resources:
requests:
cpu: 1
EOF
kubectl scale deployment inflate --replicas 5
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
I belive this is due to the pod topology defined in the Karpenter deployment here:
https://github.com/aws/karpenter/blob/main/charts/karpenter/values.yaml#L73-L77
, you can read further on what pod topologySpreadConstraints does here:
https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
If you increase the desired_size to 2 which matches the default deployment replicas above, that should resove the error.

Vault transit engine auto unseal does not pass VAULT_TOKEN when starting up?

VAULT-1 Unseal provider:
cat /etc/vault.d/vault.json
"listener": [{
"tcp": {
"address": "0.0.0.0:8200",
"tls_disable" : 1
}
}],
"storage" :{
"file" : {
"path" : "/opt/vault/data"
}
},
"max_lease_ttl": "1h",
"default_lease_ttl": "1h"
}
VAULT-2 Unseal client, this is the vault attempting to auto unseal itself:
cat /etc/vault.d/vault.hcl
disable_mlock = true
ui=true
storage "file" {
path = "/vault-2/data"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = "true"
}
seal "transit" {
address = "http://192.168.100.100:8200"
disable_renewal = "false"
key_name = "autounseal"
mount_path = "transit/"
tls_skip_verify = "true"
}
Token seems valid on VAULT-1:
vault token lookup s.XazV
Key Value
--- -----
accessor eCH1R3G
creation_time 1637091280
creation_ttl 10h
display_name token
entity_id n/a
expire_time 2021-11-17T00:34:40.837284665-05:00
explicit_max_ttl 0s
id s.XazV
issue_time 2021-11-16T14:34:40.837289691-05:00
meta <nil>
num_uses 0
on VAULT-2, I have an env var set:
export VAULT_TOKEN="s.XazV"
I have the policy enabled accordingly on VAULT-1. However when starting the service on VAULT-2:
vault2 vault[758]: URL: PUT http://192.168.100.100:8200/v1/transit/encrypt/autounseal
vault2 vault[758]: Code: 400. Errors:
vault2 vault[758]: * missing client token
Thank you.
If you're starting up the Vault service with systemctl, you may need to configure the service file to include the token in an Environment configuration rather than with export.
https://askubuntu.com/questions/940679/pass-environment-variables-to-services-started-with-systemctl#940797

How to configure S3 as backend storage for hashicorp vault

I have a running EKS cluster I want to deploy Vault on that cluster using Terraform, my code is working fine while deploying. This is my main.tf
data "aws_eks_cluster" "default" {
name = var.eks_cluster_name
}
data "aws_eks_cluster_auth" "default" {
name = var.eks_cluster_name
}
resource "kubernetes_namespace" "vault" {
metadata {
name = "vault"
}
}
resource "helm_release" "vault" {
name = "vault"
repository = "https://helm.releases.hashicorp.com/"
chart = "vault"
namespace = kubernetes_namespace.vault.metadata.0.name
values = [
"${file("values.json")}"
]
}
provider "kubernetes" {
host = data.aws_eks_cluster.default.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.default.token
load_config_file = false
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.default.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.default.token
load_config_file = false
}
}
And this is values.json
server:
image:
repository: vault
tag: latest
dataStorage:
enabled: true
auditStorage:
enabled: true
ha:
enabled: true
replicas: 1
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "s3" {
access_key = "xxxxxxxxx"
secret_key = "xxxxxxxxxx"
bucket = "xxxx-vault"
region = "xxxx-xxxx-x"
}
service_registration "kubernetes" {}
extraVolumes:
- type: secret
name: tls
extraEnvironmentVars:
VAULT_ADDR: https://127.0.0.1:8200
VAULT_SKIP_VERIFY: true
ui:
enabled: true
serviceType: LoadBalancer
but it is not taking my S3 bucket as storage after deploy every time it is taking file system as storage not given S3 bucket. Whats wrong here?
think you missed a key in your values files:
ha:
enabled: true
replicas: 1
config: |
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "s3" {
access_key = "xxxxxxxxx"
secret_key = "xxxxxxxxxx"
bucket = "xxxx-vault"
region = "xxxx-xxxx-x"
}
service_registration "kubernetes" {}

VAULT_CLIENT_TOKEN keeps expiring every 24h

Environment:
Vault + Consul, all latest. Integrating Concourse (3.14.0) with Vault. All tokens and keys are throw-away. This is just a test cluster.
Problem:
No matter what I do, I get 768h as the token_duration value. Also, overnight my approle token keeps expiring no matter what I do. I have to regenerate token and pass it to Concourse and restart the service. I want this token not to expire.
[root#k1 etc]# vault write auth/approle/login role_id="34b73748-7e77-f6ec-c5fd-90c24a5a98f3" secret_id="80cc55f1-bb8b-e96c-78b0-fe61b243832d" duration=0
Key Value
--- -----
token 9a6900b7-062d-753f-131c-a2ac7eb040f1
token_accessor 171aeb1c-d2ce-0261-e20f-8ed6950d1d2a
token_duration 768h
token_renewable true
token_policies ["concourse" "default"]
identity_policies []
policies ["concourse" "default"]
token_meta_role_name concourse
[root#k1 etc]#
So, I use token - 9a6900b7-062d-753f-131c-a2ac7eb040f1 for my Concourse to access secrets and all is good, until 24h later. It gets expired.
I set duration to 0, but It didn't help.
$ vault write auth/approle/role/concourse secret_id_ttl=0 period=0 policies=concourse secret_id_num_uses=0 token_num_uses=0
My modified vaultconfig.hcl looks like this:
storage "consul" {
address = "127.0.0.1:8500"
path = "vault/"
token = "95FBC040-C484-4D16-B489-AA732DB6ADF1"
#token = "0b4bc7c7-7eb0-4060-4811-5f9a7185aa6f"
}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_min_version = "tls10"
tls_disable = 1
}
cluster_addr = "http://192.168.163.132:8201"
api_addr = "http://192.168.163.132:8200"
disable_mlock = true
disable_cache = true
ui = true
default_lease_ttl = 0
cluster_name = "testcluster"
raw_storage_endpoint = true
My Concourse policy is vanilla:
[root#k1 etc]# vault policy read concourse
path "concourse/*" {
policy = "read"
capabilities = ["read", "list"]
}
[root#k1 etc]#
Look up token - 9a6900b7-062d-753f-131c-a2ac7eb040f1
[root#k1 etc]# vault token lookup 9a6900b7-062d-753f-131c-a2ac7eb040f1
Key Value
--- -----
accessor 171aeb1c-d2ce-0261-e20f-8ed6950d1d2a
creation_time 1532521379
creation_ttl 2764800
display_name approle
entity_id 11a0d4ac-10aa-0d62-2385-9e8071fc4185
expire_time 2018-08-26T07:22:59.764692652-05:00
explicit_max_ttl 0
id 9a6900b7-062d-753f-131c-a2ac7eb040f1
issue_time 2018-07-25T07:22:59.238050234-05:00
last_renewal 2018-07-25T07:24:44.764692842-05:00
last_renewal_time 1532521484
meta map[role_name:concourse]
num_uses 0
orphan true
path auth/approle/login
policies [concourse default]
renewable true
ttl 2763645
[root#k1 etc]#
Any pointers, feedback is very appreciated.
Try setting the token_ttl and token_max_ttl parameters instead of the secret_id_ttl when creating the new AppRole.
You should also check your Vault default_lease_ttl and max_lease_ttl, they might be set to 24h