Hashicorp vault pods with pending status - kubernetes

I deployed hashicorp vault with 3 replicas. Pod vault-0 is running but the other two pods are in pending status.
enter image description here
This is my override yaml,
# Vault Helm Chart Value Overrides
global:
enabled: true
tlsDisable: true
injector:
enabled: true
# Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
image:
repository: "hashicorp/vault-k8s"
tag: "0.9.0"
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
affinity: ""
server:
auditStorage:
enabled: true
standalone:
enabled: false
image:
repository: "hashicorp/vault"
tag: "1.6.3"
resources:
requests:
memory: 4Gi
cpu: 1000m
limits:
memory: 8Gi
cpu: 1000m
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = true
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/vault/data"
}
service_registration "kubernetes" {}
config: |
ui = true
listener "tcp" {
tls_disable = true
address = "[::]:8200"
cluster_address = "[::]:8201"
}
service_registration "kubernetes" {}
# Vault UI
ui:
enabled: true
serviceType: "ClusterIP"
externalPort: 8200
Did a kubectl describe into the pending pods and can see the following status message. I am not sure I am adding the correct affinity settings in the override file. Not sure what I am doing wrong. I am using vault helm charts to deploy to a docker desktop local cluster. Appreciate any help.
enter image description here

There are a few problems in your values.yaml file.
1.You set
server:
auditStorage:
enabled: true
but you didn't specify how the PVC would be created and what the Storage class is. The chart expects you to do that if you enable the storage. Look at: https://github.com/hashicorp/vault-helm/blob/v0.9.0/values.yaml#L443
Turn it false if you just testing on your local machine or specify storage config.
2.You set empty affinity variable for the injector but not for the server. Set
affinity: ""
for the server too. Look at: https://github.com/hashicorp/vault-helm/blob/v0.9.0/values.yaml#L338
3.An uninitialised and sealed Vault cluster is not really usable. You need to initialize and unseal Vault before it becomes ready. That means setting up a readinessProbe. Something like this:
server:
readinessProbe:
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
4.Last one but this is kinda optional. Those memory requests:
resources:
requests:
memory: 4Gi
cpu: 1000m
limits:
memory: 8Gi
cpu: 1000m
are a bit on the higher side. Setting up an HA cluster of 3 replicas with each requesting 4Gi of memory might result in Insufficient memory errors - most likely to happen when deploying on a local cluster.
But then again, you local machine might have 32 gigs of memory - I wouldn't know ;) If it doesn't, trim down those to fit on your machine.
So the following values works for me:
# Vault Helm Chart Value Overrides
global:
enabled: true
tlsDisable: true
injector:
enabled: true
# Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
image:
repository: "hashicorp/vault-k8s"
tag: "0.9.0"
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
affinity: ""
server:
auditStorage:
enabled: false
standalone:
enabled: false
image:
repository: "hashicorp/vault"
tag: "1.6.3"
resources:
requests:
memory: 256Mi
cpu: 200m
limits:
memory: 512Mi
cpu: 400m
affinity: ""
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = true
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/vault/data"
}
service_registration "kubernetes" {}
config: |
ui = true
listener "tcp" {
tls_disable = true
address = "[::]:8200"
cluster_address = "[::]:8201"
}
service_registration "kubernetes" {}
# Vault UI
ui:
enabled: true
serviceType: "ClusterIP"
externalPort: 8200

Related

How to connect opensearch dashboards to SSO AzureAD

I'm trying to have SSO in opensearch-dashboards via openid to AzureAD.
Overally - there is no need to have an encrypted communication between opensearch and nodes, there is no need to have encrypted communication between dashboards and master pod. All I need is to have working SSO to Azure AD to see dashboards.
I got errors in dashboards pod like: "res":{"statusCode":302,"responseTime":746,"contentLength":9} and tags":["error","plugins","securityDashboards"],"pid":1,"message":"OpenId authentication failed: Error: [index_not_found_exception] no such index [_plugins], with { index=\"_plugins\" │ │ & resource.id=\"_plugins\" & resource.type=\"index_expression\" & index_uuid=\"_na_\" }"} and the browser tells me The page isn’t redirecting properly
With last try I got from the ingress pod the error: Service "default/opensearch-values-opensearch-dashboards" does not have any active Endpoint.
I really appreciate any advice what am I missing...
I use helm installation of opensearch to AWS EKS (with nginx-controller ingress to publish the adress)
In AD I have an app registered like https://<some_address>/auth/openid/login
Here are my actual helm values:
opensearch.yaml
---
clusterName: "opensearch-cluster"
nodeGroup: "master"
masterService: "opensearch-cluster-master"
roles:
- master
- ingest
- data
- remote_cluster_client
replicas: 3
minimumMasterNodes: 1
majorVersion: ""
global:
dockerRegistry: "<registry>"
opensearchHome: /usr/share/opensearch
config:
log4j2.properties: |
rootLogger.level = debug
opensearch.yml: |
cluster.name: opensearch-cluster
network.host: 0.0.0.0
plugins.security.disabled: true
plugins:
security:
ssl:
transport:
pemcert_filepath: esnode.pem
pemkey_filepath: esnode-key.pem
pemtrustedcas_filepath: root-ca.pem
enforce_hostname_verification: false
http:
enabled: false
pemcert_filepath: esnode.pem
pemkey_filepath: esnode-key.pem
pemtrustedcas_filepath: root-ca.pem
allow_unsafe_democertificates: true
allow_default_init_securityindex: true
authcz:
admin_dn:
- CN=kirk,OU=client,O=client,L=test,C=de
audit.type: internal_opensearch
enable_snapshot_restore_privilege: true
check_snapshot_restore_write_privileges: true
restapi:
roles_enabled: ["all_access", "security_rest_api_access"]
system_indices:
enabled: true
indices:
[
".opendistro-alerting-config",
".opendistro-alerting-alert*",
".opendistro-anomaly-results*",
".opendistro-anomaly-detector*",
".opendistro-anomaly-checkpoints",
".opendistro-anomaly-detection-state",
".opendistro-reports-*",
".opendistro-notifications-*",
".opendistro-notebooks",
".opendistro-asynchronous-search-response*",
]
extraEnvs: []
envFrom: []
secretMounts: []
hostAliases: []
image:
repository: "opensearchproject/opensearch"
tag: ""
pullPolicy: "IfNotPresent"
podAnnotations: {}
labels: {}
opensearchJavaOpts: "-Xmx512M -Xms512M"
resources:
requests:
cpu: "1000m"
memory: "100Mi"
initResources: {}
sidecarResources: {}
networkHost: "0.0.0.0"
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
- emptyDir
persistence:
enabled: true
enableInitChown: true
labels:
enabled: false
accessModes:
- ReadWriteOnce
size: 8Gi
annotations: {}
extraVolumes: []
extraVolumeMounts: []
extraContainers: []
extraInitContainers:
- name: sysctl
image: docker.io/bitnami/bitnami-shell:10-debian-10-r199
imagePullPolicy: "IfNotPresent"
command:
- /bin/bash
- -ec
- |
CURRENT=`sysctl -n vm.max_map_count`;
DESIRED="262144";
if [ "$DESIRED" -gt "$CURRENT" ]; then
sysctl -w vm.max_map_count=262144;
fi;
CURRENT=`sysctl -n fs.file-max`;
DESIRED="65536";
if [ "$DESIRED" -gt "$CURRENT" ]; then
sysctl -w fs.file-max=65536;
fi;
securityContext:
privileged: true
priorityClassName: ""
antiAffinityTopologyKey: "kubernetes.io/hostname"
antiAffinity: "soft"
nodeAffinity: {}
topologySpreadConstraints: []
podManagementPolicy: "Parallel"
enableServiceLinks: true
protocol: http
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
headless:
annotations: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
externalTrafficPolicy: ""
updateStrategy: RollingUpdate
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
securityConfig:
enabled: true
path: "/usr/share/opensearch/plugins/opensearch-security/securityconfig"
actionGroupsSecret:
configSecret:
internalUsersSecret:
rolesSecret:
rolesMappingSecret:
tenantsSecret:
config:
securityConfigSecret: ""
dataComplete: true
data:
config.yml: |-
config:
dynamic:
authc:
basic_internal_auth_domain:
description: "Authenticate via HTTP Basic"
http_enabled: true
transport_enabled: true
order: 1
http_authenticator:
type: "basic"
challenge: false
authentication_backend:
type: "internal"
openid_auth_domain:
order: 0
http_enabled: true
transport_enabled: true
http_authenticator:
type: openid
challenge: false
config:
enable_ssl: true
verify_hostnames: false
subject_key: preferred_username
roles_key: roles
openid_connect_url: https://login.microsoftonline.com/<ms_id>/v2.0/.well-known/openid-configuration
authentication_backend:
type: noop
roles_mapping.yml: |-
all_access
reserved: false
backend_roles:
- "admin"
description: "Maps admin to all_access"
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 2000
schedulerName: ""
imagePullSecrets:
- name: regcred
nodeSelector: {}
tolerations: []
ingress:
enabled: false
annotations: {}
path: /
hosts:
- chart-example.local
tls: []
nameOverride: ""
fullnameOverride: ""
masterTerminationFix: false
lifecycle: {}
keystore: []
networkPolicy:
create: false
http:
enabled: false
fsGroup: ""
sysctl:
enabled: false
plugins:
enabled: false
installList: []
extraObjects: []
opensearch-dashboards.yaml
---
opensearchHosts: "http://opensearch-cluster-master:9200"
replicaCount: 1
image:
repository: "<registry>"
tag: "1.3.1"
pullPolicy: "IfNotPresent"
imagePullSecrets:
- name: regcred
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
rbac:
create: true
secretMounts: []
podAnnotations: {}
extraEnvs: []
envFrom: []
extraVolumes: []
extraVolumeMounts: []
extraInitContainers: ""
extraContainers: ""
podSecurityContext: {}
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
config:
opensearch_dashboards.yml: |
opensearch_security.cookie.secure: false
opensearch_security.auth.type: openid
opensearch_security.openid.client_id: <client_id>
opensearch_security.openid.client_secret: <client_secret>
opensearch_security.openid.base_redirect_url: https://<some_aws_id>.elb.amazonaws.com
opensearch_security.openid.connect_url: https://login.microsoftonline.com/<MS id>/v2.0/.well-known/openid-configuration
priorityClassName: ""
opensearchAccount:
secret: ""
keyPassphrase:
enabled: false
labels: {}
hostAliases: []
serverHost: "0.0.0.0"
service:
type: ClusterIP
port: 5601
loadBalancerIP: ""
nodePort: ""
labels: {}
annotations: {}
loadBalancerSourceRanges: []
httpPortName: http
ingress:
enabled: false
annotations: {}
hosts:
- host: chart-example.local
paths:
- path: /
backend:
serviceName: chart-example.local
servicePort: 80
tls: []
resources:
requests:
cpu: "100m"
memory: "512M"
limits:
cpu: "100m"
memory: "512M"
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
updateStrategy:
type: "Recreate"
nodeSelector: {}
tolerations: []
affinity: {}
extraObjects: []

K8ssandra deployment "Error connecting to Node (endPoint=/tmp/cassandra.sock)"

I'm trying to run K8ssandra but the Cassandra container keeps failing with the following message (Repeating over and over):
WARN [epollEventLoopGroup-374-2] 2021-12-30 23:54:23,711 AbstractBootstrap.java:452 - Unknown channel option 'TCP_NODELAY' for channel '[id: 0x7cf79bf5]'
WARN [epollEventLoopGroup-374-2] 2021-12-30 23:54:23,712 Loggers.java:39 - [s369] Error connecting to Node(endPoint=/tmp/cassandra.sock, hostId=null, hashCode=7ec5e39e), trying next node (FileNotFoundException: null)
INFO [nioEventLoopGroup-2-1] 2021-12-30 23:54:23,713 Cli.java:617 - address=/100.97.28.180:53816 url=/api/v0/metadata/endpoints status=500 Internal Server Error
and from the server-system-logger container:
tail: cannot open '/var/log/cassandra/system.log' for reading: No such file or directory
and finally, in the cass-operator pod:
2021-12-30T23:56:22.580Z INFO controllers.CassandraDatacenter incorrect status code when calling Node Management Endpoint {"cassandradatacenter": "default/dc1", "requestNamespace": "default", "requestName": "dc1", "loopID": "d1f81abc-6b68-4e63-9e95-1c2b5f6d4e9d", "namespace": "default", "datacenterName": "dc1", "clusterName": "mydomaincom", "statusCode": 500, "pod": "100.122.58.236"}
2021-12-30T23:56:22.580Z ERROR controllers.CassandraDatacenter Could not get endpoints data {"cassandradatacenter": "default/dc1", "requestNamespace": "default", "requestName": "dc1", "loopID": "d1f81abc-6b68-4e63-9e95-1c2b5f6d4e9d", "namespace": "default", "datacenterName": "dc1", "clusterName": "mydomaincom", "error": "incorrect status code of 500 when calling endpoint"}
Not really sure what's happening here. It works fine using the same config on a local minikube cluster, but I can't seem to get it to work on my AWS cluster (running kubernetes v1.20.10)
All other pods are running fine.
NAME READY STATUS RESTARTS AGE
mydomaincom-dc1-rac1-sts-0 2/3 Running 0 17m
k8ssandra-cass-operator-8675f58b89-qt2dx 1/1 Running 0 29m
k8ssandra-medusa-operator-589995d979-rnjhr 1/1 Running 0 29m
k8ssandra-reaper-operator-5d9d5d975d-c6nhv 1/1 Running 0 29m
the pod events show this:
Warning Unhealthy 109s (x88 over 16m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
My values.yaml (deployed with Helm3):
cassandra:
enabled: true
version: "4.0.1"
versionImageMap:
3.11.7: k8ssandra/cass-management-api:3.11.7-v0.1.33
3.11.8: k8ssandra/cass-management-api:3.11.8-v0.1.33
3.11.9: k8ssandra/cass-management-api:3.11.9-v0.1.27
3.11.10: k8ssandra/cass-management-api:3.11.10-v0.1.27
3.11.11: k8ssandra/cass-management-api:3.11.11-v0.1.33
4.0.0: k8ssandra/cass-management-api:4.0.0-v0.1.33
4.0.1: k8ssandra/cass-management-api:4.0.1-v0.1.33
clusterName: "mydomain.com"
auth:
enabled: true
superuser:
secret: ""
username: ""
cassandraLibDirVolume:
storageClass: default
size: 100Gi
encryption:
keystoreSecret:
keystoreMountPath:
truststoreSecret:
truststoreMountPath:
additionalSeeds: []
heap: {}
resources:
requests:
memory: 4Gi
cpu: 500m
limits:
memory: 4Gi
cpu: 1000m
datacenters:
-
name: dc1
size: 1
racks:
- name: rac1
heap: {}
ingress:
enabled: false
stargate:
enabled: false
reaper:
autoschedule: true
enabled: true
cassandraUser:
secret: ""
username: ""
jmx:
secret: ""
username: ""
medusa:
enabled: true
image:
registry: docker.io
repository: k8ssandra/medusa
tag: 0.11.3
cassandraUser:
secret: ""
username: ""
storage_properties:
region: us-east-1
bucketName: my-bucket-name
storageSecret: medusa-bucket-key
reaper-operator:
enabled: true
monitoring:
grafana:
provision_dashboards: false
prometheus:
provision_service_monitors: false
kube-prometheus-stack:
enabled: false
prometheusOperator:
enabled: false
serviceMonitor:
selfMonitor: false
prometheus:
enabled: false
grafana:
enabled: false
I was able to fix this by increasing the memory to 12Gi

Kubernetes Executor with Proxy Settings

I am using the helm chart for gitlab runner in a Kubernetes Cluster and need to pass environment variables to my Kubernetes Runner to allow him to download for example content from s3 cache. Unfortunately it does not work. Anyone any solutions for me ?
my values.yaml:
gitlabUrl: https://example.com
image: default-docker/gitlab-runner:alpine-v14.0.1
runnerRegistrationToken: XXXXXXXXXXXXX
imagePullPolicy: IfNotPresent
imagePullSecrets:
- name: "k8runner-secret"
rbac:
create: true
runners:
config: |
[[runners]]
environment = ["http_proxy: http://webproxy.comp.db.de:8080", "https_proxy: http://webproxy:comp:db:de:8080", "no_proxy: \"localhost\""]
[runners.kubernetes]
image = "default-docker/ubuntu:16.04"
cpu_request = "500m"
memory_request = "1Gi"
namespace = "gitlab"
[runners.cache]
Type = "s3"
Path = "cachepath"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "exampleBucket"
BucketLocation = "eu-west-1"
Insecure = false
tags: "test"
locked: true
name: "k8s-runner"
resources:
limits:
memory: 1Gi
cpu: 500m
requests:
memory: 250m
cpu: 50m
ENVIRONMENT:
http_proxy: http://webproxy.comp.db.de:8080
https_proxy: http://webproxy:comp:db:de:8080
no_proxy: "localhost"
config.template.toml located on the pod:
[[runners]]
[runners.kubernetes]
image = "default-docker/ubuntu:16.04"
cpu_request = "500m"
memory_request = "1Gi"
namespace = "gitlab"
[runners.cache]
Type = "s3"
Path = "cachepath"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "exampleBucket"
BucketLocation = "eu-west-1"
Insecure = false
config.toml located on the pod:
concurrent = 10
check_interval = 30
log_level = "info"
listen_address = ':9252'
It looks for me that he is not adding the environment variables. If I enter the env cmd I also can't find the environment variables.
I am thankful for every helping hand

vault agent injector bad cert

I have vault deployed from the official helm chart and it's running in HA mode, with auto-unseal, TLS enabled, raft as the backend, and the cluster is 1.17 in EKS. I have all of the raft followers joined to the vault-0 pod as the leader. I have followed this tutorial to the tee and I always end up with tls bad certificate. http: TLS handshake error from 123.45.6.789:52936: remote error: tls: bad certificate is the exact error.
I did find an issue with following this tutorial exactly. The part where they pipe the kubernetes CA to base64. For me this was multi-line and failed to deploy. So I pipped that output to tr -d '\n'. But this is where I get this error. I've tried the part of launching a container and testing it with curl, and it fails, then tailing the agent injector logs, I get that bad cert error.
Here is my values.yaml if it helps.
global:
tlsDisable: false
injector:
metrics:
enabled: true
certs:
secretName: vault-tls
caBundle: "(output of cat vault-injector.ca | base64 | tr -d '\n')"
certName: vault.crt
keyName: vault.key
server:
extraEnvironmentVars:
VAULT_CACERT: "/vault/userconfig/vault-tls/vault.ca"
extraSecretEnvironmentVars:
- envName: AWS_ACCESS_KEY_ID
secretName: eks-creds
secretKey: AWS_ACCESS_KEY_ID
- envName: AWS_SECRET_ACCESS_KEY
secretName: eks-creds
secretKey: AWS_SECRET_ACCESS_KEY
- envName: VAULT_UNSEAL_KMS_KEY_ID
secretName: vault-kms-id
secretKey: VAULT_UNSEAL_KMS_KEY_ID
extraVolumes:
- type: secret
name: vault-tls
- type: secret
name: eks-creds
- type: secret
name: vault-kms-id
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
auditStorage:
enabled: true
storageClass: gp2
standalone:
enabled: false
ha:
enabled: true
raft:
enabled: true
config: |
ui = true
api_addr = "[::]:8200"
cluster_addr = "[::]:8201"
listener "tcp" {
tls_disable = 0
tls_cert_file = "/vault/userconfig/vault-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-tls/vault.key"
tls_client_ca_file = "/vault/userconfig/vault-tls/vault.ca"
tls_min_version = "tls12"
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/vault/data"
}
disable_mlock = true
service_registration "kubernetes" {}
seal "awskms" {
region = "us-east-1"
kms_key_id = "VAULT_UNSEAL_KMS_KEY_ID"
}
ui:
enabled: true
I've exec'd into the agent-injector and poked around. I can see the /etc/webhook/certs/ are there and they look correct.
Here is my vault-agent-injector pod
kubectl describe pod vault-agent-injector-6bbf84484c-q8flv
Name: vault-agent-injector-6bbf84484c-q8flv
Namespace: default
Priority: 0
Node: ip-172-16-3-151.ec2.internal/172.16.3.151
Start Time: Sat, 19 Dec 2020 16:27:14 -0800
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/name=vault-agent-injector
component=webhook
pod-template-hash=6bbf84484c
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 172.16.3.154
IPs:
IP: 172.16.3.154
Controlled By: ReplicaSet/vault-agent-injector-6bbf84484c
Containers:
sidecar-injector:
Container ID: docker://2201b12c9bd72b6b85d855de6917548c9410e2b982fb5651a0acd8472c3554fa
Image: hashicorp/vault-k8s:0.6.0
Image ID: docker-pullable://hashicorp/vault-k8s#sha256:5697b85bc69aa07b593fb2a8a0cd38daefb5c3e4a4b98c139acffc9cfe5041c7
Port: <none>
Host Port: <none>
Args:
agent-inject
2>&1
State: Running
Started: Sat, 19 Dec 2020 16:27:15 -0800
Ready: True
Restart Count: 0
Liveness: http-get https://:8080/health/ready delay=1s timeout=5s period=2s #success=1 #failure=2
Readiness: http-get https://:8080/health/ready delay=2s timeout=5s period=2s #success=1 #failure=2
Environment:
AGENT_INJECT_LISTEN: :8080
AGENT_INJECT_LOG_LEVEL: info
AGENT_INJECT_VAULT_ADDR: https://vault.default.svc:8200
AGENT_INJECT_VAULT_AUTH_PATH: auth/kubernetes
AGENT_INJECT_VAULT_IMAGE: vault:1.5.4
AGENT_INJECT_TLS_CERT_FILE: /etc/webhook/certs/vault.crt
AGENT_INJECT_TLS_KEY_FILE: /etc/webhook/certs/vault.key
AGENT_INJECT_LOG_FORMAT: standard
AGENT_INJECT_REVOKE_ON_SHUTDOWN: false
AGENT_INJECT_TELEMETRY_PATH: /metrics
Mounts:
/etc/webhook/certs from webhook-certs (ro)
/var/run/secrets/kubernetes.io/serviceaccount from vault-agent-injector-token-k8ltm (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-certs:
Type: Secret (a volume populated by a Secret)
SecretName: vault-tls
Optional: false
vault-agent-injector-token-k8ltm:
Type: Secret (a volume populated by a Secret)
SecretName: vault-agent-injector-token-k8ltm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 40m default-scheduler Successfully assigned default/vault-agent-injector-6bbf84484c-q8flv to ip-172-16-3-151.ec2.internal
Normal Pulled 40m kubelet, ip-172-16-3-151.ec2.internal Container image "hashicorp/vault-k8s:0.6.0" already present on machine
Normal Created 40m kubelet, ip-172-16-3-151.ec2.internal Created container sidecar-injector
Normal Started 40m kubelet, ip-172-16-3-151.ec2.internal Started container sidecar-injector
My vault deployment
kubectl describe deployment vault
Name: vault-agent-injector
Namespace: default
CreationTimestamp: Sat, 19 Dec 2020 16:27:14 -0800
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=vault-agent-injector
component=webhook
Annotations: deployment.kubernetes.io/revision: 1
Selector: app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault-agent-injector,component=webhook
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/name=vault-agent-injector
component=webhook
Service Account: vault-agent-injector
Containers:
sidecar-injector:
Image: hashicorp/vault-k8s:0.6.0
Port: <none>
Host Port: <none>
Args:
agent-inject
2>&1
Liveness: http-get https://:8080/health/ready delay=1s timeout=5s period=2s #success=1 #failure=2
Readiness: http-get https://:8080/health/ready delay=2s timeout=5s period=2s #success=1 #failure=2
Environment:
AGENT_INJECT_LISTEN: :8080
AGENT_INJECT_LOG_LEVEL: info
AGENT_INJECT_VAULT_ADDR: https://vault.default.svc:8200
AGENT_INJECT_VAULT_AUTH_PATH: auth/kubernetes
AGENT_INJECT_VAULT_IMAGE: vault:1.5.4
AGENT_INJECT_TLS_CERT_FILE: /etc/webhook/certs/vault.crt
AGENT_INJECT_TLS_KEY_FILE: /etc/webhook/certs/vault.key
AGENT_INJECT_LOG_FORMAT: standard
AGENT_INJECT_REVOKE_ON_SHUTDOWN: false
AGENT_INJECT_TELEMETRY_PATH: /metrics
Mounts:
/etc/webhook/certs from webhook-certs (ro)
Volumes:
webhook-certs:
Type: Secret (a volume populated by a Secret)
SecretName: vault-tls
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: vault-agent-injector-6bbf84484c (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 46m deployment-controller Scaled up replica set vault-agent-injector-6bbf84484c to 1
What else can I check and verify or troubleshoot in order to figure out why the agent injector is causing this error?

Container Optimized OS performance

After upgrading my cluster nodes image from CONTAINER_VM to CONTAINER_OPTIMIZED_OS I ran into performance degradation of the PHP Application up to 10 times.
Did i miss something in my configuration or its a common issue?
I tried to take machines with more CPU and memory but it affected the performance slightly.
Terraform configuration:
resource "google_compute_address" "dev-cluster-address" {
name = "dev-cluster-address"
region = "europe-west1"
}
resource "google_container_cluster" "dev-cluster" {
name = "dev-cluster"
zone = "europe-west1-d"
initial_node_count = 2
node_version = "1.7.5"
master_auth {
username = "*********-dev"
password = "*********"
}
node_config {
oauth_scopes = [
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/sqlservice.admin"
]
machine_type = "n1-standard-1"
disk_size_gb = 20
image_type = "COS"
}
}
Kubernetes deployment for Symfony Application:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: deployment-dev
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
template:
metadata:
labels:
app: dev
spec:
containers:
- name: nginx
image: nginx:1.13.5-alpine
volumeMounts:
- name: application
mountPath: /var/www/web
- name: nginx-config
mountPath: /etc/nginx/conf.d
ports:
- containerPort: 80
resources:
limits:
cpu: "20m"
memory: "64M"
requests:
cpu: "5m"
memory: "16M"
- name: php
image: ********
lifecycle:
postStart:
exec:
command:
- "bash"
- "/var/www/provision/files/init_php.sh"
envFrom:
- configMapRef:
name: symfony-config-dev
volumeMounts:
- name: application
mountPath: /application
- name: logs
mountPath: /var/www/var/logs
- name: lexik-jwt-keys
mountPath: /var/www/var/jwt
ports:
- containerPort: 9000
resources:
limits:
cpu: "400m"
memory: "1536M"
requests:
cpu: "300m"
memory: "1024M"
- name: cloudsql-proxy-mysql
image: gcr.io/cloudsql-docker/gce-proxy:1.09
resources:
limits:
cpu: "10m"
memory: "64M"
requests:
cpu: "5m"
memory: "16M"
command:
- "/cloud_sql_proxy"
- "-instances=***:europe-west1:dev1=tcp:0.0.0.0:3306"
- name: cloudsql-proxy-analytics
image: gcr.io/cloudsql-docker/gce-proxy:1.09
resources:
limits:
cpu: "20m"
memory: "64M"
requests:
cpu: "10m"
memory: "16M"
command:
- "/cloud_sql_proxy"
- "-instances=***:europe-west1:analytics-dev1=tcp:0.0.0.0:3307"
- name: sidecar-logging
image: alpine:3.6
args: [/bin/sh, -c, 'tail -n+1 -f /var/www/var/logs/prod.log']
volumeMounts:
- name: logs
mountPath: /var/www/var/logs
resources:
limits:
cpu: "5m"
memory: "20M"
requests:
cpu: "5m"
memory: "20M"
volumes:
- name: application
emptyDir: {}
- name: logs
emptyDir: {}
- name: nginx-config
configMap:
name: config-dev
items:
- key: nginx
path: default.conf
- name: lexik-jwt-keys
configMap:
name: config-dev
items:
- key: lexik_jwt_private_key
path: private.pem
- key: lexik_jwt_public_key
path: public.pem
One of the reasons could be the fact that Kubernetes actually started enforcing the CPU limits with Container-Optimized OS.
resources:
limits:
cpu: "20m"
These were not enforced on the older ContainerVM images.
Could you please try removing/relaxing cpu limits from your pod-spec and see if it helps?