Odd Kubernetes behaviour in AWS EKS cluster - kubernetes

In an EKS cluster (v1.22.10-eks-84b4fe6) that I manage I've spotted a behavior that I had never seen before (or that I missed completely...) => In a namespace with an application running in created by a public helm chart, if I create a separate new unrelated pod (a simple empty busybox with a sleep command in it) it'll automatically mount some environmental variables always starting with the name of the namespace and as referring to the available services which are related to the helm chart/deployment already in it. I'm not sure I understand this behavior, I've tested this in several other namespaces with helm charts deployed as well and I get the same results (each time with different env vars obviously).
An example in a namespace with this chart installed -> https://github.com/bitnami/charts/tree/master/bitnami/keycloak
testpod.yaml
apiVersion: v1
kind: Pod
metadata:
name: testpod
namespace: keycloak-18
spec:
containers:
- image: busybox
name: testpod
command: ["/bin/sh", "-c"]
args: ["sleep 3600"]
When in the pod:
/ # env
KEYCLOAK_18_METRICS_PORT_8080_TCP_PROTO=tcp
KUBERNETES_PORT=tcp://10.100.0.1:443
KUBERNETES_SERVICE_PORT=443
KEYCLOAK_18_METRICS_SERVICE_PORT=8080
KEYCLOAK_18_METRICS_PORT=tcp://10.100.104.11:8080
KEYCLOAK_18_PORT_80_TCP_ADDR=10.100.71.5
HOSTNAME=testpod
SHLVL=2
KEYCLOAK_18_PORT_80_TCP_PORT=80
HOME=/root
KEYCLOAK_18_PORT_80_TCP_PROTO=tcp
KEYCLOAK_18_METRICS_PORT_8080_TCP=tcp://10.100.104.11:8080
KEYCLOAK_18_POSTGRESQL_PORT_5432_TCP_ADDR=10.100.155.185
KEYCLOAK_18_POSTGRESQL_SERVICE_HOST=10.100.155.185
KEYCLOAK_18_PORT_80_TCP=tcp://10.100.71.5:80
KEYCLOAK_18_POSTGRESQL_PORT_5432_TCP_PORT=5432
KEYCLOAK_18_POSTGRESQL_PORT_5432_TCP_PROTO=tcp
TERM=xterm
KUBERNETES_PORT_443_TCP_ADDR=10.100.0.1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_PORT_443_TCP_PORT=443
KEYCLOAK_18_POSTGRESQL_PORT=tcp://10.100.155.185:5432
KEYCLOAK_18_POSTGRESQL_SERVICE_PORT=5432
KEYCLOAK_18_SERVICE_PORT_HTTP=80
KEYCLOAK_18_POSTGRESQL_SERVICE_PORT_TCP_POSTGRESQL=5432
KUBERNETES_PORT_443_TCP_PROTO=tcp
KEYCLOAK_18_POSTGRESQL_PORT_5432_TCP=tcp://10.100.155.185:5432
KEYCLOAK_18_METRICS_SERVICE_PORT_HTTP=8080
KEYCLOAK_18_SERVICE_HOST=10.100.71.5
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP=tcp://10.100.0.1:443
KUBERNETES_SERVICE_HOST=10.100.0.1
PWD=/
KEYCLOAK_18_METRICS_PORT_8080_TCP_ADDR=10.100.104.11
KEYCLOAK_18_METRICS_SERVICE_HOST=10.100.104.11
KEYCLOAK_18_SERVICE_PORT=80
KEYCLOAK_18_PORT=tcp://10.100.71.5:80
KEYCLOAK_18_METRICS_PORT_8080_TCP_PORT=8080
I've looked a bit into this and I've seen this doc https://kubernetes.io/docs/concepts/containers/container-environment/, but it states less variables than I can see myself
I may be behind on some Kubernetes features, does anyone have a clue?
Thanks!

What you are seeing is expected. Asserted from the official documentation:
When a Pod is run on a Node, the kubelet adds a set of environment
variables for each active Service. It adds {SVCNAME}_SERVICE_HOST and
{SVCNAME}_SERVICE_PORT variables, where the Service name is
upper-cased and dashes are converted to underscores.
This behavior is not EKS specific.

Related

Kubernetes: How to update a live busybox container's 'command'

I have the following manifest that created the running pod named 'test'
apiVersion: v1
kind: Pod
metadata:
name: hello-world
labels:
app: blue
spec:
containers:
- name: funskies
image: busybox
command: ["/bin/sh", "-c", "echo 'Hello World'"]
I want to update the pod to include the additional command
apiVersion: v1
kind: Pod
metadata:
name: hello-world
labels:
app: blue
spec:
containers:
restartPolicy: Never
- name: funskies
image: busybox
command: ["/bin/sh", "-c", "echo 'Hello World' > /home/my_user/logging.txt"]
What I tried
kubectl edit pod test
What resulted
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
# pods "test" was not valid:
# * spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`...
Other things I tried:
Updated the manifest and then ran apply - same issue
kubectl apply -f test.yaml
Question: What is the proper way to update a running pod?
You can't modify most properties of a Pod. Typically you don't want to directly create Pods; use a higher-level controller like a Deployment.
The Kubernetes documentation for a PodSpec notes (emphasis mine):
containers: List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
In all cases, no matter what, a container runs a single command, and if you want to change what that command is, you need to delete and recreate the container. In Kubernetes this always means deleting and recreating the containing Pod. Usually you shouldn't use bare Pods, but if you do, you can create a new Pod with the new command and delete the old one. Deleting Pods is extremely routine and all kinds of ordinary things cause it to happen (updating Deployments, a HorizontalPodAutoscaler scaling down, ...).
If you have a Deployment instead of a bare Pod, you can freely change the template: for the Pods it creates. This includes changing their command:. This will result in the Deployment creating a new Pod with the new command, and once it's running, deleting the old Pod.
The sorts of very-short-lived single-command containers you show in the question aren't necessarily well-suited to running in Kubernetes. If the Pod isn't going to stay running and serve requests, a Job could be a better match; but a Job believes it will only be run once, and if you change the pod spec for a completed Job I don't think it will launch a new Pod. You'd need to create a new Job for this case.
I am not sure what the whole requirement is.
but you can exec to the pod and update the details
$ kubectl exec <pod-name> -it -n <namespace> -- <command to execute>
like,
$ kubectl exec pod/hello-world-xxxx-xx -it -- /bin/bash
if tty support shell, use "/bin/sh" to update the content or command.
Editing the running pod, will not retain the changes in manifest file. so in that case you have to run a new pod with the changes.

Kubectl get deployments, no resources

I've just started learning kubernetes, in every tutorial the writer generally uses "kubectl .... deploymenst" to control the newly created deploys. Now, with those commands (ex kubectl get deploymets) i always get the response No resources found in default namespace., and i have to use "pods" instead of "deployments" to make things work (which works fine).
Now my question is, what is causing this to happen, and what is the difference between using a deployment or a pod? ? i've set the docker driver in the first minikube, it has something to do with this?
First let's brush up some terminologies.
Pod - It's the basic building block for Kubernetes. It groups one or more containers (such as Docker containers), with shared storage/network, and a specification for how to run the containers.
Deployment - It is a controller which wraps Pod/s and manages its life cycle, which is to say actual state to desired state. There is one more layer in between Deployment and Pod which is ReplicaSet : A ReplicaSet’s purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
Below is the visualization:
Source: I drew it!
In you case what might have happened :
Either you have created a Pod not a Deployment. Therefore, when you do kubectl get deployment you don't see any resources. Note when you create Deployments it in turn creates a ReplicaSet for you and also creates the defined pods.
Or may be you created your deployment in a different namespace, if that's the case, then type this command to find your deployments in that namespace kubectl get deploy NAME_OF_DEPLOYMENT -n NAME_OF_NAMESPACE
More information to clarify your concepts:
Source
Below the section inside spec.template is the section which is supposedly your POD manifest if you were to create it manually and not take the deployment route. Now like I said earlier in simple terms Deployments are a wrapper to your PODs, therefore anything which you see outside the path spec.template is the configuration which you will need to defined on how you want to manage (scaling,affinity, e.t.c) your POD
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
Deployment is a controller providing higher level abstraction on top of pods and ReplicaSets. A Deployment provides declarative updates for Pods and ReplicaSets. Deployments internally creates ReplicaSets within which pods are created.
Use cases of deployment is documented here
One reason for No resources found in default namespace could be that you created the deployment in a specific namespace and not in default namespace.
You can see deployments in a specific namespace or in all namespaces via
kubectl get deploy -n namespacename
kubectl get deploy -A

Kubernetes exposes more environment variables than expected

I've faced a strange behaviour with K8s pods running in AWS EKS cluster (version 1.14). The services are deployed via Helm 3 charts. The case is that pod receives more environment variables than expected.
The pod specification says that variables should be populated from a config map.
apiVersion: v1
kind: Pod
metadata:
name: apigw-api-gateway-59cf5bfdc9-s6hrh
namespace: development
spec:
containers:
- env:
- name: JAVA_OPTS
value: -server -XX:MaxRAMPercentage=75.0 -XX:+UseContainerSupport -XX:+HeapDumpOnOutOfMemoryError
- name: GATEWAY__REDIS__HOST
value: apigw-redis-master.development.svc.cluster.local
envFrom:
- configMapRef:
name: apigw-api-gateway-env # <-- this is the map
# the rest of spec is hidden
The config map apigw-api-gateway-env has this specification:
apiVersion: v1
data:
GATEWAY__APP__ADMIN_LOPUSH: ""
GATEWAY__APP__CUSTOMER_LOPUSH: ""
GATEWAY__APP__DISABLE_RATE_LIMITS: "true"
# here are other 'GATEWAY__' envs
JMX_AUTH: "false"
JMX_ENABLED: "true"
# here are other 'JMX_' envs
kind: ConfigMap
metadata:
name: apigw-api-gateway-env
namespace: development
If I request a list of environment variables, I can find values from a different service. These values are not specified in the config map of the 'apigw' application; they are stored in a map for a 'lopush' application. Here is a sample.
/ # env | grep -i lopush | sort | head -n 4
GATEWAY__APP__ADMIN_LOPUSH=<hidden>
GATEWAY__APP__CUSTOMER_LOPUSH=<hidden>
LOPUSH_GAME_ADMIN_MOBILE_PORT=tcp://172.20.248.152:5050
LOPUSH_GAME_ADMIN_MOBILE_PORT_5050_TCP=tcp://172.20.248.152:5050
I've also noticed that this behaviour is somehow relative to the order in which the services were launched. That could be just because some config maps didn't exist at that moment. It seems for now like the pod receives variables from all config maps in the current namespace.
Did any one faced this issue before? Is it possible, that there are other criteria which force K8s to populate environment from other maps?
If you mean the _PORT stuff, that's for compatibility with the old Docker Container Links system. All services in the namespace get automatically set up that way to make it easier to move things from older Docker-based systems.

Running kubectl proxy from same pod vs different pod on same node - what's the difference?

I'm experimenting with this, and I'm noticing a difference in behavior that I'm having trouble understanding, namely between running kubectl proxy from within a pod vs running it in a different pod.
The sample configuration run kubectl proxy and the container that needs it* in the same pod on a daemonset, i.e.
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
# ...
spec:
template:
metadata:
# ...
spec:
containers:
# this container needs kubectl proxy to be running:
- name: l5d
# ...
# so, let's run it:
- name: kube-proxy
image: buoyantio/kubectl:v1.8.5
args:
- "proxy"
- "-p"
- "8001"
When doing this on my cluster, I get the expected behavior. However, I will run other services that also need kubectl proxy, so I figured I'd rationalize that into its own daemon set to ensure it's running on all nodes. I thus removed the kube-proxy container and deployed the following daemon set:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-proxy
labels:
app: kube-proxy
spec:
template:
metadata:
labels:
app: kube-proxy
spec:
containers:
- name: kube-proxy
image: buoyantio/kubectl:v1.8.5
args:
- "proxy"
- "-p"
- "8001"
In other words, the same container configuration as previously, but now running in independent pods on each node instead of within the same pod. With this configuration "stuff doesn't work anymore"**.
I realize the solution (at least for now) is to just run the kube-proxy container in any pod that needs it, but I'd like to know why I need to. Why isn't just running it in a daemonset enough?
I've tried to find more information about running kubectl proxy like this, but my search results drown in results about running it to access a remote cluster from a local environment, i.e. not at all what I'm after.
I include these details not because I think they're relevant, but because they might be even though I'm convinced they're not:
*) a Linkerd ingress controller, but I think that's irrelevant
**) in this case, the "working" state is that the ingress controller complains that the destination is unknown because there's no matching ingress rule, while the "not working" state is a network timeout.
namely between running kubectl proxy from within a pod vs running it in a different pod.
Assuming your cluster has an software defined network, such as flannel or calico, a Pod has its own IP and all containers within a Pod share the same networking space. Thus:
containers:
- name: c0
command: ["curl", "127.0.0.1:8001"]
- name: c1
command: ["kubectl", "proxy", "-p", "8001"]
will work, whereas in a DaemonSet, they are by definition not in the same Pod and thus the hypothetical c0 above would need to use the DaemonSet's Pod's IP to contact 8001. That story is made more complicated by the fact that kubectl proxy by default only listens on 127.0.0.1, so you would need to alter the DaemonSet's Pod's kubectl proxy to include --address='0.0.0.0' --accept-hosts='.*' to even permit such cross-Pod communication. I believe you also need to declare the ports: array in the DaemonSet configuration, since you are now exposing that port into the cluster, but I'd have to double-check whether ports: is merely polite, or is actually required.

Restart pods when configmap updates in Kubernetes?

How do I automatically restart Kubernetes pods and pods associated with deployments when their configmap is changed/updated?
I know there's been talk about the ability to automatically restart pods when a config maps changes but to my knowledge this is not yet available in Kubernetes 1.2.
So what (I think) I'd like to do is a "rolling restart" of the deployment resource associated with the pods consuming the config map. Is it possible, and if so how, to force a rolling restart of a deployment in Kubernetes without changing anything in the actual template? Is this currently the best way to do it or is there a better option?
The current best solution to this problem (referenced deep in https://github.com/kubernetes/kubernetes/issues/22368 linked in the sibling answer) is to use Deployments, and consider your ConfigMaps to be immutable.
When you want to change your config, create a new ConfigMap with the changes you want to make, and point your deployment at the new ConfigMap. If the new config is broken, the Deployment will refuse to scale down your working ReplicaSet. If the new config works, then your old ReplicaSet will be scaled to 0 replicas and deleted, and new pods will be started with the new config.
Not quite as quick as just editing the ConfigMap in place, but much safer.
Signalling a pod on config map update is a feature in the works (https://github.com/kubernetes/kubernetes/issues/22368).
You can always write a custom pid1 that notices the confimap has changed and restarts your app.
You can also eg: mount the same config map in 2 containers, expose a http health check in the second container that fails if the hash of config map contents changes, and shove that as the liveness probe of the first container (because containers in a pod share the same network namespace). The kubelet will restart your first container for you when the probe fails.
Of course if you don't care about which nodes the pods are on, you can simply delete them and the replication controller will "restart" them for you.
The best way I've found to do it is run Reloader
It allows you to define configmaps or secrets to watch, when they get updated, a rolling update of your deployment is performed. Here's an example:
You have a deployment foo and a ConfigMap called foo-configmap. You want to roll the pods of the deployment every time the configmap is changed. You need to run Reloader with:
kubectl apply -f https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
Then specify this annotation in your deployment:
kind: Deployment
metadata:
annotations:
configmap.reloader.stakater.com/reload: "foo-configmap"
name: foo
...
Helm 3 doc page
Often times configmaps or secrets are injected as configuration files in containers. Depending on the application a restart may be required should those be updated with a subsequent helm upgrade, but if the deployment spec itself didn't change the application keeps running with the old configuration resulting in an inconsistent deployment.
The sha256sum function can be used together with the include function to ensure a deployments template section is updated if another spec changes:
kind: Deployment
spec:
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
[...]
In my case, for some reasons, $.Template.BasePath didn't work but $.Chart.Name does:
spec:
replicas: 1
template:
metadata:
labels:
app: admin-app
annotations:
checksum/config: {{ include (print $.Chart.Name "/templates/" $.Chart.Name "-configmap.yaml") . | sha256sum }}
You can update a metadata annotation that is not relevant for your deployment. it will trigger a rolling-update
for example:
spec:
template:
metadata:
annotations:
configmap-version: 1
If k8>1.15; then doing a rollout restart worked best for me as part of CI/CD with App configuration path hooked up with a volume-mount. A reloader plugin or setting restartPolicy: Always in deployment manifest YML did not work for me. No application code changes needed, worked for both static assets as well as Microservice.
kubectl rollout restart deployment/<deploymentName> -n <namespace>
Had this problem where the Deployment was in a sub-chart and the values controlling it were in the parent chart's values file. This is what we used to trigger restart:
spec:
template:
metadata:
annotations:
checksum/config: {{ tpl (toYaml .Values) . | sha256sum }}
Obviously this will trigger restart on any value change but it works for our situation. What was originally in the child chart would only work if the config.yaml in the child chart itself changed:
checksum/config: {{ include (print $.Template.BasePath "/config.yaml") . | sha256sum }}
Consider using kustomize (or kubectl apply -k) and then leveraging it's powerful configMapGenerator feature. For example, from: https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/configmapgenerator/
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# Just one example of many...
- name: my-app-config
literals:
- JAVA_HOME=/opt/java/jdk
- JAVA_TOOL_OPTIONS=-agentlib:hprof
# Explanation below...
- SECRETS_VERSION=1
Then simply reference my-app-config in your deployments. When building with kustomize, it'll automatically find and update references to my-app-config with an updated suffix, e.g. my-app-config-f7mm6mhf59.
Bonus, updating secrets: I also use this technique for forcing a reload of secrets (since they're affected in the same way). While I personally manage my secrets completely separately (using Mozilla sops), you can bundle a config map alongside your secrets, so for example in your deployment:
# ...
spec:
template:
spec:
containers:
- name: my-app
image: my-app:tag
envFrom:
# For any NON-secret environment variables. Name is automatically updated by Kustomize
- configMapRef:
name: my-app-config
# Defined separately OUTSIDE of Kustomize. Just modify SECRETS_VERSION=[number] in the my-app-config ConfigMap
# to trigger an update in both the config as well as the secrets (since the pod will get restarted).
- secretRef:
name: my-app-secrets
Then, just add a variable like SECRETS_VERSION into your ConfigMap like I did above. Then, each time you change my-app-secrets, just increment the value of SECRETS_VERSION, which serves no other purpose except to trigger a change in the kustomize'd ConfigMap name, which should also result in a restart of your pod. So then it becomes:
I also banged my head around this problem for some time and wished to solve this in an elegant but quick way.
Here are my 20 cents:
The answer using labels as mentioned here won't work if you are updating labels. But would work if you always add labels. More details here.
The answer mentioned here is the most elegant way to do this quickly according to me but had the problem of handling deletes. I am adding on to this answer:
Solution
I am doing this in one of the Kubernetes Operator where only a single task is performed in one reconcilation loop.
Compute the hash of the config map data. Say it comes as v2.
Create ConfigMap cm-v2 having labels: version: v2 and product: prime if it does not exist and RETURN. If it exists GO BELOW.
Find all the Deployments which have the label product: prime but do not have version: v2, If such deployments are found, DELETE them and RETURN. ELSE GO BELOW.
Delete all ConfigMap which has the label product: prime but does not have version: v2 ELSE GO BELOW.
Create Deployment deployment-v2 with labels product: prime and version: v2 and having config map attached as cm-v2 and RETURN, ELSE Do nothing.
That's it! It looks long, but this could be the fastest implementation and is in principle with treating infrastructure as Cattle (immutability).
Also, the above solution works when your Kubernetes Deployment has Recreate update strategy. Logic may require little tweaks for other scenarios.
How do I automatically restart Kubernetes pods and pods associated
with deployments when their configmap is changed/updated?
If you are using configmap as Environment you have to use the external option.
Reloader
Kube watcher
Configurator
Kubernetes auto-reload the config map if it's mounted as volume (If subpath there it won't work with that).
When a ConfigMap currently consumed in a volume is updated, projected
keys are eventually updated as well. The kubelet checks whether the
mounted ConfigMap is fresh on every periodic sync. However, the
kubelet uses its local cache for getting the current value of the
ConfigMap. The type of the cache is configurable using the
ConfigMapAndSecretChangeDetectionStrategy field in the
KubeletConfiguration struct. A ConfigMap can be either propagated by
watch (default), ttl-based, or by redirecting all requests directly to
the API server. As a result, the total delay from the moment when the
ConfigMap is updated to the moment when new keys are projected to the
Pod can be as long as the kubelet sync period + cache propagation
delay, where the cache propagation delay depends on the chosen cache
type (it equals to watch propagation delay, ttl of cache, or zero
correspondingly).
Official document : https://kubernetes.io/docs/concepts/configuration/configmap/#mounted-configmaps-are-updated-automatically
ConfigMaps consumed as environment variables are not updated automatically and require a pod restart.
Simple example Configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: config
namespace: default
data:
foo: bar
POD config
spec:
containers:
- name: configmaptestapp
image: <Image>
volumeMounts:
- mountPath: /config
name: configmap-data-volume
ports:
- containerPort: 8080
volumes:
- name: configmap-data-volume
configMap:
name: config
Example : https://medium.com/#harsh.manvar111/update-configmap-without-restarting-pod-56801dce3388
Adding the immutable property to the config map totally avoids the problem. Using config hashing helps in a seamless rolling update but it does not help in a rollback. You can take a look at this open-source project - 'Configurator' - https://github.com/gopaddle-io/configurator.git .'Configurator' works by the following using the custom resources :
Configurator ties the deployment lifecycle with the configMap. When
the config map is updated, a new version is created for that
configMap. All the deployments that were attached to the configMap
get a rolling update with the latest configMap version tied to it.
When you roll back the deployment to an older version, it bounces to
configMap version it had before doing the rolling update.
This way you can maintain versions to the config map and facilitate rolling and rollback to your deployment along with the config map.
Another way is to stick it into the command section of the Deployment:
...
command: [ "echo", "
option = value\n
other_option = value\n
" ]
...
Alternatively, to make it more ConfigMap-like, use an additional Deployment that will just host that config in the command section and execute kubectl create on it while adding an unique 'version' to its name (like calculating a hash of the content) and modifying all the deployments that use that config:
...
command: [ "/usr/sbin/kubectl-apply-config.sh", "
option = value\n
other_option = value\n
" ]
...
I'll probably post kubectl-apply-config.sh if it ends up working.
(don't do that; it looks too bad)