Azure AKS Redis deployments - kubernetes

I want to make a Redis instance for my namespaces. I use Azure AKS.I have default, dev, qa and stg namespaces.I already have deployed the Redis in default namespace but after that is impossible to make this for others.Then I tried to make another one namespace ( redis ) but the result was the same , it is still pending.
PS D:\Code\Infrastructure> kubectl -n redis get pods
NAME READY STATUS RESTARTS AGE
redis-0 0/1 Pending 0 34s
sentinel-0 0/1 Pending 0 12s
Here are the link to resources that I use: GITHUB

I deleted all existing pvc and after that I was created new storage class with name development then I referred storageClassName: development

Related

Failed pods of previous helm release are not removed automatically

I have an application Helm chart with two deployments:
app (2 pod replicas)
app-dep (1 pod replica)
app-dep has an init container that waits for the app pods (using its labels) to be ready:
initContainers:
- name: wait-for-app-pods
image: groundnuty/k8s-wait-for:v1.5.1
imagePullPolicy: Always
args:
- "pod"
- "-l app.kubernetes.io/component=app"
I am using helm to deploy an application:
helm upgrade --install --wait --create-namespace --timeout 10m0s app ./app
Revision 1 of the release app is deployed:
helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
app default 1 2023-02-03 01:10:18.796554241 +1100 AEDT deployed app-0.1.0 1.0.0
Everything goes fine probably.
After some time, one of the app pods is evicted due to the low Memory available.
These are some lines from the pod's description details:
Status: Failed
Reason: Evicted
Message: The node was low on resource: memory. Container app was using 2513780Ki, which exceeds its request of 0.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 12m kubelet The node was low on resource: memory. Container app was using 2513780Ki, which exceeds its request of 0.
Normal Killing 12m kubelet Stopping container app
Warning ExceededGracePeriod 12m kubelet Container runtime did not kill the pod within specified grace period.
Later a new pod was added automatically to match the deployment's replica count too.
But the Failed pod still remains in the namespace.
Now comes the next helm upgrade. The pods of app for release revision 2 are ready.
But the init-container of app-dep of the latest revision remains to wait for all the pods with the label app.kubernetes.io/component=app" to become ready. After 10 minutes of timeout helm release 2 is declared as failed.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
app-7595488c8f-4v42n 1/1 Running 0 7m37s
app-7595488c8f-xt4qt 1/1 Running 0 6m17s
app-86448b6cd-7fq2w 0/1 Error 0 36m
app-dep-546d897d6c-q9sw6 1/1 Running 0 38m
app-dep-cd9cfd975-w2fzn 0/1 Init:0/1 0 7m37s
ANALYSIS FOR SOLUTION:
In order to address this issue, we can try two approaches:
Approach 1:
Find and remove all the failed pods of the previous revision first, just before doing a helm upgrade.
kubectl get pods --field-selector status.phase=Failed -n default
You can do it as part of the CD pipeline or add that task as a pre-install hook job to the helm chart too.
Approach 2:
Add one more label to the pods that change on every helm upgrade ( something like helm/release-revision=2 )
Add that label also in the init-container so that it waits for the pods that have both labels.
It will then ignore the Failed pods of the previous release that have a different label.
initContainers:
- name: wait-for-app-pods
image: groundnuty/k8s-wait-for:v1.5.1
imagePullPolicy: Always
args:
- "pod"
- "-l app.kubernetes.io/component=app, helm/release-revision=2"
This approach causes a frequent updation of pod labels and therefore recreates the pod each time. Also, it is better to update the pod labels only in the deployment because as per official Kubernetes documentation of Deployment resource:
It is generally discouraged to make label selector updates
Also, there is no need to add the revision label to the selector field in the service manifest.
QUESTION:
Which approach would be better practice?
What would be the caveats and benefits of each method?
Is there any other approach to fix this issue?

AWS Kubernetes Persistent Volumes EFS

I have deployed EFS file system in AWS EKS cluster after the deployment my storage pod is up and running.
kubectl get pod -n storage
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-968445d79-g8wjr 1/1 Running 0 136m
When I'm trying deploy application pod is not not coming up its pending state 0/1 at the same time PVC is not bounded its pending state.
Here are the logs for after the actual application deployment.
I0610 13:26:11.875109 1 controller.go:987] provision "default/logs" class "efs": started
E0610 13:26:11.894816 1 controller.go:1004] provision "default/logs" class "efs": unexpected error getting claim reference: selfLink was empty, can't make reference
I'm using k8 version 1.20 could you please some one help me on this.
Kubernetes 1.20 stopped propagating selfLink.
There is a workaround available, but it does not always work.
After the lines
spec:
containers:
- command:
- kube-apiserver
add
- --feature-gates=RemoveSelfLink=false
then reapply API server configuration
kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml
This workaround will not work after version 1.20 (1.21 and up), as selfLink will be completely removed.
Another solution is to use newer NFS provisioner image:
gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0

CockroachDB distributed workload on all nodes

I've deployed a CockroachDB cluster on Kubernetes using this guide:
https://github.com/cockroachlabs-field/kubernetes-examples/blob/master/SECURE.md
I deployed it with
$ helm install k8crdb --set Secure.Enabled=true cockroachdb/cockroachdb --namespace=thesis-crdb
Here is how it looks when I list it with $ helm list --namespace=thesis-crdb
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
k8crdb thesis-crdb 1 2021-01-29 20:18:25.5710691 +0100 CET deployed cockroachdb-5.0.4 20.2.4
Here is how it looks when I list it with $ kubectl get all --namespace=thesis-crdb
NAME READY STATUS RESTARTS AGE
pod/k8crdb-cockroachdb-0 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-1 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-2 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-init-j2h7t 0/1 Completed 0 3h1m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/k8crdb-cockroachdb ClusterIP None <none> 26257/TCP,8080/TCP 3h1m
service/k8crdb-cockroachdb-public ClusterIP 10.99.163.201 <none> 26257/TCP,8080/TCP 3h1m
NAME READY AGE
statefulset.apps/k8crdb-cockroachdb 3/3 3h1m
NAME COMPLETIONS DURATION AGE
job.batch/k8crdb-cockroachdb-init 1/1 33s 3h1m
Now I wanna simulate traffic to this cluster. First I access the pod with: $ kubectl exec -i -t -n thesis-crdb k8crdb-cockroachdb-0 -c db "--" sh -c "clear; (bash || ash || sh)"
Which gets me inside the first pod/node.
From here I initiate the workload
[root#k8crdb-cockroachdb-0 cockroach]# cockroach workload init movr 'postgresql://root#localhost:26257?sslmode=disable'
And then I run the workload for 5 minutes
[root#k8crdb-cockroachdb-0 cockroach]# cockroach workload run movr --duration=5m 'postgresql://root#localhost:26257?sslmode=disable'
I am aware that I'm running the workload on one node, but I was under the expression that the workload would be distributed among all nodes? Because when I monitor the performance with the cockroachDB console I see that it's only the first node that is doing all the work, and the other nodes are idle.
As you can see the second (and third node) haven't had any workload at all. Is this just a visual glitch in the console? Or how can I run the workload so it get distributed evenly among all nodes in the cluster?
-UPDATE-
Yes, glad you brought up the cockroachdb-client-secure pod, because that's where I no longer could follow the guide. I tried as they did in the guide by doing: $ curl https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/client-secure.yaml | sed -e 's/serviceAccountName\: cockroachdb/serviceAccountName\: k8crdb-cockroachdb/g' | kubectl create -f -
But it throws this error:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1638 100 1638 0 0 4136 0 --:--:-- --:--:-- --:--:-- 4146
Error from server (Forbidden): error when creating "STDIN": pods "cockroachdb-client-secure" is forbidden: error looking up service account default/k8crdb-cockroachdb: serviceaccount "k8crdb-cockroachdb" not found
I also don't know if my certificates have been approved, because when I try this:
$ kubectl get csr k8crdb-cockroachdb-0 --namespace=thesis-crdb
I throws this:
Error from server (NotFound): certificatesigningrequests.certificates.k8s.io "k8crdb-cockroachdb-0" not found
And when I try to approve certificate: $ kubectl certificate approve k8crdb-cockroachdb-0 --namespace=thesis-crdb
It throws:
Error from server (NotFound): certificatesigningrequests.certificates.k8s.io "k8crdb-cockroachdb-0" not found
Any idea how to proceed from here?
This is not a glitch. Nodes will only receive SQL traffic if clients connect to them and issue SQL statements. It seems like you're running the workload by logging in to one of the cockroach pods and directing it to connect to that pod on its local port. That means only that pod is going to receive queries. The cockroach workload subcommand takes an arbitrary number of pgurl strings and will balance load over all of them. Note also that k8crdb-cockroachdb-public represents a load-balancer over all o
If you look at the guide you posted, it continues to describe how to deploy the cockroachdb-client-secure pod. Th If you were to run the workload there pointed at the load balancer, with something like:
'postgres://root#k8crdb-cockroachdb-public?sslcert=cockroach-certs%2Fclient.root.crt&sslkey=cockroach-certs%2Fclient.root.key&sslrootcert=cockroach-certs%2Fca.crt&sslmode=verify-full'
UPDATE
I'm not an expert in the k8s here but I think your issue creating the client pod relates to the namespace. It's currently assuming that everything is in the default namespace but it appears that you're working in the --namespace=thesis-crdb. Consider adding a namespace flag to the kubectl create -f - command. Or, potentially consider setting the namespace for the session:
kubectl config set-context --current --namespace=thesis-crdb

Kubernetes: Prometheus context deadline exceeded error

I have several nodejs microservices which are running dev namespace which I expose metrics and able to access via http://localhost:9187/metrics.
But when I deploy prometheus server which is running monitoring namespace I received an below error in Targets page.
Get http://1.../metrics: context deadline exceeded.
I assume none of these allow access from the namespace monitoring
so need to add an additional one into the namespace dev to allow the prometheus pod from namespace monitoring to scrape the below pod or what might be the reason of this error?
What is the best way to add netpol to my application to allow prometheus from namespace monitoring?
kubectl get netpol -n dev
myapp-api-dev app.kubernetes.io/instance=myapp-api-dev,app.kubernetes.io/name=oneapihub-api 5h33m
myapp-auth-dev app.kubernetes.io/instance=myapp-auth-dev,app.kubernetes.io/name=oneapihub-auth 56m
myapp-backend-dev app.kubernetes.io/instance=myapp-backend-dev,app.kubernetes.io/name=oneapihub-backend 5h42m
redis app=redis,release=redis 33d
kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
monitoring-prometheus-server-6cc796c4db-hp4rg 2/2 Running 0 2d4h
I guess you have kube-prometheus installed. In this case you need to create custom roles and role bindings to let Prometheus to monitor other namespaces, see here

Kubernetes monitoring service heapster keeps restarting

I am running a kubernetes cluster using azure's container engine. I have an issue with one of the kubernetes services, the one that does resource monitoring heapster. The pod is relaunched every minute or something like that. I have tried removing the heapster deployment, replicaset and pods, and recreate the deployment. It goes back the the same behaviour instantly.
When I look at the resources with the heapster label it looks a little bit weird:
$ kubectl get deploy,rs,po -l k8s-app=heapster --namespace=kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/heapster 1 1 1 1 17h
NAME DESIRED CURRENT READY AGE
rs/heapster-2708163903 1 1 1 17h
rs/heapster-867061013 0 0 0 17h
NAME READY STATUS RESTARTS AGE
po/heapster-2708163903-vvs1d 2/2 Running 0 0s
For some reason there is two replica sets. The one called rs/heapster-867061013 keeps reappearing even when I delete all of the resources and redeploy them. The above also shows that the pod just started, and this is the issue it keeps getting created then it runs for some seconds and a new one is created. I am new to running kubernetes so I am unsure which logfiles are relevant to this issue.
Logs from heapster container
heapster.go:72] /heapster source=kubernetes.summary_api:""
heapster.go:73] Heapster version v1.3.0
configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
configs.go:62] Using kubelet port 10255
heapster.go:196] Starting with Metric Sink
heapster.go:106] Starting heapster on port 8082
Logs from heapster-nanny container
pod_nanny.go:56] Invoked by [/pod_nanny --cpu=80m --extra-cpu=0.5m --memory=140Mi --extra-memory=4Mi --threshold=5 --deployment=heapster --container=heapster --poll-period=300000 --estimator=exponential]
pod_nanny.go:68] Watching namespace: kube-system, pod: heapster-2708163903-mqlsq, container: heapster.
pod_nanny.go:69] cpu: 80m, extra_cpu: 0.5m, memory: 140Mi, extra_memory: 4Mi, storage: MISSING, extra_storage: 0Gi
pod_nanny.go:110] Resources: [{Base:{i:{value:80 scale:-3} d:{Dec:<nil>} s:80m Format:DecimalSI} ExtraPerNode:{i:{value:5 scale:-4} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:146800640 scale:0} d:{Dec:<nil>} s:140Mi Format:BinarySI} ExtraPerNode:{i:{value:4194304 scale:0} d:{Dec:<nil>} s:4Mi Format:BinarySI} Name:memory}]
It is completely normal and important that the Deployment Controller keeps old ReplicaSet resources in order to do fast rollbacks.
A Deployment resource manages ReplicaSet resources. Your heapster Deployment is configured to run 1 pod - this means it will always try to create one ReplicaSet with 1 pod. In case you make an update to the Deployment (say, a new heapster version), then the Deployment resource creates a new ReplicaSet which will schedule a pod with the new version. At the same time, the old ReplicaSet resource sets its desired pods to 0, but the resource itself is still kept for easy rollbacks. As you can see, the old ReplicaSet rs/heapster-867061013 has 0 pods running. In case you make a rollback, the Deployment deploy/heapster will increase the number of pods in rs/heapster-867061013 to 1 and decrease the number in rs/heapster-2708163903 back to 0. You should also checkout the documentation about the Deployment Controller (in case you haven't done it yet).
Still, it seems odd to me why your newly created Deployment Controller would instantly create 2 ReplicaSets. Did you wait a few seconds (say, 20) after deleting the Deployment Controller and before creating a new one? For me it sometimes takes some time before deletions propagate throughout the whole cluster and if I recreate too quickly, then the same resource is reused.
Concerning the heapster pod recreation you mentioned: pods have a restartPolicy. If set to Never, the pod will be recreated by its ReplicaSet in case it exits (this means a new pod resource is created and the old one is being deleted). My guess is that your heapster pod has this Never policy set. It might exit due to some error and reach a Failed state (you need to check that with the logs). Then after a short while the ReplicaSet creates a new pod.
OK, so it happens to be a problem in the azure container service default kubernetes configuration. I got some help from an azure supporter.
The problem is fixed by adding the label addonmanager.kubernetes.io/mode: EnsureExists to the heapster deployment. Here is the pull request that the supporter referenced: https://github.com/Azure/acs-engine/pull/1133