Syncing Prometheus and Kubewatch with Kubernetes Cluster - kubernetes

I want to get all events that occurred in Kubernetes cluster in some python dictionary using maybe some API to extract data from the events that occurred in the past. I found on internet that it is possible by storing all data of Kube-watch on Prometheus and later accessing it. I am unable to figure out how to set it up and see all past pod events in python. Any alternative solutions to access past events are also appreciated. Thanks!

I'll describe a solution that is not complicated and I think meets all your requirements.
There are tools such as Eventrouter that take Kubernetes events and push them to a user specified sink. However, as you mentioned, you only need Pods events, so I suggest a slightly different approach.
In short, you can run the kubectl get events --watch command from within a Pod and collect the output from that command using a log aggregation system like Loki.
Below, I will provide a detailed step-by-step explanation.
1. Running kubectl command from within a Pod
To display only Pod events, you can use:
$ kubectl get events --watch --field-selector involvedObject.kind=Pod
We want to run this command from within a Pod. For security reasons, I've created a separate events-collector ServiceAccount with the view Role assigned and our Pod will run under this ServiceAccount.
NOTE: I've created a Deployment instead of a single Pod.
$ cat all-in-one.yml
apiVersion: v1
kind: ServiceAccount
metadata:
name: events-collector
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: events-collector-binding
subjects:
- kind: ServiceAccount
name: events-collector
namespace: default
roleRef:
kind: ClusterRole
name: view
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: events-collector
name: events-collector
spec:
selector:
matchLabels:
app: events-collector
template:
metadata:
labels:
app: events-collector
spec:
serviceAccountName: events-collector
containers:
- image: bitnami/kubectl
name: test
command: ["kubectl"]
args: ["get","events", "--watch", "--field-selector", "involvedObject.kind=Pod"]
After applying the above manifest, the event-collector was created and collects Pod events as expected:
$ kubectl apply -f all-in-one.yml
serviceaccount/events-collector created
clusterrolebinding.rbac.authorization.k8s.io/events-collector-binding created
deployment.apps/events-collector created
$ kubectl get deploy,pod | grep events-collector
deployment.apps/events-collector 1/1 1 1 14s
pod/events-collector-d98d6c5c-xrltj 1/1 Running 0 14s
$ kubectl logs -f events-collector-d98d6c5c-xrltj
LAST SEEN TYPE REASON OBJECT MESSAGE
77s Normal Scheduled pod/app-1-5d9ccdb595-m9d5n Successfully assigned default/app-1-5d9ccdb595-m9d5n to gke-cluster-2-default-pool-8505743b-brmx
76s Normal Pulling pod/app-1-5d9ccdb595-m9d5n Pulling image "nginx"
71s Normal Pulled pod/app-1-5d9ccdb595-m9d5n Successfully pulled image "nginx" in 4.727842954s
70s Normal Created pod/app-1-5d9ccdb595-m9d5n Created container nginx
70s Normal Started pod/app-1-5d9ccdb595-m9d5n Started container nginx
73s Normal Scheduled pod/app-2-7747dcb588-h8j4q Successfully assigned default/app-2-7747dcb588-h8j4q to gke-cluster-2-default-pool-8505743b-p7qt
72s Normal Pulling pod/app-2-7747dcb588-h8j4q Pulling image "nginx"
67s Normal Pulled pod/app-2-7747dcb588-h8j4q Successfully pulled image "nginx" in 4.476795932s
66s Normal Created pod/app-2-7747dcb588-h8j4q Created container nginx
66s Normal Started pod/app-2-7747dcb588-h8j4q Started container nginx
2. Installing Loki
You can install Loki to store logs and process queries. Loki is like Prometheus, but for logs :). The easiest way to install Loki is to use the grafana/loki-stack Helm chart:
$ helm repo add grafana https://grafana.github.io/helm-charts
"grafana" has been added to your repositories
$ helm repo update
...
Update Complete. ⎈Happy Helming!⎈
$ helm upgrade --install loki grafana/loki-stack
$ kubectl get pods | grep loki
loki-0 1/1 Running 0 76s
loki-promtail-hm8kn 1/1 Running 0 76s
loki-promtail-nkv4p 1/1 Running 0 76s
loki-promtail-qfrcr 1/1 Running 0 76s
3. Querying Loki with LogCLI
You can use the LogCLI tool to run LogQL queries against a Loki server. Detailed information on installing and using this tool can be found in the LogCLI documentation. I'll demonstrate how to install it on Linux:
$ wget https://github.com/grafana/loki/releases/download/v2.2.1/logcli-linux-amd64.zip
$ unzip logcli-linux-amd64.zip
Archive: logcli-linux-amd64.zip
inflating: logcli-linux-amd64
$ mv logcli-linux-amd64 logcli
$ sudo cp logcli /bin/
$ whereis logcli
logcli: /bin/logcli
To query the Loki server from outside the Kubernetes cluster, you may need to expose it using the Ingress resource:
$ cat ingress.yml
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
name: loki-ingress
spec:
rules:
- http:
paths:
- backend:
serviceName: loki
servicePort: 3100
path: /
$ kubectl apply -f ingress.yml
ingress.networking.k8s.io/loki-ingress created
$ kubectl get ing
NAME CLASS HOSTS ADDRESS PORTS AGE
loki-ingress <none> * <PUBLIC_IP> 80 19s
Finally, I've created a simple python script that we can use to query the Loki server:
NOTE: We need to set the LOKI_ADDR environment variable as described in the documentation. You need to replace the <PUBLIC_IP> with your Ingress IP.
$ cat query_loki.py
#!/usr/bin/env python3
import os
os.environ['LOKI_ADDR'] = "http://<PUBLIC_IP>"
os.system("logcli query '{app=\"events-collector\"}'")
$ ./query_loki.py
...
2021-07-02T10:33:01Z {} 2021-07-02T10:33:01.626763464Z stdout F 0s Normal Pulling pod/backend-app-5d99cf4b-c9km4 Pulling image "nginx"
2021-07-02T10:33:00Z {} 2021-07-02T10:33:00.836755152Z stdout F 0s Normal Scheduled pod/backend-app-5d99cf4b-c9km4 Successfully assigned default/backend-app-5d99cf4b-c9km4 to gke-cluster-1-default-pool-328bd2b1-288w
2021-07-02T10:33:00Z {} 2021-07-02T10:33:00.649954267Z stdout F 0s Normal Started pod/web-app-6fcf9bb7b8-jbrr9 Started container nginx2021-07-02T10:33:00Z {} 2021-07-02T10:33:00.54819851Z stdout F 0s Normal Created pod/web-app-6fcf9bb7b8-jbrr9 Created container nginx
2021-07-02T10:32:59Z {} 2021-07-02T10:32:59.414571562Z stdout F 0s Normal Pulled pod/web-app-6fcf9bb7b8-jbrr9 Successfully pulled image "nginx" in 4.228468876s
...

Related

ingress-nginx wasn't installed properly?

Now I'm using WSL 2 and Docker Desktop on Windows 10.
I created an YAML script to create an ingress for my microservices like below.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-srv
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: posts.com
http:
paths:
- path: /posts
pathType: Prefix
backend:
service:
name: posts-clusterip-srv
port:
number: 4000
And I installed ingress-nginx by following this installation guide
I ran this command in the guide.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/cloud/deploy.yaml
But when I ran kubectl get pods --namespace=ingress-nginx, ingress-nginx-controller shows ImageInspectError
And when I ran the command kubectl apply -f ingress-srv.yaml, it showed an error message.
Can anyone please let me know what the issue is?
I removed the namespace ingress-nginx using this command kubectl delete all --all -n ingress-nginx and ran the deploy script again.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/cloud/deploy.yaml
But the issue still happened.
There is an issue deploying ingress-nginx controller. You need to first fix issues there before deploying the ingress. Because, only the nginx controller knows how to handle the ingress resources.
Since there is no much info about the controller deployment failure, you better add more details about the error. You can describe the controller pod and share its event and status to look into this further.
It was because of the corrupted filesystem.
When I ran the ingress-nginx deployment command, there was a docker-desktop crash because of the lack of drive storage size.
So I removed all corrupted, unused or dangling docker images.
docker system prune
Also I deleted ingress-nginx and reinstalled.
kubectl delete -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/cloud/deploy.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/cloud/deploy.yaml
After that, it worked well.
kubectl get pods --namespace=ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-tgkfx 0/1 Completed 0 74m
ingress-nginx-admission-patch-28l7q 0/1 Completed 3 74m
ingress-nginx-controller-7844b9db77-4dfvb 1/1 Running 0 74m

How to achieve Automatic Rollback in Kubernetes?

Let's say I've a deployment. For some reason it's not responding after sometime. Is there any way to tell Kubernetes to rollback to previous version automatically on failure?
You mentioned that:
I've a deployment. For some reason it's not responding after sometime.
In this case, you can use liveness and readiness probes:
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The above probes may prevent you from deploying a corrupted version, however liveness and readiness probes aren't able to rollback your Deployment to the previous version. There was a similar issue on Github, but I am not sure there will be any progress on this matter in the near future.
If you really want to automate the rollback process, below I will describe a solution that you may find helpful.
This solution requires running kubectl commands from within the Pod.
In short, you can use a script to continuously monitor your Deployments, and when errors occur you can run kubectl rollout undo deployment DEPLOYMENT_NAME.
First, you need to decide how to find failed Deployments. As an example, I'll check Deployments that perform the update for more than 10s with the following command:
NOTE: You can use a different command depending on your need.
kubectl rollout status deployment ${deployment} --timeout=10s
To constantly monitor all Deployments in the default Namespace, we can create a Bash script:
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
We want to run this script from inside the Pod, so I converted it to ConfigMap which will allow us to mount this script in a volume (see: Using ConfigMaps as files from a Pod):
$ cat check-script-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
$ kubectl apply -f check-script-configmap.yml
configmap/check-script created
I've created a separate deployment-checker ServiceAccount with the edit Role assigned and our Pod will run under this ServiceAccount:
NOTE: I've created a Deployment instead of a single Pod.
$ cat all-in-one.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: deployment-checker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: deployment-checker-binding
subjects:
- kind: ServiceAccount
name: deployment-checker
namespace: default
roleRef:
kind: ClusterRole
name: edit
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: deployment-checker
name: deployment-checker
spec:
selector:
matchLabels:
app: deployment-checker
template:
metadata:
labels:
app: deployment-checker
spec:
serviceAccountName: deployment-checker
volumes:
- name: check-script
configMap:
name: check-script
containers:
- image: bitnami/kubectl
name: test
command: ["bash", "/mnt/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt
After applying the above manifest, the deployment-checker Deployment was created and started monitoring Deployment resources in the default Namespace:
$ kubectl apply -f all-in-one.yaml
serviceaccount/deployment-checker created
clusterrolebinding.rbac.authorization.k8s.io/deployment-checker-binding created
deployment.apps/deployment-checker created
$ kubectl get deploy,pod | grep "deployment-checker"
deployment.apps/deployment-checker 1/1 1
pod/deployment-checker-69c8896676-pqg9h 1/1 Running
Finally, we can check how it works. I've created three Deployments (app-1, app-2, app-3):
$ kubectl create deploy app-1 --image=nginx
deployment.apps/app-1 created
$ kubectl create deploy app-2 --image=nginx
deployment.apps/app-2 created
$ kubectl create deploy app-3 --image=nginx
deployment.apps/app-3 created
Then I changed the image for the app-1 to the incorrect one (nnnginx):
$ kubectl set image deployment/app-1 nginx=nnnginx
deployment.apps/app-1 image updated
In the deployment-checker logs we can see that the app-1 has been rolled back to the previous version:
$ kubectl logs -f deployment-checker-69c8896676-pqg9h
...
====== Thu Oct 7 09:20:15 UTC 2021 ======
Ok: app-1
Ok: app-2
Ok: app-3
====== Thu Oct 7 09:21:16 UTC 2021 ======
Error: app-1 - rolling back!
deployment.apps/app-1 rolled back
Ok: app-2
Ok: app-3
I stumbled upon Argo Rollout which addresses this non automatic rollback and many other deployment related things.
https://argoproj.github.io/argo-rollouts/

Replicaset doesnot update pods in when pod image is modified

I have created a replicaset with wrong container image with below configuration.
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: rs-d33393
namespace: default
spec:
replicas: 4
selector:
matchLabels:
name: busybox-pod
template:
metadata:
labels:
name: busybox-pod
spec:
containers:
- command:
- sh
- -c
- echo Hello Kubernetes! && sleep 3600
image: busyboxXXXXXXX
name: busybox-container
Pods Information:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rs-d33393-5hnfx 0/1 InvalidImageName 0 11m
rs-d33393-5rt5m 0/1 InvalidImageName 0 11m
rs-d33393-ngw78 0/1 InvalidImageName 0 11m
rs-d33393-vnpdh 0/1 InvalidImageName 0 11m
After this, i try to edit the image inside replicaset using kubectl edit replicasets.extensions rs-d33393 and update image as busybox.
Now, i am expecting pods to be recreated with proper image as part of replicaset.
This has not been the exact result.
Can someone please explain, why it is so?
Thanks :)
With ReplicaSets directly you have to kill the old pod, so the new ones will be created with the right image.
If you would be using a Deployment, and you should, changing the image would force the pod to be re-created.
Replicaset does not support updates. As long as required number of pods exist matching the selector labels, replicaset's jobs is done. You should use Deployment instead.
https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
From the docs:
To update Pods to a new spec in a controlled way, use a Deployment, as
ReplicaSets do not support a rolling update directly.
Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to Pods. Therefore, it is recommend to use Deployments instead of directly using ReplicaSets unless you don’t require updates at all. ( i.e. one may never need to manipulate ReplicaSet objects when using a Deployment)
Its easy to perform rolling updates and rollbacks when deployed using deployments.
$ kubectl create deployment busybox --image=busyboxxxxxxx --dry-run -o yaml > busybox.yaml
$ cat busybox.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: busybox
name: busybox
spec:
replicas: 1
selector:
matchLabels:
app: busybox
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: busybox
spec:
containers:
- image: busyboxxxxxxx
name: busyboxxxxxxx
ubuntu#dlv-k8s-cluster-master:~$ kubectl create -f busybox.yaml --record=true
deployment.apps/busybox created
Check rollout history
ubuntu#dlv-k8s-cluster-master:~$ kubectl rollout history deployment busybox
deployment.apps/busybox
REVISION CHANGE-CAUSE
1 kubectl create --filename=busybox.yaml --record=true
Update image on deployment
ubuntu#dlv-k8s-cluster-master:~$ kubectl set image deployment.app/busybox *=busybox --record
deployment.apps/busybox image updated
ubuntu#dlv-k8s-cluster-master:~$ kubectl rollout history deployment busybox
deployment.apps/busybox
REVISION CHANGE-CAUSE
1 kubectl create --filename=busybox.yaml --record=true
2 kubectl set image deployment.app/busybox *=busybox --record=true
Rollback Deployment
ubuntu#dlv-k8s-cluster-master:~$ kubectl rollout undo deployment busybox
deployment.apps/busybox rolled back
ubuntu#dlv-k8s-cluster-master:~$ kubectl rollout history deployment busybox
deployment.apps/busybox
REVISION CHANGE-CAUSE
2 kubectl set image deployment.app/busybox *=busybox --record=true
3 kubectl create --filename=busybox.yaml --record=true
You could use
k scale rs new-replica-set --replicas=0
and then
k scale rs new-replica-set --replicas=<Your number of replicas>
Edit the replicaset(assuming its called replicaset.yaml) file with command:
kubectl edit rs replicaset
edit the image name in the editor
save the file
exit the editor
Now , you will need to either delete the replica sets or delete the existing pods:
kubectl delete rs new-replica-set
kubectl delete pod pod_1 pod_2 pod_2 pod_4
replicaset should spin up new pods with new image.

Snapshot of Hostpath volume in kubernetes example clarification

I have a K8s cluster inside Azure VMs, running Ubuntu 18.
Cluster was provisioned using conjure-up.
I am trying to test the kubernetes snapshot feature. Trying to follow the steps here:
https://github.com/kubernetes-incubator/external-storage/blob/master/snapshot/doc/examples/hostpath/README.md
While i can follow most instructions on the page, not sure of what this specific command does:
"_output/bin/snapshot-controller -kubeconfig=${HOME}/.kube/config"
directly executing this instruction doesnt work as such.
Can anyone explain what this does and how to run this part successfully?
Or better yet point to a complete walk-through if it exists.
Update
Tried out steps from
https://github.com/kubernetes-incubator/external-storage/tree/master/snapshot/deploy/kubernetes/hostpath
Commented out below line since not using RBAC
# serviceAccountName: snapshot-controller-runner
Then deployed using
kubectl create -f deployment.yaml
kubectl create -f pv.yaml
kubectl create -f pvc.yaml
kubectl create -f snapshot.yaml
These yaml are from examples 'as is':
github.com/kubernetes-incubator/external-storage/blob/master/snapshot/doc/examples/hostpath/
kubectl describe volumesnapshot snapshot-demo Name: snapshot-demo
Namespace: default
Labels: SnapshotMetadata-PVName=hostpath-pv
SnapshotMetadata-Timestamp=1555999582450832931
Annotations: <none>
API Version: volumesnapshot.external-storage.k8s.io/v1
Kind: VolumeSnapshot
Metadata:
Creation Timestamp: 2019-04-23T05:56:05Z
Generation: 2
Resource Version: 261433
Self Link: /apis/volumesnapshot.external-storage.k8s.io/v1/namespaces/default/volumesnapshots/snapshot-demo
UID: 7b89194a-658c-11e9-86b2-000d3a07ff79
Spec:
Persistent Volume Claim Name: hostpath-pvc
Snapshot Data Name:
Status:
Conditions: <nil>
Creation Timestamp: <nil>
Events: <none>
the snapshot resource is created however the volumesnapshotdata is NOT created.
kubectl get volumesnapshotdata
No resources found.
kubectl get crd
NAME CREATED AT
volumesnapshotdatas.volumesnapshot.external-storage.k8s.io 2019-04-21T04:18:54Z
volumesnapshots.volumesnapshot.external-storage.k8s.io 2019-04-21T04:18:54Z
kubectl get pod
NAME READY STATUS RESTARTS AGE
azure 1/1 Running 2 2d21h
azure-2 1/1 Running 2 2d20h
snapshot-controller-5d798696ff-qsh6m 2/2 Running 2 14h
ls /tmp/test/
data
Enabled featuregate for volume snapshot
cat /var/snap/kube-apiserver/924/args
--advertise-address="192.168.0.4"
--min-request-timeout="300"
--etcd-cafile="/root/cdk/etcd/client-ca.pem"
--etcd-certfile="/root/cdk/etcd/client-cert.pem"
--etcd-keyfile="/root/cdk/etcd/client-key.pem"
--etcd-servers="https://192.168.0.4:2379"
--storage-backend="etcd3"
--tls-cert-file="/root/cdk/server.crt"
--tls-private-key-file="/root/cdk/server.key"
--insecure-bind-address="127.0.0.1"
--insecure-port="8080"
--audit-log-maxbackup="9"
--audit-log-maxsize="100"
--audit-log-path="/root/cdk/audit/audit.log"
--audit-policy-file="/root/cdk/audit/audit-policy.yaml"
--basic-auth-file="/root/cdk/basic_auth.csv"
--client-ca-file="/root/cdk/ca.crt"
--requestheader-allowed-names="system:kube-apiserver"
--requestheader-client-ca-file="/root/cdk/ca.crt"
--requestheader-extra-headers-prefix="X-Remote-Extra-"
--requestheader-group-headers="X-Remote-Group"
--requestheader-username-headers="X-Remote-User"
--service-account-key-file="/root/cdk/serviceaccount.key"
--token-auth-file="/root/cdk/known_tokens.csv"
--authorization-mode="AlwaysAllow"
--admission-control="NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota"
--allow-privileged=true
--enable-aggregator-routing
--kubelet-certificate-authority="/root/cdk/ca.crt"
--kubelet-client-certificate="/root/cdk/client.crt"
--kubelet-client-key="/root/cdk/client.key"
--kubelet-preferred-address-types="[InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP]"
--proxy-client-cert-file="/root/cdk/client.crt"
--proxy-client-key-file="/root/cdk/client.key"
--service-cluster-ip-range="10.152.183.0/24"
--logtostderr
--v="4"
--feature-gates="VolumeSnapshotDataSource=true"
What am i missing here?
I think everything you need is already present here: https://github.com/kubernetes-incubator/external-storage/tree/master/snapshot/deploy/kubernetes/hostpath
There is one YAML for deployment of snapshot controller and one YAML for snapshotter RBAC rules.

helm init Error: error installing: deployments.extensions is forbidden when run inside gitlab runner

I have Gitlab (11.8.1) (self-hosted) connected to self-hosted K8s Cluster (1.13.4). There're 3 projects in gitlab name shipment, authentication_service and shipment_mobile_service.
All projects add the same K8s configuration exception project namespace.
The first project is successful when install Helm Tiller and Gitlab Runner in Gitlab UI.
The second and third projects only install Helm Tiller success, Gitlab Runner error with log in install runner pod:
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Error: cannot connect to Tiller
+ sleep 1s
+ echo 'Retrying (30)...'
+ helm repo add runner https://charts.gitlab.io
Retrying (30)...
"runner" has been added to your repositories
+ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "runner" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈
+ helm upgrade runner runner/gitlab-runner --install --reset-values --tls --tls-ca-cert /data/helm/runner/config/ca.pem --tls-cert /data/helm/runner/config/cert.pem --tls-key /data/helm/runner/config/key.pem --version 0.2.0 --set 'rbac.create=true,rbac.enabled=true' --namespace gitlab-managed-apps -f /data/helm/runner/config/values.yaml
Error: UPGRADE FAILED: remote error: tls: bad certificate
I don't config gitlab-ci with K8s cluster on first project, only setup for the second and third. The weird thing is with the same helm-data (only different by name), the second run success but the third is not.
And because there only one gitlab runner available (from the first project), I assign both 2nd and 3rd project to this runner.
I use this gitlab-ci.yml for both 2 projects with only different name in helm upgrade command.
stages:
- test
- build
- deploy
variables:
CONTAINER_IMAGE: dockerhub.linhnh.vn/${CI_PROJECT_PATH}:${CI_PIPELINE_ID}
CONTAINER_IMAGE_LATEST: dockerhub.linhnh.vn/${CI_PROJECT_PATH}:latest
CI_REGISTRY: dockerhub.linhnh.vn
DOCKER_DRIVER: overlay2
DOCKER_HOST: tcp://localhost:2375 # required when use dind
# test phase and build phase using docker:dind success
deploy_beta:
stage: deploy
image: alpine/helm
script:
- echo "Deploy test start ..."
- helm init --upgrade
- helm upgrade --install --force shipment-mobile-service --recreate-pods --set image.tag=${CI_PIPELINE_ID} ./helm-data
- echo "Deploy test completed!"
environment:
name: staging
tags: ["kubernetes_beta"]
only:
- master
The helm-data is very simple so I think don't really need to paste here.
Here is the log when second project deploy success:
Running with gitlab-runner 11.7.0 (8bb608ff)
on runner-gitlab-runner-6c8555c86b-gjt9f XrmajZY2
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image linkyard/docker-helm ...
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-15-concurrent-0x2bms to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-15-concurrent-0x2bms to be running, status is Pending
Running on runner-xrmajzy2-project-15-concurrent-0x2bms via runner-gitlab-runner-6c8555c86b-gjt9f...
Cloning into '/root/authentication_service'...
Cloning repository...
Checking out 5068bf1f as master...
Skipping Git submodules setup
$ echo "Deploy start ...."
Deploy start ....
$ helm init --upgrade --dry-run --debug
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: helm
name: tiller
name: tiller-deploy
namespace: kube-system
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: helm
name: tiller
spec:
automountServiceAccountToken: true
containers:
- env:
- name: TILLER_NAMESPACE
value: kube-system
- name: TILLER_HISTORY_MAX
value: "0"
image: gcr.io/kubernetes-helm/tiller:v2.13.0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /liveness
port: 44135
initialDelaySeconds: 1
timeoutSeconds: 1
name: tiller
ports:
- containerPort: 44134
name: tiller
- containerPort: 44135
name: http
readinessProbe:
httpGet:
path: /readiness
port: 44135
initialDelaySeconds: 1
timeoutSeconds: 1
resources: {}
status: {}
---
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
app: helm
name: tiller
name: tiller-deploy
namespace: kube-system
spec:
ports:
- name: tiller
port: 44134
targetPort: tiller
selector:
app: helm
name: tiller
type: ClusterIP
status:
loadBalancer: {}
...
$ helm upgrade --install --force authentication-service --recreate-pods --set image.tag=${CI_PIPELINE_ID} ./helm-data
WARNING: Namespace "gitlab-managed-apps" doesn't match with previous. Release will be deployed to default
Release "authentication-service" has been upgraded. Happy Helming!
LAST DEPLOYED: Tue Mar 26 05:27:51 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
authentication-service 1/1 1 1 17d
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
authentication-service-966c997c4-mglrb 0/1 Pending 0 0s
authentication-service-966c997c4-wzrkj 1/1 Terminating 0 49m
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
authentication-service NodePort 10.108.64.133 <none> 80:31340/TCP 17d
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services authentication-service)
echo http://$NODE_IP:$NODE_PORT
$ echo "Deploy completed"
Deploy completed
Job succeeded
And the third project fail:
Running with gitlab-runner 11.7.0 (8bb608ff)
on runner-gitlab-runner-6c8555c86b-gjt9f XrmajZY2
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image alpine/helm ...
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Running on runner-xrmajzy2-project-18-concurrent-0bv4bx via runner-gitlab-runner-6c8555c86b-gjt9f...
Cloning repository...
Cloning into '/canhnv5/shipmentmobile'...
Checking out 278cbd3d as master...
Skipping Git submodules setup
$ echo "Deploy test start ..."
Deploy test start ...
$ helm init --upgrade
Creating /root/.helm
Creating /root/.helm/repository
Creating /root/.helm/repository/cache
Creating /root/.helm/repository/local
Creating /root/.helm/plugins
Creating /root/.helm/starters
Creating /root/.helm/cache/archive
Creating /root/.helm/repository/repositories.yaml
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /root/.helm.
Error: error installing: deployments.extensions is forbidden: User "system:serviceaccount:shipment-mobile-service:shipment-mobile-service-service-account" cannot create resource "deployments" in API group "extensions" in the namespace "kube-system"
ERROR: Job failed: command terminated with exit code 1
I could see they use the same runner XrmajZY2 that I install in the first project, same k8s namespace gitlab-managed-apps.
I think they use privilege mode but don't know why the second can get the right permission, and the third can not? Should I create user system:serviceaccount:shipment-mobile-service:shipment-mobile-service-service-account and assign to cluster-admin?
Thanks to #cookiedough's instruction. I do these steps:
Fork the canhv5/shipment-mobile-service into my root account root/shipment-mobile-service.
Delete gitlab-managed-apps namespace without anything inside, run kubectl delete -f gitlab-admin-service-account.yaml.
Apply this file then get the token as #cookiedough guide.
Back to root/shipment-mobile-service in Gitlab, Remove previous Cluster. Add Cluster back with new token. Install Helm Tiller then Gitlab Runner in Gitlab UI.
Re run the job then the magic happens. But I still unclear why canhv5/shipment-mobile-service still get the same error.
Before you do the following, delete the gitlab-managed-apps namespace:
kubectl delete namespace gitlab-managed-apps
Reciting from the GitLab tutorial you will need to create a serviceaccount and clusterrolebinding got GitLab, and you will need the secret created as a result to connect your project to your cluster as a result.
Create a file called gitlab-admin-service-account.yaml with contents:
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-admin
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: gitlab-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: gitlab-admin
namespace: kube-system
Apply the service account and cluster role binding to your cluster:
kubectl apply -f gitlab-admin-service-account.yaml
Output:
serviceaccount "gitlab-admin" created
clusterrolebinding "gitlab-admin" created
Retrieve the token for the gitlab-admin service account:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep gitlab-admin | awk '{print $1}')
Copy the <authentication_token> value from the output:
Name: gitlab-admin-token-b5zv4
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name=gitlab-admin
kubernetes.io/service-account.uid=bcfe66ac-39be-11e8-97e8-026dce96b6e8
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1025 bytes
namespace: 11 bytes
token: <authentication_token>
Follow this tutorial to connect your cluster to the project, otherwise you will have to stitch up the same thing along the way with a lot more pain!