Velero - Restore Partially fails for volumes provisioned using CSI driver - kubernetes

As part of POC, I am trying to backup and restore volumes provisioned by the GKE CSI driver in the same GKE cluster. However, the restore fails with no logs to debug.
Steps:
Create volume snapshot class: kubectl create -f vsc.yaml
# vsc.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-gce-vsc
labels:
"velero.io/csi-volumesnapshot-class": "true"
driver: pd.csi.storage.gke.io
deletionPolicy: Delete
Create storage class: kubectl create -f sc.yaml
# sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pd-example
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: pd-standard
Create namespace: kubectl create namespace csi-app
Create a persistent volume claim: kubectl create -f pvc.yaml
# pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: podpvc
namespace: csi-app
spec:
accessModes:
- ReadWriteOnce
storageClassName: pd-example
resources:
requests:
storage: 6Gi
Create a pod to consume the pvc: kubectl create -f pod.yaml
# pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: web-server
namespace: csi-app
spec:
containers:
- name: web-server
image: nginx
volumeMounts:
- mountPath: /var/lib/www/html
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: podpvc
readOnly: false
Once the pvc is bound, I created the velero backup.
velero backup create test --include-resources=pvc,pv --include-namespaces=csi-app --wait
Output:
Backup request "test" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe test` and `velero backup logs test`.
velero describe backup test
Name: test
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.21.5-gke.1302
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=21
Phase: Completed
Errors: 0
Warnings: 1
Namespaces:
Included: csi-app
Excluded: <none>
Resources:
Included: pvc, pv
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2021-12-22 15:40:08 +0300 +03
Completed: 2021-12-22 15:40:10 +0300 +03
Expiration: 2022-01-21 15:40:08 +0300 +03
Total items to be backed up: 2
Items backed up: 2
Velero-Native Snapshots: <none included>
After the backup is created, I verified the backup was created and was available in my GCS bucket.
Delete all the existing resources to test restore.
kubectl delete -f pod.yaml
kubectl delete -f pvc.yaml
kubectl delete -f sc.yaml
kubectl delete namespace csi-app
Run restore command:
velero restore create --from-backup test --wait
Output:
Restore request "test-20211222154302" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.
Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.
What did you expect to happen:
I was expecting the pv, pvc and the namespace get restored.
The following information will help us better understand what's going on:
velero debug --backup test --restore test-20211222154302 command is stuck for more than 10 minutes and I couldn't generate the support bundle.
Output:
2021/12/22 15:45:16 Collecting velero resources in namespace: velero
2021/12/22 15:45:24 Collecting velero deployment logs in namespace: velero
2021/12/22 15:45:28 Collecting log and information for backup: test
Environment:
Velero version (use velero version):
Client:
Version: v1.7.1
Git commit: -
Server:
Version: v1.7.1
Velero features (use velero client config get features):
features:
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5-gke.1302", GitCommit:"639f3a74abf258418493e9b75f2f98a08da29733", GitTreeState:"clean", BuildDate:"2021-10-21T21:35:48Z", GoVersion:"go1.16.7b7", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version:
GKE 1.21.5-gke.1302
Cloud provider or hardware configuration:
GCP
OS (e.g. from /etc/os-release):
GCP Container-Optimized OS (COS)

You should be able to check the logs as mentioned:
Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.
The latter is available after the restore is completed, check for errors in it and it should show what went wrong.
Since you were doing a PV/PVC backup with CSI, you should have Velero setup to support it:
https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html
Depending on the plugin you used, it might have been a bug, like :
https://github.com/vmware-tanzu/velero-plugin-for-csi/pull/122
This should be fixed in the latest 0.3.2 release for example:
https://github.com/vmware-tanzu/velero-plugin-for-csi/releases/tag/v0.3.2
So start with :
velero restore logs test-20211222154302
and go from there. Update the question with the findings and if you resolved it please, thank you.

Related

How to achieve Automatic Rollback in Kubernetes?

Let's say I've a deployment. For some reason it's not responding after sometime. Is there any way to tell Kubernetes to rollback to previous version automatically on failure?
You mentioned that:
I've a deployment. For some reason it's not responding after sometime.
In this case, you can use liveness and readiness probes:
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The above probes may prevent you from deploying a corrupted version, however liveness and readiness probes aren't able to rollback your Deployment to the previous version. There was a similar issue on Github, but I am not sure there will be any progress on this matter in the near future.
If you really want to automate the rollback process, below I will describe a solution that you may find helpful.
This solution requires running kubectl commands from within the Pod.
In short, you can use a script to continuously monitor your Deployments, and when errors occur you can run kubectl rollout undo deployment DEPLOYMENT_NAME.
First, you need to decide how to find failed Deployments. As an example, I'll check Deployments that perform the update for more than 10s with the following command:
NOTE: You can use a different command depending on your need.
kubectl rollout status deployment ${deployment} --timeout=10s
To constantly monitor all Deployments in the default Namespace, we can create a Bash script:
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
We want to run this script from inside the Pod, so I converted it to ConfigMap which will allow us to mount this script in a volume (see: Using ConfigMaps as files from a Pod):
$ cat check-script-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
$ kubectl apply -f check-script-configmap.yml
configmap/check-script created
I've created a separate deployment-checker ServiceAccount with the edit Role assigned and our Pod will run under this ServiceAccount:
NOTE: I've created a Deployment instead of a single Pod.
$ cat all-in-one.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: deployment-checker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: deployment-checker-binding
subjects:
- kind: ServiceAccount
name: deployment-checker
namespace: default
roleRef:
kind: ClusterRole
name: edit
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: deployment-checker
name: deployment-checker
spec:
selector:
matchLabels:
app: deployment-checker
template:
metadata:
labels:
app: deployment-checker
spec:
serviceAccountName: deployment-checker
volumes:
- name: check-script
configMap:
name: check-script
containers:
- image: bitnami/kubectl
name: test
command: ["bash", "/mnt/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt
After applying the above manifest, the deployment-checker Deployment was created and started monitoring Deployment resources in the default Namespace:
$ kubectl apply -f all-in-one.yaml
serviceaccount/deployment-checker created
clusterrolebinding.rbac.authorization.k8s.io/deployment-checker-binding created
deployment.apps/deployment-checker created
$ kubectl get deploy,pod | grep "deployment-checker"
deployment.apps/deployment-checker 1/1 1
pod/deployment-checker-69c8896676-pqg9h 1/1 Running
Finally, we can check how it works. I've created three Deployments (app-1, app-2, app-3):
$ kubectl create deploy app-1 --image=nginx
deployment.apps/app-1 created
$ kubectl create deploy app-2 --image=nginx
deployment.apps/app-2 created
$ kubectl create deploy app-3 --image=nginx
deployment.apps/app-3 created
Then I changed the image for the app-1 to the incorrect one (nnnginx):
$ kubectl set image deployment/app-1 nginx=nnnginx
deployment.apps/app-1 image updated
In the deployment-checker logs we can see that the app-1 has been rolled back to the previous version:
$ kubectl logs -f deployment-checker-69c8896676-pqg9h
...
====== Thu Oct 7 09:20:15 UTC 2021 ======
Ok: app-1
Ok: app-2
Ok: app-3
====== Thu Oct 7 09:21:16 UTC 2021 ======
Error: app-1 - rolling back!
deployment.apps/app-1 rolled back
Ok: app-2
Ok: app-3
I stumbled upon Argo Rollout which addresses this non automatic rollback and many other deployment related things.
https://argoproj.github.io/argo-rollouts/

Kubernetes: Cannot deploy a simple "Couchbase" service

I am new to Kubernetes I am trying to mimic a behavior a bit like what I do with docker-compose when I serve a Couchbase database in a docker container.
couchbase:
image: couchbase
volumes:
- ./couchbase:/opt/couchbase/var
ports:
- 8091-8096:8091-8096
- 11210-11211:11210-11211
I managed to create a cluster in my localhost using a tool called "kind"
kind create cluster --name my-cluster
kubectl config use-context my-cluster
Then I am trying to use that cluster to deploy a Couchbase service
I created a file named couchbase.yaml with the following content (I am trying to mimic what I do with my docker-compose file).
apiVersion: apps/v1
kind: Deployment
metadata:
name: couchbase
namespace: my-project
labels:
platform: couchbase
spec:
replicas: 1
selector:
matchLabels:
platform: couchbase
template:
metadata:
labels:
platform: couchbase
spec:
volumes:
- name: couchbase-data
hostPath:
# directory location on host
path: /home/me/my-project/couchbase
# this field is optional
type: Directory
containers:
- name: couchbase
image: couchbase
volumeMounts:
- mountPath: /opt/couchbase/var
name: couchbase-data
Then I start the deployment like this:
kubectl create namespace my-project
kubectl apply -f couchbase.yaml
kubectl expose deployment -n my-project couchbase --type=LoadBalancer --port=8091
However my deployment never actually start:
kubectl get deployments -n my-project couchbase
NAME READY UP-TO-DATE AVAILABLE AGE
couchbase 0/1 1 0 6m14s
And when I look for the logs I see this:
kubectl logs -n my-project -lplatform=couchbase --all-containers=true
Error from server (BadRequest): container "couchbase" in pod "couchbase-589f7fc4c7-th2r2" is waiting to start: ContainerCreating
As OP mentioned in a comment, issue was solved using extra mount as explained in documentation: https://kind.sigs.k8s.io/docs/user/configuration/#extra-mounts
Here is OP's comment but formated so it's more readable:
the error shows up when I run this command:
kubectl describe pods -n my-project couchbase
I could fix it by creating a new kind cluster:
kind create cluster --config cluster.yaml
Passing this content in cluster.yaml:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: inf
nodes:
- role: control-plane
extraMounts:
- hostPath: /home/me/my-project/couchbase
containerPath: /couchbase
In couchbase.yaml the path becomes path: /couchbase of course.

Longhorn backup/snapshot hooks

How do I configure Longhorn backup so it executes some bash scripts in the pod before and after snapshot/backup is taken?
Something similar to Velero's backup hooks.
annotations:
backup.velero.io/backup-volumes: data
pre.hook.backup.velero.io/command: "['/usr/bin/mysql', '-e', '\"flush tables with read lock;\"']"
pre.hook.backup.velero.io/container: mysql
post.hook.backup.velero.io/command: "['/usr/bin/mysql', '-e', '\"unlock tables;\"']"
post.hook.backup.velero.io/container: mysql
Apparently not possible at the moment, according to the longhorn github issue.
You can orchestrate similar behaviour by using volume snapshot
kubectl exec mypod-id -- app_freeze
kubectl apply -f volumesnapshot.yaml
kubectl exec mypod-id -- app_thaw
Where volumesnapshot.yaml is:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: my-longhorn-snapshot
spec:
volumeSnapshotClassName: longhorn
source:
persistentVolumeClaimName: my-longhorn-pvc
See example for IRIS database: https://community.intersystems.com/post/amazon-eks-and-iris-high-availability-and-backup

PersistentVolumeClaim not being restored using velero

I have the following values set to my velero configuration that was installed using helm.
schedules:
my-schedule:
schedule: "5 * * * *"
template:
includeClusterResources: true
includedNamespaces:
- jenkins
includedResources:
- 'pvcs'
storageLocation: backups
snapshotVolumes: true
ttl: 24h0m0s
I had a PVC (and an underlying PV that had been dynamically provisioned) which I manually deleted (alongside with the PV).
I then performed a velero restore (pointing to a backup taken prior to PV/PVC deletions of course) as in:
velero restore create --from-backup velero-hourly-backup-20201119140005 --include-resources persistentvolumeclaims -n extra-services
extra-services is the namespace where velero is deployed btw.
Although the logs indicate the restore was successful:
▶ velero restore logs velero-hourly-backup-20201119140005-20201119183805 -n extra-services
time="2020-11-19T16:38:06Z" level=info msg="starting restore" logSource="pkg/controller/restore_controller.go:467" restore=extra-services/velero-hourly-backup-20201119140005-20201119183805
time="2020-11-19T16:38:06Z" level=info msg="Starting restore of backup extra-services/velero-hourly-backup-20201119140005" logSource="pkg/restore/restore.go:363" restore=extra-services/velero-hourly-backup-20201119140005-20201119183805
time="2020-11-19T16:38:06Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:482" restore=extra-services/velero-hourly-backup-20201119140005-20201119183805
I see the following error in the restore description:
Name: velero-hourly-backup-20201119140005-20201119183805
Namespace: extra-services
Labels: <none>
Annotations: <none>
Phase: PartiallyFailed (run 'velero restore logs velero-hourly-backup-20201119140005-20201119183805' for more information)
Started: 2020-11-19 18:38:05 +0200 EET
Completed: 2020-11-19 18:38:07 +0200 EET
Errors:
Velero: error parsing backup contents: directory "resources" does not exist
Cluster: <none>
Namespaces: <none>
Backup: velero-hourly-backup-20201119140005
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: persistentvolumeclaims
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Any ideas?
Does this having to do with me deleting PV/PVC? (after all I tried to simulate a disaster situation)
I have both backupsEnabled and snapshotsEnabled set to true.

helm init Error: error installing: deployments.extensions is forbidden when run inside gitlab runner

I have Gitlab (11.8.1) (self-hosted) connected to self-hosted K8s Cluster (1.13.4). There're 3 projects in gitlab name shipment, authentication_service and shipment_mobile_service.
All projects add the same K8s configuration exception project namespace.
The first project is successful when install Helm Tiller and Gitlab Runner in Gitlab UI.
The second and third projects only install Helm Tiller success, Gitlab Runner error with log in install runner pod:
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Error: cannot connect to Tiller
+ sleep 1s
+ echo 'Retrying (30)...'
+ helm repo add runner https://charts.gitlab.io
Retrying (30)...
"runner" has been added to your repositories
+ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "runner" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈
+ helm upgrade runner runner/gitlab-runner --install --reset-values --tls --tls-ca-cert /data/helm/runner/config/ca.pem --tls-cert /data/helm/runner/config/cert.pem --tls-key /data/helm/runner/config/key.pem --version 0.2.0 --set 'rbac.create=true,rbac.enabled=true' --namespace gitlab-managed-apps -f /data/helm/runner/config/values.yaml
Error: UPGRADE FAILED: remote error: tls: bad certificate
I don't config gitlab-ci with K8s cluster on first project, only setup for the second and third. The weird thing is with the same helm-data (only different by name), the second run success but the third is not.
And because there only one gitlab runner available (from the first project), I assign both 2nd and 3rd project to this runner.
I use this gitlab-ci.yml for both 2 projects with only different name in helm upgrade command.
stages:
- test
- build
- deploy
variables:
CONTAINER_IMAGE: dockerhub.linhnh.vn/${CI_PROJECT_PATH}:${CI_PIPELINE_ID}
CONTAINER_IMAGE_LATEST: dockerhub.linhnh.vn/${CI_PROJECT_PATH}:latest
CI_REGISTRY: dockerhub.linhnh.vn
DOCKER_DRIVER: overlay2
DOCKER_HOST: tcp://localhost:2375 # required when use dind
# test phase and build phase using docker:dind success
deploy_beta:
stage: deploy
image: alpine/helm
script:
- echo "Deploy test start ..."
- helm init --upgrade
- helm upgrade --install --force shipment-mobile-service --recreate-pods --set image.tag=${CI_PIPELINE_ID} ./helm-data
- echo "Deploy test completed!"
environment:
name: staging
tags: ["kubernetes_beta"]
only:
- master
The helm-data is very simple so I think don't really need to paste here.
Here is the log when second project deploy success:
Running with gitlab-runner 11.7.0 (8bb608ff)
on runner-gitlab-runner-6c8555c86b-gjt9f XrmajZY2
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image linkyard/docker-helm ...
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-15-concurrent-0x2bms to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-15-concurrent-0x2bms to be running, status is Pending
Running on runner-xrmajzy2-project-15-concurrent-0x2bms via runner-gitlab-runner-6c8555c86b-gjt9f...
Cloning into '/root/authentication_service'...
Cloning repository...
Checking out 5068bf1f as master...
Skipping Git submodules setup
$ echo "Deploy start ...."
Deploy start ....
$ helm init --upgrade --dry-run --debug
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: helm
name: tiller
name: tiller-deploy
namespace: kube-system
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: helm
name: tiller
spec:
automountServiceAccountToken: true
containers:
- env:
- name: TILLER_NAMESPACE
value: kube-system
- name: TILLER_HISTORY_MAX
value: "0"
image: gcr.io/kubernetes-helm/tiller:v2.13.0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /liveness
port: 44135
initialDelaySeconds: 1
timeoutSeconds: 1
name: tiller
ports:
- containerPort: 44134
name: tiller
- containerPort: 44135
name: http
readinessProbe:
httpGet:
path: /readiness
port: 44135
initialDelaySeconds: 1
timeoutSeconds: 1
resources: {}
status: {}
---
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
app: helm
name: tiller
name: tiller-deploy
namespace: kube-system
spec:
ports:
- name: tiller
port: 44134
targetPort: tiller
selector:
app: helm
name: tiller
type: ClusterIP
status:
loadBalancer: {}
...
$ helm upgrade --install --force authentication-service --recreate-pods --set image.tag=${CI_PIPELINE_ID} ./helm-data
WARNING: Namespace "gitlab-managed-apps" doesn't match with previous. Release will be deployed to default
Release "authentication-service" has been upgraded. Happy Helming!
LAST DEPLOYED: Tue Mar 26 05:27:51 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
authentication-service 1/1 1 1 17d
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
authentication-service-966c997c4-mglrb 0/1 Pending 0 0s
authentication-service-966c997c4-wzrkj 1/1 Terminating 0 49m
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
authentication-service NodePort 10.108.64.133 <none> 80:31340/TCP 17d
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services authentication-service)
echo http://$NODE_IP:$NODE_PORT
$ echo "Deploy completed"
Deploy completed
Job succeeded
And the third project fail:
Running with gitlab-runner 11.7.0 (8bb608ff)
on runner-gitlab-runner-6c8555c86b-gjt9f XrmajZY2
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image alpine/helm ...
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-xrmajzy2-project-18-concurrent-0bv4bx to be running, status is Pending
Running on runner-xrmajzy2-project-18-concurrent-0bv4bx via runner-gitlab-runner-6c8555c86b-gjt9f...
Cloning repository...
Cloning into '/canhnv5/shipmentmobile'...
Checking out 278cbd3d as master...
Skipping Git submodules setup
$ echo "Deploy test start ..."
Deploy test start ...
$ helm init --upgrade
Creating /root/.helm
Creating /root/.helm/repository
Creating /root/.helm/repository/cache
Creating /root/.helm/repository/local
Creating /root/.helm/plugins
Creating /root/.helm/starters
Creating /root/.helm/cache/archive
Creating /root/.helm/repository/repositories.yaml
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /root/.helm.
Error: error installing: deployments.extensions is forbidden: User "system:serviceaccount:shipment-mobile-service:shipment-mobile-service-service-account" cannot create resource "deployments" in API group "extensions" in the namespace "kube-system"
ERROR: Job failed: command terminated with exit code 1
I could see they use the same runner XrmajZY2 that I install in the first project, same k8s namespace gitlab-managed-apps.
I think they use privilege mode but don't know why the second can get the right permission, and the third can not? Should I create user system:serviceaccount:shipment-mobile-service:shipment-mobile-service-service-account and assign to cluster-admin?
Thanks to #cookiedough's instruction. I do these steps:
Fork the canhv5/shipment-mobile-service into my root account root/shipment-mobile-service.
Delete gitlab-managed-apps namespace without anything inside, run kubectl delete -f gitlab-admin-service-account.yaml.
Apply this file then get the token as #cookiedough guide.
Back to root/shipment-mobile-service in Gitlab, Remove previous Cluster. Add Cluster back with new token. Install Helm Tiller then Gitlab Runner in Gitlab UI.
Re run the job then the magic happens. But I still unclear why canhv5/shipment-mobile-service still get the same error.
Before you do the following, delete the gitlab-managed-apps namespace:
kubectl delete namespace gitlab-managed-apps
Reciting from the GitLab tutorial you will need to create a serviceaccount and clusterrolebinding got GitLab, and you will need the secret created as a result to connect your project to your cluster as a result.
Create a file called gitlab-admin-service-account.yaml with contents:
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-admin
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: gitlab-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: gitlab-admin
namespace: kube-system
Apply the service account and cluster role binding to your cluster:
kubectl apply -f gitlab-admin-service-account.yaml
Output:
serviceaccount "gitlab-admin" created
clusterrolebinding "gitlab-admin" created
Retrieve the token for the gitlab-admin service account:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep gitlab-admin | awk '{print $1}')
Copy the <authentication_token> value from the output:
Name: gitlab-admin-token-b5zv4
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name=gitlab-admin
kubernetes.io/service-account.uid=bcfe66ac-39be-11e8-97e8-026dce96b6e8
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1025 bytes
namespace: 11 bytes
token: <authentication_token>
Follow this tutorial to connect your cluster to the project, otherwise you will have to stitch up the same thing along the way with a lot more pain!