velero with velero-plugin-for-aws backup jobs failed - kubernetes

k3s cluster.
I have used velero helm installation:
helm install vmware-tanzu/velero --namespace velero-minio -f helm-custom-values-minio.yaml --generate-name --create-namespace
and
helm install vmware-tanzu/velero --namespace velero-aws -f helm-custom-values-aws.yaml --generate-name --create-namespace
Custom helm values:
helm-custom-values-minio.yaml
configuration:
provider: aws
backupStorageLocation:
bucket: k3s-backup
name: minio
default: false
config:
region: minio
s3ForcePathStyle: true
s3Url: http://10.10.5.15:9009
volumeSnapshotLocation:
name: minio
config:
region: minio
credentials:
secretContents:
cloud: |
[default]
aws_access_key_id=minioadm
aws_secret_access_key=<password>
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
snapshotsEnabled: true
deployRestic: true
and helm-custom-values-aws.yaml
configuration:
provider: aws
backupStorageLocation:
name: aws-s3
bucket: k3s-backup-aws
default: false
provider: aws
config:
region: us-east-1
s3ForcePathStyle: false
volumeSnapshotLocation:
name: aws-s3
provider: aws
config:
region: us-east-1
credentials:
secretContents:
cloud: |
[default]
aws_access_key_id=A..............MJ
aws_secret_access_key=qZ79rA/yVUq2c................xnIA
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
snapshotsEnabled: true
deployRestic: true
velero backup jobs:
velero create backup k3s-mongodb-restic-minio --include-namespaces mongodb --default-volumes-to-restic=true --storage-location minio -n velero-minio
velero create backup k3s-mongodb-restic-aws --include-namespaces mongodb --default-volumes-to-restic=true --storage-location aws-s3 -n velero-aws
....
They all failed:
Restic Backups:
Failed:
mongodb/mongodb-cluster-0: agent-scripts, data-volume, healthstatus, hooks, logs-volume, mongodb-cluster-keyfile, tmp
mongodb/mongodb-cluster-1: agent-scripts, data-volume, healthstatus, hooks, logs-volume, mongodb-cluster-keyfile, tmp
time="2022-10-17T17:42:32Z" level=error msg="Error backing up item" backup=velero-minio/k3s-mongodb-restic-minio error="pod volume backup failed: running Restic backup, stderr=Fatal: unable to open config file: Stat: The Access Key Id you provided does not exist in our records.\nIs there a repository at the following location?\ns3:http://10.10.5.15:9009/k3s-backup/restic/mongodb\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=mongodb-cluster-0
...
velero get backup-locations -n velero-aws
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT
aws-s3 aws k3s-backup-aws Available 2022-10-17 14:12:46 -0400 EDT ReadWrite
...
velero get backup-locations -n velero-minio
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT
minio aws k3s-backup Available 2022-10-17 14:16:25 -0400 EDT ReadWrite
velero backup part works without errors but restic fails for all my jobs (mongodb is the only example).
It looks like the restic can't create snapshots for my nfs pvc.
What am I doing wrong?

It looks like velero doesn't work with multiple installations, at least the restic part fails (in my case, two instances in name spaces velero-aws and velero-minio).
So, I installed only one instance of velero to work with minio.
Removed --default-volumes-to-restic=true from the backup job configuration.
Used opt-in pod volume backup with the restic integration.
Each pod that has pvc volume needs to be annotated, like the following:
kubectl -n mongodb annotate pod/mongodb-cluster-0 backup.velero.io/backup-volumes=logs-volume,data-volume
I have not tried velero-pvc-watcher, probably it works well
Now backup works with no errors.

Related

Skaffold deploys an extra pod when deploying with helm

I have skaffold build that creates 2 docker images - myimg1 and myimg2.
When I try to deploy via helm, skaffold deploys them fine, then tries to deploy and additional pod for myimg1. I can see that in minikube dashboard. At first see myimg1:<tag> but in a few seconds that changes to myimg1-0
deploy:
kubectl:
manifests:
- k8s/pgadmin.yaml
kubeContext: minikube
helm:
releases:
- name: myrelease
chartPath: charts/mychart
artifactOverrides:
myimg1.container.image: myimg1
- name: myrelease
chartPath: charts/mychart
artifactOverrides:
myimg2.container.image: myimg2
Values.yam looks like this:
myimg1:
name: myimg1
container:
image: jfrog.host.com/docker_repo/myimg1:latest
pullPolicy: IfNotPresent
port: 5432
service:
type: ClusterIP
port: 5432
myimg2:
name: myimag2
container:
image: jfrog.host.com/docker_repo/myimg1:latest
pullPolicy: IfNotPresent
port: 8080
service:
type: ClusterIP
port: 8080
portName: http
Now, if I run helm manually and override the image tags via command line arguments, it deploys fine:
helm install --atomic --debug myrelease ./charts/mychart/ -f ./charts/mychart/values.yaml --set-string myimg1.container.image=myimg1:latest --set-string myimg2.container.image: myimg2:latest
When I run skaffold dev however, both containers are deployed, then the second image is deployed again and it tries to pulls from the remote registry.
As far as I can tell everything is set up fine:
Is there anyway to debug this?
I see the following log message:
time="2022-10-07T10:40:50-04:00" level=info msg="Streaming logs from pod: myimg1-0 container: myimg1" subtask=-1 task=DevLoop
time="2022-10-07T10:40:50-04:00" level=debug msg="Running command: [kubectl --context minikube logs --since=7s -f myimg1-0 -c myrelease --namespace default]" subtask=-1 task=DevLoop

Process json logs with Grafana/loki

I have set up Grafana, Prometheus and loki (2.6.1) as follows on my kubernetes (1.21) cluster:
helm upgrade --install promtail grafana/promtail -n monitoring -f monitoring/promtail.yaml
helm upgrade --install prom prometheus-community/kube-prometheus-stack -n monitoring --values monitoring/prom.yaml
helm upgrade --install loki grafana/loki -n monitoring --values monitoring/loki.yaml
with:
# monitoring/loki.yaml
loki:
schemaConfig:
configs:
- from: 2020-09-07
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: loki_index_
period: 24h
storageConfig:
aws:
s3: s3://eu-west-3/cluster-loki-logs
boltdb_shipper:
shared_store: filesystem
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168h
# monitoring/promtail.yaml
config:
serverPort: 80
clients:
- url: http://loki:3100/loki/api/v1/push
# monitoring/prom.yaml
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector:
matchLabels:
monitored: "true"
grafana:
sidecar:
datasources:
defaultDatasourceEnabled: true
additionalDataSources:
- name: Loki
type: loki
url: http://loki.monitoring:3100
I get data from my containers, but, whenever I have a container logging in json format, I can't get access to the nested fields:
{app="product", namespace="api-dev"} | unpack | json
Yields:
My aim is, for example, to filter by log.severity
Actually, following this answer, it occurs to be a promtail scraping issue.
The current (promtail-6.3.1 / 2.6.1) helm chart default is to have cri as pipeline's stage, which expects this kind of logs:
"2019-04-30T02:12:41.8443515Z stdout xx message"
I should have use docker, which expects json, consequently, my promtail.yaml changed to:
config:
serverPort: 80
clients:
- url: http://loki:3100/loki/api/v1/push
snippets:
pipelineStages:
- docker: {}

Kubernetes Pod permission denied on local volume

I have created a pod on Kubernetes and mounted a local volume but when I try to execute the ls command on locally mounted volume, I get a permission denied error. If I disable SELINUX then everything works fine. I am unable to make out how do I make it work with SELinux enabled.
Following is the output of permission denied:
kubectl apply -f testpod.yaml
root#olcne-operator-ol8 opc]# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/testpod 1/1 Running 0 5s
# kubectl exec -i -t testpod /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root#testpod /]# cd /u01
[root#testpod u01]# ls
ls: cannot open directory '.': Permission denied
[root#testpod u01]#
Following is the testpod.yaml
cat testpod.yaml
kind: Pod
apiVersion: v1
metadata:
name: testpod
labels:
name: testpod
spec:
hostname: testpod
restartPolicy: Never
volumes:
- name: swvol
hostPath:
path: /u01
containers:
- name: testpod
image: oraclelinux:8
imagePullPolicy: Always
securityContext:
privileged: false
command: [/usr/sbin/init]
volumeMounts:
- mountPath: "/u01"
name: swvol
Selinux Configuration on worker node:
# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Memory protection checking: actual (secure)
Max kernel policy version: 31
---
# semanage fcontext -l | grep kub | grep container_file
/var/lib/kubelet/pods(/.*)? all files system_u:object_r:container_file_t:s0
/var/lib/kubernetes/pods(/.*)? all files system_u:object_r:container_file_t:s0
Machine OS Details
rpm -qa | grep kube
kubectl-1.20.6-2.el8.x86_64
kubernetes-cni-0.8.1-1.el8.x86_64
kubeadm-1.20.6-2.el8.x86_64
kubelet-1.20.6-2.el8.x86_64
kubernetes-cni-plugins-0.9.1-1.el8.x86_64
----
cat /etc/oracle-release
Oracle Linux Server release 8.4
---
uname -r
5.4.17-2102.203.6.el8uek.x86_64
This is a community wiki answer posted for better visibility. Feel free to expand it.
SELinux labels can be assigned with seLinuxOptions:
apiVersion: v1
metadata:
name: testpod
labels:
name: testpod
spec:
hostname: testpod
restartPolicy: Never
volumes:
- name: swvol
hostPath:
path: /u01
containers:
- name: testpod
image: oraclelinux:8
imagePullPolicy: Always
command: [/usr/sbin/init]
volumeMounts:
- mountPath: "/u01"
name: swvol
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
From the official documentation:
seLinuxOptions: Volumes that support SELinux labeling are relabeled
to be accessible by the label specified under seLinuxOptions.
Usually you only need to set the level section. This sets the
Multi-Category Security (MCS) label given to all Containers in the Pod
as well as the Volumes.
Based on the information from the original post on stackoverflow:
You can only specify the level portion of an SELinux label when relabeling a path destination pointed to by a hostPath volume. This
is automatically done so by the seLinuxOptions.level attribute
specified in your securityContext.
However attributes such as seLinuxOptions.type currently have no
effect on volume relabeling. As of this writing, this is still an
open issue within
Kubernetes

error: the server doesn't have resource type "svc"

Admins-MacBook-Pro:~ Harshin$ kubectl cluster-info
Kubernetes master is running at http://localhost:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
error: the server doesn't have a resource type "services"
i am following this document
https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html?refid=gs_card
while i am trying to test my configuration in step 11 of configure kubectl for amazon eks
apiVersion: v1
clusters:
- cluster:
server: ...
certificate-authority-data: ....
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: aws
name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: heptio-authenticator-aws
args:
- "token"
- "-i"
- "kunjeti"
# - "-r"
# - "<role-arn>"
# env:
# - name: AWS_PROFILE
# value: "<aws-profile>"
Change "name: kubernetes" to actual name of your cluster.
Here is what I did to work it through....
1.Enabled verbose to make sure config files are read properly.
kubectl get svc --v=10
2.Modified the file as below:
apiVersion: v1
clusters:
- cluster:
server: XXXXX
certificate-authority-data: XXXXX
name: abc-eks
contexts:
- context:
cluster: abc-eks
user: aws
name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
command: aws-iam-authenticator
args:
- "token"
- "-i"
- "abc-eks"
# - "-r"
# - "<role-arn>"
env:
- name: AWS_PROFILE
value: "aws"
I have faced a similar issue, however this is not a direct solution but workaround. Use AWS cli commands to create cluster rather than console. As per documentation, the user or role which creates cluster will have master access.
aws eks create-cluster --name <cluster name> --role-arn <EKS Service Role> --resources-vpc-config subnetIds=<subnet ids>,securityGroupIds=<security group id>
Make sure that EKS Service Role has IAM access(I have given Full however AssumeRole will do I guess).
The EC2 machine Role should have eks:CreateCluster and IAM access. Worked for me :)
I had this issue and found it was caused default key setting in ~/.aws/credentials.
We have a few AWS accounts for different customers plus a sandbox account for our own testing and research. So our credentials file looks something like this:
[default]
aws_access_key_id = abc
aws_secret_access_key = xyz
region=us-east-1
[cpproto]
aws_access_key_id = abc
aws_secret_access_key = xyz
region=us-east-1
[sandbox]
aws_access_key_id = abc
aws_secret_access_key = xyz
region=us-east-1
I was messing around our sandbox account but the [default] section was pointing to another account.
Once I put the keys for sandbox into the default section the "kubectl get svc" command worked fine.
Seems we need a way to tell aws-iam-authenticator which keys to use same as --profile in the aws cli.
I guess you should uncomment "env" item and change your refer to ~/.aws/credentials
Because your aws_iam_authenticator requires exact AWS credentials.
Refer this document: https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html
To have the AWS IAM Authenticator for Kubernetes always use a specific named AWS credential profile (instead of the default AWS credential provider chain), uncomment the env lines and substitute with the profile name to use.

Deploying Node.js apps with Kubernetes

I was trying to deploy a very basic Express app, a small server listening on 8080 on a EC2 server (Ubuntu 16.04) following this tutorial. On that server, it was created a Kubernetes cluster through kops 1.8.0.
After that, I created a Dockerfile like the following:
FROM node:carbon
ENV NPM_CONFIG_PREFIX=/home/node/.npm-global
ENV PATH=$PATH:/home/node/.npm-global/bin
# Create app directory
WORKDIR /usr/src/app
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm#5+)
COPY package*.json ./
RUN npm install
# Bundle app source
COPY . .
EXPOSE 8080
CMD [ "node", "server.js" ]
# At the end, set the user to use when running this image
USER node
After that, I built the image with docker build -t ccastelli/stupid_server:test1, I specified my credentials with docker login -u ccastelli, I copied the imaged ID from docker images, tagged it docker tag c549618dcd86 org/test:first_try and pushed with docker push org/test on a private repository in cloud.docker.com.
After that I created a cluster secret with kubectl create secret docker-registry ccastelli-regcred --docker-server=docker.com --docker-username=ccastelli --docker-password='pass' --docker-email=myemail#gmail.com
After that I created a deployment file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: stupid-server-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: stupid-server
spec:
containers:
- name: stupid-server
image: org/test:first_try
imagePullPolicy: Always
ports:
- containerPort: 8080
imagePullSecrets:
- name: ccastelli-regcred
I see from kubectl get pods that the image transitioned from ErrPullImage to ImagePullBackOff and it's not ready. Anyway the docker container was working on the client instance but not in the cluster. At this point, I'm a bit lost. What am I doing wrong?
Thanks
Edit: message error:
Failed to pull image "org/test:first_try": rpc error: code =
Unknown desc = Error response from daemon: repository pycomio/test not
found: does not exist or no pull access
your --docker-server should be index.docker.io
DOCKER_REGISTRY_SERVER=https://index.docker.io/v1/
DOCKER_USER=Type your dockerhub username, same as when you `docker login`
DOCKER_EMAIL=Type your dockerhub email, same as when you `docker login`
DOCKER_PASSWORD=Type your dockerhub pw, same as when you `docker login`
kubectl create secret docker-registry myregistrykey \
--docker-server=$DOCKER_REGISTRY_SERVER \
--docker-username=$DOCKER_USER \
--docker-password=$DOCKER_PASSWORD \
--docker-email=$DOCKER_EMAIL