How install Ceph on a kubernetes cluster - kubernetes

we want to use Ceph but we want to use Docker and Kubernetes to
deploy new instances of Ceph quickly.
I tried to use the default ceph docker hub: ceph/daemon-base. But I didn't work.
I tried to use the ceph-container. Seems like it doesn't work.
This is my last deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ceph3-deployment
spec:
replicas: 1
selector:
matchLabels:
app: ceph3
template:
metadata:
labels:
app: ceph3
spec:
containers:
- name: ceph
image: ceph/daemon-base:v3.0.5-stable-3.0-luminous-centos-7
resources:
limits:
memory: 512Mi
cpu: "500m"
requests:
memory: 256Mi
cpu: "250m"
volumeMounts:
- mountPath: /etc/ceph
name: etc-ceph
- mountPath: /var/lib/ceph
name: lib-ceph
volumes:
- name: etc-ceph
hostPath:
path: /etc/ceph
- name: lib-ceph
hostPath:
path: /var/lib/ceph
Does someone already install a ceph instance on Kubernetes?
I tried to follow the tutorial here
But pods not working:
pod/ceph-mds-7b49574f48-vhvtl 0/1 Pending 0 81s
pod/ceph-mon-75c49c4fd5-2cq2r 0/1 CrashLoopBackOff 3 81s
pod/ceph-mon-75c49c4fd5-6nprj 0/1 Pending 0 81s
pod/ceph-mon-75c49c4fd5-7vrp8 0/1 Pending 0 81s
pod/ceph-mon-check-5df985478b-d87rs 1/1 Running 0 81s

The common practice for deploying stateful systems on Kubernetes is to use an Operator to manage and codify the lifecycle management of the application. Rook is an operator that provides Ceph lifecycle management on Kubernetes clusters.
Documentation for using Rook to deploy Ceph clusters can be found at https://rook.io/docs/rook/v1.1/ceph-storage.html
For a basic introduction, you can use the Rook Storage Quickstart guide
The core Ceph team is highly involved in working on Rook and with the Rook community, and Rook is widely deployed within the Kubernetes community for distributed storage applications, and the Ceph Days event now has added Rook] explicitly to become Ceph + Rook Days

Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:ceph:default" cannot list resource "pods" in API group "" in the namespace "ceph".
This error means Your default user doesn't have an access to resources in default namespace, look into documentation how to deal with rbac and how to grant user accesses. There is the same error on github.
Install Ceph
To install ceph I would recommend You to use helm
Install helm
There are few ways to install helm, You can find them there.
Personally i am using helm github releases to download latest version. When answering this question latest release was 2.15.0. That's how You can install it.
1.Download helm (If you want to install current version of helm, check above helm github releases link, there might be a new version of it. Everything you need to do then is change v2.15.0 in wget and tar command to new, for example v2.16.0)
wget https://get.helm.sh/helm-v2.15.0-linux-amd64.tar.gz
2.Unpack it
tar -zxvf helm-v2.15.0-linux-amd64.tar.gz
3.Find the helm binary in the unpacked directory, and move it to its desired destination
mv linux-amd64/helm /usr/local/bin/helm
4.Install tiller
The easiest way to install tiller into the cluster is simply to run helm init. This will validate that helm’s local environment is set up correctly (and set it up if necessary). Then it will connect to whatever cluster kubectl connects to by default (kubectl config view). Once it connects, it will install tiller into the kube-system namespace.
And then follow official documentation to install Ceph.

Related

How can i set local volume using mongodb charts in k8s?

I want deploy a mongodb chart using helm on my local dev environment.
I found all the possibile values on bitnami, but is overwhelming!
How can i configure something like that:
template:
metadata:
labels:
app: mongodb
spec:
containers:
- name: mongodb
image: mongo
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-data
mountPath: /data/db/
volumes:
- name: mongo-data
hostPath:
path: /app/db
Using value.yml configuration file?
The best approach here is to deploy something like the Bitnami MongoDB chart that you reference in the question with its default options
helm install mongodb bitnami/mongodb
The chart will create a PersistentVolumeClaim for you, and a standard piece of Kubernetes called the persistent volume provisioner will create the corresponding PersistentVolume. The actual storage will be "somewhere inside Kubernetes", but for database storage there's little you can do with the actual files directly, so this isn't usually a practical problem.
If you can't use this approach, then you need to manually create the storage and then tell the chart to use it. You need to create a pair of a PersistentVolumeClaim and a PersistentVolume, for example as shown in the start of Kubernetes Persistent Volume and hostpath, and manually submit these using kubectl apply -f pv-pvc.yaml. You then need to tell the Bitnami chart about that PersistentVolume:
helm install mongodb bitnami/mongodb \
--set persistence.existingClaim=your-pvc-name
I'd avoid this sequence in a non-development environment. The cluster should normally have a persistent volume provisioner set up and so you shouldn't need to manually create PersistentVolumes, and host-path volumes are unreliable in multi-node environments (they refer to a fixed path on whichever node the pod happens to be running on, so data can get misplaced if a pod is rescheduled on a different node).
You need first to create a persistent volume claim, where it will create a persistent volume only if needed by a specific deployement( here you mondb helm chart):
kubectl -n $NAMESPACE apply -f persistent-volume-claim.yaml
For example: (or check https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
#persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-data
spec:
accessModes:
- ReadWriteMany
storageClassName: default
resources:
requests:
storage: 10Gi
Check your volume is well created
kubectl -n $NAMESPACE get pv
Now, Even if you delete your mongodb, your volume will persist and can be accessed by any other deployment

Installing RabbitMQ on Minikube

I'm currently building a backend, that among other things, involves sending RabbitMQ messages from localhost into a K8s cluster where containers can run and pickup specific messages.
So far I've been using Minikube to carry out all of my Docker and K8s development but have ran into a problem when trying to install RabbitMQ.
I've been following the RabbitMQ Cluster Operator official documentation (installing) (using). I got to the "Create a RabbitMQ Instance" section and ran into this error:
1 pod has unbound immediate persistentVolumeClaims
I fixed it by continuing with the tutorial and adding a PV and PVC into my RabbitMQCluster YAML file. Tried to apply it again and came across my next issue:
1 insufficient cpu
I've tried messing around with resource limits and requests in the YAML file but no success yet. After Googling and doing some general research I noticed that my specific problems and setup (Minikube and RabbitMQ) doesn't seem to be very popular. My question is, have I passed the scope or use case of Minikube by trying to install external services like RabbitMQ? If so what should be my next step?
If not, are there any useful tutorials out there for installing RabbitMQ in Minikube?
If it helps, here's my current YAML file for the RabbitMQCluster:
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: rabbitmq-cluster
spec:
persistence:
storageClassName: standard
storage: 5Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rabbimq-pvc
spec:
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbitmq-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: standard
hostPath:
path: /mnt/app/rabbitmq
type: DirectoryOrCreate
Edit:
Command used to start Minikube:
minikube start
Output:
πŸ˜„ minikube v1.17.1 on Ubuntu 20.04
✨ Using the docker driver based on existing profile
πŸ‘ Starting control plane node minikube in cluster minikube
πŸ”„ Restarting existing docker container for "minikube" ...
πŸŽ‰ minikube 1.18.1 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.18.1
πŸ’‘ To disable this notice, run: 'minikube config set WantUpdateNotification false'
🐳 Preparing Kubernetes v1.20.2 on Docker 20.10.2 ...
πŸ”Ž Verifying Kubernetes components...
🌟 Enabled addons: storage-provisioner, default-storageclass, dashboard
πŸ„ Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
According to the command you used to start minikube, the error is because you don't have enough resources assigned to your cluster.
According to the source code from the rabbitmq cluster operator, it seems that it needs 2CPUs.
You need to adjust the number of CPUs (and probably the memory also) when you initialize your cluster. Below is an example to start a kubernetes cluster with 4 cpus and 8G of RAM :
minikube start --cpus=4 --memory 8192
If you want to check your current allocated ressources, you can run kubectl describe node.
Raising cpus and memory definitely gets things un-stuck, but if you are running minikube on a dev machine - which is probably a laptop - this is a pretty big resource requirement. I get that Rabbit runs in some really big scenarios, but can't it be configured to work in small scenarios?

Helm 3 Deployment Order of Kubernetes Service Catalog Resources

I am using Helm v3.3.0, with a Kubernetes 1.16.
The cluster has the Kubernetes Service Catalog installed, so external services implementing the Open Service Broker API spec can be instantiated as K8S resources - as ServiceInstances and ServiceBindings.
ServiceBindings reflect as K8S Secrets and contain the binding information of the created external service. These secrets are usually mapped into the Docker containers as environment variables or volumes in a K8S Deployment.
Now I am using Helm to deploy my Kubernetes resources, and I read here that...
The [Helm] install order of Kubernetes types is given by the enumeration InstallOrder in kind_sorter.go
In that file, the order does neither mention ServiceInstance nor ServiceBinding as resources, and that would mean that Helm installs these resource types after it has installed any of its InstallOrder list - in particular Deployments. That seems to match the output of helm install --dry-run --debug run on my chart, where the order indicates that the K8S Service Catalog resources are applied last.
Question: What I cannot understand is, why my Deployment does not fail to install with Helm.
After all my Deployment resource seems to be deployed before the ServiceBinding is. And it is the Secret generated out of the ServiceBinding that my Deployment references. I would expect it to fail, since the Secret is not there yet, when the Deployment is getting installed. But that is not the case.
Is that just a timing glitch / lucky coincidence, or is this something I can rely on, and why?
Thanks!
As said in the comment I posted:
In fact your Deployment is failing at the start with Status: CreateContainerConfigError. Your Deployment is created before Secret from the ServiceBinding. It's only working as it was recreated when the Secret from ServiceBinding was available.
I wanted to give more insight with example of why the Deployment didn't fail.
What is happening (simplified in order):
Deployment -> created and spawned a Pod
Pod -> failing pod with status: CreateContainerConfigError by lack of Secret
ServiceBinding -> created Secret in a background
Pod gets the required Secret and starts
Previously mentioned InstallOrder will leave ServiceInstace and ServiceBinding as last by comment on line 147.
Example
Assuming that:
There is a working Kubernetes cluster
Helm3 installed and ready to use
Following guides:
Kubernetes.io: Instal Service Catalog using Helm
Magalix.com: Blog: Kubernetes Service Catalog
There is a Helm chart with following files in templates/ directory:
ServiceInstance
ServiceBinding
Deployment
Files:
ServiceInstance.yaml:
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
name: example-instance
spec:
clusterServiceClassExternalName: redis
clusterServicePlanExternalName: 5-0-4
ServiceBinding.yaml:
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
name: example-binding
spec:
instanceRef:
name: example-instance
Deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ubuntu
spec:
selector:
matchLabels:
app: ubuntu
replicas: 1
template:
metadata:
labels:
app: ubuntu
spec:
containers:
- name: ubuntu
image: ubuntu
command:
- sleep
- "infinity"
# part below responsible for getting secret as env variable
env:
- name: DATA
valueFrom:
secretKeyRef:
name: example-binding
key: host
Applying above resources to check what is happening can be done in 2 ways:
First method is to use timestamp from $ kubectl get RESOURCE -o yaml
Second method is to use $ kubectl get RESOURCE --watch-only=true
First method
As said previously the Pod from the Deployment couldn't start as Secret was not available when the Pod tried to spawn. After the Secret was available to use, the Pod started.
The statuses this Pod had were the following:
Pending
ContainerCreating
CreateContainerConfigError
Running
This is a table with timestamps of Pod and Secret:
| Pod | Secret |
|-------------------------------------------|-------------------------------------------|
| creationTimestamp: "2020-08-23T19:54:47Z" | - |
| - | creationTimestamp: "2020-08-23T19:54:55Z" |
| startedAt: "2020-08-23T19:55:08Z" | - |
You can get this timestamp by invoking below commands:
$ kubectl get pod pod_name -n namespace -o yaml
$ kubectl get secret secret_name -n namespace -o yaml
You can also get get additional information with:
$ kubectl get event -n namespace
$ kubectl describe pod pod_name -n namespace
Second method
This method requires preparation before running Helm chart. Open another terminal window (for this particular case 2) and run:
$ kubectl get pod -n namespace --watch-only | while read line ; do echo -e "$(gdate +"%H:%M:%S:%N")\t $line" ; done
$ kubectl get secret -n namespace --watch-only | while read line ; do echo -e "$(gdate +"%H:%M:%S:%N")\t $line" ; done
After that apply your Helm chart.
Disclaimer!
Above commands will watch for changes in resources and display them with a timestamp from the OS. Please remember that this command is only for example purposes.
The output for Pod:
21:54:47:534823000 NAME READY STATUS RESTARTS AGE
21:54:47:542107000 ubuntu-65976bb789-l48wz 0/1 Pending 0 0s
21:54:47:553799000 ubuntu-65976bb789-l48wz 0/1 Pending 0 0s
21:54:47:655593000 ubuntu-65976bb789-l48wz 0/1 ContainerCreating 0 0s
-> 21:54:52:001347000 ubuntu-65976bb789-l48wz 0/1 CreateContainerConfigError 0 4s
21:55:09:205265000 ubuntu-65976bb789-l48wz 1/1 Running 0 22s
The output for Secret:
21:54:47:385714000 NAME TYPE DATA AGE
21:54:47:393145000 sh.helm.release.v1.example.v1 helm.sh/release.v1 1 0s
21:54:47:719864000 sh.helm.release.v1.example.v1 helm.sh/release.v1 1 0s
21:54:51:182609000 understood-squid-redis Opaque 1 0s
21:54:52:001031000 understood-squid-redis Opaque 1 0s
-> 21:54:55:686461000 example-binding Opaque 6 0s
Additional resources:
Stackoverflow.com: Answer: Helm install in certain order
Alibabacloud.com: Helm charts and templates hooks and tests part 3
So to answer my own question (and thanks to #dawid-kruk and the folks on Service Catalog Sig on Slack):
In fact, the initial start of my Pods (the ones referencing the Secret created out of the ServiceBinding) fails! It fails because the Secret is actually not there the moment K8S tries to start the pods.
Kubernetes has a self-healing mechanism, in the sense that it tries (and retries) to reach the target state of the cluster as described by the various deployed resources.
By Kubernetes retrying to get the pods running, eventually (when the Secret is finally there) all conditions will be satisfied to make the pods start up nicely. Therefore, eventually, evth. is running as it should.
How could this be streamlined? One possibility would be for Helm to include the custom resources ServiceBinding and ServiceInstance into its ordered list of installable resources and install them early in the installation phase.
But even without that, Kubernetes actually deals with it just fine. The order of installation (in this case) really does not matter. And that is a good thing!

Pod mounts wrong directory on Node when a flexvolume with cifs is configured

The following problem occurs on a Kubernetes cluster with 1 master and 3 nodes and also on a single-machine Kubernetes.
I set up the Kubernetes with flexvolume smb support (https://github.com/Azure/kubernetes-volume-drivers/tree/master/flexvolume/smb). When I apply a new pod with flexvolume the Node mounts the smb share as expected. But the Pod points his share to some docker directory on the Node.
My installation:
latest CentOS 7
latest Kubernetes v1.14.0
(https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/)
disabled SELinux and disabled firewall
Docker 1.13.1
jq and cifs-utils
https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/smb/deployment/smb-flexvol-installer/smb installed to /usr/libexec/kubernetes/kubelet-plugins/volume/exec/microsoft.com~smb and executable
Create Pod with
smb-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: smb-secret
type: microsoft.com/smb
data:
username: YVVzZXI=
password: YVBhc3N3b3Jk
nginx-flex-smb.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-flex-smb
spec:
containers:
- name: nginx-flex-smb
image: nginx
volumeMounts:
- name: test
mountPath: /data
volumes:
- name: test
flexVolume:
driver: "microsoft.com/smb"
secretRef:
name: smb-secret
options:
source: "//<host.with.smb.share>/kubetest"
mountoptions: "vers=3.0,dir_mode=0777,file_mode=0777"
What happens
Mount point on Node is created on /var/lib/kubelet/pods/bef26895-5ac7-11e9-a668-00155db9c92e/volumes/microsoft.com~smb.
mount returns //<host.with.smb.share>/kubetest on /var/lib/kubelet/pods/bef26895-5ac7-11e9-a668-00155db9c92e/volumes/microsoft.com~smb/test type cifs (rw,relatime,vers=3.0,cache=strict,username=aUser,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=172.27.72.43,file_mode=0777,dir_mode=0777,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
read and write works as expected on host and on the Node itself
on Pod
mountfor /data points to tmpfs on /data type tmpfs (rw,nosuid,nodev,seclabel,size=898680k,nr_inodes=224670,mode=755)
but the content of the directory /data comes from /run/docker/libcontainerd/8039742ae2a573292cd9f4ef7709bf7583efd0a262b9dc434deaf5e1e20b4002/ on the node.
I tried to install the Pod with a PersistedVolumeClaime and get the same problem. Searching for this problem got me no solutions.
Our other pods uses GlusterFS and heketi which works fine.
Is there maybe a configuration failure? Something missing?
EDIT: Solution
I upgraded Docker to the latest validated Version 18.06 and everything works well now.
I upgraded Docker to the latest validated Version 18.06 and everything works well now.
To install it follow the instructions on Get Docker CE for CentOS.

Kubernetes Ceph StorageClass with dynamic provisioning

I'm trying to setup my Kubernetescluster with a Ceph Cluster using a storageClass, so that with each PVC a new PV is created automatically inside the ceph cluster.
But it doesn't work. I've tried a lot and read a lot of documentation and tutorials and can't figure out, what went wrong.
I've created 2 secrets, for the ceph admin user and an other user kube, which I created with this command to grant access to a ceph osd pool.
Creating the pool:
sudo ceph osd pool create kube 128
Creating the user:
sudo ceph auth get-or-create client.kube mon 'allow r' \
osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube' \
-o /etc/ceph/ceph.client.kube.keyring
After that I exported both the keys and converted them to Base64 with:
sudo ceph auth get-key client.admin | base64 and sudo ceph auth get-key client.kube | base64
I used those values inside my secret.yaml to create kubernetes secrets.
apiVersion: v1
kind: Secret
type: "kubernetes.io/rbd"
metadata:
name: ceph-secret
data:
key: QVFCb3NxMVpiMVBITkJBQU5ucEEwOEZvM1JlWHBCNytvRmxIZmc9PQo=
And another one named ceph-user-secret.
Then I created a storage class to use the ceph cluster
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/rbd
parameters:
monitors: publicIpofCephMon1:6789,publicIpofCephMon2:6789
adminId: admin
adminSecretName: ceph-secret
pool: kube
userId: kube
userSecretName: ceph-kube-secret
fsType: ext4
imageFormat: "2"
imageFeatures: "layering"
To test my setup I created a PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-eng
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
But it gets stuck in the pending state:
#kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
pvc-eng Pending standard 25m
Also, no images are created inside the ceph kube pool.
Do you have any recommendations how to debug this problem?
I tried to install the ceph-common ubuntu package on all kubernetes nodes. I switched the kube-controller-manager docker image with an image provided by AT&T which includes the ceph-common package.
https://github.com/att-comdev/dockerfiles/tree/master/kube-controller-manager
Network is fine, I can access my ceph cluster from inside a pod and from every kubernetes host.
I would be glad if anyone has any ideas!
You must use annotation: ReadWriteOnce.
As you can see https://kubernetes.io/docs/concepts/storage/persistent-volumes/ (Persistent Volumes section) RBD devices does not support ReadWriteMany mode. Choose different volume plugin (CephFS, for example) if you need read and write data from PV by several pods.
As an expansion on the accepted answer.....
RBD is a remote block device. ie an external hard drive like iSCSI. The filesystem is interpreted by the client container so can only be written by a single user or corruption will happen.
CephFS is a network aware filesystem similar to NFS or SMB/CIFS. That allows multiple writes to different files. The filesystem is interpreted by the Ceph server so can accept writes from multiple clients.