We need to enable some sysctl parameters in kubernetes. This should be achievable with the below annotation in the Deployment.
annotations:
security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.ip_local_port_range="10240 65535"
When doing so the container fails to start with the error:
Warning FailedCreatePodSandBox 8s (x12 over 19s) kubelet, <node> Failed create pod sandbox.
The solution looks to be to add this flag to the kublet:
--experimental-allowed-unsafe-sysctls
Which for other flags can be done under kubelet in
kops edit cluster
Does anyone know the correct way to do this as it refuses to pick up the setting when entering the flag there.
Thanks,
Alex
A fix for this was merged back in May, you can see the PR here: https://github.com/kubernetes/kops/pull/5104/files
You'd enable it with:
spec:
kubelet:
ExperimentalAllowedUnsafeSysctls:
- 'net.ipv4.ip_local_port_range="10240 65535"'
It seems the flag takes a stringSlice, so you'd need to pass an array.
If that doesn't work, ensure you're using the right version of kops
As of 2020-05-18, the proper config is, for example:
kubelet:
allowedUnsafeSysctls:
- net.ipv4.ip_local_port_range="10240 65535"
In general, all KOPS config must be camelCased.
From here, KOPS 1.16.2+
Related
I'm running Kubeflow in a local machine that I deployed with multipass using these steps but when I tried running my pipeline, it got stuck with the message ContainerCreating. When I ran kubectl describe pod train-pipeline-msmwc-1648946763 -n kubeflow I found this on the Events part of the describe:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 7m12s (x51 over 120m) kubelet, kubeflow-vm Unable to mount volumes for pod "train-pipeline-msmwc-1648946763_kubeflow(45889c06-87cf-4467-8cfa-3673c7633518)": timeout expired waiting for volumes to attach or mount for pod "kubeflow"/"train-pipeline-msmwc-1648946763". list of unmounted volumes=[docker-sock]. list of unattached volumes=[podmetadata docker-sock mlpipeline-minio-artifact pipeline-runner-token-dkvps]
Warning FailedMount 2m22s (x67 over 122m) kubelet, kubeflow-vm MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file
Looks to me like there is a problem with my deployment, but I'm new to Kubernetes and can't figure out what I supposed to do right now. Any idea on how to solve this? I don't know if it helps but I'm pulling the containers from a private docker registry and I've set up the secret according to this.
You don't need to use docker. In fact the problem is with workflow-controller-configmap in kubeflow name space. You can edit it with
kubectl edit configmap workflow-controller-configmap -n kubeflow
and change containerRuntimeExecutor: docker to containerRuntimeExecutor: pns. Also you can change some of the steps and install kubeflow 1.3 in mutlitpass 1.21 rather than 1.15. Do not use kubelfow add-on (at least didn't work for me). You need kustomize 3.2 to create manifests as they mentioned in https://github.com/kubeflow/manifests#installation.
There was one step missing which is not mentioned in the tutorial, which is, I have to install docker. I've installed docker, rebooted the machine, and now everything works fine.
I am trying to configure ceph on kubernetes cluster using rook, I have run the following commands:
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster.yaml
I have three worker nodes with atached volumes and on master, all the created pods are running except the rook-ceph-crashcollector pods for the three nodes, when I describe these pods I get this message
MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found
However all the nodes are running and working
It is hard to exactly tell what might be the cause of this but there are few possibilities:
Cluster networking problem between nodes
Some possible leftover sockets in the /var/lib/kubelet directory related to rook ceph.
A bug when connecting to an external Ceph cluster.
In order to fix your issue you can:
Use Flannel and make sure it is using the right interface. Check the kube-flannel.yml file and see if it uses the --iface= option. Or alternatively try to use Calico.
Clear the ./var/lib/rook/, ./var/lib/kubelet/plugins/ and ./var/lib/kubelet/plugins_registry/ directories and reinstall the rook service.
Create the rook-ceph-crash-collector-keyring secret manually by executing: kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring.
I'm trying the debug CLI feature on 1.18 release of Kubernetes but I have an issue while I have executed debug command.
I have created a pod show as below.
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
After than when running this command: kubectl alpha debug -it ephemeral-demo --image=busybox --target=ephemeral-demo
Kubernetes is hanging like that :
Defaulting debug container name to debugger-aaaa.
How Can I resolve that issue?
It seems that you need to enable the feature gate on all control plane components as well as on the kubelets. If the feature is enabled partially (for instance, only kube-apiserver and kube-scheduler), the resources will be created in the cluster, but no containers will be created, thus there will be nothing to attach to.
In addition to the answer posted by Konstl
To enable EphemeralContainers featureGate correctly, add on the master nodes to:
/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml
the following line to container command:
spec:
containers:
- command:
- kube-apiserver # or kube-controller-manager/kube-scheduler
- --feature-gates=EphemeralContainers=true # < -- add this line
Pods will restart immediately.
For enabling the featureGate for kubelet add on all nodes to:
/var/lib/kubelet/config.yaml
the following lines at the bottom:
featureGates:
EphemeralContainers: true
Save the file and run the following command:
$ systemctl restart kubelet
This was enough in my case to be able to use kubectl alpha debug as it's explained in the documentation
Additional usefull pages:
Ephemeral Containers
Share Process Namespace between Containers in a Pod
I'm seeing similar behaviour. Although i'm running 1.17 k8s with 1.18 kubectl. But I was under the impression that the feature was added in 1.16 in k8s and in 1.18 in kubectl.
I was able to setup the Kubernetes Cluster on Centos7 with one master and two worker nodes, however when I try to deploy a pod with nginx, the state of the pod stays in ContainerRunning forever and doesn't seem to get out of it.
For pod network I am using the calico.
Can you please help me resolve this issue? for some reason I don't feel satisfied moving forward without resolving this issue, I tried to check forums etc, since the last two days and this is the last resort that I am reaching out to you.
[root#kube-master ~]# kubectl get pods --all-namespaces
[get pods result][1]
However when I run describe pods I see the below error for the nginx container under events section.
Warning FailedCreatePodSandBox 41s (x8 over 11m) kubelet,
kube-worker1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"ac77a42270009cba0c508e2fd82a84d6caef287bdb117d288d5193960b52abcb"
network for pod "nginx-6db489d4b7-2r4d2": networkPlugin cni failed to
set up pod "nginx-6db489d4b7-2r4d2_default" network: unable to connect
to Cilium daemon: failed to create cilium agent client after 30.000000
seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config:
dial unix /var/run/cilium/cilium.sock: connect: no such file or
directory
Hope you can help here.
Edit 1:
The ip address of the master VM is 192.168.40.133
Used the below command to initialize the kubeadm:
kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 192.168.40.133
Used the below command to install the pod network:
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
The kubeadm init above gave me the join command that I used to join the workers into the cluster.
All the VMs are connected to host and bridged network adapters.
your pod subnet (specified by --pod-network-cidr) clashes with the network your VMs are located in: these 2 have to be distinct. Use something else for the pod subnet, for example 10.244.0.0/16 and then edit calico.yaml before applying it as described in the official docs:
POD_CIDR="10.244.0.0/16"
kubeadm init --pod-network-cidr=${POD_CIDR} --apiserver-advertise-address 192.168.40.133
curl https://docs.projectcalico.org/manifests/calico.yaml -O
sed -i -e "s?192.168.0.0/16?${POD_CIDR}?g" calico.yaml
kubectl apply -f calico.yaml
hope this helps :)
note: you don't really need to specify --apiserver-advertise-address flag: kubeadm will detect correctly the main IP of the machine most of the time.
My company uses it's own root CA and when I'm trying to pull images. Even from a private registry I'm getting error:
1h 3m 22 {kubelet minikube} Warning FailedSync Error syncing
pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull:
"image pull failed for gcr.io/google_containers/pause-amd64:3.0, this
may be because there are no credentials on this request.
details:
(Error response from daemon: Get https://gcr.io/v1/_ping: x509:
certificate signed by unknown authority)"
1h 10s 387 {kubelet minikube} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with
ImagePullBackOff: "Back-off pulling image
\"gcr.io/google_containers/pause-amd64:3.0\""
How I can install root CA to minkube or avoid this message i.e. Use only private registry, and don't pull anything from gcr.io ?
The only solution I've found so far is adding --insecure-registry gcr.io option to the minikube.
To address:
x509: certificate signed by unknown authority
Could you please try the following suggestion from Minikube repo?
copy the cert into the VM. The location should be:
/etc/docker/certs.d/
from here: https://docs.docker.com/engine/security/certificates/
ref
That thread also includes the following one-liner:
cat <certificatefile> \
| minikube ssh "sudo mkdir -p /etc/docker/certs.d/<domain> && sudo tee /etc/docker/certs.d/<domain>/ca.crt"
The issue here is the CA Trust chain of the Linux host that needs to be updated. The easiest way is to reboot the Linux host after copying the certs into the VM, if rebooting is not an option - look for a way to update-ca-certificates.
Just restarting the Docker Daemon will most likely not solve this issue
Note: allowing the Docker daemon to use insecure registries means certificates aren't verified.. while this may help, it does not solve the question asked here
A straight forward way to do this independent from minikube would be to use a imagePullSecrets configuration. As documented in the Pulling an Image from a Private Registry guide, you can create a Secret and use that along with your image like this:
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
spec:
containers:
- name: private-container
image: <your-private-image>
imagePullSecrets:
- name: the_secret
Where the the_secret could be created with:
kubectl create secret docker-registry the_secret --docker-server=xxx --docker-username=xxx --docker-password=xxx --docker-email=xxx
Or with:
kubectl create secret generic the_secret --from-file=.docker/config.json
Or with anything else which is related to kubectl create secret - see documentation for details.
Edit: Even in the official minikube documentation you'll find that they use Secrets along with the registry-creds addon.
You'll find the generic documentation here and the documentation for the addon here.
It burns down to:
minikube addons enable registry-creds
But technically it does the same as described above.
You can add your companies certificate into the minikube kubernetes cluster by first copying the certificate to $HOME/.minikube/certs:
mkdir -p $HOME/.minikube/certs
cp my-company.pem $HOME/.minikube/certs/my-company.pem
And then starting minikube with the --embed-certs option:
minikube start --embed-certs
Source: Official documentation