I am bit stuck with a problem I have.
I am struggling to set up the cni-plugin for k8. I was trying to install different cni-plugins that now I think many things are messed up.
Is there a way to neatly delete everything connected with a cni-plugin so that I can have a clean starting point? The goal is to avoid formatting my whole machine.
From this Stack Question.
steps to remove old calico configs from kubernetes without kubeadm reset:
clear ip route: ip route flush proto bird
remove all calico links in all nodes ip link list | grep cali | awk '{print $2}' | cut -c 1-15 | xargs -I {} ip link delete {}
remove ipip module modprobe -r ipip
remove calico configs rm /etc/cni/net.d/10-calico.conflist && rm /etc/cni/net.d/calico-kubeconfig
restart kubelet service kubelet restart
After those steps all the running pods won't be connect, then I have to delete all the pods, then all the pods works. This has litter influence if you are using replicaset
Or else you can use:
kubeadm reset command. this will un-configure the kubernetes cluster.
Related
How can I distribute the kube config file on worker nodes?
Only my master node has it (file under ~/.kube/config), and I'm not sure what's the proper way to programmatically copy the file over to the worker nodes, so that I can use kubectl on all nodes.
You can use the scp command in order to copy a file from one machine to another one.
run the following command from your master node for each worker node :
[user#k8s-master]$ scp ~/.kube/config username#k8s-worker1:~/.kube/config
It is not recommended that you have ~/.kube/config on the worker nodes. If a worker node is compromised due to a vulnerable pod, it could compromise the cluster using this config.
Thats why it is recommended to use a bastion host and use kube context.
However, for non-prod environments,
you can do something like this
kubectl get no --no-headers | egrep -v "master|controlplane" | awk '{print $1}' | while read line; do
scp -pr ~/.kube/ ${line}:~/.kube;
done
scp -pr will create the .kube directory if it doesn't exist on the worker nodes
I uninstalled calico using:
'kubectl delete -f calico.yaml'
and installed weave using:
'export kubever=$(kubectl version | base64 | tr -d '\n')'
'kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"'
When i deploy my pods they remain at "ContainerCreating" status.
When i check pod logs i see error below:
'networkPlugin cni failed to set up pod "saccofrontend-d7444fd6d-998gf_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/'
I manually deleted this file.
A reinstall of docker and kubeadm did not help either.Still getting the same error.
Please advise what could be promting kubelet to still use calico as the cni even though i unstialled it.
thank you for pointing me in the right direction: These cmds solved the problem:
rm -rf /var/lib/cni
rm -rf /etc/cni/net.d
then re-installed kubeadm
My issue was that I was using the command below to setup the Calico CNI:
kubectl apply -f https://docs.projectcalico.org/v3.9/manifests/calico.yaml
Using the link below instead worked. It's the same without the version I guess.
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Install Tigera Operator
kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
Install Calico by creating the necessary custom resource
kubectl create -f https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml
Now Watch the nodes running and bring up core dns pod too.
Microk8s is installed on default port 16443. I want to change it to 6443. I am using Ubuntu 16.04. I have installed microk8s using snapd and conjure-up.
None of the following options I have tried worked.
Tried to edit the port in /snap/microk8s/current/kubeproxy.config. As the volume is read-only, I could not edit it.
Edited the /home/user_name/.kube/config and restarted the cluster.
Tried using the command and restarted the cluster
sudo kubectl config set clusters.microk8s-cluster.server https://my_ip_address:6443.
Tried to use kubectl proxy --port=6443 --address=0.0.0.0 --accept-hosts=my_ip_address &. It listens on 6443, but only HTTP, not HTTPS traffic.
That was initially resolved in microk8s issue 43, but detailed in microk8s issue 300:
This is the right one to use for the latest microk8s:
#!/bin/bash
# define our new port number
API_PORT=8888
# update kube-apiserver args with the new port
# tell other services about the new port
sudo find /var/snap/microk8s/current/args -type f -exec sed -i "s/8080/$API_PORT/g" {} ';'
# create new, updated copies of our kubeconfig for kubelet and kubectl to use
mkdir -p ~/.kube && microk8s.config -l | sed "s/:8080/:$API_PORT/" | sudo tee /var/snap/microk8s/current/kubelet.config > ~/.kube/microk8s.config
# tell kubelet about the new kubeconfig
sudo sed -i 's#${SNAP}/configs/kubelet.config#${SNAP_DATA}/kubelet.config#' /var/snap/microk8s/current/args/kubelet
# disable and enable the microk8s snap to restart all services
sudo snap disable microk8s && sudo snap enable microk8s
I have tried two different applications, both consisting of a web application frontend that needs to connect to a relational database.
In both cases the frontend application is unable to connect to the database. In both instances the database is also running as a container (pod) in OpenShift. And the web application uses the service name as the url. Both applications have worked in other OpenShift environments.
Version
OpenShift Master: v1.5.1+7b451fc
Kubernetes Master: v1.5.2+43a9be4
Installed using Ansible Openshift
Single node, with master on this node
Host OS: CentOS 7 Minimal
I am not sure where to look in OpenShift to debug this issue. The only way I was able to reach the db pod from the web pod was using the cluster ip address.
In order for the internal DNS resolution to work, you need to ensure dnsmasq.service is running, /etc/resolv.conf contains the IP address of the OCP node itself instead of other DNS servers (these should be in /etc/dnsmasq.d/origin-upstream-dns.conf).
Example:
# ip a s eth0
...
inet 10.0.0.1/24
# cat /etc/resolv.conf
...
nameserver 10.0.0.1
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
^^ note the dispatcher script in the /etc/resolv.conf
# systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; vendor preset: disabled)
Active: active (running)
# cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
server=/cluster.local/172.18.0.1
^^ this IP should be kubernetes service IP (oc get svc -n default)
# cat /etc/dnsmasq.d/origin-upstream-dns.conf
server=<dns ip 1>
server=<dns ip 2>
If the OpenShift is running on some kind of OpenStack instance, AWS or similar, it might happen that cloud-init does not trigger the NetworkManager dispatcher script, therefore the resolv.conf is not modified to point to dnsmasq. Try to restart whole network, e.g.:
# systemctl restart network.service
I hope this helps.
I have been facing issues connecting to databases as well using SkyDNS e.g phpMyAdmin, as a workaround I tried entering the ClusterIP instead of the SkyDNS name, and it worked, have you tried using service ClusterIP instead?
We ended up upgrading openshift from 4.5.3 to 4.5.7 for now and observing the status.
Looks like it is SkyDNS issue and I wonder if this will get resolved or not in 4.5.7 onwards.
The below commands will let you know if there if DNS requests failed or got resolved. Try running on the bastion node.
Sticky ( local DNS Query )
DST_HOST=kubernetes.default.svc.cluster.local; while read wide; do pod=$(echo ${wide} | awk '{print $1}'); node=$(echo ${wide} | awk '{print $7}'); while read wide2; do ip=$(echo ${wide2} | awk '{print $6}'); node2=$(echo ${wide2} | awk '{print $7}'); echo -ne "`date +"%Y-%m-%d %T"` : ${pod}(${node}) querying ${DST_HOST} via ${ip}(${node2}): "; oc exec -n openshift-dns ${pod} -- dig ${DST_HOST} +short &>/dev/null; test "$?" -eq "0" && echo ok || echo failed; done < <(oc get pods -n openshift-dns -o wide --no-headers); done < <(oc get pods -n openshift-dns -o wide --no-headers)
Random ( Sprayed DNS Query, see if they give same results as above )
DST_HOST=kubernetes.default.svc.cluster.local; while read wide; do pod=$(echo ${wide} | awk '{print $1}'); node=$(echo ${wide} | awk '{print $7}'); while read wide2; do ip=$(echo ${wide2} | awk '{print $6}'); node2=$(echo ${wide2} | awk '{print $7}'); echo -ne "`date +"%Y-%m-%d %T"` : ${pod}(${node}) querying ${DST_HOST} via ${ip}(${node2}): "; oc exec -n openshift-dns ${pod} -- dig #${ip} ${DST_HOST} -p 5353 +short &>/dev/null; test "$?" -eq "0" && echo ok || echo failed; done < <(oc get pods -n openshift-dns -o wide --no-headers); done < <(oc get pods -n openshift-dns -o wide --no-headers)
In OpenShift skydns is part of master, you can restart master to restart internal dns, but I suggest you try this:
1. Check whether dns can resolve your service name using dig
2. If it fail it's the dns problem, or it's the iptables problem, you can try restart the kube proxy(part of the node service) to sync the proxy rules.
If route is not reachable it's dns issue
So I have a Kubernetes cluster, and I am using Flannel for an overlay network. It has been working fine (for almost a year actually) then I modified a service to have 2 ports and all of a sudden I get this about a completely different service, one that was working previously and I did not edit:
<Timestamp> <host> flanneld[873]: I0407 18:36:51.705743 00873 vxlan.go:345] L3 miss: <Service's IP>
<Timestamp> <host> flanneld[873]: I0407 18:36:51.705865 00873 vxlan.go:349] Route for <Service's IP> not found
Is there a common cause to this? I am using Kubernetes 1.0.X and Flannel 0.5.5 and I should mention only one node is having this issue, the rest of the nodes are fine. The bad node's kube-proxy is also saying it can't find the service's endpoint.
Sometime flannel will change it's subnet configuration... you can tell this if the IP and MTU from cat /run/flannel/subnet.env doesn't match ps aux | grep docker (or cat /etc/default/docker)... in which case you will need to reconfigure docker to use the new flannel config.
First you have to delete the docker network interface
sudo ip link set dev docker0 down
sudo brctl delbr docker0
Next you have to reconfigure docker to use the new flannel config.
Note: sometimes this step has to be done manually (i.e. read the contents of /run/flannel/subnet.env and then alter /etc/default/docker)
source /run/flannel/subnet.env
echo DOCKER_OPTS=\"-H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU}\" > /etc/default/docker
Finally, restart docker
sudo service docker restart