How to debug failed `kops update cluster` - kubernetes

I had a working kops cluster. I deleted some unneeded igs and updated the cluster. Now kubectl won't connect to the cluster. I get the following error: Unable to connect to the server: dial tcp {ip} i/o timeout.
How do I go about debugging the issue?

as a first step I would try to run it again with higher log level, there logs are really good.
Note you probably want to redirect to a file ...
kops <whatever> -v 10 &> log.txt

Related

Concourse 5.0 Installation in AWS

We have been trying to setup concourse 5.0.0 (we already set up 4.2.2) in our AWS. We have created two instances one is for web and another is for worker. We are able to see the site up and running but we are not able to run our pipeline. we checked the logs and noticed that worker throwing the below error.
Workerr.beacon.forward-conn.failed-to-dial","data":{"addr":"127.0.0.1:7777","error":"dial tcp 127.0.0.1:7777: connect: connection refused","network":"tcp","session":"9.1.4"}}
We are assuming worker is struggling to connect to web instance and wondering if this could be due to missing gdn configuration. Concourse 5.0.0 release included both concourse and gdn binaries. we want to try --garden-config file to see if that fixes the problem.
can somebody suggest how do we write garden config file ?
I had this same problem and solved it using #umamaheswararao-meka's answer. (Using ubuntu 18.04 on EC2)
Also had a problem with containers not being able to resolve domain names (https://github.com/docker/libnetwork/issues/2187). Here is the error message:
resource script '/opt/resource/check []' failed: exit status 1
stderr:
failed to ping registry: 2 error(s) occurred:
* ping https: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
* ping http: Get http://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
What I did:
sudo apt-get install resolvconf -y
# These are cloudflare's DNS servers
sudo echo "nameserver 1.1.1.1" >> /etc/resolvconf/resolv.conf.d/tail
sudo echo "nameserver 1.0.0.1" >> /etc/resolvconf/resolv.conf.d/tail
sudo resolvconf -u
cat /etc/resolv.conf # just to make sure changes are in place
# restart concourse service
Containers make use of resolv.conf and as the file is generated dynamically on ubuntu 18.04 this was the easiest way of making containers inherit this configuration.
Also relevant snippets from man resolvconf
-u Just run the update scripts (if updating is enabled).
/etc/resolvconf/resolv.conf.d/tail
File to be appended to the dynamically generated resolver configuration file. To append
nothing, make this an empty file. This file is a good place to put a resolver options
line if one is needed, e.g.,
it was the issue with gdn(garden binary) which was not configured. we had to include CONCOURSE_BIND_IP=xx.xx.x.x ( IP where your gdn is located) and CONCOURSE_BIND_PORT=7777( gdn's port) in wroker.env file. Which solved the problem for us.

Unable to run kubectl command on MacOS

I have just installed Kubernetes on my MacOS using homebrew.
Now, in the terminal, I ran kubectl version command and the error message reads Unable to connect to the server: dial tcp 35.225.115.157:443: i/o timeout
How to solve this issue?
kubectl version makes a connection to both client and server (kubernetes master) in order to print the versions.
Run kubectl cluster-info and check if the server is up.

minikube start failed with ssh error

I got this error when start minikube. Can anybody help me? Thanks in advance.
I1105 12:57:36.987582 15567 cluster.go:77] Machine state: Running
Waiting for SSH to be available...
Getting to WaitForSSH function...
Using SSH client type: native
&{{{ 0 [] [] []} docker [0x83b300] 0x83b2b0 [] 0s} 127.0.0.1 22 }
About to run SSH command:
exit 0
Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Error dialing TCP: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
In order to have local instance of cluster-like Kubernetes installation there is minikube utility.
This tool can do all the magic of the installation process automatically.
In the short - the process of deploying consist of downloading runtime images, spinning up containers and configuring necessary elements of Kubernetes without any user interaction required. It works like a charm.
You may consider to just drop current installation and start it over?
Looks like
minikube delete && minikube start
may help.
Removing the .minikube file and starting minikube helps.
rm -rf ~/.minikube & minikube start

Bootkube API server unable to start

I am following the CoreOS tutorial for self-hosted Kubernetes and I am having some issues with the Bootkube API server. Using the Bootkube example from the recommended repository I have only changed the ssh_authorized_keys metadata field in nodes 1,2 and 3. All other settings are the same as in the repository. However, after running bootkube-start via systemctl on node1 I check the logs using ssh core#node1.example.com 'journalctl -f -u bootkube' and I am getting Unable to determine api-server readiness: Get https://node1.example.com:443/version: dial tcp 172.17.0.21:443: getsockopt: connection refused. Does anyone know of the best ways to debug such an issue?
It looks like the api-server is having issues while starting, or that you have some networking/firewall/dns problem.
You should be able to ssh core#node1.example.com and then get the full bootkube logs there via sudo journalctl -u bootkube.

Minikube on Windows with VirtualBox: Connection attempt fail

I got Kubernetes Minikube on my laptop (4cores, 8 GB RAM). I just performed the basic installation steps (got miniKube and kubectl, enabled the BIOS virtualization) and I am able to start the cluster:
C:\Users\me>minikube start
Starting local Kubernetes cluster...
Starting VM...
SSH-ing files into VM...
Setting up certs...
Starting cluster components...
Connecting to cluster...
Setting up kubeconfig...
Kubectl is now configured to use the cluster.
However, when I try to interact with the cluster, I allways get the same error, sample:
C:\Users\me>kubectl get pods --context=minikube
Unable to connect to the server: dial tcp 192.168.99.100:8443: connectex: A connection attempt failed because the connected party
did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
I execute minikube ip and I ping the result IP and I get a response. Also I tried to give more memory (3Gb vs the standard 2Gb) and nothing changed.
Am I doing something wrong here?
Thanks!
I had same issue as above. I found out that kubectl couldn't connect to the cluster and would throw up the error when i'm on a VPN connection. When I turned off my VPN client, it started working as fine.
I think it could be some problem with the cluster, when I run minikube status I've got the mixed results of cluster running and cluster stopped:
First run:
c:\> minikube status
minikube: Running
cluster: Stopped
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.100
Second run:
minikube: Running
cluster: Running
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.100
Third run:
minikube: Running
cluster: Stopped
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.100
The service is flapping.
UPDATED:
Connecting to the minikube vm using minikube ssh I realized the kubeconfig file have wrong path separator for certificates generated by minikube automatic configuration. The path on kubeconfig file stands for \var\lib\localkube\certs\ca.cert and it have to be /var/lib/localkube/certs/ca.cert and so on...
To update the file I have to copy the content of the orignal file to my desktop, fix the directory separators and save the correct file to /var/lib/localkube/kubeconfig and restart the service using:
sudo systemclt restart localkube.
I hope everyone can use minikube with this tip.
If it keep to hit 8443 connection issue when changed work environment, would simplify turn off TLS verification for minikube local cluster if there is not clue.
https://github.com/robertluwang/docker-hands-on-guide/blob/master/minikube-no-tls-verify.md
Hope it is helpful for you.
BR/
Robert
from the documentation:
for Troubleshooting
Run minikube start --alsologtostderr -v=7 to debug crashes
I had the same problem:
check if a some service of a VPN is running by checking the task management, for me, I had a running service of my VPN, so kill the task and try to run the command showed above