My k8s 1.12.8 cluster (created via kops) has been running fine for 6+ months. Recently, something caused both kube-scheduler and kube-controller-manager on the master node to die and restart:
SyncLoop (PLEG): "kube-controller-manager-ip-x-x-x-x.z.compute.internal_kube-system(abc123)", event: &pleg.PodLifecycleEvent{ID:"abc123", Type:"ContainerDied", Data:"def456"}
hostname for pod:"kube-controller-manager-ip-x-x-x-x.z.compute.internal" was longer than 63. Truncated hostname to :"kube-controller-manager-ip-x-x-x-x.z.compute.inter"
SyncLoop (PLEG): "kube-scheduler-ip-x-x-x-x.z.compute.internal_kube-system(hij678)", event: &pleg.PodLifecycleEvent{ID:"hij678", Type:"ContainerDied", Data:"890klm"}
SyncLoop (PLEG): "kube-controller-manager-ip-x-x-x-x.eu-west-2.compute.internal_kube-system(abc123)", event: &pleg.PodLifecycleEvent{ID:"abc123", Type:"ContainerStarted", Data:"def345"}
SyncLoop (container unhealthy): "kube-scheduler-ip-x-x-x-x.z.compute.internal_kube-system(hjk678)"
SyncLoop (PLEG): "kube-scheduler-ip-x-x-x-x.z.compute.internal_kube-system(ghj567)", event: &pleg.PodLifecycleEvent{ID:"ghj567", Type:"ContainerStarted", Data:"hjk768"}
Ever since kube-scheduler and kube-controller-manager restarted, kubelet is completely unable to get or update any node status:
Error updating node status, will retry: failed to patch status "{"status":{"$setElementOrder/conditions":[{"type":"NetworkUnavailable"},{"type":"OutOfDisk"},{"type":"MemoryPressure"},{"type":"DiskPressure"},{"type":"PIDPressure"},{"type":"Ready"}],"conditions":[{"lastHeartbeatTime":"2020-08-12T09:22:08Z","type":"OutOfDisk"},{"lastHeartbeatTime":"2020-08-12T09:22:08Z","type":"MemoryPressure"},{"lastHeartbeatTime":"2020-08-12T09:22:08Z","type":"DiskPressure"},{"lastHeartbeatTime":"2020-08-12T09:22:08Z","type":"PIDPressure"},{"lastHeartbeatTime":"2020-08-12T09:22:08Z","type":"Ready"}]}}" for node "ip-172-20-60-88.eu-west-2.compute.internal": Patch net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error updating node status, will retry: error getting node "ip-x-x-x-x.z.compute.internal": Get net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error updating node status, will retry: error getting node "ip-x-x-x-x.z.compute.internal": Get net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error updating node status, will retry: error getting node "ip-x-x-x-x.z.compute.internal": Get context deadline exceeded
Error updating node status, will retry: error getting node "ip-x-x-x-x.z.compute.internal": Get context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Unable to update node status: update node status exceeds retry count
The cluster is completely unable to perform any updates in this state.
What can cause the master node to lose connectivity to nodes like
Is the 2nd line in the first log output 'Truncated
hostname..' a potential source of the issue?
How can I further
diagnose what is actually causing the get/update node actions to

I remember kubernetes limits the hostname to less than 64 characters. Is there a case where hostname is updated this time?
If so it would be good to reconstruct the kubelet configuration using this documentation


How do I solve a timeout when trying to add a node to my Kubernetes cluster?

I am trying to add a node to my (currently running) Kubernetes cluster.
When I run the kubeadm join command, I get the following error:
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker
cgroup driver. The recommended driver is "systemd".
Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: couldn't validate the identity of the API Server:
Get "":
net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
To see the stack trace of this error execute with --v=5 or higher
here is a snippet from the stack trace
I0917 16:06:58.162180 2714 token.go:215] [discovery] Failed to request cluster-info, will try again: Get "https://*redacted*:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
What does this mean and how do I solve it?
I forgot that I had a firewall installed on my server
I have added port 6443 per instructions found here (Kubeadm join failed : Failed to request cluster-info) and all is well!

Readiness probe failed: Get http://*.*.*.*:8080/***/healthCheck: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

The GCP Error reporting Dashboard is showing following error for one of our production cluster ,could you please help me with it is this an issue or information about the pod is not ready to take traffic at that movement
Readiness probe failed: Get http://...:8080/***healthCheck: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

After starting Ditto services, pods toggle from "OK" to "Liveness probe failed" or "Readiness probe failed"

I managed to get Ditto up and running on minikube, following the instructions provided in the README.txt file. I had to do some minor adjustments to the .yaml files (see Deployment of Ditto and MongoDB using kubectl fails because of unsupported version "extensions/v1beta1").
Now that the Ditto services have been started, the pods toggle from status "OK" to the following errors:
pod connectivity: Liveness probe failed: Get "": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
pod gateway: Readiness probe failed: Get "": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
pod things: Readiness probe failed: Get "": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Back-off restarting failed container
pod things-search: Readiness probe failed: Get "": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Back-off restarting failed container
pod policies: Readiness probe failed: Get "": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Back-off restarting failed container
pod concierge: Readiness probe failed: Get "": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Even when all pods have the status "OK", I can't send POST requests without getting Error 502 (Bad Gateway).
Any help for solving this problem is highly appreciated.
Thank you in advance.
Maybe this is caused by a resource issue for your Minikube VM.
How many CPUs and Memory does the VM have?
Maybe you can you scale up resources and try again?
I had several Problems with Ditto running in docker till I changed the CPU usage in docker from 4 to 8.
Docker Settings
Since I am using a 4Core/8Thread I wonder if a setting of 4 does lead to the usage of 2 cores (on an old mac). Which seems to be too few for Ditto.

Grafana clock panel manual installation

I have installed Grafana on Windows Server 2016 and it's runing as a Service.
Because my server is behind proxy server and i didn't find where i can setup proxy server in custom.ing i approach to manual installation of Grafana clock panel. The server it self has internet, but not Grafana.
From command line i tried with:
grafana-cli --pluginsDir C:\grafana-6.1.6\data\plugins\grafana-clock-panel plugins install grafana-clock-panel-6fdc3d5
but i get error:
Failed to send request: Get https://grafana.com/api/plugins/repo/grafana-clock-panel-6fdc3d5: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
[31mError[0m: [31m✗[0m Failed to send request. error: Get https://grafana.com/api/plugins/repo/grafana-clock-panel-6fdc3d5: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Minikube is slow and unresponsive

Today randomly minikube seems to be taking very long to respond to command via kubectl.
And occasionally even:
kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
How can I diagnose this?
Some logs from minikube logs:
==> kube-scheduler <==
I0527 14:16:55.809859 1 serving.go:319] Generated self-signed cert in-memory
W0527 14:16:56.256478 1 authentication.go:387] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.256856 1 authentication.go:249] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0527 14:16:56.257077 1 authentication.go:252] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0527 14:16:56.257189 1 authorization.go:177] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.257307 1 authorization.go:146] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0527 14:16:56.264875 1 server.go:142] Version: v1.14.1
I0527 14:16:56.265228 1 defaults.go:87] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W0527 14:16:56.286959 1 authorization.go:47] Authorization is disabled
W0527 14:16:56.286982 1 authentication.go:55] Authentication is disabled
I0527 14:16:56.286995 1 deprecated_insecure_serving.go:49] Serving healthz insecurely on [::]:10251
I0527 14:16:56.287397 1 secure_serving.go:116] Serving securely on
I0527 14:16:57.417028 1 controller_utils.go:1027] Waiting for caches to sync for scheduler controller
I0527 14:16:57.524378 1 controller_utils.go:1034] Caches are synced for scheduler controller
I0527 14:16:57.827438 1 leaderelection.go:217] attempting to acquire leader lease kube-system/kube-scheduler...
E0527 14:17:10.865448 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0527 14:17:43.418910 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0527 14:18:01.447065 1 leaderelection.go:227] successfully acquired lease kube-system/kube-scheduler
I0527 14:18:29.044544 1 leaderelection.go:263] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0527 14:18:38.999295 1 server.go:252] lost master
E0527 14:18:39.204637 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
lost lease
To work around this issue I just did a minikube delete and minikube start, and the performance issue resolved..
As solution has been found, I am posting this as Community Wiki for future users.
1) Debugging issues with minikube by adding -v flag and set debug level (0, 1, 2, 3, 7).
As example: minikube start --v=1 to set outbut to INFO level.
More detailed information here
2) Use logs command minikube logs
3) Because Minikube is working on Virtual Machine sometimes is better to delete minikube and start it again (It helped in this case).
minikube delete
minikube start
4) It might get slow due to lack of resources.
Minikube as default is using 2048MB of memory and 2 CPUs. More details about this can be fund here
In addition, you can enforce Minikube to create more using command
minikube start --cpus 4 --memory 8192