Kubernetes unable to create impersonator account timeout - kubernetes

Hosting: Azure centos VM's running RKE1
Rancher Version: v2.6.2
Kubernetes Version: 1.18.6
Looking for help diagnosing this issue; I get two error messages
From Rancher:
Cluster health check failed: Failed to communicate with API server during namespace check: Get "https://<NODE_IP>:6443/api/v1/namespace/kube-system?timeout=45s": write tcp 172.16.0.2:443 -> <NODE_IP>:7832:i/o timeout
From Kubectl:
unable to create impersonator account: error setting up impersonation for user user-sd7q9: Put "https://<NODE_IP>:6443/apis/rbac.authorization.k8s.io/v1/clusterroles/cattle-impersonation-user-sd7q9": write tcp 172.17.0.2:443-> <NODE_IP>:7832: i/o timeout
Nothing appears to be broken and my applications are still available via ingress.
https://github.com/rancher/rancher/issues/34671

Related

Jenkins Kubernetes slaves are offline

I'm currently trying to run a Jenkins build on top of a Kubernetes minikube 2-node cluster. This is the code that I am using: https://github.com/rsingla2012/docker-development-youtube-series-youtube-series/tree/main/jenkins. Every time I run the build, I get an error that the slave is offline. This is the output of "kubectl get all -o wide -n jenkinsonkubernetes2" after I apply the files:
cmd line logs
Looking at the Jenkins logs below, Jenkins is able to spin up and provision a slave pod but as soon as the container is run (in this case, I'm using the inbound-agent image although it's named jnlp), the pod is terminated and deleted and another is created. Jenkins logs
2: https://i.stack.imgur.com/mudPi.png`enter code here`
I also added a new Jenkins logger for org.csanchez.jenkins.plugins.kubernetes at all levels, the log of which is shown below.
kubernetes logs
This led me to believe that it might be a network issue or a firewall blocking the port so I checked with netstat and although jenkins was listening at 0.0.0.0:8080, port 50000 was not. So, I opened port 50000 with an inbound rule for Windows 10, but after running the build, it's still not listening. For reference, I also created a node port for the service and port forwarded the master pod to port 32767, so that the Jenkins UI is accessible at 127.0.01:32767. I believed opening the port should fix the issue, but upon using Microsoft Telnet to double check, I received the error "Connecting To 127.0.0.1...Could not open connection to the host, on port 50000: Connect failed" with the command "open 127.0.0.1 50000". One thing I thought was causing the problem was the lack of a server certificate when accessing the kubernetes API from jenkins, so I added the Kubernetes server certificate key to the Kubernetes cloud configuration, but still receiving the same error. My kubernetes URL is set to https://kubernetes.default:443, Jenkins URL is http://jenkins, and I'm using Jenkins tunnel jenkins:50000 with no concurrency limit.

Terraform dial tcp 192.xx.xx.xx:443: i/o timeout error

I am trying to implement CI / CD using GitLab + Terraform to K8S Cluster and K8S Control Plane (Master node) was setup on CentOS
However, Pipeline job fails with the following error
Error: Failed to get existing workspaces: Get "https://192.xx.xx.xx/api/v1/namespaces/default/secrets?labelSelector=tfstate%3Dtrue": dial tcp 192.xx.xx.xx:443: i/o timeout
From the error mentioned above (default/secrets?labelSelector=tfstate%3Dtrue), I assume the error is related to missing 'terraform secret' on default namespace
Example (Terraform secret taken from my Windows)
PS C:\> kubectl get secret
NAME TYPE DATA AGE
default-token-7mzv6 kubernetes.io/service-account-token 3 27d
tfstate-default-state Opaque 1 15h
However, I am not sure which process would create 'tfsecret' or should we create it manually ?
Kindly let me know if I my understanding is wrong and had I missed anything else
EDIT
The issue mentioned above occurred because existing Gitlab-runner was on a different subnet (eg 172.xx.xx.xx instead of 192.xx.xx.xx)
I was asked to use a different Gitlab-runner which runs on the same subnet and now it throws the following error
Error: Failed to get existing workspaces: Get "https://192.xx.xx.xx:6443/api/v1/namespaces/default/secrets?labelSelector=tfstate%3Dtrue": x509: certificate signed by unknown authority
Now, I am bit confused whether the certificate-issue is between GitLab-Runner and Gitlab-Server or Gitlab-Server and K8S Cluster or something else
You have configured Kubernetes as the remote state backend for your Terraform configuration. The error is, that the backend is trying to query existing secrets to determine what workspaces are configured. The x509: certificate signed by unknown authority indicates, that the KUBECONFIG the remote state backend uses does not match the CA of the API server you're connecting to.
If the runners are K8s pods themselves, make sure you provide a KUBECONFIG that matches your target cluster and that the remote state does not configure itself as in-cluster by reading the service account token every K8s pod has - which in most cases will only work for the cluster the pod is running on.
You don't provide enough information to be more specific. But big picture, you have to configure the state backend, and any provider that connect to K8s. Theoretically, the state backend secrets and the K8s resources do not have to be on the same cluster. Meaning, you may have to have different configuration for state backend and K8s providers.

airflow ui log error after deploying on kubernetes while using celery workers

After deploying BITNAMI HELM chart for AIRFLOW, on kubernetes cluster, ALTHOUGH EVERYTHING WORKS, logging is still unreachable.
Turns out that helm chat that is being used to deploy is using a headless service for communication between celery workers and is not able to show me logs.
I have set the hostname_callable setting right, and yet, LOGS ALWAYS PICK UP THE NAME OF HEADLESS SERVICE as their hostname, but, not the DNS name.
*** Log file does not exist: /opt/bitnami/airflow/logs/secondone/s3files/2020-06-19T10:35:00+00:00/1.log
*** Fetching from: http://mypr-afw-worker-1.mypr-afw-headless.mynamespace.svc.cluster.local:8793/log/secondone/s3files/2020-06-19T10:35:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='mypr-afw-worker-1.mypr-afw-headless.mynamespace.svc.cluster.local', port=8793): Max retries exceeded with url: /log/secondone/s3files/2020-06-19T10:35:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f12917f5630>: Failed to establish a new connection: [Errno 111] Connection refused',))
Any help in this regard would be appreciated! thanks!
How are you setting the hostname, it seems you need to pass them as an array:
## The list of hostnames to be covered with this ingress record.
## Most likely this will be just one host, but in the event more hosts are needed, this is an array
##
hosts:
- name: airflow.local
path: /
or --set ingress.hosts[0].name=airflow.local --set ingress.hosts[0].path=/ in the helm install command

Error creating: Internal error occurred: failed calling webhook "validator.trow.io" installing Ceph with Helm on Kubernetes

I'm trying to install Ceph using Helm on Kunbernetes following this tutorial
install ceph
Probably the problem is that I installed trow registry before because as soon as I run the helm step
helm install --name=ceph local/ceph --namespace=ceph -f ~/ceph-overrides.yaml
I get this error in ceph namespace
Error creating: Internal error occurred: failed calling webhook "validator.trow.io": Post https://trow.kube-public.svc:443/validate-image?timeout=30s: dial tcp 10.102.137.73:443: connect: connection refused
How can I solve this?
Apparently you are right with the presumption, I have a few concerns about this issue.
Trow registry manager controls the images that run in the cluster via implementing Admission webhooks that validate every request before pulling image, and as far as I can see Docker Hub images are not accepted by default.
The default policy will allow all images local to the Trow registry to
be used, plus Kubernetes system images and the Trow images themselves.
All other images are denied by default, including Docker Hub images.
Due to the fact that during Trow installation procedure you might require to distribute and approve certificate in order to establish secure HTTPS connection from target node to Trow server, I would suggest to check certificate presence on the node where you run ceph-helm chart as described in Trow documentation.
The other option you can run Trow registry manager with disabled TLS over HTTP, as was guided in the installation instruction.
This command should help to get it cleaned.
kubectl delete ValidatingWebhookConfiguration -n rook-ceph rook-ceph-webhook

Rancher v1.3.1 Kubernetes Dashboard not working

I try to install Rancher v1.3.1 and enable Kubernetes Environment, the install seem OK but when i navigate to Dashboard but result is blank page, i check 2 deployment :kubernetes-dashboard and tiller-deploy restart every time with log:
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.43.0.1:443/version: dial tcp 10.43.0.1:443: i/o timeout
I dont know why, Please help me
I dont know why kubernetes service for expose 10.43.0.1:443 belong different namespace(default) with others(kube-system)
Please try switching from using https://10.43.0.1:443 to http://10.43.0.1 by editing the following in the deployment.
args:
- --auto-generate-certificates
- --namespace=kubernetes-dashboard
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
- --apiserver-host=http://10.43.0.1