gitlab: unable to access git repository: Operation timed out - kubernetes

Our registered Gitlab-runner (on Kubernetes) was working fine, after upgrading the version of Gitlab, it can't clone the projects anymore! Does anyone have any idea about this issue?
Here is the log of the issue:
Running with gitlab-runner 14.9.0 (d1f69508)
on gitlab-runner-dev K5KVWdx-
Preparing the "kubernetes" executor
30:00
Using Kubernetes namespace: cicd
Using Kubernetes executor with image <docker-registry>:kuber_development ...
Using attach strategy to execute scripts...
Preparing environment
30:07
Waiting for pod cicd/runner-k5kvwdx--project-1227-concurrent-02kqgq to be running, status is Pending
Waiting for pod cicd/runner-k5kvwdx--project-1227-concurrent-02kqgq to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-k5kvwdx--project-1227-concurrent-02kqgq via gitlab-runner-85776bd9c6-rkdvl...
Getting source from Git repository
32:13
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/bigdata/search/query-processing-module/.git/
Created fresh repository.
fatal: unable to access '<git-repository>': Failed to connect to <gitlab-url> port 443 after 130010 ms: Operation timed out
Cleaning up project directory and file based variables
30:01
ERROR: Job failed: command terminated with exit code 1

Here is how I would debug this Issue:
Make sure there are no NetworkPolicies present, that are restricting the egress of the pod.
If you have the newest Kubernetes version you can run an ephemeral debug container inside the Pod to examine the networking situation. Docs
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
If not you can try to get a shell inside your container and examine the situation from there or you can try to start a pod on the same node and try to connect from there.
As soon as you have a shell inside some container that doesn't work try to answer the following questions:
Can you connect to some other Server?
Can you resolve the hostname?
Is the IP a private one and overlapping with some internal Kubernetes IPs?
Can you ping the IP? If yes
Can you curl the IP? If no
If you open another port on the target machine can you connect to this port? => if yes probably some firewall problem somewhere
If no (can't ping) => can be either firewall related or IP routing related.
I cannot say for sure what is wrong, but try the steps above and hopefully you get some insight into where the Problem is.

Related

Jenkins Kubernetes slaves are offline

I'm currently trying to run a Jenkins build on top of a Kubernetes minikube 2-node cluster. This is the code that I am using: https://github.com/rsingla2012/docker-development-youtube-series-youtube-series/tree/main/jenkins. Every time I run the build, I get an error that the slave is offline. This is the output of "kubectl get all -o wide -n jenkinsonkubernetes2" after I apply the files:
cmd line logs
Looking at the Jenkins logs below, Jenkins is able to spin up and provision a slave pod but as soon as the container is run (in this case, I'm using the inbound-agent image although it's named jnlp), the pod is terminated and deleted and another is created. Jenkins logs
2: https://i.stack.imgur.com/mudPi.png`enter code here`
I also added a new Jenkins logger for org.csanchez.jenkins.plugins.kubernetes at all levels, the log of which is shown below.
kubernetes logs
This led me to believe that it might be a network issue or a firewall blocking the port so I checked with netstat and although jenkins was listening at 0.0.0.0:8080, port 50000 was not. So, I opened port 50000 with an inbound rule for Windows 10, but after running the build, it's still not listening. For reference, I also created a node port for the service and port forwarded the master pod to port 32767, so that the Jenkins UI is accessible at 127.0.01:32767. I believed opening the port should fix the issue, but upon using Microsoft Telnet to double check, I received the error "Connecting To 127.0.0.1...Could not open connection to the host, on port 50000: Connect failed" with the command "open 127.0.0.1 50000". One thing I thought was causing the problem was the lack of a server certificate when accessing the kubernetes API from jenkins, so I added the Kubernetes server certificate key to the Kubernetes cloud configuration, but still receiving the same error. My kubernetes URL is set to https://kubernetes.default:443, Jenkins URL is http://jenkins, and I'm using Jenkins tunnel jenkins:50000 with no concurrency limit.

Kubectl connection refused existing cluster

Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.

Unable to connect to the server: net/http: TLS handshake timeout

On minikube for windows I created a deployment on the kubernetes cluster, then I tried to scale it by changing replicas from 1 to 2, and after that kubectl hangs and my disk usage is 100%.
I only have one container in my deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: first-deployment
spec:
replicas: 1
selector:
matchLabels:
run: app
template:
metadata:
labels:
run: app
spec:
containers:
- name: demo
image: ner_app
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5000
all I did was run this after the pods were successfully deployed and running
kubectl scale --replicas=2 deployment first-deployment
In another terminal I was watching the pods using
kubectl get pods --watch
But everything is unresponsive and I'm not sure how to recover from this.
When I run kubectl get pods again it gives the following message
PS D:\docker\ner> kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
Is there a way to recover, or cancel whatever process is running?
Also my VM's are on Hyper-V for Windows 10 Pro (minikube and Docker Desktop) both have the default RAM allocated - 2048MB
The container in my pod is a machine learning process and the model it loads could be large, in the order of 200MB to 300MB
You may have some proxy problems. Try following commands:
$ unset http_proxy
$ unset https_proxy
and repeat your kubectl call.
For me, the problem is that Docker ran out of memory. (EDIT: Possibly anyway; I wrote this post a while ago, and am now not so sure that is the root case, but did not write down my rationale, so idk.)
Anyway, to fix:
Fully close your k8s emulator. (docker desktop, minikube, etc.)
Shutdown WSL2. (wsl --shutdown) [EDIT: This step is apparently not necessary -- at least not always, since this time I skipped it, and the problem still resolved.]
Restart your k8s emulator.
Rerun the commands you wanted.
Sometimes it also works to simply:
Right click the Docker Desktop tray-icon, press "Restart Docker", and wait a few minutes for things to restart. (sometimes this fails, with Docker Desktop saying "Docker failed to start", so I'd generally recommend the more thorough process above)
Just happened to me on a new Windows 10 install with Ubuntu distro in WSL2. I solved the problem by running:
$ sudo ifconfig eth0 mtu 1350
(BTW, I was on a VPN connection when trying the 'kubectl get pods' command)
You can set up resource limits on deployments so that pods will not use the entire available resource in the node.
In my case I have my private EKS cluster and there is no 443(HTTPS) enabled in security groups.
My issue is solved after enabling the (HTTPS)443 port in security groups.
Kindly refer for AWS documentation for more details: "You must ensure that your Amazon EKS control plane security group contains rules to allow ingress traffic on port 443 from your connected network"
i solved this problem when execute the following command
minikube delete
and then start it
minikube start --vm-driver="virtualbox"
if use this why your pods will deleted
and when run kubectl get pods
you can see this result
No resources found in default namespace.
You could try $ unset all_proxy to reset the socket proxy.
Also, if you're connected to a VPN, try disconnecting - it seems that can interfere with connecting to a cluster.
I think the other answers don't really mention or refer to the vpn and proxy documentation for minikube: https://minikube.sigs.k8s.io/docs/handbook/vpn_and_proxy/
The NO_PROXY variable here is important: Without setting it, minikube may not be able to access resources within the VM. minikube uses two IP ranges, which should not go through the proxy:
192.168.99.0/24: Used by the minikube VM. Configurable for some hypervisors via --host-only-cidr
192.168.39.0/24: Used by the minikube kvm2 driver.
192.168.49.0/24: Used by the minikube docker driver’s first cluster.
10.96.0.0/12: Used by service cluster IP’s. Configurable via --service-cluster-ip-rang
So adding those IP ranges to your NO_PROXY environment variable should fix the issue.
Simply closing cmd, opening again, then
minikube start
And then executing the commands again solved this issue for me.
P.S: minikube start took less than a minute
Adding the IP address to the no_proxy list worked for me.
Obtain the IP address from ip addr output.
export no_proxy=localhost,127.0.0.1,<IP_ADDRESS>
restart minikube will work.
But if you don't want to delete it
then you can just switch to other cluster and then switch back.
I just click other kubenete cluster (ex: docker-desktop)
and then click back to the cluster I want to run (ex: minikube)
If you're on Linux or Mac, go to your virtualbox, and then on the toolbar choose 'Global Tools', then if you see two machines are using the same ip address, you should remove one of them. this image shows virtual box GUI
As this answer comes first on search for net-http-tls-handshake-timeout error
For those having issue with AWS EKS (and likely any K8s),
NO_PROXY solves problem by adding related IP/host to environment variable.
As suggested in comments for first answer.
For AWS EKS (when seeing this intermittently after vpc-cni addon upgrade)
replace for specific region or single url for your use case.
NO_PROXY=$NO_PROXY;eks.amazonaws.com
At least for Windows 10 and 11
$PS C:\oc rollback dc/my-app
Unable to connect to the server: net/http: TLS handshake timeout
For OpenShift 4.x the problem is that for some reason you are logged-out:
$PS C:\oc status
error: You must be logged in to the server (Unauthorized)
logging in by e.g.
$oc login -u developer
resolves the problem
Open PowerShell as an administrator and run the command "wsl --shutdown". You will see the same notification in your open Ubuntu terminal.
Open Docker Desktop.
Open a new terminal.
Run the command "minikube status" in the Ubuntu terminal.
Run the Minikube container. You can do this in Docker Desktop.
Run the command "minikube start".
That's it! You don't need to close your computer after this, and Minikube should work fine.

k8s, RabbitMQ, and Peer Discovery

We are trying to run an instance of the RabbitMQ chart with Helm from the helm/charts/stable/rabbit project. I had it running perfect but then I had to restart k8s for some maintenance. Now we are completely unable to launch the RabbitMQ chart in any way shape or form. I am not even trying to run the chart with any variables, i.e. just the default values.
Here is all I am doing:
helm install stable/rabbitmq
I have confirmed I can simply run the default right on my local k8s which I'm running with Docker for Desktop. When we run the rabbit chart on our shared k8s the exact same way as on desktop and what we did before the restart, the following error is thrown:
Failed to get nodes from k8s - 503
I have also posted an issue on the Helm charts repo as well. Click here to see the issue on Github.
We are suspecting the DNS but are unable to confirm anything yet. What is very frustrating is after the restart every single other chart we installed restarted perfectly except Rabbit which now will not start at all.
Anyone know what I could do to get Rabbits peer discovery to work? Anyone seen issue like this after restarting k8s?
So I actually got rabbit to run. Turns out my issue was the k8s peer discovery could not connect over the default port 443 and I had to use the external port 6443 because kubernetes.default.svc.cluster.local resolved to the public port and could not find the internal, so yeah our config is messed up too.
It took me a while to realize the variable below was not overriding when I overrode it with helm install . -f server-values.yaml.
rabbitmq:
configuration: |-
## Clustering
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.port = 6443
cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
# queue master locator
queue_master_locator=min-masters
# enable guest user
loopback_users.guest = false
I had to add cluster_formation.k8s.port = 6443 to the main values.yaml file instead of my own. Once the port was changed specifically in the values.yaml, rabbit started right up.
I'm wondering what is the reason of using rabbit_peer_discovery_k8s plugin, if values.yaml defaults to 1 replicas (your manifest file does not override this setting) ?
I was trying to reproduce your issue with given by you override values (dev-server.yaml), as per the details in your github issue #10811, but I somewhat failed. Here are my observations:
If to install RabbitMQ chart with your custom values, my rabbitmq-dev-default-0 pod gets stuck in CrashLoopBackOff state.
It`s quite hard to troubleshoot it further for me as bitnami`s rabbitmq image containers, used by this rabbitmq Helm chart, are shipped with non-root account.
On the other hand if rabbitmq chart is installed on my Kubernetes cluster (v1.13.2) in simplest form:
helm install stable/rabbitmq
I observe similar issue then. I mean rabbitmq server survives a simulated VM restart of all cluster nodes (including master), but I cannot connect to it from outside:
Post VM restart, I`m getting following error from my python mqclient:
socket.gaierror: [Errno -2] Name or service not known
Few remarks here:
Yes, I did port(s)-forward as per instructions on "helm status " command:
The readiness probe works fine:
curl -sS -f --user user:<my_pwd> 127.0.0.1:15672/api/healthchecks/node
{"status":"ok"}
rabbitmqctl to rabbitmq-server connectivity from inside the container works fine too:
kubectl exec rabbitmq-dev-default-0 -- rabbitmqctl list_queues
warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by running "locale" in your shell)
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
name messages
hello 11
From the moment I used kubectl port-forward to pod instead service, connectivity to rabbitmq server is restored:
kubectl port-forward --namespace default pod/rabbitmq-dev-default-0 5672:5672
$ python send.py
[x] Sent 'Hello World!'

Rancher v1.3.1 Kubernetes Dashboard not working

I try to install Rancher v1.3.1 and enable Kubernetes Environment, the install seem OK but when i navigate to Dashboard but result is blank page, i check 2 deployment :kubernetes-dashboard and tiller-deploy restart every time with log:
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.43.0.1:443/version: dial tcp 10.43.0.1:443: i/o timeout
I dont know why, Please help me
I dont know why kubernetes service for expose 10.43.0.1:443 belong different namespace(default) with others(kube-system)
Please try switching from using https://10.43.0.1:443 to http://10.43.0.1 by editing the following in the deployment.
args:
- --auto-generate-certificates
- --namespace=kubernetes-dashboard
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
- --apiserver-host=http://10.43.0.1