Jenkins Kubernetes slaves are offline - kubernetes

I'm currently trying to run a Jenkins build on top of a Kubernetes minikube 2-node cluster. This is the code that I am using: https://github.com/rsingla2012/docker-development-youtube-series-youtube-series/tree/main/jenkins. Every time I run the build, I get an error that the slave is offline. This is the output of "kubectl get all -o wide -n jenkinsonkubernetes2" after I apply the files:
cmd line logs
Looking at the Jenkins logs below, Jenkins is able to spin up and provision a slave pod but as soon as the container is run (in this case, I'm using the inbound-agent image although it's named jnlp), the pod is terminated and deleted and another is created. Jenkins logs
2: https://i.stack.imgur.com/mudPi.png`enter code here`
I also added a new Jenkins logger for org.csanchez.jenkins.plugins.kubernetes at all levels, the log of which is shown below.
kubernetes logs
This led me to believe that it might be a network issue or a firewall blocking the port so I checked with netstat and although jenkins was listening at 0.0.0.0:8080, port 50000 was not. So, I opened port 50000 with an inbound rule for Windows 10, but after running the build, it's still not listening. For reference, I also created a node port for the service and port forwarded the master pod to port 32767, so that the Jenkins UI is accessible at 127.0.01:32767. I believed opening the port should fix the issue, but upon using Microsoft Telnet to double check, I received the error "Connecting To 127.0.0.1...Could not open connection to the host, on port 50000: Connect failed" with the command "open 127.0.0.1 50000". One thing I thought was causing the problem was the lack of a server certificate when accessing the kubernetes API from jenkins, so I added the Kubernetes server certificate key to the Kubernetes cloud configuration, but still receiving the same error. My kubernetes URL is set to https://kubernetes.default:443, Jenkins URL is http://jenkins, and I'm using Jenkins tunnel jenkins:50000 with no concurrency limit.

Related

gitlab: unable to access git repository: Operation timed out

Our registered Gitlab-runner (on Kubernetes) was working fine, after upgrading the version of Gitlab, it can't clone the projects anymore! Does anyone have any idea about this issue?
Here is the log of the issue:
Running with gitlab-runner 14.9.0 (d1f69508)
on gitlab-runner-dev K5KVWdx-
Preparing the "kubernetes" executor
30:00
Using Kubernetes namespace: cicd
Using Kubernetes executor with image <docker-registry>:kuber_development ...
Using attach strategy to execute scripts...
Preparing environment
30:07
Waiting for pod cicd/runner-k5kvwdx--project-1227-concurrent-02kqgq to be running, status is Pending
Waiting for pod cicd/runner-k5kvwdx--project-1227-concurrent-02kqgq to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-k5kvwdx--project-1227-concurrent-02kqgq via gitlab-runner-85776bd9c6-rkdvl...
Getting source from Git repository
32:13
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/bigdata/search/query-processing-module/.git/
Created fresh repository.
fatal: unable to access '<git-repository>': Failed to connect to <gitlab-url> port 443 after 130010 ms: Operation timed out
Cleaning up project directory and file based variables
30:01
ERROR: Job failed: command terminated with exit code 1
Here is how I would debug this Issue:
Make sure there are no NetworkPolicies present, that are restricting the egress of the pod.
If you have the newest Kubernetes version you can run an ephemeral debug container inside the Pod to examine the networking situation. Docs
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
If not you can try to get a shell inside your container and examine the situation from there or you can try to start a pod on the same node and try to connect from there.
As soon as you have a shell inside some container that doesn't work try to answer the following questions:
Can you connect to some other Server?
Can you resolve the hostname?
Is the IP a private one and overlapping with some internal Kubernetes IPs?
Can you ping the IP? If yes
Can you curl the IP? If no
If you open another port on the target machine can you connect to this port? => if yes probably some firewall problem somewhere
If no (can't ping) => can be either firewall related or IP routing related.
I cannot say for sure what is wrong, but try the steps above and hopefully you get some insight into where the Problem is.

Unable to connect to the server: net/http: TLS handshake timeout

On minikube for windows I created a deployment on the kubernetes cluster, then I tried to scale it by changing replicas from 1 to 2, and after that kubectl hangs and my disk usage is 100%.
I only have one container in my deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: first-deployment
spec:
replicas: 1
selector:
matchLabels:
run: app
template:
metadata:
labels:
run: app
spec:
containers:
- name: demo
image: ner_app
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5000
all I did was run this after the pods were successfully deployed and running
kubectl scale --replicas=2 deployment first-deployment
In another terminal I was watching the pods using
kubectl get pods --watch
But everything is unresponsive and I'm not sure how to recover from this.
When I run kubectl get pods again it gives the following message
PS D:\docker\ner> kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
Is there a way to recover, or cancel whatever process is running?
Also my VM's are on Hyper-V for Windows 10 Pro (minikube and Docker Desktop) both have the default RAM allocated - 2048MB
The container in my pod is a machine learning process and the model it loads could be large, in the order of 200MB to 300MB
You may have some proxy problems. Try following commands:
$ unset http_proxy
$ unset https_proxy
and repeat your kubectl call.
For me, the problem is that Docker ran out of memory. (EDIT: Possibly anyway; I wrote this post a while ago, and am now not so sure that is the root case, but did not write down my rationale, so idk.)
Anyway, to fix:
Fully close your k8s emulator. (docker desktop, minikube, etc.)
Shutdown WSL2. (wsl --shutdown) [EDIT: This step is apparently not necessary -- at least not always, since this time I skipped it, and the problem still resolved.]
Restart your k8s emulator.
Rerun the commands you wanted.
Sometimes it also works to simply:
Right click the Docker Desktop tray-icon, press "Restart Docker", and wait a few minutes for things to restart. (sometimes this fails, with Docker Desktop saying "Docker failed to start", so I'd generally recommend the more thorough process above)
Just happened to me on a new Windows 10 install with Ubuntu distro in WSL2. I solved the problem by running:
$ sudo ifconfig eth0 mtu 1350
(BTW, I was on a VPN connection when trying the 'kubectl get pods' command)
You can set up resource limits on deployments so that pods will not use the entire available resource in the node.
In my case I have my private EKS cluster and there is no 443(HTTPS) enabled in security groups.
My issue is solved after enabling the (HTTPS)443 port in security groups.
Kindly refer for AWS documentation for more details: "You must ensure that your Amazon EKS control plane security group contains rules to allow ingress traffic on port 443 from your connected network"
i solved this problem when execute the following command
minikube delete
and then start it
minikube start --vm-driver="virtualbox"
if use this why your pods will deleted
and when run kubectl get pods
you can see this result
No resources found in default namespace.
You could try $ unset all_proxy to reset the socket proxy.
Also, if you're connected to a VPN, try disconnecting - it seems that can interfere with connecting to a cluster.
I think the other answers don't really mention or refer to the vpn and proxy documentation for minikube: https://minikube.sigs.k8s.io/docs/handbook/vpn_and_proxy/
The NO_PROXY variable here is important: Without setting it, minikube may not be able to access resources within the VM. minikube uses two IP ranges, which should not go through the proxy:
192.168.99.0/24: Used by the minikube VM. Configurable for some hypervisors via --host-only-cidr
192.168.39.0/24: Used by the minikube kvm2 driver.
192.168.49.0/24: Used by the minikube docker driver’s first cluster.
10.96.0.0/12: Used by service cluster IP’s. Configurable via --service-cluster-ip-rang
So adding those IP ranges to your NO_PROXY environment variable should fix the issue.
Simply closing cmd, opening again, then
minikube start
And then executing the commands again solved this issue for me.
P.S: minikube start took less than a minute
Adding the IP address to the no_proxy list worked for me.
Obtain the IP address from ip addr output.
export no_proxy=localhost,127.0.0.1,<IP_ADDRESS>
restart minikube will work.
But if you don't want to delete it
then you can just switch to other cluster and then switch back.
I just click other kubenete cluster (ex: docker-desktop)
and then click back to the cluster I want to run (ex: minikube)
If you're on Linux or Mac, go to your virtualbox, and then on the toolbar choose 'Global Tools', then if you see two machines are using the same ip address, you should remove one of them. this image shows virtual box GUI
As this answer comes first on search for net-http-tls-handshake-timeout error
For those having issue with AWS EKS (and likely any K8s),
NO_PROXY solves problem by adding related IP/host to environment variable.
As suggested in comments for first answer.
For AWS EKS (when seeing this intermittently after vpc-cni addon upgrade)
replace for specific region or single url for your use case.
NO_PROXY=$NO_PROXY;eks.amazonaws.com
At least for Windows 10 and 11
$PS C:\oc rollback dc/my-app
Unable to connect to the server: net/http: TLS handshake timeout
For OpenShift 4.x the problem is that for some reason you are logged-out:
$PS C:\oc status
error: You must be logged in to the server (Unauthorized)
logging in by e.g.
$oc login -u developer
resolves the problem
Open PowerShell as an administrator and run the command "wsl --shutdown". You will see the same notification in your open Ubuntu terminal.
Open Docker Desktop.
Open a new terminal.
Run the command "minikube status" in the Ubuntu terminal.
Run the Minikube container. You can do this in Docker Desktop.
Run the command "minikube start".
That's it! You don't need to close your computer after this, and Minikube should work fine.

Azure Service Fabric Cluster returns nothing for code-versions and config-versions

In short: both the "sfctl cluster code-versions" and "sfctl cluster config-versions" return empty arrays. Is this a symptom of a problem with the cluster?
Background: I am attempting to follow the Create a Linux container app tutorial, for learning about Service Fabric; but I have run into a problem when the application upload fails with a timeout.
On investigating this, I found that the other sfctl cluster commands (e.g. sfctl cluster health) all worked and returned useful data - except code-versions and config-versions, which both return an empty array:
$ sfctl cluster code-versions
[]
$ sfctl cluster config-versions
[]
I'm not sure if that's unhealthy, or what kind of data they might be returning.
Other notes:
The cluster is secured with a self-signed certificate; this is installed locally and works correctly, but both the above commands also log a warning:
~/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning)
However, the same warning is logged for the other commands (e.g. sysctl cluster health) and doesn't stop them from working.
The cluster is at version 6.4.634.1, on Linux
Service Fabric Explorer shows everything as Healthy: Cluster Health State, System Application Health State, and the 3 nodes.
The Azure portal shows the cluster status as "Baseline upgrade"
Explorer shows the cluster as having Code Version "0.0.0.0"

Unable to connect to the server: dial tcp [::1]:8080: connectex: No connection could be made because the target machine actively refused it

Am working on Azure Kubernates where we can store Docker Images in Azure. Here am trying to check my kubectl version, then am getting
Unable to connect to the server: dial tcp [::1]:8080: connectex: No
connection could be made because the target machine actively refused
it.
For this I followed MSDN:uilding Microservices with AKS and VSTS – Part 2 and MSDOCS:Kubernetes on windows
So, can you please suggest me “How to resolve for this issue?”
I am on windows 10, and for me I did not enable kubernetes on Docker Desktop.
As you can see here, there are no contexts available.
So go to settings of docker desktop and enable it as follows.
Now run a command as follows.
kubectl config get-contexts
Ensure you see something like this.
Also you can also try listing the nodes as follows.
kubectl get nodes
I think you might missed out to configure the cluster, for that you need to run the below command in your command prompt.
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster
The above CLI command creates .config file with complete cluster and nodes details in your local machine.
After that you run kubectl get nodes command in your command prompt, then you can get the list of nodes inside the cluster like in the below image.
For reference follow this Deploy an Azure Kubernetes Service (AKS) cluster.
If you can see that your config file is correctly configured by going to $HOME/.kube/config - Linux or %UserProfile%/.kube/config - Windows but you are still receiving the error message - try running command line as an administrator.
More information on the config file can be found here: https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/
In my case, I was shuffling between az aks k8s cluster and local docker-desktop.
So every time I change the cluster context I need to restart the docker, else I get the same described error.
Unable to connect to the server: dial tcp 127.0.0.1:6443: connectex: No connection could be made because the target machine actively refused it.
PS: make sure your cluster is started as shown in this picture showing (Stop local cluster)
For me it appeared to be due to Windows not having a HOME environment variable set. According to the docs kubectl will use the config file $(HOME)/.kube/config. But since this variable isn't set on Window it can't locate the file.
I created a HOME variable with the same value as USERPROFILE and it started working.
I'm using Hyper-V on Local Windows and I met this error because I didn't configure minikube.
(I know the question is about Azure, not minikube. But this article is on the top for the error message. So, I've put the solution here.)
1. enable Hyper-V.
Type in systeminfo on your Terminal. If you can find the line below,
Hyper-V Requirements: A hypervisor has been detected. Features required for Hyper-V will not be displayed.
Hyper-V works correctly.
If you can't, enable it from settings.
2. Create Hyper-V Network Switch
Open Hyper-V manager. (Searching it is the fastest way.)
Next, click your PC name on the left.
Then, you can find Virtual Switch Manager menu on the right.
Click it and choose External Virtual Switch with name: "Minikube Switch"
Click apply to create it.
3. start minikube
Go back to terminal and type in:
minikube start --vm-driver hyperv --hyperv-virtual-switch "Minikube Switch"
For more information, check the steps in this article.
Check docker is running and you started minikube or whichever cloud kube you using.
my issue resolved after running "minikube start --driver=docker"
Essentially this problem occurs if your minikube or kind isn't configured. Just try to restart your minikube or kind. If that doesn't solve your problem then try to restart your hypervisor which minikube uses.
minikube start
This command solved my issue.
I was facing the same error while firing the command "kubectl get pods"
The issue has been resolved by having following steps below:
a) First find out current-context
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
b) if no context is set then set it manually by using
kubectl config set-context <Your context>
Hope this will help you.
If you're facing this error on windows, its possible that your docker instance is not running.
These are the steps I followed to replicate the above error;
Stopped docker and then tried to start-up an nginx-deployment. Doing this caused the mentioned error above to happen.
How did I solve it?
Check if minikube is running in my case this was not running
Start minikube
Retry applying your configuration above. In my case see the screenshot below
When you see that your deployment has been created, then all should be fine.
I had exactly the same problem even after having correct config (by running an azure cli command).
It seems that kubectl expects HOME env.variable set but it did not exist for me. There is however a solution:
If you add a KUBECONFIG environmental variable that will point to config it will start working.
Example:
setx KUBECONFIG %UserProfile%\.kube\config
When the variable is present kubectl has no troubles reading from file.
P.S. It is an alternative to setting a HOME variable as suggested in another answer.
Azure self-hosted agent doesn't have the permission to access Kubernates cluster:
Remove Azure self-hosted agent - .\config.cmd Remove
configure again ( .\config.cmd) with a user have permission to access Kubernates cluster
I encountered similar problem:
> kubectl cluster-info
"To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Unable to connect to the server: dial tcp xxx.x.x.x:8080: connectex: No connection could be made because the target machine actively refused it."
> kubectl cluster-info dump
Unable to connect to the server: dial tcp xxx.0.0.x:8080: connectex: No connection could be made because the target machine actively refused it.
This setup was working fine until Docker for Desktop bought it's own copy of kubectl. There are 2 ways to overcome this situation:
1 - Quit / Stop Docker for Desktop while using the cluster
2 - Set KUBECONFIG file path
I tried both the options and they worked.
Found a good source for .kube/config, sending it over here for quick reference:
apiVersion: v1
clusters:
- cluster:
certificate-authority: fake-ca-file
server: https://1.2.3.4
name: development
- cluster:
insecure-skip-tls-verify: true
server: https://5.6.7.8
name: scratch
contexts:
- context:
cluster: development
namespace: frontend
user: developer
name: dev-frontend
- context:
cluster: development
namespace: storage
user: developer
name: dev-storage
- context:
cluster: scratch
namespace: default
user: experimenter
name: exp-scratch
current-context: ""
kind: Config
preferences: {}
users:
- name: developer
user:
client-certificate: fake-cert-file
client-key: fake-key-file
- name: experimenter
user:
password: some-password
username: exp
Reference: https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/
Following #ilya-chernomordik,
I've added my config path to the System Variable by doing
setx KUBECONFIG "D:\Minikube\Minikube.minikube\config"
I have changed the default Location from C: Drive to D: Drive as i have less space in C.
Now the problem is fixed.
edit: after 5 mins, the api server again stopped. It's been more than 5-6 hours i'm trying to solve this issue. I'm not sure why this problem is happening, even after adding the coreect path.
On Rancher Desktop, make sure context is correctly choosen
In my situation, I'm in windows with docker desktop in a simple scenario just for studies, but the case is:
In the docker version in 20.10 or above, it come with kubernetes installed. Then it doesn't necessary installed a cluster adm like minikube. Then, when it just need to enable kubernetes in Docker Desktop configuration. Like:
Go to Docker Desktop: settings > kubernetes > check the box inside section Enable kubernetes and then click in Restart Kubernetes Cluster
When we do this, the docker provide all needed to works Kubernetes properly.
Referenced by: Blog

How does Kubectl connect to the master

I've installed Kubernetes via Vagrant on OS X and everything seems to be working fine, but I'm unsure how kubectl is able to communicate with the master node despite being local to the workstation filesystem.
How is this implemented?
kubectl has a configuration file that specifies the location of the Kubernetes apiserver and the client credentials to authenticate to the master. All of the commands issued by kubectl are over the HTTPS connection to the apiserver.
When you run the scripts to bring up a cluster, they typically generate this local configuration file with the parameters necessary to access the cluster you just created. By default, the file is located at ~/.kube/config.
In addition to what Robert said: the connection between your local CLI and the cluster is controlled through kubectl config set, see the docs.
The Getting started with Vagrant section of the docs should contain everything you need.