Gitlab CI cannot resolve service hostname on kubernetes runner - kubernetes

I have a GitLab CI pipeline configured to run on Kubernetes runner. Everything worked great until I tried to add services (https://docs.gitlab.com/ee/ci/services/mysql.html) for test job. The service hostname (eg.: mysql) cannot be resolved on kubernetes, resulting into the following error dial tcp: lookup mysql on 10.96.0.10:53: no such host. However, it works on docker runner, but that's just not what I want. Is there any way to
The job definition from .gitlab-ci.yml:
test:
stage: test
variables:
MYSQL_ROOT_PASSWORD: --top-secret--
MYSQL_DATABASE: --top-secret--
MYSQL_USER: --top-secret--
MYSQL_PASSWORD: --top-secret--
services:
- mysql:latest
- nats:latest
script:
- ping -c 2 mysql
- go test -cover -coverprofile=coverage.prof.tmp ./...
Edit:
Logs from runner-jd6sxcl7-project-430-concurrent-0g5bm8 pod show that the services started. There are 4 containers total inside the pod: build,helper,svc-0 (mysql), svc-1 (nats)
svc-0 logs show the mysql service started successfully:
2019-12-09 21:52:07+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.18-1debian9 started.
2019-12-09 21:52:07+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2019-12-09 21:52:08+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.18-1debian9 started.
2019-12-09 21:52:08+00:00 [Note] [Entrypoint]: Initializing database files
2019-12-09T21:52:08.226747Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release.
2019-12-09T21:52:08.233097Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.18) initializing of server in progress as process 46
svc-1 logs show the nats service started successfully as well:
[1] 2019/12/09 21:52:12.876121 [INF] Starting nats-server version 2.1.2
[1] 2019/12/09 21:52:12.876193 [INF] Git commit [679beda]
[1] 2019/12/09 21:52:12.876393 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2019/12/09 21:52:12.876522 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2019/12/09 21:52:12.876548 [INF] Server id is NCPAQNFKKWPI67DZHSWN5EWOCQSRACFG2FXNGTLMW2NNRBAMLSDY4IYQ
[1] 2019/12/09 21:52:12.876552 [INF] Server is ready
[1] 2019/12/09 21:52:12.876881 [INF] Listening for route connections on 0.0.0.0:6222

This was a known issue with older versions of GitLab Runner (< 12.8), or when running Kubernetes executor in older versions of Kubernetes (< 1.7).
From the Kubernetes executor documentation of GitLab Runner:
Since GitLab Runner 12.8 and Kubernetes 1.7, the services are accessible via their DNS names. If you are using an older version you will have to use localhost.
(emphasis mine)
It's important to keep in mind that the other restrictions and implications of the Kubernetes executor. From the same document:
Note that when services and containers are running in the same Kubernetes
pod, they are all sharing the same localhost address.
So, even if you're able to use the service specific hostname to talk to your service, it's all really localhost (127.0.0.1) underneath.
As such, keep in mind the other important restriction from the same document:
You cannot use several services using the same port
(thanks to #user3154003 for the link to the GitLab Runner issue in a currently deleted answer that pointed me in the right direction for this answer.)

From the documentation link you provided, I see that the mysql hostname should be accessible.
How services are linked to the job
To better understand how the container linking works, read Linking
containers together.
To summarize, if you add mysql as service to your application, the
image will then be used to create a container that is linked to the
job container.
The service container for MySQL will be accessible under the hostname
mysql. So, in order to access your database service you have to
connect to the host named mysql instead of a socket or localhost. Read
more in accessing the services.
Also from the doc
If you don’t specify a service alias, when the job is run, service will be started and you will have access to it from your build container
So can you check if there are errors when the mysql service started.
I provided this as answer since I could not fit this in comment.

Related

How to fix error with GitLab runner inside Kubernetes cluster - try setting KUBERNETES_MASTER environment variable

I have setup two VMs that I am using throughout my journey of educating myself in CI/CD, GitLab, Kubernetes, Cloud Computing in general and so on. Both VMs have Ubuntu 22.04 Server as a host.
VM1 - MicroK8s Kubernetes cluster
Most of the setup is "default". Since I'm not really that knowledgeable, I have only configured two pods and their respective services - one with PostGIS and the other one with GeoServer. My intent is to add a third pod, which is the deployment of a app that I a have in VM2 and that will communicate with the GeoServer in order to provide a simple map web service (Leaflet + Django). All pods are exposed both within the cluster via internal IPs as well as externally (externalIp).
I have also installed two GitLab-related components here:
GitLab Runner with Kubernetes as executor
GitLab Kubernetes Agent
In VM2 both are visible as connected.
VM2 - GitLab
Here is where GitLab (default installation, latest version) runs. In the configuration (/etc/gitlab/gitlab.rb) I have enabled the agent server.
Initially I had the runner in VM1 configured to have Docker as executor. I had not issues with that. However then I thought it would be nice to try out running the runner inside the cluster so that everything is capsuled (using the internal cluster IPs without further configuration and exposing the VM's operating system).
Both the runner and agent are showing as connected but running a pseudo-CI/CD pipeline (the one provided by GitLab, where you have build, test and deploy stages with each consisting of a simple echo and waiting for a few seconds) returns the following error:
Running with gitlab-runner 15.8.2 (4d1ca121)
on testcluster-k8s-runner Hko2pDKZ, system ID: s_072d6d140cfe
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
ERROR: Job failed (system failure): getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
I am unable to find any information regarding KUBERNETES_MASTER except in issue tickets (GitLab) and questions (SO and other Q&A platforms). I have no idea what it is, where to set it. My guess would be it belongs in the runner's configuration on VM1 or at least the environment of the gitlab-runner (the user that contains the runner's userspace with its respective /home/gitlab-runner directory).
The only one possible solution I have found so far is to create the .kube directory from the user which uses kubectl (in my case microk8s kubectl since I use MicroK8s) to the home directory of the GitLab runner. I didn't see anything special in this directory (no hidden files) except for a cache subdirectory, hence my decision to simply create it at /home/gitlab-runner/.kube, which didn't change a thing.

gitlab: unable to access git repository: Operation timed out

Our registered Gitlab-runner (on Kubernetes) was working fine, after upgrading the version of Gitlab, it can't clone the projects anymore! Does anyone have any idea about this issue?
Here is the log of the issue:
Running with gitlab-runner 14.9.0 (d1f69508)
on gitlab-runner-dev K5KVWdx-
Preparing the "kubernetes" executor
30:00
Using Kubernetes namespace: cicd
Using Kubernetes executor with image <docker-registry>:kuber_development ...
Using attach strategy to execute scripts...
Preparing environment
30:07
Waiting for pod cicd/runner-k5kvwdx--project-1227-concurrent-02kqgq to be running, status is Pending
Waiting for pod cicd/runner-k5kvwdx--project-1227-concurrent-02kqgq to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-k5kvwdx--project-1227-concurrent-02kqgq via gitlab-runner-85776bd9c6-rkdvl...
Getting source from Git repository
32:13
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/bigdata/search/query-processing-module/.git/
Created fresh repository.
fatal: unable to access '<git-repository>': Failed to connect to <gitlab-url> port 443 after 130010 ms: Operation timed out
Cleaning up project directory and file based variables
30:01
ERROR: Job failed: command terminated with exit code 1
Here is how I would debug this Issue:
Make sure there are no NetworkPolicies present, that are restricting the egress of the pod.
If you have the newest Kubernetes version you can run an ephemeral debug container inside the Pod to examine the networking situation. Docs
kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
If not you can try to get a shell inside your container and examine the situation from there or you can try to start a pod on the same node and try to connect from there.
As soon as you have a shell inside some container that doesn't work try to answer the following questions:
Can you connect to some other Server?
Can you resolve the hostname?
Is the IP a private one and overlapping with some internal Kubernetes IPs?
Can you ping the IP? If yes
Can you curl the IP? If no
If you open another port on the target machine can you connect to this port? => if yes probably some firewall problem somewhere
If no (can't ping) => can be either firewall related or IP routing related.
I cannot say for sure what is wrong, but try the steps above and hopefully you get some insight into where the Problem is.

Kubernetes unable to create impersonator account timeout

Hosting: Azure centos VM's running RKE1
Rancher Version: v2.6.2
Kubernetes Version: 1.18.6
Looking for help diagnosing this issue; I get two error messages
From Rancher:
Cluster health check failed: Failed to communicate with API server during namespace check: Get "https://<NODE_IP>:6443/api/v1/namespace/kube-system?timeout=45s": write tcp 172.16.0.2:443 -> <NODE_IP>:7832:i/o timeout
From Kubectl:
unable to create impersonator account: error setting up impersonation for user user-sd7q9: Put "https://<NODE_IP>:6443/apis/rbac.authorization.k8s.io/v1/clusterroles/cattle-impersonation-user-sd7q9": write tcp 172.17.0.2:443-> <NODE_IP>:7832: i/o timeout
Nothing appears to be broken and my applications are still available via ingress.
https://github.com/rancher/rancher/issues/34671

Fabric v2.0 in kubernetes (minikube) - problem running docker inside peer for running chaincode

I am trying to run the Fabric 2.0 test-network in Kubernetes (locally, in minikube) and am facing an issue with the installing or running of the chaincode by the peers (in a docker container, it seems).
I created kubernetes files based on the docker-compose-test-net.yaml and successfully deployed the network, generated the crypto material, created and joined the channel, installed the chaincode on the peers, commited its definition. But when I try to invoke it, I have the following error:
Error: endorsement failure during invoke. response: status:500 message:"error in simulation:
failed to execute transaction 68e996b0d17c210af9837a78c0480bc7ba0c7c0f84eec7da359a47cd1f5c704a:
could not launch chaincode fabcar_01:bb76beb676a23a9be9eb377a452baa4b756cb1dc3a27acf02ecb265e1a7fd3df:
chaincode registration failed: container exited with 0"
I included in that pastebin the logs of the peer. We can see in there that it starts the container, but then I don't understand what happens with it: https://pastebin.com/yrMwG8Nd
I then tried as explained here: https://github.com/IBM/blockchain-network-on-kubernetes/issues/27. Where they say that
IKS v1.11 and onwards now use containerd as its container runtime
instead of the docker engine therefore using docker.sock is no longer
possible.
And they propose to deploy a docker pod (dind) with that file and that file and change the occurences of unix:///host/var/run/docker.sock to tcp://docker:2375.
But then I have the following error when I try to install the chaincode:
Error: chaincode install failed with status:
500 - failed to invoke backing implementation of 'InstallChaincode':
could not build chaincode:
docker build failed:
docker image inspection failed:
cannot connect to Docker endpoint
So it seems it cannot connect to the Docker endpoint. But I cannot find how to fix this.
If you have an idea, it would help a lot!
I found my issue:
For the peers, I was setting:
- name: CORE_PEER_CHAINCODEADDRESS
value: peer0-org1-example-com:7052
- name: CORE_PEER_CHAINCODELISTENADDRESS
value: 0.0.0.0:7052
like they do for the test-network with docker-compose.
Removing those made it work. I guess there were important for the docker-compose setup, but not adequate for kubernetes.

Gcloud Kubernetes and Redis memory store, intermittent issues, host not found

From time to time once a week or so we get in a weird state with our Kubernetes cluster not able to connect to the memory store Redis service.
K8S mater version: 1.10.7
cloud beta redis instances list --region europe-west1  1 ↵  10122  12:26:38
INSTANCE_NAME REGION TIER SIZE_GB HOST PORT NETWORK RESERVED_IP STATUS CREATE_TIME
chefclub-redis europe-west1 STANDARD_HA 1 10.0.10.4 6379 default 10.0.10.0/29 READY 2018-05-29T14:12:46
Getting a No route to host.
kubectl run -i --tty busybox --image=busybox -- sh  ✓  10125  12:28:36
If you don't see a command prompt, try pressing enter.
/ # telnet 10.0.10.4 6379
telnet: can't connect to remote host (10.0.10.4): No route to host
It happened a few times in the past, Now I just did an upgrade of my node to 1.10.7 and everything went back in place, I could connect again.
I wonder what other steps I could take next it happens?
Make sure you have followed the instructions on how to connect to Redis instance from a cluster and troubleshooting doc. Note that while connecting to redis server if your cluster configuration have IP aliases enabled, steps may vary.
You can research through Stackdriver logging for Kubernetes pods and check for complete error message during the affected timeframe. This will help you check for known issues in Github or other Stackoverflow thread. Advanced Stackdriver logging filter to view pod logs:
resource.type="container" resource.labels.cluster_name="cluster_name"
resource.labels.namespace_id="k8s_namespace"
labels."container.googleapis.com/k8s_pod_name"="k8s_pod_name"
If you did not find any known issues and suspect that the issue could be on Google end. You can create an issue using the Public Issue Tracker.