Kubernetes pods crashes when running GitLab Pipeline - kubernetes

In my Kubernetes cluster I am running GitLab-ee 15.8.0 with a GitLab Runner. This runner is configured for a kubernetes executor and I have mounted the /var/run/docker.sock to this runner in the configmap. When running a pipeline which brings up a docker-compose-test.yml, I can see that all pods that exist in kubernetes are starting to crash and are getting restarted. After that I can see that the pipeline is still in the Running state, but nor runner is working on it. The last command the runner executed in the pipeline was: docker-compose -f docker-compose-test.yml up -d.
I expected the pipeline to just bring up the docker containers and run the Laravel tests using the database container and the application container, but instead it messes up the Nginx-Ingress resource.
I am running GitLab-ee:15.8.0 with the gitlab-runner version 15.8.2
Here is the gitlab-ci.yml:
image: docker:20.10.16
services:
- docker:20.10.16-dind
variables:
DOCKER_COMPOSE_CMD: "docker-compose -f docker-compose-test.yml"
stages:
- test
- build
test:
stage: test
script:
- docker-compose --version
- $DOCKER_COMPOSE_CMD down --volumes --remove-orphans
- $DOCKER_COMPOSE_CMD up -d
- $DOCKER_COMPOSE_CMD exec -T -e APP_ENV=testing laravel-api-test ./scripts/wait-for.sh database-test:54321 -t 60 -- echo "Database connection established"
- $DOCKER_COMPOSE_CMD exec -T -e APP_ENV=testing laravel-api-test php artisan passport:keys
- $DOCKER_COMPOSE_CMD exec -T -e APP_ENV=testing laravel-api-test php artisan migrate
- $DOCKER_COMPOSE_CMD exec -T -e APP_ENV=testing laravel-api-test sh -c "vendor/bin/phpunit ./tests $PARAMETERS --coverage-text --colors=never --stderr"
- $DOCKER_COMPOSE_CMD down --volumes --remove-orphans
# only:
# - tags
build:
stage: build
script:
- export IMAGE_TAG=$(echo "$CI_COMMIT_TAG" | awk -F '/' '{print $NF}')
- docker build -t laravel-api:"$IMAGE_TAG" .
- docker login -u "$CONTAINER_REGISTRY_USERNAME" -p "$CONTAINER_REGISTRY_PASSWORD" "$CONTAINER_REGISTRY_URL"
- docker push laravel-api:"$IMAGE_TAG"
only:
- tags
And this is the docker-compose-test.yml that seems to mess things up:
version: "3.7"
services:
laravel-api-test:
build:
args:
user: laravel
uid: 1000
context: .
dockerfile: docker/development/Dockerfile
working_dir: /var/www/
volumes:
- ./:/var/www
ports:
- ${APP_PORT}:9000
networks:
- application
database-test:
image: postgres:15.1-alpine
ports:
- 54321:5432
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_USER: ${DB_USERNAME}
networks:
- application
networks:
application:
driver: bridge
The last thing that is probably relevant is the gitlab-runner config:
apiVersion: v1
kind: ConfigMap
metadata:
name: gitlab-runner-config
namespace: gitlab-runner
data:
config.toml: |-
concurrent = 4
[[runners]]
name = "Runner_1"
url = "https://gitlab.project.com/ci"
token = "my-token"
executor = "kubernetes"
[runners.kubernetes]
namespace = "gitlab-runner"
privileged = true
poll_timeout = 600
cpu_request = "1"
service_cpu_request = "200m"
[[runners.kubernetes.volumes.host_path]]
name = "docker"
mount_path = "/var/run/docker.sock"
host_path = "/var/run/docker.sock"
Finally this the output from the pipeline after it crashed:
Running with gitlab-runner 15.8.2 (4d1ca121)
on Runner_1 eNNz4y9k, system ID: r_y3jEhmF8fN58
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab-runner
Using Kubernetes executor with image docker:20.10.16 ...
Using attach strategy to execute scripts...
Preparing environment
00:04
Waiting for pod gitlab-runner/runner-ennz4y9k-project-117-concurrent-0f24cx to be running, status is Pending
Running on runner-ennz4y9k-project-117-concurrent-0f24cx via gitlab-runner-56cd6f4bb5-zrbd9...
Getting source from Git repository
00:01
Fetching changes with git depth set to 20...
Initialized empty Git repository in /builds/Clients/opus-volvere/laravel-api/.git/
Created fresh repository.
Checking out 3890412c as main...
Skipping Git submodules setup
Executing "step_script" stage of the job script
$ docker-compose --version
Docker Compose version v2.6.0
$ $DOCKER_COMPOSE_CMD down --volumes --remove-orphans
Container laravel-api-database-test-1 Stopping
Container laravel-api-laravel-api-test-1 Stopping
Container laravel-api-database-test-1 Stopping
Container laravel-api-laravel-api-test-1 Stopping
Container laravel-api-database-test-1 Stopped
Container laravel-api-database-test-1 Removing
Container laravel-api-laravel-api-test-1 Stopped
Container laravel-api-laravel-api-test-1 Removing
Container laravel-api-laravel-api-test-1 Removed
Container laravel-api-database-test-1 Removed
Network laravel-api_application Removing
Network laravel-api_application Removed
$ $DOCKER_COMPOSE_CMD up -d
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 827B done
#1 DONE 0.1s
#2 [internal] load .dockerignore
#2 transferring context: 88B done
#2 DONE 0.1s
I am not really sure where to look, with log files or anything, so some help regarding the debugging of this issue is really appreciated...
As far as I can see, the only starts when I try to launch the docker compose. I already built the image in the pipeline and that worked like it should, but it start to go wrong when I actually try to run the containers. Maybe that helps? This is just a really annoying problem that isn't my real expertise or anything so I am reading, learning and trying a lot :(
I followed this tutorial on how to add a gitlab runner to kubernetes. Maybe is has something to do with the fact that it tries to create a new pod for the pipeline, because the tuotial I sent says:
The second is a ServiceAccount, Role, and RoleBinding to give the
Runner the privileges to add new Pods to the Namespace.
Again, I am not familiar with all this stuff, so for me its also a shot in the dark, but I really want this fixed so I can continue working on this project
What could cause this GitLab pipeline to crash my entire kubernetes?

Never expose the host container runtime to workload inside the cluster.
This can lead to the situation that the GitLab runner "cleans up" and removes the containers that operate your cluster components.
In addition to that you get tied to a specific container runtime which should be an implementation detail of your cluster.
As an alternative you can use docker-in-docker for the GitLab runner for example.

Related

volumes_from tried to find a service despite container: is specified

I am running Win10, WSL and Docker Desktop. I have the following test YML which errors out:
version: '2.3'
services:
cli:
image: smartsheet-www
volumes_from:
- container:amazeeio-ssh-agent
➜ ~ ✗ docker-compose -f test.yml up
no such service: amazeeio-ssh-agent
Why does it try to find a service when I specified container: ?
The container exists, runs and has a volume.
docker inspect -f "{{ .Config.Volumes }}" amazeeio-ssh-agent
map[/tmp/amazeeio_ssh-agent:{}]
docker exec -it amazeeio-ssh-agent /bin/sh -c 'ls -l /tmp/amazeeio_ssh-agent/'
total 0
srw------- 1 drupal drupal 0 Apr 1 03:54 socket
Removing the volumes_from and the following line starts the cli service just fine.
After a bit of searching, I finally found https://github.com/docker/compose/issues/8874 and https://github.com/pygmystack/pygmy-legacy/issues/60#issue-1037009622 this fixes it.
uncheck this

Hyperledger fabric explorer - Docker Compose to Kubernetes

I have a working docker based setup - peer(s), orderers and explorer (db & app) which I am aiming to deployed on GCP - Kubernetes.
For the peer(s) and orderer I have used the docker images and created kubernetes yaml file with (StatefulSet, Service, NodePort and Ingress) to deploy on Kubernetes.
For Explorer I have the below docker-compose file which depends on my local connection-profile and crypto files.
I am struggling to deploy explorer on kubernetes and looking for advice on the approach
I have tried to convert docker-compose using Kompose - but face issues while translating network and health-check tags.
I have tried to create a single docker-image (Dockerfile - multiple FROM tags) from hyperledger/explorer-db:latest and hyperledger/explorer:latest but again specifying network becomes an issue.
Any suggestions or examples on how Explorer can be deployed in the cluster ??
Thanks
Explorer Docker Compose
version: '2.1'
volumes:
pgdata:
walletstore:
networks:
mynetwork.com:
external:
name: my-netywork
services:
explorerdb.mynetwork.com:
image: hyperledger/explorer-db:latest
container_name: explorerdb.mynetwork.com
hostname: explorerdb.mynetwork.com
environment:
- DATABASE_DATABASE=fabricexplorer
- DATABASE_USERNAME=hppoc
- DATABASE_PASSWORD=password
healthcheck:
test: "pg_isready -h localhost -p 5432 -q -U postgres"
interval: 30s
timeout: 10s
retries: 5
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- mynetwork.com
explorer.mynetwork.com:
image: hyperledger/explorer:latest
container_name: explorer.mynetwork.com
hostname: explorer.mynetwork.com
environment:
- DATABASE_HOST=explorerdb.mynetwork.com
- DATABASE_DATABASE=fabricexplorer
- DATABASE_USERNAME=hppoc
- DATABASE_PASSWD=password
- LOG_LEVEL_APP=info
- LOG_LEVEL_DB=info
- LOG_LEVEL_CONSOLE=debug
- LOG_CONSOLE_STDOUT=true
- DISCOVERY_AS_LOCALHOST=false
volumes:
- ./config.json:/opt/explorer/app/platform/fabric/config.json
- ./connection-profile:/opt/explorer/app/platform/fabric/connection-profile
- ../config/crypto-config:/tmp/crypto
- walletstore:/opt/explorer/wallet
ports:
- 8080:8080
depends_on:
explorerdb.mynetwork.com:
condition: service_healthy
networks:
- mynetwork.com
Explorer Dockerfile - multiple froms
# Updated to Fabric 2.x
#1. Docker file for setting up the Orderer
# FROM hyperledger/fabric-orderer:1.4.2
FROM hyperledger/explorer-db:latest
ENV DATABASE_DATABASE=fabricexplorer
ENV DATABASE_USERNAME=hppoc
ENV DATABASE_PASSWORD=password
FROM hyperledger/explorer:latest
COPY ./config/explorer/. /opt/explorer/
COPY ./config/crypto-config/. /tmp/crypto
ENV DATABASE_HOST=explorerdb.xxx.com
ENV DATABASE_DATABASE=fabricexplorer
ENV DATABASE_USERNAME=hppoc
ENV DATABASE_PASSWD=password
ENV LOG_LEVEL_APP=info
ENV LOG_LEVEL_DB=info
ENV LOG_LEVEL_CONSOLE=debug
ENV LOG_CONSOLE_STDOUT=true
ENV DISCOVERY_AS_LOCALHOST=false
ENV DISCOVERY_AS_LOCALHOST=false
# ENV EXPLORER_APP_ROOT=${EXPLORER_APP_ROOT:-dist}
# ENV ${EXPLORER_APP_ROOT}/main.js name - hyperledger-explorer
ENTRYPOINT ["tail", "-f", "/dev/null"]
There are 2 groups of required steps for this setup. One I tested is:
1.Create a K8s cluster
2.Connect your cluster with the cloud shell
3.Clone this repository
git clone https://github.com/acloudfan/HLF-K8s-Cloud.git
4.Setup the storage class
cd HLF-K8s-Cloud/gcp kubectl apply -f . This will setup the storage class
5.Launch the Acme Orderer
cd .. kubectl apply -f ./k8s-acme-orderer.yaml Check the logs for 'acme-orderer-0' to ensure there is no error
6.Launch the Acme Peer
kubectl apply -f ./k8s-acme-peer.yaml Check the logs for 'acme-peer-0' to ensure there is no error
7.Setup the Channel & Join acme peer to it.
kubectl exec -it acme-peer-0 /bin/bash ./submit-channel-create.sh
./join-channel.sh
Ensure that peer has joined the channel
peer channel list
exit
8.Launch the budget Peer and join it to the channel
kubectl apply -f ./k8s-budget-peer.yaml Wait for the container to launch & check the logs for errors
kubectl exec -it budget-peer-0 /bin/bash ./fetch-channel-block.sh ./join-channel.sh
Ensure that peer has joined the channel
peer channel list
exit ** At this point your K8s Fabric Network is up **
Validate the network
1.Install & Instantiate the test chaincode
kubectl exec -it acme-peer-0 /bin/bash
./cc-test.sh install ./cc-test.sh instantiate
2.Invoke | Query the chaincode to see the changes in values of a/b
./cc-test.sh query ./cc-test.sh invoke
3.Check the values inside the Budget peer
kubectl exec -it acme-peer-0 /bin/bash
./cc-test.sh install
./cc-test.sh query The query should return the same values as you see in acme-peer Execute invoke/query in both peers to validate
Plus, you can visit the following threads to see option 2 and more references on the proper steps to set up your environment Production Network with GKE, HLF-K8s-Cloud, Hyperledger Fabric blockchain deployment on Google Kubernetes Engine and hyperledger/fabric-peer.

Locust worker not connecting to master

I have locust setup in docker. I have a master and worker nodes setup. I was using a pre-1.0 version and have now upgraded to 1.0.2 and now my workers can't connect to the master. I read through the release notes and changed the environmental variables. Here's what they look like on my workers.
LOCUST_MASTER_NODE_HOST 10.200.202.13
LOCUST_MODE worker
and on the master
LOCUST_MODE master
Edit: Something interesting. When i change LOCUST_MODE:worker to LOCUST_MODE_WORKER=true. In the logs it says "starting in standalone mode" but it still connects to the master.
How should I be setting this up?
This guide demonstrates how to connect workers to master nodes: https://docs.locust.io/en/stable/running-locust-distributed.html
To start locust in master mode:
locust -f my_locustfile.py --master
And then on each worker
(replace 192.168.0.14 with IP of the master machine, or leave out the
parameter altogether if your workers are on the same machine as the
master):
locust -f my_locustfile.py --worker --master-host=192.168.0.14
In my case the port 5557 must be mapped on docker and open on server's firewall.
Here is my master/worker docker-compose configurations on locust.
master node on server 1
version: '3'
services:
master:
image: locustio/locust
ports:
- "8089:8089"
- "5557:5557"
volumes:
- ./:/mnt/locust
command: -f /mnt/locust/locustfile.py --master --expect-workers 2 -H http://xxx.xxx.xxx.xxx:8089 -u 100 -r 200 -t 100s --headless --print-stats --host=https://my-server-2-load-test.com -L DEBUG --html /mnt/locust/app.html
worker nodes on server 2 and 3
version: '3'
services:
worker:
image: locustio/locust
volumes:
- ./:/mnt/locust
command: -f /mnt/locust/locustfile.py --worker --master-host xxx.xxx.xxx.xxx
remember to save report you should touch app.html file and chmod 666 app.html along docker-compose.yml.

Cloud Build - "rollout restart" not recognized (unknown command)

I have a small cloudbuild.yaml file where I build a Docker image, push it to Google container registry (GCR) and then apply the changes to my Kubernetes cluster. It looks like this:
steps:
- name: 'gcr.io/cloud-builders/docker'
entrypoint: 'bash'
args: [
'-c',
'docker pull gcr.io/$PROJECT_ID/frontend:latest || exit 0'
]
- name: "gcr.io/cloud-builders/docker"
args:
[
"build",
"-f",
"./services/frontend/prod.Dockerfile",
"-t",
"gcr.io/$PROJECT_ID/frontend:$REVISION_ID",
"-t",
"gcr.io/$PROJECT_ID/frontend:latest",
".",
]
- name: "gcr.io/cloud-builders/docker"
args: ["push", "gcr.io/$PROJECT_ID/frontend"]
- name: "gcr.io/cloud-builders/kubectl"
args: ["apply", "-f", "kubernetes/gcp/frontend.yaml"]
env:
- "CLOUDSDK_COMPUTE_ZONE=europe-west3-a"
- "CLOUDSDK_CONTAINER_CLUSTER=cents-ideas"
- name: "gcr.io/cloud-builders/kubectl"
args: ["rollout", "restart", "deployment/frontend-deployment"]
env:
- "CLOUDSDK_COMPUTE_ZONE=europe-west3-a"
- "CLOUDSDK_CONTAINER_CLUSTER=cents-ideas"
The build runs smoothly, until the last step. args: ["rollout", "restart", "deployment/frontend-deployment"]. It has the following log output:
Already have image (with digest): gcr.io/cloud-builders/kubectl
Running: gcloud container clusters get-credentials --project="cents-ideas" --zone="europe-west3-a" "cents-ideas"
Fetching cluster endpoint and auth data.
kubeconfig entry generated for cents-ideas.
Running: kubectl rollout restart deployment/frontend-deployment
error: unknown command "restart deployment/frontend-deployment"
See 'kubectl rollout -h' for help and examples.
Allegedly, restart is an unknown command. But it works when I run kubectl rollout restart deployment/frontend-deployment manually.
How can I fix this problem?
Looking at the Kubernetes release notes, the kubectl rollout restart commmand was introduced in the v1.15 version. In your case, it seems Cloud Build is using an older version where this command wasn't implemented yet.
After doing some test, it appears Cloud Build uses a kubectl client version depending on the cluster's server version. For example, when running the following build:
steps:
- name: "gcr.io/cloud-builders/kubectl"
args: ["version"]
env:
- "CLOUDSDK_COMPUTE_ZONE=<cluster_zone>"
- "CLOUDSDK_CONTAINER_CLUSTER=<cluster_name>"
if the cluster's master version is v1.14, Cloud Build uses a v1.14 kubectl client and returns the same unknown command "restart" error message. When master's version is v1.15, Cloud Build uses a v1.15 kubectl client and the command runs successfully.
So about your case, I suspect your cluster "cents-ideas" master version is <1.15 which would explain the error you're getting. As per why it works when you run the command manually (I understand locally), I suspect your kubectl may be authenticated to another cluster with master version >=1.15.

docker-compose exit depends_on service after tests

How do I get a service container to exit once the dependent container has finished?
I have test suite running in the app_unittestbot container that depends_on a postgresql db server (postgres:9.5-alpine) running in separate container. Once the test suite exits, I want to check the return code of the test suite and halt the database container. With the docker-compose.yml below, the db service container never halts.
docker-compose.yml
version: '2.1'
services:
app_postgresql95:
build: ./postgresql95/
ports:
- 54321:5432
app_unittestbot:
command: /root/wait-for-it.sh app_postgresql95:5432 --timeout=60 -- nose2 tests
build: ./unittestbot/
links:
- app_postgresql95
volumes:
- /app/src:/src
depends_on:
- 'app_postgresql95'
You can run docker-compose up --abort-on-container-exit to have compose stop all the containers if any one of them exits. That will likely solve your use case.
For something a little more resilient, I'd probably split this into two compose files so that an abort on postgresql doesn't get accidentally registered as a successful test. Then you'd just run those files in the order you need:
docker-compose -f docker-compose.yml up -d
docker-compose -f docker-compose.test.yml up
docker-compose -f docker-compose.yml down