How to get pod labels overwritten in GitLab Kubernetes executor/runner? - kubernetes

I have a GitLab runner with this configuration:
runners:
privileged: false
config: |
[[runners]]
[runners.kubernetes]
namespace = "managed-ng-1"
pod_labels_overwrite_allowed = ".*"
[runners.kubernetes.pod_labels]
"kubernetes.io/arch" = "amd64"
"job_id" = "${CI_JOB_ID}"
"job_name" = "${CI_JOB_NAME}"
"pipeline_id" = "${CI_PIPELINE_ID}"
"project" = "${CI_PROJECT_PATH}"
I have a .gitlab-ci.yml file with this variables section:
variables:
KUBERNETES_POD_LABELS_1: "karpenter.k8s.aws/instance-local-nvme=256G"
KUBERNETES_POD_LABELS_2: "kubernetes.io/arch=arm64"
When the job runs, the logs show this:
Preparing the "kubernetes" executor
"PodLabels" "karpenter.k8s.aws/instance-local-nvme" overwritten with "256G"
"PodLabels" "kubernetes.io/arch" overwritten with "arm64"
However, if I run kubectl describe pod against the pod, these labels are not there:
Labels: job_id=297
job_name=job1
kubernetes.io/arch=amd64
pipeline_id=116
pod=runner-awq3dkxf-project-5-concurrent-0
project=root_simple-cicd-test
I explicitly added a default value for "kubernetes.io/arch" in case the label overwriting mechanism only worked if there was already a label with a value.
I don't know why this isn't working. Are there any other logs I should be looking at that might explain what is going on?
Thanks.

It turns out there is a bug in the Kubernetes executor: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/29168

Related

Helm reads wrong Kubeversion: >=1.22.0-0 for v1.23.0 as v1.20.0

How to deploy on K8 via Pulumi using the ArgoCD Helm Chart?
Pulumi up Diagnostics:
kubernetes:helm.sh/v3:Release (argocd):
error: failed to create chart from template: chart requires kubeVersion: >=1.22.0-0 which is incompatible with Kubernetes v1.20.0
THE CLUSTER VERSION IS: v1.23.0 verified on AWS. And NOT 1.20.0
ArgoCD install yaml used with CRD2Pulumi: https://raw.githubusercontent.com/argoproj/argo-cd/master/manifests/core-install.yaml
Source:
...
cluster = eks.Cluster("argo-example") # version="1.23"
# Cluster provider
provider = k8s.Provider(
"eks",
kubeconfig=cluster.kubeconfig.apply(lambda k: json.dumps(k))
#kubeconfig=cluster.kubeconfig
)
ns = k8s.core.v1.Namespace(
'argocd',
metadata={
"name": "argocd",
},
opts=pulumi.ResourceOptions(
provider=provider
)
)
argo = k8s.helm.v3.Release(
"argocd",
args=k8s.helm.v3.ReleaseArgs(
chart="argo-cd",
namespace=ns.metadata.name,
repository_opts=k8s.helm.v3.RepositoryOptsArgs(
repo="https://argoproj.github.io/argo-helm"
),
values={
"server": {
"service": {
"type": "LoadBalancer",
}
}
},
),
opts=pulumi.ResourceOptions(provider=provider, parent=ns),
)
Any ideas as to fixing this oddity between the version error and the actual cluster version?
I've tried:
Deleting everything and starting over.
Updating to the latest ArgoCD install yaml.
I could reproduce your issue, though I am not quite sure what causes the mismatch between versions. Better open an issue at pulumi's k8s repository.
Looking at the history of https://github.com/argoproj/argo-helm/blame/main/charts/argo-cd/Chart.yaml, you can see that the kubeversion requirement has been added after 5.9.1. So using that version successfully deploys the helm chart. E.g.
import * as k8s from "#pulumi/kubernetes";
const namespaceName = "argo";
const namespace = new k8s.core.v1.Namespace("namespace", {
metadata: {
name: namespaceName,
}
});
const argo = new k8s.helm.v3.Release("argo", {
repositoryOpts: {
repo: "https://argoproj.github.io/argo-helm"
},
chart: "argo-cd",
version: "5.9.1",
namespace: namespace.metadata.name,
})
(Not Recommended) Alternatively, you could also clone the source code of the chart, comment out the kubeVersion requirement in Chart.yaml and install the chart from your local path.
Upgrade helm. I had a similar issue where my k8s was 1.25 but helm complained it was 1.20. Tried everything else, upgrading helm worked.

Pulumi - How do we patch a deployment created with helm chart, when values do not contain the property to be updated

I've code to deploy a helm chart using pulumi kubernetes.
I would like to patch the StatefulSet (change serviceAccountName) after deploying the chart. Chart doesn't come with an option to specify service account for StatefulSet.
here's my code
// install psmdb database chart
const psmdbChart = new k8s.helm.v3.Chart(psmdbChartName, {
namespace: namespace.metadata.name,
path: './percona-helm-charts/charts/psmdb-db',
// chart: 'psmdb-db',
// version: '1.7.0',
// fetchOpts: {
// repo: 'https://percona.github.io/percona-helm-charts/'
// },
values: psmdbChartValues
}, {
dependsOn: psmdbOperator
})
const set = psmdbChart.getResource('apps/v1/StatefulSet', `${psmdbChartName}-${psmdbChartValues.replsets[0].name}`);
I'm using Percona Server for MongoDB Operator helm charts. It uses Operator to manage StatefulSet, which also defines CRDs.
I've tried pulumi transformations. In my case Chart doesn't contain a StatefulSet resource instead a CRD.
If it's not possible to update ServiceAccountName on StatefulSet using transformations, is there any other way I can override it?
any help is appreciated.
Thanks,
Pulumi has a powerful feature called Transformations which is exactly what you need here(Example). A transformation is a callback that gets invoked by the Pulumi runtime and can be used to modify resource input properties before the resource is created.
I've not tested the code but you should get the idea:
import * as k8s from "#pulumi/kubernetes";
// install psmdb database chart
const psmdbChart = new k8s.helm.v3.Chart(psmdbChartName, {
namespace: namespace.metadata.name,
path: './percona-helm-charts/charts/psmdb-db',
// chart: 'psmdb-db',
// version: '1.7.0',
// fetchOpts: {
// repo: 'https://percona.github.io/percona-helm-charts/'
// },
values: psmdbChartValues,
transformations: [
// Set name of StatefulSet
(obj: any, opts: pulumi.CustomResourceOptions) => {
if (obj.kind === "StatefulSet" && obj.metadata.name === `${psmdbChartName}-${psmdbChartValues.replsets[0].name}`) {
obj.spec.template.spec.serviceAccountName = "customServiceAccount"
}
},
],
}, {
dependsOn: psmdbOperator
})
Seems Pulumi doesn't have straight forward way to patch the existing kubernetes resource. Though this is still possible with multiple steps.
From Github Comment
Import existing resource
pulumi up to import
Make desired changes to imported resource
pulumi up to apply changes
It seems they plan on supporting functionality similar to kubectl apply -f for patching resources.

Nextflow doesn't use the right service account to deploy workflows to kubernetes

We're trying to use nextflow on a k8s namespace other than our default, the namespace we're using is nextflownamespace. We've created our PVC and ensured the default service account has an admin rolebinding. We're getting an error that nextflow can't access the PVC:
"message": "persistentvolumeclaims \"my-nextflow-pvc\" is forbidden:
User \"system:serviceaccount:mynamespace:default\" cannot get resource
\"persistentvolumeclaims\" in API group \"\" in the namespace \"nextflownamespace\"",
In that error we see that system:serviceaccount:mynamespace:default is incorrectly pointing to our default namespace, mynamespace, not nextflownamespace which we created for nextflow use.
We tried adding debug.yaml = true to our nextflow.config but couldn't find the YAML it submits to k8s to validate the error. Our config file looks like this:
profiles {
standard {
k8s {
executor = "k8s"
namespace = "nextflownamespace"
cpus = 1
memory = 1.GB
debug.yaml = true
}
aws{
endpoint = "https://s3.nautilus.optiputer.net"
}
}
We did verify that when we change the namespace to another arbitrary value the error message used the new arbitrary namespace, but the service account name continued to point to the users default namespace erroneously.
We've tried every variant of profiles.standard.k8s.serviceAccount = "system:serviceaccount:nextflownamespace:default" that we could think of but didn't get any change with those attempts.
I think it's best to avoid using nested config profiles with Nextflow. I would either remove the 'standard' layer from your profile or just make 'standard' a separate profile:
profiles {
standard {
process.executor = 'local'
}
k8s {
executor = "k8s"
namespace = "nextflownamespace"
cpus = 1
memory = 1.GB
debug.yaml = true
}
aws{
endpoint = "https://s3.nautilus.optiputer.net"
}
}

Create GKE cluster and namespace with Terraform

I need to create GKE cluster and then create namespace and install db through helm to that namespace. Now I have gke-cluster.tf that creates cluster with node pool and helm.tf, that has kubernetes provider and helm_release resource. It first creates cluster, but then tries to install db but namespace doesn't exist yet, so I have to run terraform apply again and it works. I want to avoid scenario with multiple folder and run terraform apply only once. What's the good practice for situaction like this? Thanks for the answers.
The create_namespace argument of helm_release resource can help you.
create_namespace - (Optional) Create the namespace if it does not yet exist. Defaults to false.
https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release#create_namespace
Alternatively, you can define a dependency between the namespace resource and helm_release like below:
resource "kubernetes_namespace" "prod" {
metadata {
annotations = {
name = "prod-namespace"
}
labels = {
namespace = "prod"
}
name = "prod"
}
}
resource "helm_release" "arango-crd" {
name = "arango-crd"
chart = "./kube-arangodb-crd"
namespace = "prod"
depends_on = [ kubernetes_namespace.prod ]
}
The solution posted by user adp is correct but I wanted to give more insight on using Terraform for this particular example in regards of running single commmand:
$ terraform apply --auto-approve.
Basing on following comments:
Can you tell how are you creating your namespace? Is it with kubernetes provider? - Dawid Kruk
resource "kubernetes_namespace" - Jozef Vrana
This setup needs specific order of execution. First the cluster, then the resources. By default Terraform will try to create all of the resources at the same time. It is crucial to use a parameter depends_on = [VALUE].
The next issue is that the kubernetes provider will try to fetch the credentials at the start of the process from ~/.kube/config. It will not wait for the cluster provisioning to get the actual credentials. It could:
fail when there is no .kube/config
fetch credentials for the wrong cluster.
There is ongoing feature request to resolve this kind of use case (also there are some workarounds):
Github.com: Hashicorp: Terraform: Issue: depends_on for providers
As an example:
# Create cluster
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
# Get the credentials
resource "null_resource" "get-credentials" {
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=europe-west3-c"
}
}
# Create a namespace
resource "kubernetes_namespace" "awesome-namespace" {
depends_on = [null_resource.get-credentials]
metadata {
name = "awesome-namespace"
}
}
Assuming that you had earlier configured cluster to work on and you didn't delete it:
Credentials for Kubernetes cluster are fetched.
Terraform will create a cluster named gke-terraform
Terraform will run a local command to get the credentials for gke-terraform cluster
Terraform will create a namespace (using old information):
if you had another cluster in .kube/config configured, it will create a namespace in that cluster (previous one)
if you deleted your previous cluster, it will try to create a namespace in that cluster and fail (previous one)
if you had no .kube/config it will fail on the start
Important!
Using "helm_release" resource seems to get the credentials when provisioning the resources, not at the start!
As said you can use helm provider to provision the resources on your cluster to avoid the issues I described above.
Example on running a single command for creating a cluster and provisioning resources on it:
variable zone {
type = string
default = "europe-west3-c"
}
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
data "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
}
resource "null_resource" "get-credentials" {
# do not start before resource gke-terraform is provisioned
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=${var.zone}"
}
}
resource "helm_release" "mydatabase" {
name = "mydatabase"
chart = "stable/mariadb"
# do not start before the get-credentials resource is run
depends_on = [null_resource.get-credentials]
set {
name = "mariadbUser"
value = "foo"
}
set {
name = "mariadbPassword"
value = "qux"
}
}
Using above configuration will yield:
data.google_container_cluster.gke-terraform: Refreshing state...
google_container_cluster.gke-terraform: Creating...
google_container_cluster.gke-terraform: Still creating... [10s elapsed]
<--OMITTED-->
google_container_cluster.gke-terraform: Still creating... [2m30s elapsed]
google_container_cluster.gke-terraform: Creation complete after 2m38s [id=projects/PROJECT_ID/locations/europe-west3-c/clusters/gke-terraform]
null_resource.get-credentials: Creating...
null_resource.get-credentials: Provisioning with 'local-exec'...
null_resource.get-credentials (local-exec): Executing: ["/bin/sh" "-c" "gcloud container clusters get-credentials gke-terraform --zone=europe-west3-c"]
null_resource.get-credentials (local-exec): Fetching cluster endpoint and auth data.
null_resource.get-credentials (local-exec): kubeconfig entry generated for gke-terraform.
null_resource.get-credentials: Creation complete after 1s [id=4191245626158601026]
helm_release.mydatabase: Creating...
helm_release.mydatabase: Still creating... [10s elapsed]
<--OMITTED-->
helm_release.mydatabase: Still creating... [1m40s elapsed]
helm_release.mydatabase: Creation complete after 1m44s [id=mydatabase]

Airflow- dag_id could not be found issue when using kubernetes executor

I am using airflow stable helm chart and using Kubernetes Executor, new pod is being scheduled for dag but its failing with dag_id could not be found issue. I am using git-sync to get dags. Below is the error and kubernetes config values. Can someone please help me resolve this issue?
Error:
[2020-07-01 23:18:36,939] {__init__.py:51} INFO - Using executor LocalExecutor
[2020-07-01 23:18:36,940] {dagbag.py:396} INFO - Filling up the DagBag from /opt/airflow/dags/dags/etl/sampledag_dag.py
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 37, in <module>
args.func(args)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 75, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 523, in run
dag = get_dag(args)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 149, in get_dag
'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: sampledag . Either the dag did not exist or it failed to parse.
Config:
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: false
AIRFLOW__KUBERNETES__GIT_REPO: git#git.com/dags.git
AIRFLOW__KUBERNETES__GIT_BRANCH: master
AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: /dags
AIRFLOW__KUBERNETES__GIT_SSH_KEY_SECRET_NAME: git-secret
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-repo
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: tag
AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
sampledag
import logging
import datetime
from airflow import models
from airflow.contrib.operators import kubernetes_pod_operator
import os
args = {
'owner': 'airflow'
}
YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
try:
print("Entered try block")
with models.DAG(
dag_id='sampledag',
schedule_interval=datetime.timedelta(days=1),
start_date=YESTERDAY) as dag:
print("Initialized dag")
kubernetes_min_pod = kubernetes_pod_operator.KubernetesPodOperator(
# The ID specified for the task.
task_id='trigger-task',
# Name of task you want to run, used to generate Pod ID.
name='trigger-name',
namespace='scheduler',
in_cluster = True,
cmds=["./docker-run.sh"],
is_delete_operator_pod=False,
image='imagerepo:latest',
image_pull_policy='Always',
dag=dag)
print("done")
except Exception as e:
print(str(e))
logging.error("Error at {}, error={}".format(__file__, str(e)))
raise
I had the same issue. I solved it by adding the following to my config:
AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH: repo/
What was happening is that the init container will download your dags in [AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT]/[AIRFLOW__KUBERNETES__GIT_SYNC_DEST] and AIRFLOW__KUBERNETES__GIT_SYNC_DEST by default is repo (https://airflow.apache.org/docs/stable/configurations-ref.html#git-sync-dest)
I am guessing that the problem could be incurred from the difference in your setup that causes: /opt/airflow/dags/dags/etl/sampledag_dag.py and AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: /dags
I'd double check that these are what you want, and are what you expect.
I was facing the same issue while trying to use Kubernetes Executor using stable helm airflow chart. In my case, I was able to resolve it by changing
AIRFLOW__KUBERNETES__RUN_AS_USER: "50000" to AIRFLOW__KUBERNETES__GIT_SYNC_RUN_AS_USER: "65533" in the env section of helm chart.
Same value is mentioned in this link
I came to this conclusion as the init container (git sync) which was running before the temporary worker pod came up was not able to clone/sync the git dags to the worker pods. In my case, there was a permissions error (even when kube secret for ssh clone was correctly passed)
Note:
the git-sync init container returns no error even if it fails to fetch the DAGs
Kubernetes debugging information for init containers
kubectl get pods -n [NAMESPACE]
kubectl logs -n [NAMESPACE] [POD_ID] -c git-sync
Getting the same issue, I solved it with the suggestion from #gtrip to set the UID of the git-sync run user to 65533.
I would add the following debug hints:
the git-sync init container returns no error even if it fails to fetch the DAGs
Kubernetes debugging information for init containers
kubectl get pods -n [NAMESPACE]
kubectl logs -n [NAMESPACE] [POD_ID] -c git-sync