How to pull image from private Docker registry in KubernetesPodOperator of Google Cloud Composer? - kubernetes

I'm trying to run a task in an environment built from an image in a private Google Container Registry through the KubernetesPodOperator of the Google Cloud Composer.
The Container Registry and Cloud Composer instances are under the same project.
My code is below.
import datetime
import airflow
from airflow.contrib.operators import kubernetes_pod_operator
YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
# Create Airflow DAG the the pipeline
with airflow.DAG(
'my_dag',
schedule_interval=datetime.timedelta(days=1),
start_date=YESTERDAY) as dag:
my_task = kubernetes_pod_operator.KubernetesPodOperator(
task_id='my_task',
name='my_task',
cmds=['echo 0'],
namespace='default',
image=f'gcr.io/<my_private_repository>/<my_image>:latest')
The task fails and I get the following error message in the logs in the Airflow UI and in the logs folder in the storage bucket.
[2020-09-21 08:39:12,675] {taskinstance.py:1147} ERROR - Pod Launching failed: Pod returned a failure: failed
Traceback (most recent call last)
File "/usr/local/lib/airflow/airflow/contrib/operators/kubernetes_pod_operator.py", line 260, in execut
'Pod returned a failure: {state}'.format(state=final_state
airflow.exceptions.AirflowException: Pod returned a failure: failed
This not very informative...
Any idea what I could be doing wrong?
Or anywhere I can find more informative log messages?
Thank you very much!

In general, the way how we start troubleshooting GCP Composer once getting a failure running the DAG is finely explained in the dedicated chapter of GCP documentation.
Moving to KubernetesPodOperator specifically related issues, the certain user investigation might consists of:
Verifying the particular task status for the corresponded DAG
file;
Inspecting the task inventory logs and events;, logs also can be found in GCP Composer's storage bucket;
With any K8s related resource/objects errors it's strongly
required to check Composer's relevant GKE cluster log/event
journals.
Further analyzing the error context and KubernetesPodOperator.py source code, I assume that this issue might occur due to Pod launching problem on Airflow worker GKE node, ending up with Pod returned a failure: {state}'.format(state=final_state) message once the Pod execution is not successful.
Personally, I prefer to check the image run in prior executing Airflow task in a Kubernetes Pod. Having said this and based on the task command provided, I believe that you can verify the Pod launching process, connecting to GKE cluster and redrafting kubernetes_pod_operator.KubernetesPodOperator definition being adoptable for kubectl command-line executor:
kubectl run test-app --image=eu.gcr.io/<Project_ID>/image --command -- "/bin/sh" "-c" "echo 0"
This would simplify the process of image validation, hence you'll be able to get closer look at Pod logs or event records as well:
kubectl describe po test-app
Or
kubectl logs test-app

If you want to pull or push an image in KubernetesPodOperator from private registry, you should create a Secret in k8s which contains a service account (SA). This SA should have permission for pulling or maybe pushing images (RO/RW permission).
Then just use this secret with SA in KubernetesPodOperator and specify image_pull_secrets argument:
my_task = kubernetes_pod_operator.KubernetesPodOperator(
task_id='my_task',
name='my_task',
cmds=['echo 0'],
namespace='default',
image=f'gcr.io/<my_private_repository>/<my_image>:latest',
image_pull_secrets='your_secret_name')

Related

Issues running kubectl from Jenkins

I have deployed jenkins on the kubernetes cluster using helmchart by following this:
https://octopus.com/blog/jenkins-helm-install-guide
I have the pods and services running in the cluster. I was trying to create a pipeline to run some kubectl commands. It is failing with below error:
java.io.IOException: error=2, No such file or directory
Caused: java.io.IOException: Cannot run program "kubectl": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
I thought that it has something to do with the Kubernetes CLI plugin for jenkins and raised an issue here:
https://github.com/jenkinsci/kubernetes-cli-plugin/issues/108
I have been advised to install kubectl inside jenkins pod.
I have the jenkins pod already running (deployed using helmchart). I have been seeing options to include the kubectl image binary as part of the dockerfile. But, I have used the helmcharts and not sure if I have to luxury to edit and deploy the pod to add the kubectl.
Can you please help with your inputs to resolve this? IS there any steps/documentation that explain how to install kubectl on the running pod? Really appreciate your inputs as this issue stopped one of my critical projects. Thanks in advance.
I tried setting the rolebinding for the jenkins service account as mentioned here:
Kubernetes commands are not running inside the Jenkins container
I haven't installed kubectl inside pod yet. Please help.
Jenkins pipeline:
kubeconfig(credentialsId: 'kube-config', serverUrl: '')
sh 'kubectl get all --all-namespaces'
(attached the pod/service details for jenkins)enter image description here

How to fix error with GitLab runner inside Kubernetes cluster - try setting KUBERNETES_MASTER environment variable

I have setup two VMs that I am using throughout my journey of educating myself in CI/CD, GitLab, Kubernetes, Cloud Computing in general and so on. Both VMs have Ubuntu 22.04 Server as a host.
VM1 - MicroK8s Kubernetes cluster
Most of the setup is "default". Since I'm not really that knowledgeable, I have only configured two pods and their respective services - one with PostGIS and the other one with GeoServer. My intent is to add a third pod, which is the deployment of a app that I a have in VM2 and that will communicate with the GeoServer in order to provide a simple map web service (Leaflet + Django). All pods are exposed both within the cluster via internal IPs as well as externally (externalIp).
I have also installed two GitLab-related components here:
GitLab Runner with Kubernetes as executor
GitLab Kubernetes Agent
In VM2 both are visible as connected.
VM2 - GitLab
Here is where GitLab (default installation, latest version) runs. In the configuration (/etc/gitlab/gitlab.rb) I have enabled the agent server.
Initially I had the runner in VM1 configured to have Docker as executor. I had not issues with that. However then I thought it would be nice to try out running the runner inside the cluster so that everything is capsuled (using the internal cluster IPs without further configuration and exposing the VM's operating system).
Both the runner and agent are showing as connected but running a pseudo-CI/CD pipeline (the one provided by GitLab, where you have build, test and deploy stages with each consisting of a simple echo and waiting for a few seconds) returns the following error:
Running with gitlab-runner 15.8.2 (4d1ca121)
on testcluster-k8s-runner Hko2pDKZ, system ID: s_072d6d140cfe
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
Using Kubernetes namespace: gitlab-runner
ERROR: Preparation failed: getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Will be retried in 3s ...
ERROR: Job failed (system failure): getting Kubernetes config: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
I am unable to find any information regarding KUBERNETES_MASTER except in issue tickets (GitLab) and questions (SO and other Q&A platforms). I have no idea what it is, where to set it. My guess would be it belongs in the runner's configuration on VM1 or at least the environment of the gitlab-runner (the user that contains the runner's userspace with its respective /home/gitlab-runner directory).
The only one possible solution I have found so far is to create the .kube directory from the user which uses kubectl (in my case microk8s kubectl since I use MicroK8s) to the home directory of the GitLab runner. I didn't see anything special in this directory (no hidden files) except for a cache subdirectory, hence my decision to simply create it at /home/gitlab-runner/.kube, which didn't change a thing.

kubernetes Python Client utils.create_from_yaml raise Create Error when deploy argo workflow

I need to deploy the Argo workflow using kubernetes python API SDK.
My code is like this and 'quick-start-postgres.yaml' is the official deployment yaml file.
argo_yaml = 'quick-start-postgres.yaml' res = utils.create_from_yaml(kube.api_client, argo_yaml, verbose=True, namespace="argo")
I tried to create the argo-server pod, postgress pod, etc. I finally created services and pods successfully except for the argo-server
and there is also an error shows as the following:
error information here
I am not clear about what happened so anybody can give me a help? Thanks!

Prefect task image can not be pulled from private registry but flow image can

I am trying to run a prefect flow with a single task. My flow has an image for the flow and an image for the task. Both images are in an azure private registry. I am running prefect with kubernetes in an private non-azure cluster and the whole prefect stack is deployed via helm while the prefect agent is deployed with a yaml file. My prefect version is 1.4.0
So far:
I have created a docker-registry kubernetes secret that holds the credentials for pulling from the registry as described here
I am passing the IMAGE_PULL_SECRETS environment variable in the prefect agent with its value being the secret name.
I have tried passing the IMAGE_PULL_SECRETS to the task via the agent's PREFECT__CLOUD__AGENT__ENV_VARS variable with:
- name: PREFECT__CLOUD__AGENT__ENV_VARS
value: '{"IMAGE_PULL_SECRETS: "container-registry-creds"}'
which leads to an error even before creating the job pod
What I observe is that when running a flow, the flow image is pulled from the registry but when it is time to run the task, the task image can not be pulled and I get the following error from the pod description:
Failed to pull image "<registry>/<image>": rpc error: code = Unknown desc = Error response from daemon: Head https://<registry-name>.azurecr.io/v2/<image>/manifests/<image-tag>: unauthorized: authentication required, visit https://aka.ms/acr/authorization for more information.
I think this issue is not an azure issue because I successfully pulled a pod from the private registry using the same docker-registry secret.
What's your prefect version?
It looks like you are running this with Prefect 1 on Azure AKS, correct?
Can you explain what is meant by task image vs job image?
Generally, you could attach secrets directly to the agent start command in v1:
prefect agent kubernetes start --image-pull-secrets secret-1,secret-2

Broken parameters persisting in Deis deployments

An invalid command parameter got into the deployment for a worker process in a Deis app. Now whenever I run a deis pull for a new image this broken parameter gets passed to the deployment so the worker doesn't start up successfully.
If I go into kubectl I can see the following parameter being set in the deployment for the worker (path /spec/template/spec/containers/0)
"command": [
"/bin/bash",
"-c"
],
Which results in the pod not starting up properly:
Error: failed to start container "worker": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
Error syncing pod
Back-off restarting failed container
This means that for every release/pull I've been going in and manually removing that parameter from the worker deployment setup. I've run kubectl delete deployment and recreated it with valid json (kubectl create -f deployment.json). This fixes things until I run deis pull again, at which point the broken parameter is back.
My thinking is that that broken command parameter is persisted somewhere in the deis database or the like and that it's being reset when I run deis pull.
I've tried the troubleshooting guide and dug around in the deis-database but I can't find where the deployment for the worker process is being created or where the deployment parameters that get passed to kubernetes when you run a deis pull come from.
Running deis v2.10.0 on Google Cloud