Airflow KubePodOperator pull image from private repository - kubernetes

How can Apache Airflow's KubernetesPodOperator pull docker images from a private repository?
The KubernetesPodOperator has an image_pull_secrets which you can pass a Secrets object to authenticate with the private repository. But the secrets object can only represent an environment variable, or a volume - neither of which fit my understanding of how Kubernetes uses secrets to authenticate with private repos.
Using kubectl you can create the required secret with something like
$ kubectl create secret docker-registry $SECRET_NAME \
--docker-server=https://${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com \
--docker-username=AWS \
--docker-password="${TOKEN}" \
--docker-email="${EMAIL}"
But how can you create the authentication secret in Airflow?

There is secret object with docker-registry type according to kubernetes documentation which can be used to authenticate to private repository.
As You mentioned in Your question; You can use kubectl to create secret of docker-registry type that you can then try to pass with image_pull_secrets.
However depending on platform You are using this might have limited or no use at all according to kubernetes documentation:
Configuring Nodes to Authenticate to a Private Registry
Note: If you are running on Google Kubernetes Engine, there will already be a .dockercfg on each node with credentials for Google Container Registry. You cannot use this approach.
Note: If you are running on AWS EC2 and are using the EC2 Container Registry (ECR), the kubelet on each node will manage and update the ECR login credentials. You cannot use this approach.
Note: This approach is suitable if you can control node configuration. It will not work reliably on GCE, and any other cloud provider that does automatic node replacement.
Note: Kubernetes as of now only supports the auths and HttpHeaders section of docker config. This means credential helpers (credHelpers or credsStore) are not supported.
Making this work on mentioned platforms is possible but it would require automated scripts and third party tools.
Like in Amazon ECR example: Amazon ECR Docker Credential Helper would be needed to periodically pull AWS credentials to docker registry configuration and then have another script to update kubernetes docker-registry secrets.
As for Airflow itself I don't think it has functionality to create its own docker-repository secrets.
You can request functionality like that in Apache Airflow JIRA.
P.S.
If You still have issues with Your K8s cluster you might want to create new question on stack addressing them.

Related

K8s to pull private image from Github container registry (ghcr) using GITHUB_TOKEN

Is it possible to pull private images in Github Container Registry using GITHUB_TOKEN?
If so, what do I need to configure in k8s?
Thanks!
Yes you can. You will have to create a secret object in your cluster.
kubectl create secret docker-registry ghcr-login-secret --docker-server=https://ghcr.io --docker-username=$YOUR_GITHUB_USERNAME --docker-password=$YOUR_GITHUB_TOKEN --docker-email=$YOUR_EMAIL
Note: Your credentials will become part of shell history, so be careful and remove the shell history afterwards.
This will internally create a dockerconfig.json with your provided values and generate a secret that will be used to authenticate with your registry.
You can then proceed to specify in your Pod specification that you are using a private registry and pass this secret as:
...
imagePullSecrets:
- name: ghcr-login-secret
...
You can read more about external registry interfacing with Kubernetes here.
Ok, I've a better understanding of GITHUB_TOKEN.
GITHUB_TOKEN is for github internal usage for Actions etc generate docker image and push into github container registry.
In order for k8s to pull the image from github, we have to generate PAT which then to put added in k8s's secret.

DO Kubernetes Cluster + GCP Container Registry

I have a Kubernetes cluster in Digital Ocean, I want to pull the images from a private repository in GCP.
I tried to create a secret that make me able to to pull the images following this article https://blog.container-solutions.com/using-google-container-registry-with-kubernetes
Basically, these are the steps
In the GCP account, create a service account key, with a JSON credential
Execute
kubectl create secret docker-registry gcr-json-key \
--docker-server=gcr.io \
--docker-username=_json_key \
--docker-password="$(cat ~/json-key-file.json)" \
--docker-email=any#valid.email
In the deployment yaml reference the secret
imagePullSecrets:
- name: gcr-json-key
I don't understand why I am getting 403. If there are some restriccions to use the registry outside google cloud, or if I missed some configuration something.
Failed to pull image "gcr.io/myapp/backendnodeapi:latest": rpc error: code = Unknown desc = failed to pull and unpack image "gcr.io/myapp/backendnodeapi:latest": failed to resolve reference "gcr.io/myapp/backendnodeapi:latest": unexpected status code [manifests latest]: 403 Forbidden
Verify that you have enabled the Container Registry API, Installed Cloud SDK and Service account you are using for authentication has permissions to access Container Registry.
Docker requires privileged access to interact with registries. On Linux or Windows, add the user that you use to run Docker commands to the Docker security group.
This documentation has details on prerequisites for container registry.
Note:
Ensure that the version of kubectl is the latest version.
I tried replicating by following the document you provided and it worked at my end, So ensure that all the prerequisites are met.
That JSON string is not a password.
The documentation suggests to either activate the service account:
gcloud auth activate-service-account [USERNAME]#[PROJECT-ID].iam.gserviceaccount.com --key-file=~/service-account.json
Or add the configuration to $HOME/.docker/config.json
And then run docker-credential-gcr configure-docker.
Kubernetes seems to demand a service-account token secret
and this requires annotation kubernetes.io/service-account.name.
Also see Configure Service Accounts for Pods.

How to pull from private project's image registry using GitLab managed Kubernetes clusters

GitLab offers to manage a Kubernetes cluster, which includes (e.g.) creating the namespace, adding some tokens, etc. In GitLab CI jobs, one can directly use the $KUBECONFIG variable for contacting the cluster and e.g. creating deployments using helm. This works like a charm, as long as the GitLab project is public and therefore Docker images hosted by the GitLab project's image registry are publicly accessible.
However, when working with private projects, Kubernetes of course needs an ImagePullSecret to authenticate the GitLab's image registry to retreive the image. As far as I can see, GitLab does not automatically provide an ImagePullSecret for repository access.
Therefore, my question is: What is the best way to access the image repository of private GitLab repositories in a Kubernetes deployment in a GitLab managed deployment environment?
In my opinion, these are the possibilities and why they are not eligible/optimal:
Permanent ImagePullSecret provided by GitLab: When doing a deployment on a GitLab managed Kubernetes cluster, GitLab provides a list of variables to the deployment script (e.g. Helm Chart or kubectl apply -f manifest.yml). As far as I can (not) see, there is a lot of stuff like ServiceAccounts and tokens etc., but no ImagePullSecret - and also no configuration option for enabling ImagePullSecret creation.
Using $CI_JOB_TOKEN: When working with GitLab CI/CD, GitLab provides a variable named $CI_JOB_TOKEN which can be used for uploading Docker images to the registry during job execution. This token expires after the job is done. It could be combined with helm install --wait, but when a rescheduling takes place to a new node which does not have the image yet, the token is expired and the node is not able to download the image anymore. Therefore, this only works right in the moment of deploying the app.
Creating an ImagePullSecret manually and add it to the Deployment or the default ServiceAccount: *This is a manual step, has to be repeated for each individual project and just sucks - we're trying to automate things/GitLab managed Kubernetes clusters is designed for avoiding any manual step.`
Something else but I don't know about it.
So, am I wrong in one of these points? Am I missing a eligible option in this listing?
Again: It's all about a seamless integration with the "Managed Cluster" features of GitLab. I know how to add tokens from GitLab as ImagePullSecrets in Kubernetes, but I want to know how to automate this with the Managed Cluster feature.
There is another way. You can bake the ImagePullSecret in your container runtime configuration. Docker, containerd or CRI-O (Whatever you are using)
Docker
As root run docker login <your-private-registry-url>. Then a file /root/.docker/config.json should be created/updated. Stick that in all your Kubernetes node and make sure your kubelet runs as root (which typically does). Some background info.
The content of the file should look something like this:
{
"auths": {
"my-private-registry": {
"auth": "xxxxxx"
}
},
"HttpHeaders": {
"User-Agent": "Docker-Client/18.09.2 (Linux)"
}
}
Containerd
Configure your containerd.toml file with something like this:
[plugins.cri.registry.auths]
[plugins.cri.registry.auths."https://gcr.io"]
username = ""
password = ""
auth = ""
identitytoken = ""
CRI-O
Specify the global_auth_file option in your crio.conf file.
✌️
Configure your account.
For example, for kubernetes pull image gitlab.com, use the address registry.gitlab.com:
kubectl create secret docker-registry regcred --docker-server=<your-registry-server> --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>

Unable to pull the image to run a pod in kubernetes

Image of Pod Detail
Failed to pull image
The image has been pushed in the azure registry through docker for windows.
image name provided: as provided during tag through docker
You currently provided very little detail.
Is your kubernetes cluster correctly configured to pull images from azure registry ? As far as I can see it isn't. Is this a managed AKS k8s cluster? If not, by default won't be able to access your private azure registry and will need to be configured with the credentials needed to access your private azure registry.
http://docs.heptio.com/content/private-registries/pr-docker-hub.html
Another possibility is you're pushing a Windows-based container onto linux based worker nodes which can only run linux based containers.
I only have experience with GKE but if you want to pull docker images from a repository that is not in the same project as the GKE cluster, you have to provide credentials for the image to be pulled.
I do this with a secret in Kubernetes that contains a .dockerconfig.json:
apiVersion: v1
data:
.dockerconfigjson: <REDACTED>
In order to create a secret of this type i've used a template as such:
kubectl create secret docker-registry <SECRET_NAME> \
--docker-server=https://gcr.io \
--docker-username=_json_key \
--docker-email=<SVC_ACCOUNT_EMAIL> \
--docker-password=<CONTENTS_OF_SVC_ACCOUNT_CREDS_FILE>
Once thats created you will need to mount to secret to the relevant pods / deployment. In the pod spec you will need:
imagePullSecrets:
- name: <SECRET_NAME>
(its a list because you can mount many secrets to pull from other places)
I imagine that azure has a similar setup whereby any images in the same project as the cluster will be able to be pulled. But any images hosted in another azure project or external image repo will need credentials.
I use the setup described to also pull google container registry images to a local minikube.
So I think we need to work out where your docker image is hosted and if credentials are needed to pull that image.

"permanent" GKE kubectl service account authentication

I deploy apps to Kubernetes running on Google Cloud from CI. CI makes use of kubectl config which contains auth information (either in directly CVS or templated from the env vars during build)
CI has seperate Google Cloud service account and I generate kubectl config via
gcloud auth activate-service-account --key-file=key-file.json
and
gcloud container clusters get-credentials <cluster-name>
This sets the kubectl config but the token expires in few hours.
What are my options of having 'permanent' kubectl config other than providing CI with key file during the build and running gcloud container clusters get-credentials ?
You should look into RBAC (role based access control) which will authenticate the role avoiding expiration in contrast to certificates which currently expires as mentioned.
For those asking the same question and upvoting.
This is my current sollution:
For some time I treated key-file.json as an identity token, put it to the CI config and used it within container with gcloud CLI installed. I used the key file/token to log in to GCP and let gcloud generate kubectl config - the same approach used for GCP container registry login.
This works fine but using kubectl in CI is kind of antipattern. I switched to deploying based on container registry push events. This is relatively easy to do in k8s with keel flux, etc. So CI has only to push Docker image to the repo and its job ends there. The rest is taken care of within k8s itself so there is no need for kubectl and it's config in the CI jobs.