Google Cloud Composer pulls stale image from Google Container Registry - kubernetes

I'm trying to run an Airflow task through Google Cloud Composer with the KubernetesPodOperator in an environment built from an image in a private Google Container Registry. The Container Registry and Cloud Composer instances are under the same project and everything worked fine until I updated the image the DAG refers too.
When I update the image in the Container Registry, Cloud Composer keeps using a stale image.
Concretely, in the code below
import datetime
import airflow
from airflow.contrib.operators import kubernetes_pod_operator
YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
# Create Airflow DAG the the pipeline
with airflow.DAG(
'my_dag',
schedule_interval=datetime.timedelta(days=1),
start_date=YESTERDAY) as dag:
my_task = kubernetes_pod_operator.KubernetesPodOperator(
task_id='my_task',
name='my_task',
cmds=['echo 0'],
namespace='default',
image=f'gcr.io/<my_private_repository>/<my_image>:latest')
if I update the image gcr.io/<my_private_repository>/<my_image>:latest in the Container Registry, Cloud Composer keeps using the stale image that is not present anymore in the Container Registry and throws an error.
Is this a bug?
Thanks a lot!

As mentioned in the documentation for KubernetesPodOperator, the default value for image_pull_policy is 'IfNotPresent'. You need to configure your Pod Spec to pull image image always.
Simplest way to do it is setting the image_pull_policy to 'Always'.
Few more ways are mentioned in K8s Container Images documentation.

Related

Spark job fails on image pull in Iguazio

I am using code examples in the MLRun documentation for running a spark job on Iguazio platform. Docs say I can use a default spark docker image provided by the platform, but when I try to run the job the pod hangs with Error ImagePullBackOff. Here is the function spec item I am using:
my_func.spec.use_default_image = True
How do I configure Iguazio to use the default spark image that is supposed to be included in the platform?
You need to deploy the default image to the cluster docker registry. There is one image for remote spark and one image for spark operator. Those images contain all the necessary dependencies for remote Spark and Spark Operator.
See the code below.
# This section has to be invoked once per MLRun/Iguazio upgrade
from mlrun.runtimes import RemoteSparkRuntime
RemoteSparkRuntime.deploy_default_image()
from mlrun.runtimes import Spark3Runtime
Spark3Runtime.deploy_default_image()
Once these images are deployed (to the cluster docker registry), your function with function spec “use_default_image” = True will be able to pull the image and deploy.

Make Airflow KubernetesPodOperator clear image cache without setting image_pull_policy on DAG?

I'm running Airflow on Google Composer. My tasks are KubernetesPodOperators, and by default for Composer, they run with the Celery Executor.
I have just updated the Docker image that one of the KubernetesPodOperator uses, but my changes aren't being reflected. I think the cause is that it is using a cached version of the image.
How do I clear the cache for the KubernetesPodOperator? I know that I can set image_pull_policy=Always in the DAG, but I want it to use cached images in the future, I just need it to refresh the cache now.
Here is how my KubernetesPodOperator (except for the commented line):
processor = kubernetes_pod_operator.KubernetesPodOperator(
task_id='processor',
name='processor',
arguments=[filename],
namespace='default',
pool='default_pool',
image='gcr.io/proj/processor',
# image_pull_policy='Always' # I want to avoid this so I don't have to update the DAG
)
Update - March 3, 2021
I still do not know how to make the worker nodes in Google Composer reload their images once while using the :latest tag on images (or using no tag, as the original question states).
I do believe that #rsantiago's comment would work, i.e. doing a rolling restart. A downside of this approach that I see is that, by default, in Composer worker nodes run in the same node pool as the Airflow infra itself. This means that doing a rolling restart would possibly affect the Airflow scheduling system, Web interface, etc. as well, although I haven't tried it so I'm not sure.
The solution that my team has implemented is adding version numbers to each image release, instead of using no tag, or the :latest tag. This ensures that you know exactly which image should be running.
Another thing that has helped is adding core.logging_level=DEBUG to the "Airflow Configuration Overrides" section of Composer. This will output the command that launched the docker image. If you're using version tags as suggested, this will display that tag.
I would like to note that setting up local debugging has helped tremendously. I am using PyCharm with the docker image as a "Remote Interpreter" which allows me to do step-by-step debugging inside the image to be confident before I push a new version.

How to write to dags directory from Cloud Composer webserver?

I'm writing Cloud Composer plugin and I need to create DAG in runtime. How can I create DAG file from webserver or how can I access bucket ID from plugin code(so I can use gcs client and just upload DAG)? I tried code below and it doesn't work, I don't get any exceptions but also I don't see any results:
dag_path = os.path.join(settings.DAGS_FOLDER, dag_id + '.py')
with open(dag_path, 'w') as dag:
dag.write(result)
Possible solution is to read bucket ID from Cloud Composer env variable
You may either use the Environment Variables, or you may make use of the API's provided by GCloud SDK.
gcloud composer environments describe --format=json --project=<project-name> --location=<region> <cluster-name>
This would return the details of the cloud composer cluster.
It would have the dag location under the key dagGcsPrefix
The format of dagGcsPrefix would be gs://<GCSBucket>/dags

Can I stop Azure Container Service for Linux from issuing Docker Pull commands?

I am using an Azure App Service (Linux containers) to host a container application. Unfortunately for me, the App Service periodically issues a new Docker Pull command like this:
2018-11-08 18:39:32.512 INFO - Issuing docker pull: imagename =library/ghost:2.2.4-alpine
I don't know why it is issuing this command, and I can't find out how to stop it doing so.
I want to stop it because although the volume on which my container stores data can survive restarts of the container, it doesn't seem to survive rebuilding the container. I suspect that this might be because I'm using the Docker Compose (preview), and the docker compose configuration sets a volume name and associates it with the container.
I currently have 'continuous deployment' toggled 'OFF' in the azure console, and I can't find any setting which seems to control whether or not the underlying app service is issuing the docker pull command.
Unfortunately I can't use the docker single container as the pre-built ghost images don't appear to be set up to store data in a volume outside the container.
I have had no luck in searching the App Service FAQs for information about this behaviour. I'm hoping that I've made a foolish mistake which is easy to fix, and that someone here will have seen this and fixed it themselves.
For your issue, you will know how to achieve what you want if you know the work process of Azure Web App for Container.
Each time when the Web App starts, no matter you restart it or restart itself because of the timeout, it will check the image if it should update. When you use the public Docker hub image, the update dependent on the Docker hub, not your order.
So the best way for you is to store the image in your private container registries like your own git hub or Azure Container Registry. And give the image a specific tag. This way make sure that if you do not update the image, the web app will do the check when it starts.

Publish a Service Fabric Container project

I don't make it to publish a container. In my case I want to put an MVC4 Webrole into the container. ...but actually, what's inside the container does not matter.
Their primary tutorial for using a container to lift-and-shift old apps uses Continuous Delivery. The average user does not always need this.
Instead of Continuous Delivery one may use the Visual Studio's support for Docker Compose:
Connect-ServiceFabricCluster <mycluster> and then New-ServiceFabricComposeApplication -ApplicationName> <mytestapp> -Compose docker-compose.yml
But following exactly their tutorial still leads to errors. The applicaton appears in the cluster but outputs immediatly an error event:
"SourceId='System.Hosting', Property='Download:1.0:1.0'. Error during
download. Failed to download container image fabrikamfiber.web"
Do I miss a whole step, which they expect to be obvious? But even placing the image in my Docker Hub registry myself did not help? Or does it need to be Azure Container Registry?
Docker Hub should work fine, ACR is not required.
These blog posts may help:
about running containers
about docker compose on Service Fabric