List only private Dataproc custom images - google-cloud-dataproc

I have created a custom dataproc image using this command ...
$ python generate_custom_image.py --image-name my-ubuntu18-custom --dataproc-version 1.5-ubuntu18 --customization-script my-customization-script.sh --zone us-central-1 --gcs-bucket gs://dataproc-xxxxxx-imgs
After creation, I tried to list all the custom dataproc images created by me and was surprised to see 83 images. Mine was showing up alongside 82 other images. I expected to see only mine. How to ensure mine is not in the public list of dataproc images?

gcloud will by default list private images in your default project alongside a standard list of "public image projects", as listed under the gcloud list help page; you can make it not list public images, but only your project-level private images with the flag --no-standard-images:
gcloud compute images list --no-standard-images
Another way to see the difference is if you have two GCP projects, and you gcloud config set project my-other-project and then try a default gcloud compute images list again, you shouldn't expect to see the custom image you created.
Finally, you can also use:
gcloud compute images describe my-ubuntu18-custom
to see the full resource name of the image along with other metadata, showing that it is nested in your project, and also gcloud compute images get-iam-policy:
gcloud compute images get-iam-policy my-ubuntu18-custom
to assure yourself that the permissions of the custom image are not public.

Related

Kubernetes deploying a template with multiple pods with images from DockerHub with blocked DockerHub access

I'm deploying a system which uses many images from DockerHub and other public registries(eg registry.redhat.io). Our k8s doesn't have access to Docker Hub or others. We use AKS & ACR.
The template has multiple references like this
- name: some-api
image: some-vendor/some-api:2.3.2
If I import image some-vendor/some-api:2.3.2 from DockerHub into ACR and change template to use full image name like my-acr-registry.azurecr.io/some-vendor/some-api:2.3.2 then it will pull the image ok, but I'd like avoid touching template. I'd like to do few changes in a namespaces or in the cluster so that it will attempt to find them in ACR.
Is there a way to make AKS to search that image in our ACR without specifying full image name? I'd like to avoid touching the template.
Another example, template also uses some other images from other repositories, eg registry.redhat.io/ubi8/nodejs-14. Is it possible to make it so that it will find it in our ACR if I push it using name my-acr-registry.azurecr.io/registry.redhat.io/ubi8/nodejs-14 without updating template? Is it possible to do that?
Ideally I'd like to add some sort of image resolver, which attempts to pull it as concat("my-acr-registry.azurecr.io/docker.io/", imageName), then if not able to find try to pull it as concat("my-acr-registry.azurecr.io/", imageName) to cater for both cases. Can I do that?

Why are Dataproc cluster properties not in the info panel?

In the Google Cloud Console I have opened the "info" panel for a cluster but I only see labels and permissions. What I really want to see is the cluster properties, such as the Spark and YARN properties.
How can I see cluster properties?
The info panel is generally consistent in the Cloud Console inside and outside of Dataproc. The info panel usually shows labels and IAM permissions.
To see cluster properties:
Open a cluster's detail in the Cloud Console (click the cluster)
Click 'Configuration'
Expand 'Properties'
You can also use the gcloud command to list properties.
gcloud dataproc clusters describe
In addition to what James has posted above, you can also click on REST equivalent link on bottom left to see all the information in one go.

How do I update a service in the cluster to use a new docker image

I have created a new docker image that I want to use to replace the current docker image. The application is on the kubernetes engine on google cloud platform.
I believe I am supposed to use the gcloud container clusters update command. Although, I struggle to see how it works and how I'm supposed to replace the old docker image with the new one.
You may want to use kubectl in order to interact with your GKE cluster. Method of image update depends on how the Pod / Container was created.
For some example commands, see https://kubernetes.io/docs/reference/kubectl/cheatsheet/#updating-resources
For example, kubectl set image deployment/frontend www=image:v2 will do a rolling update "www" containers of "frontend" deployment, updating the image.
Getting up and running on GKE: https://cloud.google.com/kubernetes-engine/docs/quickstart
You can use Container Registry[1] as a single place to manage Docker images.
Google Container Registry provides secure, private Docker repository storage on Google Cloud Platform. You can use gcloud to push[2] images to your registry, then you can pull images using an HTTP endpoint from any machine.
You can also use Docker Hub repositories[3] allow you share container images with your team, customers, or the Docker community at large.
[1]https://cloud.google.com/container-registry/
[2]https://cloud.google.com/container-registry/docs/pushing-and-pulling
[3]https://docs.docker.com/docker-hub/repos/

Unable to create Dataproc cluster using custom image

I am able to create a google dataproc cluster from the command line using a custom image:
gcloud beta dataproc clusters create cluster-name --image=custom-image-name
as specified in https://cloud.google.com/dataproc/docs/guides/dataproc-images, but I am unable to find information about how to do the same using the v1beta2 REST api in order to create a cluster from within airflow. Any help would be greatly appreciated.
Since custom images can theoretically reside in a different project if you grant read/use access of that custom image to whatever project service account you use for the Dataproc cluster, images currently always need a full URI, not just a short name.
When you use gcloud, there's syntactic sugar where gcloud will resolve the full URI automatically; you can see this in action if you use --log-http with your gcloud command:
gcloud beta dataproc clusters create foo --image=custom-image-name --log-http
If you created one with gcloud you can also gcloud dataproc clusters describe your cluster to see the fully-resolved custom image URI.

GCR using CLI to retrieve tagless images

I know you can do the following to retrieve a list of tags of a specific image:
gcloud container images list --repository=gcr.io/myproject
But I was wondering whether I can also use the gcloud CLI to retrieve the images without a tag.
The tag-less images are shown in the Google cloud console web interface.
Solution
gcloud container images list-tags gcr.io/myproject/repo --filter='-tags:*'
list-tags would be better for your needs. Specifically, if you want to see information on all images (including untagged ones):
gcloud container images list-tags gcr.io/project-id/repository --format=json
And if you want to print the digests of images which are untagged:
gcloud container images list-tags gcr.io/project-id/repository --filter='-tags:*' --format='get(digest)' --limit=$BIG_NUMBER
I think what you are looking for is list-tags sub-command:
gcloud container images list-tags --repository=gcr.io/myproject/myrepo