Custom Container Image for Google Dataproc pyspark Batch Job - pyspark

I am exploring newly introduced the google dataproc serverless. While sumitting job, I want to use custom images (wanted use --container-image argument) such that all my python libraries and related files already present in the server such that job can execute faster .
I have googled and I found only this Dataproc custom images which talks about custom dataproc image. I did not see anything else.
Can you please confirm whether above stated custom image link is right one or is there any other base image we need to use to build container docker image?

No, above link is for custom VM images for Dataproc on GCE clusters.
To create custom container image for Dataproc Serveless for Spark, please follow the guide.
As a side note, all Dataproc Serverless-related documentation is on the https://cloud.google.com/dataproc-serverless website.

Related

How to run docker image in Kubernetes (v1.21) pod for CI/CD scenario?

We're investigating using an AKS cluster to run our Azure Pipeline Agents.
I've looked into tools like Kaniko for building docker images, which should work fine, however some of our pipelines run docker commands, e.g. we run checkov using the docker image, but I've struggled to find any solution that seems to work, given the deprecation of the docker shim in kubernetes.
The obvious solution would be add those tools that we currently run from docker into our agent image but this isn't a great solution as it will mean that any time a developer wants to run a tool like that we would need to modify the new image, which is less than ideal.
Any suggestions?
Thanks
You could just use nerdctl or crictl for your commands and even create an alias for that (specially wiht nerdctl) = alias docker="nerdctl"
As docker images or better said container images using the OCI image format you will have no issues running them with containerd or cri-o.

Where do I view my CronJob in OpenShift Container Platform?

This is a really basic question, that I just can't seem to find it ANYWHERE.
I need to create a CronJob on OpenShift Container Platform.
I wasn't able to find a page on the Container Platform on how to directly create a CronJob.
But I did manage to find instruction on creating it by pasting the Job yaml file in the Add to Application Button.
https://docs.openshift.com/container-platform/4.1/nodes/jobs/nodes-nodes-jobs.html
Now, having created a CronJob(I think).
Lol, how do I even find/modify/delete it on Container Platform?
You can find cron jobs in the Cluster Console/Workloads/Cron Jobs.
In the Openshift version 3.X, you need to be under "Administrator" profile. Then click Workloads/Cronjobs.

Does Kubernetes supports downloading of resources like mesosphere

We know that mesosphere provides mesosphere fetcher in DCOS to download resources into the sandbox directory. Does Kubernetes provides anything similar to same?
While Kubernetes does not have a feature like Mesosphere Fetcher, it is still possible to copy / download resources into a Docker container using the following ways:
Docker's COPY and ADD copy resources from the host into the container.
Docker's ADD supports tar extraction and remote URLs too.
Download / extract resources inside the container using commands like:
wget
curl
lynx
tar
gunzip
No. Kubernetes does not have any built-in feature to download and inject files into a container the way Mesos does.
The Mesos fetcher feature existed before Docker image support, in fact. Prior to images, the fetcher was the primary way to download the executable and any supporting files. Kubernetes never needed that feature because it requires a container image. That said, having both be optional can be useful.
The Mesos fetcher is supported by Mesos, Marathon, and Mesosphere DC/OS.
Kubernetes could hypothetically add support for arbitrary file fetching in the future, but there hasn’t been a lot of demand, and it would probably require either container dependencies within a pod (to use a controller-injected sidecar), a kubelet plugin (to download before container start), or a native fetcher-like feature.

PostgreSQL installation dockerfile

I was trying to create docker image of postgresql installation from sources, however, I have got some errors I do not know how to fix. And I was wondering whether there is an alternative way to build this image without using standard postgres docker image?
I would be grateful for any resources and help!
Regarding the errors that you are facing I have shared the answer here.
As you have asked for docker image, all the open source docker images are stored at https://hub.docker.com which is a central repository for public images. You can search any docker image that you are specifically looking for. Even look for Dockerfiles, available for most of the images, for these images to get an Idea of how the application is built.
You can try this postgres docker image which is available here : https://hub.docker.com/_/postgres/

Creating custom boilerplates in Bluemix

I created an application on Bluemix and I want to create a private Boilerplate from it in order to automatically deploy them when required through a web interface. Is there any possible way to create that boilerplate?
Boilerplates are not public documented, so not possible to create your own in the catalog.
But you can check "Deploy to Bluemix Button" which perhaps covers your requirement of being able to deploy an app and its runtime and required services.
I think that IBM Containers can be used to achieve that goal. A container is basically an application with all its dependencies, that is stored in a portable, platform-independent module (the container). The structure of a container is explained in an image. From a single image you can then instantiate all the containers you want.
So, if you create an image composed by your application and its dependencies and you push it on Bluemix, you can automatically instantiate and deploy new containers (with your application up and running inside) when required through a Web interface as you requested.
IBM Containers are based on Docker containers, that wrap up a piece of software in a complete filesystem that contains everything it needs to run. This guarantees that it will always run the same, regardless of the environment it is running in.
Please refer to IBM Containers Docs to understand how to use Docker on Bluemix and Docker training to learn the basics.