Unable to create Dataproc cluster using custom image - gcloud

I am able to create a google dataproc cluster from the command line using a custom image:
gcloud beta dataproc clusters create cluster-name --image=custom-image-name
as specified in https://cloud.google.com/dataproc/docs/guides/dataproc-images, but I am unable to find information about how to do the same using the v1beta2 REST api in order to create a cluster from within airflow. Any help would be greatly appreciated.

Since custom images can theoretically reside in a different project if you grant read/use access of that custom image to whatever project service account you use for the Dataproc cluster, images currently always need a full URI, not just a short name.
When you use gcloud, there's syntactic sugar where gcloud will resolve the full URI automatically; you can see this in action if you use --log-http with your gcloud command:
gcloud beta dataproc clusters create foo --image=custom-image-name --log-http
If you created one with gcloud you can also gcloud dataproc clusters describe your cluster to see the fully-resolved custom image URI.

Related

How to easily switch gcloud / kubectl credentials

At work we use Kubernetes hosted in GCP. I also have a side project hosted in my personal GCP account using Google App Engine (deploy using gcloud app deploy).
Often when I try to run a command such as kubectl logs -f service-name, I get an error like "Error from server (Forbidden): pods is forbidden: User "my_personal_email#gmail.com" cannot list resource "pods" in API group "" in the namespace "WORK_NAMESPACE": Required "container.pods.list" permission." and then I have to fight with kubectl for hours trying to get it to work.
Can somebody please break it down for a slow person like me, how gcloud and kubectl work together, and how I can easily switch accounts so I can use gcloud commands for my personal projects and kubectl commands for my work projects? I'm happy to nuke my whole config and start from scratch if that's what it takes. I've found various kubectl and gcloud documentation but it doesn't make much sense or talks in circles.
Edit: this is on Linux.
Had the same problem and doing all of the:
gcloud auth login
gcloud auth list
gcloud config set account
gcloud projects list
didn't help. I knew gcloud switched fine as I was able to list other resources with it directly.
But it seems kubectl can't pick those changes up automatically, as kubectl/gcloud integration relies on the pre-generated key, which has a 1h expiration(not sure if it's a default but it's what it is on my machine right now).
So, on top of setting right user/project/account with gcloud, you should re-generate the creds:
gcloud container clusters get-credentials <my-cluster> --zone <clusters-zone>
I'm in the same boat as you - apps deployed in GKE for work and personal projects deployed in my personal GCP account.
gcloud stores a list of logged in accounts that you can switch between to communicate with associated projects. Take a look at these commands:
gcloud auth login
gcloud auth list
gcloud config set account
gcloud projects list
To work with a specific project under one of your accounts you need to set that configuration via gcloud config set project PROJECT_ID
kubectl has a list of "contexts" on your local machine in ~/.kube/config. Your current context is the cluster you want to run commands against - similar to the active account/project in gcloud.
Unlike gcloud these are cluster specific and store info on cluster endpoint, default namespaces, the current context, etc. You can have contexts from GCP, AWS, on-prem...anywhere you have a cluster. We have different clusters for dev, qa, and prod (thus different contexts) and switch between them a ton. Take a look at the [kubectx project][1] https://github.com/ahmetb/kubectx for an easier way to switch between contexts and namespaces.
kubectl will use the keys from whatever GCP account you are logged in with against the cluster that is set as your current context. i.e., from your error above, if your active account for gcloud is your personal but try to list pods from a cluster at work you will get an error. You either need to set the active account/project for gcloud to your work email or change the kubectl context to a cluster that is hosted in your personal GCP account/project.
For me updating the ~/.kube/config and setting the expiry to a date in past fixes it
TL;DR
Use gcloud config configurations to manage your separate profiles with Google Cloud Platform.
Add an explicit configuration argument to the cmd-args of your kubeconfig's user to prevent gcloud from producing an access token for an unrelated profile.
users:
- user:
auth-provider:
config:
cmd-args: config --configuration=work config-helper --format=json
Can somebody please break it down for a slow person like me, how gcloud and kubectl work together, and how I can easily switch accounts so I can use gcloud commands for my personal projects and kubectl commands for my work projects?
Sure! By following Google's suggested instructions that lead to running gcloud container clusters get-credentials ... when configuring a kubernetes cluster, you will end up with a section of your kubeconfig that contains information on what kubectl should do to acquire an access token when communicating with a cluster that is configured with a given user. That will look something like this:
users:
- name: gke_project-name_cluster-zone_cluster-name
user:
auth-provider:
config:
access-token: &Redacted
cmd-args: config config-helper --format=json
cmd-path: /path/to/google-cloud-sdk/bin/gcloud
expiry: "2022-12-25T01:02:03Z"
expiry-key: '{.credential.token_expiry}'
token-key: '{.credential.access_token}'
name: gcp
Basically, this tells kubectl to run gcloud config config-helper --format=json when it needs a new token, and to parse the access_token using the json-path .credential.access_token in the response from that command. This is the crux in understanding how kubectl communicates with gcloud.
Like you, I use google cloud both personally and at work. The issue is that this user configuration block does not take into account the fact that it shouldn't use the currently active gcloud account when generating a credential. Even if you don't use kubernetes in either one of your two projects, extensions in vscode for example might try to run a kubectl command when you're working on something in a different project. If this were to happen after your current token is expired, gcloud config config-helper might get invoked to generate a token using a personal account.
To prevent this from happening, I suggest using gcloud config configuations. Configurations are global configuration profiles that you can quickly switch between. For example, I have two configurations that look like:
> gcloud config configurations list
NAME IS_ACTIVE ACCOUNT PROJECT COMPUTE_DEFAULT_ZONE COMPUTE_DEFAULT_REGION
work False zev#work.email work-project us-west1-a us-west1
personal True zev#personal.email personal-project northamerica-northeast1-a northamerica-northeast1
With configurations you can alter your kubeconfig to specify which configuration to always use when creating an access token for a given kubernetes user by altering the kubeconfig user's auth-provider.config.cmd-args to include one of your gcloud configurations. With a value like config --configuration=work config-helper --format=json, whenever kubectl needs a new access token, it will use the account from your work configuration regardless of which account is currently active with the gcloud tool.

Automatically get cluster credentials when activating gcloud configuration

Working in GCP with several kubernetes clusters, I would like to automatically get cluster credentials when switching gcloud configurations.
I have created several configurations for gcloud with gcloud config configurations create [config-name] and I have set what I need, specifically gcloud config set container/cluster [cluster-name].
When I switch configurations with gcloud config configurations activate [config-name], everything goes ok, except I do not get the credentials for the cluster I have configured for that configuration. Instead I need to run gcloud container clusters get-credentials [cluster-name].
Is there any way to automatically get credentials for a cluster when activating a gcloud configuration?
I think not.
gcloud and kubectl are distinct tools and each maintains its own configuration.
gcloud container custers get-credentials is a bridging helper that configures kubectl configuration (conventionally located in ~/.kube/config file) with a gcloud auth helper to facilitate accessing Kubernetes Engine clusters. But, otherwise the 2 tools are unrelated.
Have a look at this post I wrote that covers using different configurations with kubectl. It's not exactly what you want but I hope it will be useful:
https://medium.com/google-cloud/context-light-gcloud-and-kubectl-89185d38ce82

"permanent" GKE kubectl service account authentication

I deploy apps to Kubernetes running on Google Cloud from CI. CI makes use of kubectl config which contains auth information (either in directly CVS or templated from the env vars during build)
CI has seperate Google Cloud service account and I generate kubectl config via
gcloud auth activate-service-account --key-file=key-file.json
and
gcloud container clusters get-credentials <cluster-name>
This sets the kubectl config but the token expires in few hours.
What are my options of having 'permanent' kubectl config other than providing CI with key file during the build and running gcloud container clusters get-credentials ?
You should look into RBAC (role based access control) which will authenticate the role avoiding expiration in contrast to certificates which currently expires as mentioned.
For those asking the same question and upvoting.
This is my current sollution:
For some time I treated key-file.json as an identity token, put it to the CI config and used it within container with gcloud CLI installed. I used the key file/token to log in to GCP and let gcloud generate kubectl config - the same approach used for GCP container registry login.
This works fine but using kubectl in CI is kind of antipattern. I switched to deploying based on container registry push events. This is relatively easy to do in k8s with keel flux, etc. So CI has only to push Docker image to the repo and its job ends there. The rest is taken care of within k8s itself so there is no need for kubectl and it's config in the CI jobs.

How to acces Google kubernetes cluster without googlecloud SDK?

I'm having trouble figuring out how I can set my kubectl context to connect to a googlecloud cluster without using the gcloud sdk. (to run in a controlled CI environment)
I created a service account in googlecloud
Generated a secret from that (json format)
From there, how do I configure kubectl context to be able to interact with the cluster ?
Right in the Cloud Console you can find the connect link
gcloud container clusters get-credentials "your-cluster-name" --zone "desired-zone" --project "your_project"
But before this you should configure gcloud tool.

Access Kubernetes GKE cluster outside of GKE cluster with client-go?

I have multiple kubernetes clusters running on GKE (let's say clusterA and clusterB)
I want to access both of those clusters from client-go in an app that is running in one of those clusters (e.g. access clusterB from an app that is running on clusterA)
I general for authenticating with kubernetes clusters from client-go I see that I have two options:
InCluster config
or from kube config file
So it is easy to access clusterA from clusterA but not clusterB from clusterA.
What are my options here? It seems that I just cannot pass GOOGLE_APPLICATION_CREDENTIALS and hope that client-go will take care of itself.
So my thinking:
create a dedicated IAM service account
create kube config with tokens for both clusters by doing gcloud container clusters get-credentials clusterA and gcloud container clusters get-credentials clusterB
use that kube config file in client-go via BuildConfigFromFlags on clusterA
Is this the correct approach, or is there a simpler way? I see that tokens have an expiration date?
Update:
It seems I can also use CLOUDSDK_CONTAINER_USE_CLIENT_CERTIFICATE=True gcloud beta container clusters get-credentials clusterB --zone. Which would add certificates to kube conf which I could use. But AFAIK those certificates cannot be revoked
client-go needs to know about:
cluster master’s IP address
cluster’s CA certificate
(If you're using GKE, you can see these info in $HOME/.kube/config, populated by gcloud container clusters get-credentials command).
I recommend you to either:
Have a kubeconfig file that contains these info for clusters A & B
Use GKE API to retrieve these info for clusters A & B (example here) (You'll need a service account to do this, explained below.)
Once you can create a *rest.Config object in client-go, client-go will use the auth plugin that's specified in the kubeconfig file (or its in-memory equivalent you constructed). In gcp auth plugin, it knows how to retrieve a token.
Then, Create a Cloud IAM Service Account and give it "Container Developer" role. Download its key.
Now, you have two options:
Option 1: your program uses gcloud
gcloud auth activate-service-account --key-file=key.json
KUBECONFIG=a.yaml gcloud container clusters get-credentials clusterA
KUBECONFIG=b.yaml gcloud container clusters get-credentials clusterB
Then create 2 different *rest.Client objects, one created from a.yaml, another from b.yaml in your program.
Now your program will rely on gcloud binary to retrieve token every time your token expires (every 1 hour).
Option 2: use GOOGLE_APPLICATION_CREDENTIALS
Don't install gcloud to your program’s environment.
Set your key.json to GOOGLE_APPLICATION_CREDENTIALS environment
variable for your program.
Figure out a way to get cluster IP/CA (explained above) so you can
construct two different *rest.Config objects for cluster A & B.
Now your program will use the specified key file to get an access_token
to Google API every time it expires (every 1h).
Hope this helps.
P.S. do not forget to import _ "k8s.io/client-go/plugin/pkg/client/auth/gcp" in your Go program. This loads the gcp auth plugin!