Failed to stop job or delete job in dataproc google cloud platform - google-cloud-dataproc

When I am trying to delete dataproc cluster in google cloud platform getting below error,
Failed to stop job b021d29d-acc9-409d-8fca-52363076a63c Cluster not
found
could any one help??

I'm guessing you are trying to delete the cluster via the Dataproc Clusters UI. In that case the problem could be a bug with the UI itself which always sets the cluster region argument to 'global'. If your cluster region is not set to 'global' you'll get the 'Cluster not found' error.
The solution is to use the gcloud api:
gcloud dataproc clusters delete NAME [--async] [--region=REGION] [GCLOUD_WIDE_FLAG …]
ref: https://cloud.google.com/sdk/gcloud/reference/dataproc/clusters/delete

Related

GKE-kubevirt virtualmachineinstances-mutator.kubevirt.io issue

I'm trying to use kubevirt in google cloud platform GKE cluster.
I got to know the GKE won't support nested virtualisation so i tried below step to install kubevirt in GKE
Is there a way to enable nested virtualization in GKE cluster node?
Start a GKE cluster with ubuntu/containerd, n1-standard nodes and minimum cpu of Haswell.
Find the template used for your new cluster, then to determine the proper source image:
gcloud compute instance-templates describe --format=json | jq ".properties.disks[0].initializeParams.sourceImage"
Create a copy of the source disk with nested virtualization enabled:
gcloud compute images --project $PROJECT create $NEW_IMAGE_NAME --source-image $SOURCE_IMAGE --source-image-project=$SOURCE_PROJECT --licenses "https://www.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"
Use "Create Similar" on the template for your GKE cluster. Change the boot disk to $NEW_IMAGE_NAME. You will also need to drill down to networking/alias and change the default subnet to your pod network.
5.Trigger a rolling update on the group for your GKE nodes to move them to the new template.
I'm able to install kubevirt and virtctl but when i try to launch basic vm using
kubectl apply -f https://kubevirt.io/labs/manifests/vm.yaml
I'm getting below error:
Error from server (InternalError): error when creating "https://kubevirt.io/labs/manifests/vm.yaml": Internal error occurred: failed calling webhook "virtualmachines-mutator.kubevirt.io": failed to call webhook: Post "https://virt-api.kubevirt.svc:443/virtualmachines-mutate?timeout=10s": context deadline exceeded
Is there any way to debug the error.
How to make Kubevirt work in GKE.
I am trying to achieve Kubevirt in GKE

How to check if AWS EKS cluster is active to be able to start a nodegroup?

I started an AWS EKS cluster using AWS CloudFormation. I then waited for the stack creation to complete through the following command:
$ aws cloudformation wait stack-create-complete --stack-name "myclusterstack"
I also waited for the EKS cluster to be active through the following command:
aws eks wait cluster-active --name "mycluster"
After these commands complete, I started up an AWS EKS Nodegroup using AWS CloudFormation. However, I get the following error in the CloudFormation stack events:
Resource handler returned message: "Cluster 'mycluster' is not in ACTIVE status (Service: Eks, Status Code: 400, Request ID: a9aa2e4e-1b17-4fd4-bd89-851037867b25)" (RequestToken: 410bd19e-7710-b637-a30c-889f5b5a8893, HandlerErrorCode: InvalidRequest)
I noticed, however, that if I wait for around 5 minutes after creation of the cluster, I don't get this error when I create the nodegroup.
It seems like a few more minutes is needed after aws eks wait cluster-active to be able to successfully launch a nodegroup.
I also tried checking if the cluster endpoint is available as a means to check if the cluster is already active:
aws eks describe-cluster --name "mycluster" --query 'cluster.endpoint'
However, creation of the nodegroup still fails even when the cluster endpoint is already available.
Is sleeping for a few minutes the only way to be able to spin up a nodegroup after cluster creation? What should be the proper way to wait for a cluster to be active to be able to launch a nodegroup?

Unable to create Dataproc cluster using custom image

I am able to create a google dataproc cluster from the command line using a custom image:
gcloud beta dataproc clusters create cluster-name --image=custom-image-name
as specified in https://cloud.google.com/dataproc/docs/guides/dataproc-images, but I am unable to find information about how to do the same using the v1beta2 REST api in order to create a cluster from within airflow. Any help would be greatly appreciated.
Since custom images can theoretically reside in a different project if you grant read/use access of that custom image to whatever project service account you use for the Dataproc cluster, images currently always need a full URI, not just a short name.
When you use gcloud, there's syntactic sugar where gcloud will resolve the full URI automatically; you can see this in action if you use --log-http with your gcloud command:
gcloud beta dataproc clusters create foo --image=custom-image-name --log-http
If you created one with gcloud you can also gcloud dataproc clusters describe your cluster to see the fully-resolved custom image URI.

TLS handshake timeout with kubernetes in GKE

I've created a cluster on Google Kubernetes Engine (previously Google Container Engine) and installed the Google Cloud SDK and the Kubernetes tools with it on my Windows machine.
It worked well for some time, and, out of nowhere, it stopped working. Every command I'm issuing with kubectl provokes the following:
Unable to connect to the server: net/http: TLS handshake timeout
I've searched Google, the Kubernetes Github Issues, Stack Overflow, Server Fault ... without success.
I've tried the following:
Restart my computer
Change wifi connection
Check that I'm not somehow using a proxy
Delete and re-create my cluster
Uninstall the Google Cloud SDK (and kubectl) from my machine and re-install them
Delete my .kube folder (config and cache)
Check my .kube/config
Change my cluster's version (tried 1.8.3-gke.0 and 1.7.8-gke.0)
Retry several hours later
Tried both on PowerShell and cmd.exe
Note that the cluster seem to work perfectly, since I have my application running on it and can interact with it normally through the Google Cloud Shell.
Running:
gcloud container clusters get-credentials cluster-2 --zone europe-west1-b --project ___
kubectl get pods
works on Google Cloud Shell and provokes the TLS handshake timeout on my machine.
For others seeing this issue, there is another cause to consider.
After doing:
gcloud config set project $PROJECT_NAME
gcloud config set container/cluster $CLUSTER_NAME
gcloud config set compute/zone europe-west2
gcloud beta container clusters get-credentials $CLUSTER_NAME --region europe-west2 --project $PROJECT_NAME
I was then seeing:
kubectl cluster-info
Unable to connect to the server: net/http: TLS handshake timeout
I tried everything suggested here and elsewhere. When the above worked without issue from my home desktop, I discovered that shared workspace wifi was disrupting TLS/VPNs to control the internet access!
This is what I did to solve the above problem.
I simply ran the following commands::
> gcloud container clusters get-credentials {cluster_name} --zone {zone_name} --project {project_name}
> gcloud auth application-default login
Replace the placeholders appropriately.
So this MAY NOT work for you on GKE, but Azure AKS (managed Kubernetes) has a similar problem with the same error message so who knows — this might be helpful to someone.
The solution to this for me was to scale the nodes in my Cluster from the Azure Kubernetes service blade web console.
Workaround / Solution
Log into the Azure (or GKE) Console — Kubernetes Service UI.
Scale your cluster up by 1 node.
Wait for scale to complete and attempt to connect (you should be able to).
Scale your cluster back down to the normal size to avoid cost increases.
Total time it took me ~2 mins.
More Background Info on the Issue
Added this to the full ticket description write up that I posted over here (if you want more info have a read):
'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure AKS server?

Kubernetes unable to pull images from gcr.io

I am trying to setup Kubernetes for the first time. I am following the Fedora Manual installation guide: http://kubernetes.io/v1.0/docs/getting-started-guides/fedora/fedora_manual_config.html
I am trying to get the kubernetes addons running , specifically the kube-ui. I created the service and replication controller like so:
kubectl create -f cluster/addons/kube-ui/kube-ui-rc.yaml --namespace=kube-system
kubectl create -f cluster/addons/kube-ui/kube-ui-svc.yaml --namespace=kube-system
When i run
kubectl get events --namespace=kube-system
I see errors such as this:
Failed to pull image "gcr.io/google_containers/pause:0.8.0": image pull failed for gcr.io/google_containers/pause:0.8.0, this may be because there are no credentials on this request. details: (Authentication is required.)
How am i supposed to tell kubernetes to authenticate? This isnt covered in the documentation. So how do i fix this?
This happened due to a recent outage to gce storage as a result of which all of us went through this error while pulling images from gcr (which uses gce storage on the backend).
Are you still seeing this error ?
as the message says, you need credentials. Are you using Google Container Engine? Then you need to run
gcloud config set project <your-project>
gcloud config set compute/zone <your-zone, like us-central1-f>
gcloud beta container clusters get-credentials --cluster <your-cluster-name>
then your GCE cluster will have the credentials