Google Dataproc Agent reports failure when using initialization script - google-cloud-dataproc

I am trying to set up a cluster with an initialization script, but I get the following error:
[BAD JSON: JSON Parse error: Unexpected identifier "Google"]
In the log folder the init script output log is absent.
This seems rather strange as it seemed to work past week, and the error message does not seem related to the init script, but rather to the input arguments for the cluster creation. I used the following command:
gcloud beta dataproc clusters create <clustername> --bucket <bucket> --zone <zone> --master-machine-type n1-standard-1 --master-boot-disk-size 10 --num-workers 2 --worker-machine-type n1-standard-1 --worker-boot-disk-size 10 --project <projectname> --initialization-actions <gcs-uri of script>

Apparently changing
#!/bin/sh
to
#!/bin/bash
and removing all "sudo" occurrences did the trick.

This particular error occurs most often when the initialization script is in a Cloud Storage (GCS) bucket to which the project running the cluster does not have access.
I would recommend double-checking the project which is being used for the cluster has read access to the bucket.

Related

Error executing access token command "/google/google-cloud-sdk/bin/gcloud config-helper --format=json

I'm trying to follow this step by step to upload the airflow in Kubernetes (https://github.com/EamonKeane/airflow-GKE-k8sExecutor-helm) but in this part of the execution I have problems as follows:
Researching on the topic did not find anything that solved so far my problem, does anyone have any suggestions of what to do?
SQL_ALCHEMY_CONN=postgresql+psycopg2://$AIRFLOW_DB_USER:$AIRFLOW_DB_USER_PASSWORD#$KUBERNETES_POSTGRES_CLOUDSQLPROXY_SERVICE:$KUBERNETES_POSTGRES_CLOUDSQLPROXY_PORT/$AIRFLOW_DB_NAME
echo $SQL_ALCHEMY_CONN > /secrets/airflow/sql_alchemy_conn
# Create the fernet key which is needed to decrypt database the database
FERNET_KEY=$(dd if=/dev/urandom bs=32 count=1 2>/dev/null | openssl base64)
echo $FERNET_KEY > /secrets/airflow/fernet-key
kubectl create secret generic airflow \
--from-file=fernet-key=/secrets/airflow/fernet-key \
--from-file=sql_alchemy_conn=/secrets/airflow/sql_alchemy_conn
Unable to connect to the server: error executing access token command
"/google/google-cloud-sdk/bin/gcloud config config-helper
--format=json": err=exit status 1 output= stderr=ERROR: gcloud crashed (BadStatusLine): '' If you would like to report this issue, please run
the following command: gcloud feedback To check gcloud for common
problems, please run the following command: gcloud info
--run-diagnostics
I solved this by creating a new cloud shell tab to connect the cluster:
gcloud container clusters get-credentials testcluster1 --zone = your_zone
Example:
get the name and location of your cluster
gcloud container clusters list
then
gcloud container clusters get-credentials demo --region=us-west1-a

How to create an SSH in gcloud, but keep getting API error

I am trying to set up datalab from my chrome book using the following tutorial https://cloud.google.com/dataproc/docs/tutorials/dataproc-datalab. However when trying to set up an SSH tunnel using the following guidelines https://cloud.google.com/dataproc/docs/concepts/accessing/cluster-web-interfaces#create_an_ssh_tunnel I keep on receiving the following error.
ERROR: (gcloud.compute.ssh) Could not fetch resource:
- Project 57800607318 is not found and cannot be used for API calls. If it is recently created, enable Compute Engine API by visiting https://console.developers.google
.com/apis/api/compute.googleapis.com/overview?project=57800607318 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our sy
stems and retry.
The error message would lead me to believe my "Compute Engine API" is not enabled. However, I have double checked and "Compute Engine API" is enabled.
Here is what I am entering into the cloud shell
gcloud compute ssh ${test-cluster-m} \
--project=${datalab-test-229519} --zone=${us-west1-b} -- \
-4 -N -L ${8080}:${test-cluster-m}:${8080}
The ${} is for accessing the local environment variable. You set them in the step before with:
export PROJECT=project;export HOSTNAME=hostname;export ZONE=zone;PORT=number
In this case would be:
export PROJECT=datalab-test-229519;export HOSTNAME=test-cluster-m;export ZONE=us-west1-b;PORT=8080
Either try this:
gcloud compute ssh test-cluster-m \
--project datalab-test-229519 --zone us-west1-b -- \
-D 8080 -N
Or access the enviroment variables with:
gcloud compute ssh ${HOSTNAME} \
--project=${PROJECT} --zone=${ZONE} -- \
-D ${PORT} -N
Also check the VM you are trying to access is running.

How to know when dataproc initialization actions are done

I need to run a Dataproc cluster with both BigQuery and Cloud Storage connectors installed.
I use a variant of this script (because I have no access to the bucket used in the general one), everything is working fine but when I run a job, when the cluster is up and running, it always results in a Task was not acquired error.
I can fix this by simply restarting the dataproc agent on every nodes but I really need this to work properly to be able to run a job right after my cluster is created. it seems that this part of the script is not working properly:
# Restarts Dataproc Agent after successful initialization
# WARNING: this function relies on undocumented and not officially supported Dataproc Agent
# "sentinel" files to determine successful Agent initialization and not guaranteed
# to work in the future. Use at your own risk!
restart_dataproc_agent() {
# Because Dataproc Agent should be restarted after initialization, we need to wait until
# it will create a sentinel file that signals initialization competition (success or failure)
while [[ ! -f /var/lib/google/dataproc/has_run_before ]]; do
sleep 1
done
# If Dataproc Agent didn't create a sentinel file that signals initialization
# failure then it means that initialization succeded and it should be restarted
if [[ ! -f /var/lib/google/dataproc/has_failed_before ]]; then
service google-dataproc-agent restart
fi
}
export -f restart_dataproc_agent
# Schedule asynchronous Dataproc Agent restart so it will use updated connectors.
# It could not be restarted sycnhronously because Dataproc Agent should be restarted
# after its initialization, including init actions execution, has been completed.
bash -c restart_dataproc_agent & disown
My question here are:
How to know that the initialization actions are done?
Do I have/How to properly restart the Dataproc agent one my newly created cluster's nodes?
EDIT:
Here is the command I use to create a cluster (using the 1.3 image version):
gcloud dataproc --region europe-west1 \
clusters create my-cluster \
--bucket my-bucket \
--subnet default \
--zone europe-west1-b \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 50 \
--num-workers 2 \
--worker-machine-type n1-standard-2 \
--worker-boot-disk-size 100 \
--image-version 1.3 \
--scopes 'https://www.googleapis.com/auth/cloud-platform' \
--project my-project \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh \
--metadata 'gcs-connector-version=1.9.6' \
--metadata 'bigquery-connector-version=0.13.6'
Also, please note that the connectors initialization script has been fixed and works fine by now, so I am using it now but I still have to restart manually the dataproc agent to be able to run a job.
Dataproc agent logs Custom initialization actions finished. message in the /var/log/google-dataproc-agent.0.log file after initialization actions succeed.
No you don't need to restart Dataproc agent manually.
This issue is caused by Dataproc agent service restart in the connectors initialization action and should be resolved by this PR.
As for knowing when the initialization actions are finished, You can check the dataproc's status.state, if it's CREATING that means initialization actions are still being executed, if RUNNING that would mean that they are done!
Check here

How to run cluster initialization script on GCP after creation of cluster

I have created a Google Dataproc cluster, but need to install presto as I now have a requirement. Presto is provided as an initialization action on Dataproc here, how can I run this initialization action after creation of the cluster.
Most init actions would probably run even after the cluster is created (though I haven't tried the Presto init action).
I like to run clusters describe to get the instance names, then run something like gcloud compute ssh <NODE> -- -T sudo bash -s < presto.sh for each node. Reference: How to use SSH to run a shell script on a remote machine?.
Notes:
Everything after the -- are args to the normal ssh command
The -T means don't try to create an interactive session (otherwise you'll get a warning like "Pseudo-terminal will not be allocated because stdin is not a terminal.")
I use "sudo bash" because init actions scripts assume they're being run as root.
presto.sh must be a copy of the script on your local machine. You could alternatively ssh and gsutil cp gs://dataproc-initialization-actions/presto/presto.sh . && sudo bash presto.sh.
But #Kanji Hara is correct in general. Spinning up a new cluster is pretty fast/painless, so we advocate using initialization actions when creating a cluster.
You could use initialization-actions parameter
Ex:
gcloud dataproc clusters create $CLUSTERNAME \
--project $PROJECT \
--num-workers $WORKERS \
--bucket $BUCKET \
--master-machine-type $VMMASTER \
--worker-machine-type $VMWORKER \
--initialization-actions \
gs://dataproc-initialization-actions/presto/presto.sh \
--scopes cloud-platform
Maybe this script can help you: https://github.com/kanjih-ciandt/script-dataproc-datalab

GCE Windows Instance not running startup scripts

I have been trying to apply my startup scripts to new Windows instances on Google Compute Engine as described here, however when I check the instances there is no trace of them ever being executed. Here is the gcloud command I am running:
gcloud compute instances create "my-instance"
--project "my-project"
--zone "us-central1-a"
--machine-type "g1-small"
--network "default"
--metadata "gce-initial-windows-user=my-user" "gce-initial-windows-password=my-pass"
--maintenance-policy "MIGRATE"
--scopes "storage-ro"
--tags "http-server" "https-server"
--image "https://www.googleapis.com/compute/v1/projects/windows-cloud/global/images/windows-server-2008-r2-dc-v20150110"
--boot-disk-type "pd-standard"
--boot-disk-device-name "my-instance"
--metadata-from-file sysprep-oobe-script-ps1=D:\Path\To\startup.ps1
I tried using all 3 startup types (sysprep-specialize-script-ps1, sysprep-oobe-script-ps1, windows-startup-script-ps1) but none worked. Can't see any indication in the Task Scheduler or Event Viewer either. The file on my system exists and does work when I run it manually. How can I get this working?
A good way to debug Powershell scripts is to have them write to the serial console (COM1). You'll be able to see the output of the script from GCE's serial port output.
gcloud compute instances get-serial-port-output my-instance --zone
us-central1-a
If there's no script you'll see something like:
Calling oobe-script from metadata.
attributes/sysprep-oobe-script-bat value is not set or metadata server is not reachable.
attributes/sysprep-oobe-script-cmd value is not set or metadata server is not reachable.
attributes/sysprep-oobe-script-ps1 value is not set or metadata server is not reachable.
Running schtasks with arguments /run /tn GCEStartup
--> SUCCESS: Attempted to run the scheduled task "GCEStartup".
-------------------------------------------------------------
Instance setup finished. windows is ready to use.
-------------------------------------------------------------
Booting on date 01/25/2015 06:26:26
attributes/windows-startup-script-bat value is not set or metadata server is not reachable.
attributes/windows-startup-script-cmd value is not set or metadata server is not reachable.
attributes/windows-startup-script-ps1 value is not set or metadata server is not reachable.
Make sure that contents of the ps1 file is actually attached to the instance.
gcloud compute instances describe my-instance --zone us-central1-a
--format json
The JSON dump should contain the powershell script within it.
Lastly, a great way to debug Powershell startup scripts is to write the output to the serial console.
You can print log messages and see them in the Google Developer Console > Compute > Compute Engine > VM Instances > (Instance Name). Then scroll to the bottom and click the expand option for "Serial console".
Function Write-SerialPort ([string] $message) {
$port = new-Object System.IO.Ports.SerialPort COM1,9600,None,8,one
$port.open()
$port.WriteLine($message)
$port.Close()
}
Write-SerialPort ("Testing GCE Startup Script")
This command worked for me, I had to make sure that the script was written in ascii. Powershell ISE writes with a different encoding that breaks gcloud compute.
gcloud compute instances create testwin2 --zone us-central1-a
--metadata-from-file sysprep-oobe-script-ps1=testconsole.ps1 --image windows-2008-r2