let's consider the following command:
gcloud dataflow jobs drain JOB_ID --region=europe-west1 --project=project_id
It is just an example of command, the question is general:
Why is it necessary to give a region. Why gcloud is not able to determine it on himself? After all, I give a project id and job id. For my eye gcloud should be able to determine region by project id and job id.
In relation to your general question, as mentioned in the beginning of the Changing the Default Region or Zone documentation, it is expected that requests related to regional resources require that you supply a region name.
For the example you provided, it is most likely that the Dataflow API looks for the JOB_ID in the specified region, this makes more sense once you look at the message returned when executing the command without the --region flag:
Could not cancel workflow; user does not have sufficient permissions on project: [PROJECT_ID], or the job does not exist in the project. Please ensure you have permission to access the job and the `--region` flag, us-central1, matches the job's region.
The command uses us-central1 as the default region for the request, and the API tries to find the provided JOB_ID “inside” this region.
Related
kubectl version does not show when the cluster was upgraded with the current version. Is there a way to get the cluster upgrade timestamp or better timestamps of all previous upgrades?
We can see a list of every running and completed operations in the cluster using the following command :
gcloud beta container operations list
Each operation is assigned an operation ID , operation type, start and end times, target cluster, and status.
To get more information about a specific operation, specify the operation ID in the following command:
gcloud beta container operations describe OPERATION_ID -z region
I need to get the oldest instance from an instance group. I am using the following command:
gcloud compute instance-groups managed list-instances "instance-group-name" --region "us-central1" --format="value(NAME,ZONE,CREATION_TIMESTAMP)" --sort-by='~CREATION_TIMESTAMP'
But it seems --sort-by is not working or I am using it a bit wrong.
Could you please suggest the right way.
It's probably creationTimestamp not CREATION_TIMESTAMP.
See: instances.list and the response body for the underlying field names.
It's slightly confusing but gcloud requires you to use the (field|property) names of the underlying request|response types not the output names.
Another way to more readily determine this is to add --format=yaml or --format=json to gcloud compute instances list (or any gcloud command) to get an idea of what's being returned so that you can begin filtering and formatting it.
I want to pass the runid of Data fusion pipeline to some function upon pipeline completion but i am not able to find any run-time variable which holds this value. Please help!
As an update to the previous answer, the first thing to do is to obtain the details of the deployed pipelines in a given namespace. For this, the following endpoint should be queried: '/v3/namespaces/${NAMESPACE}/apps'. Where ${NAMESPACE} is the namespace where the pipeline is deployed.
This endpoint returns a list with the pipelines deployed on this namespace ${NAMESPACE} (not the pipeline JSON, just a high level description list). Once the pipeline list is obtained, to obtain the run metrics of a given pipeline, the following endpoint should be called: '/v3/namespaces/${NAMESPACE}/apps/${PIPELINE}/workflows/DataPipelineWorkflow/runs', where ${PIPELINE} is the name of the pipeline. This endpoint will return the details of all the runs for this pipeline. This is where the run_id can be obtained. The field containing the run_id is actually called runid in this list.
With the run_id, you can then obtain all the run logs for example by querying the endpoint '{CDAP_ENDPOINT}/v3/namespaces/{NAMESPACE}/apps/{PIPELINE}/workflows/DataPipelineWorkflow/runs/{run["runid"]}/logs?start={run["start"]}&stop={run["start"]}'. The previous snippet is a python snippet where run is a dictionary containing the run details of a particular run.
As explained in the CDAP microservice guide, to call these endpoints, the CDAP endpoint must be obtained by running the command: gcloud beta data-fusion instances describe --project=${PROJECT} --location=${REGION} --format="value(apiEndpoint)" ${INSTANCE_ID}. The authentication token will also be needed and this can be found through running: gcloud auth print-access-token.
The correct answer has been provided by #Edwin Elia in the comment section:
Retrieving the run-id of a Data Fusion pipeline within its run or the predecessor pipeline's is not possible currently. Here is an enhancement that you can track that would make it possible.
When talking about retrieving the run_id value after pipeline completion you should be able to use the REST API from the CDAP documentation to get information on the run including the run-id.
I am creating a redshift cluster using CF and then I need to output the cluster status (basically if its available or not). There are ways to output the endpoints and port but I could not find any possible way of outputting the status.
How can I get that, or it is not possible ?
You are correct. According to AWS::Redshift::Cluster - AWS CloudFormation, the only available outputs are Endpoint.Address and Endpoint.Port.
Status is not something that you'd normally want to output from CloudFormation because the value changes.
If you really want to wait until the cluster is available, you could create a WaitCondition and then have something monitor the status and the signal for the Wait Condition to continue. This would probably need to be an Amazon EC2 instance with some User Data. Linux instances are charged per-second, so this would be quite feasible.
GceClusterConfig object has property internalIpOnly, but there is no clear documentation on how to specify that flag through gcloud command. Is there a way to pass that property?
That feature was first released in gcloud beta dataproc clusters create where you can use the --no-address flag to turn that on. The feature recently became General Availability, and should be making it into the main gcloud dataproc clusters create any moment now (it's possible if you run gcloud components update you'll get the flag in the non-beta branch even though the public documentation hasn't been updated to reflect it yet).