Provide custom UUID to spark job through airflow DataprocSubmitJobOperator - google-cloud-dataproc

I want to give a custom job id to the spark jobs submitted through Airflow DataprocSubmitJobOperator on Google cloud.
Through API we can do that using --id param, any idea how can we give the same through this operator?

You can specify job id in the "reference" field
https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#jobreference

I think that you should be able to give a custom job id by specifying the task_id in the configuration of DataprocSubmitJobOperator. You can find more about it in the documentation.

Related

How to add tags to all ECS task definition in one go

I would like to tag all 9000+ task definition in one go, please help me with best way to do.
i tried with cmd aws ecs tag-resource --resource-arn but it is allowing to take one arn at a time.
You can only do this using a program. Use the AWS SDK, list all ARNs, loop through them all and call the tag resource api on each of them. This is how you tag a bunch of tables in Python:
session = boto3.Session(profile_name="my-region")
client = session.client('dynamodb')
# Get all DDB tables
tables = client.list_tables()
# Loop through tables
for table in tables['TableNames']:
print(f'Tagging table: {table}')
client.tag_resource(ResourceArn=f'arn:aws:dynamodb:us-east-1:xxx:table/{table}',Tags=[{'Key':'my_tag','Value':'my_tag_value'}])

For how long does Dataflow remember attribute id in PubsubIO

The PubsubIO allows deduplicating messages based on the id attribute:
PubsubIO.readStrings().fromSubscription(pubSubSubscription).withIdAttribute("message_id"))
For how long does Dataflow remember this id? Is it documented anywhere?
It is documented, however it has not yet been migrated to the V2+ version of the docs. The information can still be found in the V1 docs:
https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids
"If you've set a record ID label when using PubsubIO.Read, when Dataflow receives multiple messages with the same ID (which will be read from the attribute with the name of the string you passed to idLabel), Dataflow will discard all but one of the messages. However, Dataflow does not perform this de-duplication for messages with the same record ID value that are published to Cloud Pub/Sub more than 10 minutes apart."

How to get job id of existing job in XTRF smart project

I would like to update the status of existing jobs in XTRF smart projects using the XTRF Home Portal API. The API call requires a job ID, but I don't where to find this ID.
End point:
.../v2/jobs/{jobId}/status
Following the solution of a similar post, I have defined a view with a list of jobs that require updating. However, there seems to be no column that holds the {jobId} that is required for the API. There is a column called "Internal ID" that contains a 4-digit number. But when I use that number in the API call, there's an error:
"Invalid Job ID of a Smart Job. Use new form of Job ID for Smart Jobs (e.g. 2QROVSCO3ZG3NM6KAZZBXH5HMI)."
So apparently, there is a new form for the job ID. Is there a specific column for the view that I should use, or is there another way to retrieve this job ID?
The Job ID can be found in the url (after clicking on a job):
https://[your xtrf url]/xtrf/faces/projectAssistant/projects/project.seam?assistedProjectId=5GB3QLPO2QROVSCOR55O3WJVU2Y#/project?jobs=DZAGF2QROVSCOVBJPG2UVBCJZ4II
The Job ID is DZAGF2QROVSCOVBJPG2UVBCJZ4II
Another way is to retrieve the jobs by the API itself, this can be done for a quote, but also for a project:
Endpoint: /v2/quotes/{quoteId}/jobs

Add metric name in OTSDB via API

I am adding data into OTSDB from different sources. But i give metric name for each data points using XML file. Also i dont have any access to OTSDB to create Metric Name via terminal
I have reffered below links :-
API PUT
GitHub Issue
In gitHub issue, i couldn't understand how to use --auto-metirc .
I know how to create metric using Terminal :-
Here i am creating abxcs metirc using terminal.
./tsdb mkmetric abxcs
But How to create metric using API?
FYI :- Please suggest solution using JAVA
Thanks for help in advance.
In order to have metric names auto created on-the-fly, you'll need to set
tsd.core.auto_create_metrics = true
in the OpenTSDB configuration file. Ref: http://opentsdb.net/docs/build/html/user_guide/configuration.html
Whether or not a data point with a new metric will assign a UID to the metric. When false, a data point with a metric that is not in the database will be rejected and an exception will be thrown.
CLI equivalent of it is to pass --auto-metric switch while starting tsd process.

How to fetch a task id in Alfresco Activiti

I need to know how to fetch the task id from within a BPMN process.
I tried the following without luck:
<serviceTask id="assignApplicationId" name="Assign Application Id"
activiti:expression="${sequenceUtil.getOutboundId(**task.id**)}"
activiti:resultVariable="OutboundWF_ApplicationNumber"/>
and
<serviceTask id="assignApplicationId" name="Assign Application Id"
activiti:expression="${sequenceUtil.getOutboundId(**bpm_taskid**)}"
activiti:resultVariable="OutboundWF_ApplicationNumber"/>
According to the manual, access to the DelegateTask only works for expressions evaluated in task listeners, so it does not seem to be possible to fetch the task ID with other expressions.