Azure Data Factory - Custom Activity never complete - azure-data-factory

I'm new to Azure and I am working on data factory and custom activity. I am creating a pipeline with only one custom activity (the activity actually do nothing and return immediately).
However, it seems that the custom activity is sent to batch account. I can see the Job and task created but task remains "Active" and never complete.
Is there anything I missed?
Job: Created and is belonged to desired application pool
Job
Task: Not sure why but application pool is n/a and never complete
Job -> Task Status
Task application pool n/a
Code of the dummy activity. I'm using ADF v2 and therefore it is just a simple console program.
Dummy activity

I figured out.
The problem is from the batch account. The node of the pool failed at start task which block the node to take job. I have changed the start task of the pool not to wait for success so that even if the start task failed the node can still take job.

Related

Fargate: how to stop task after job done?

I need to calculate the task on my Fargate cluster and after finishing calculating the task should be stopped or terminated to decrease payments.
The consequence of my actions:
One task always running on EC2 Cluster and checking DB to the new data.
If new data appears, Boto3 runs the Fargate container.
After the job is done, the task should be stopped.
Also if in the DB appears the second row of data during proceeding the first task, Fargate should create a second task for the second job. and then stop tasks...
So, I have:
Task written on Python and deployed on ECR
Fargate cluster
Task definition (describe Memory, CPU's and container)
How task know, that it should be closed? Any idea or solution to stop task after job done?

Is there a way to configure retries for Azure DevOps pipeline tasks or jobs?

Currently I have a OneBranch DevOps pipeline that fails every now and then while restoring packages. Usually it fails because of some transient error like a socket exception or timeout. Re-trying the job usually fixes the issue.
Is there a way to configure a job or task to retry?
Azure Devops now supports the retryCountOnTaskFailure setting on a task to do just this.
See this page for further information:
https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/pipelines/sprint-195-update
Update:
Automatic retries for a task was added and when you read this it should be available for usage.
It can be used as follow:
- task: <name of task>
retryCountOnTaskFailure: <max number of retries>
...
Here are a few things to note when using retries:
The failing task is retried immediately.
There is no assumption about the idempotency of the task. If the task has side-effects (for instance, if it created an external resource partially), then it may fail the second time it is run.
There is no information about the retry count made available to the task.
A warning is added to the task logs indicating that it has failed before it is retried.
All of the attempts to retry a task are shown in the UI as part of the same task node.
Original answer:
There is no way of doing that with native tasks. However, if you can script then you can put such logic inside.
You could do this for instance in this way:
n=0
until [ "$n" -ge 5 ]
do
command && break # substitute your command here
n=$((n+1))
sleep 15
done
However there is no native way of doing this for regular tasks.
Automatically retry a task in on roadmap so it could change in near future.

How to give a job manager task permissions to resize the pool?

I'm running embarrassingly parallel workloads, but the number of parallel tasks not known beforehand. Instead, my job manager task performs simple computation to determine the number of parallel tasks and then adds the tasks to the job.
Now, as soon as I know the number of parallel tasks I would like to immediately resize the pool I'm running in accordingly (I am running the job in an auto-pool). Here is how I try do this.
When I create the JobManagerTask I supply
...
authentication_token_settings=AuthenticationTokenSettings(
access=[AccessScope.job]),
...
At run time the task receives AZ_BATCH_AUTHENTICATION_TOKEN in environment, uses it to create BatchServiceClient, uses the client to add worker tasks to the job and ultimately calls client.pool.resize() to increase target_dedicated_nodes. At this stage the task gets an error from the service:
.../site-packages/azure/batch/operations/_pool_operations.py", line 1310, in resize
raise models.BatchErrorException(self._deserialize, response)
azure.batch.models._models_py3.BatchErrorException: Request encountered an exception.
Code: PermissionDenied
Message: {'additional_properties': {}, 'lang': 'en-US', 'value': 'Server failed to authorize the request.\nRequestId:4b34d8e5-7c28-4af2-9e1f-9cf88a486511\nTime:2020-11-26T17:32:55.7673310Z'}
AuthenticationErrorDetail: The supplied authentication token does not have permission to call the requested Url.
How can I give the task permission to resize the pool?
Currently the AZ_BATCH_AUTHENTICATION_TOKEN is limited to permissions immediately with the job. The pool ends up being a separate resource even in the auto-pool configuration so is not modifiable with the token.
There are two main approaches you can take. You can either add a certificate to your account and add it to your pool allowing you to authenticate with a ServicePrincipal with permissions to your account, or you could set your pool to autoscale depending on the number of pending tasks which doesn't get immediate resize options instead doing them at set intervals as needed.

Composed task arguments are not passed after job restart

I'm running a composed task with three child tasks.
Composed task definition:
composed-task-runner --graph='task1 && task2 && task3'
Launch command
task launch my-composed-task --properties "app.composed-task-runner.composed-task-arguments=arg1=a.txt arg2=test"
Scenario 1:
when the composed task runs without any error, the arguments are passed to all child tasks.
Scenario 2:
when the second child task fails and if the job is restarted , the composed task arguments are passed to second child task but not to third child task
Scenario 3 :
when the first and second tasks succeed and third child task fails and if the job is restarted , the composed task arguments are now passed to third child task.
Observation:
After a task failure and restart, the composed-task-arguments are passed only to the failed task and not to the tasks after that.
How the arguments are retrieved in the composed-task after job restart ? what could be the reason for this behavior ?
Version used :
Spring cloud local server - 1.7.3 , Spring boot - 2.0.4 , Spring cloud starter task - 2.0.0
The issue that you are experiencing is that SCDF is not storing the properties specified at launch time.
This issue is being tracked here: https://github.com/spring-cloud/spring-cloud-dataflow/issues/2807 and is scheduled to be fixed in SCDF 2.0.0
[Detail]
So when the job is restarted these properties are not submitted (since they are not currently stored) to the new CTR launch.
And thus subsequent tasks (after the failed task succeeds) will not have the properties set for them.
The reason that the failed job still has this value is that the arguments are stored in the batch-step-execution-context for that step.
[Work Around until Issue is resolved]
Instead of restarting the job, launch the CTR task definition using the properties (so long as they are the same).

A Workload Scheduler's step doesn't proceed and remains queued status

I created a simple process by Applicaiton Lab interface in Bluemix Workload Scheduler. I ran my process, but the step didn't proceed and remained in queued status.
How can I proceed the step?
I executed the process by "Run now". The process doesn't have triggers
The step remains "Queued status".
The Process information
There is only one step. The step is "ping www.ibm.com"
Process doesn't have trigger. It is an on-demand process.
There might be a problem with the agent as I can successfully run a simple workload process without any issues. If you are using the Workload Automation Agent that is created for you then you will need to open a support ticket to have the Workload team look at that agent.
reviewing your question I think that a process submitted to the Workload Scheduler service should be a process which will complete: a ping command like the one you are trying to submit, will never complete if not 'killed' using CTRL+C (or called with [-c count] option)