Azure DevOps, When a job in the stage is successful, how can it be ensured that stage is successful or partially successful? - azure-devops

When one of the jobs in the stage is successful and the other is fail, how can it be ensured that that stage is successful and proceeds to the next stage?
If I'm more descriptive,
Pipeline
A(Stage) --> B(Stage)
-Job1(failed) -Job1
- ...
-Job2(succeeded)
(run only when previous job failed)
When Job 1 in A stage is fail, automatically Job 2 running. But in this way, stage A is resulted 'Failed'. Therefore, Stage B is never running because of previous stage failed.
"Trigger even when the selected stages partially succeed" button is not working for stage B because previous stage marked as "fail".
Additionally, I tried:
If i mark all tasks inside "A -> Job1" as "Continue on error" and i checked "Trigger even when the selected stages partially succeed" box in stage B this time "Job 2" not running because "run only when previous job failed " option marked on job 2. There is no option like "run only when previous job partially succeeded".
In addition, custom condition not working in Job 2, its show null for "Agent.JobStatus" variable. If these "SucceededWithIssues" and "Failed" status available for "Agent.JobStatus" it can be run but I couldn't come to a conclusion here because null value of JobStatus.

Related

AWS CloudFormation Template for Orchestration of mutliple AWS Glue Jobs (combination of sequentially and parallel execution)

I'm looking for help on CloudFormation Template for Glue Jobs orchestration for below scenario:
Suppose I have 6 AWS Glue Jobs, 3 jobs (Job1, Job2, Job3) should be executed parallel and remaining 3 jobs should be executed sequentially (Job3 executed before Job4 then Job4 executed before Job5, then Job5 executed before Job6). If any job failure, send Workflow "Failure" notification along with the failed Glue Jobname.
Job1
Job2
job3 --->job4--->job5-->job6
You can define a StepFunctions using AWS::StepFunctions::StateMachine resource and define additionally AWS::Events::Rule for following events:
EventPattern:
source:
- "aws.states"
detail-type:
- "Step Functions Execution Status Change"
detail:
status:
- "FAILED"
- "TIMED_OUT"
Here is a sample single job execution stage in step functions:
Run Job 3:
Type: "Task"
Resource: "arn:aws:states:::glue:startJobRun.sync"
Parameters:
JobName: "GlueJob1Name"
Next: "Run Job 4"
You will have to enclose three parallel jobs within Parallel task type.

Azure Devops - Release Pipeline when re-running failed tests azure devops shows failure status even if re-run succeeded

I use Specflow with SpecRunner+ I am using the Deafult.srprofile to to re-run failed tests 3 times in visual studio it shows 2passed 1 failed but the status of the test is a failure, the same goes for azure devops if a re-ran test passes the outcome of the run is a failure. The Failures are sometimes caused by locator timeouts or server timeouts not often but saw it happen few time thats why we decided to implement a re-run.
Could anyone help on this?
022-02-09T12:40:13.8607507Z Test Run Failed.
2022-02-09T12:40:13.8608607Z Total tests: 37
2022-02-09T12:40:13.8609271Z Passed: 36
2022-02-09T12:40:13.8609858Z Failed: 1
2022-02-09T12:40:13.8617476Z Total time: 7.4559 Minutes
2022-02-09T12:40:13.9226929Z ##[warning]Vstest failed with error. Check logs for failures. There might be failed tests.
2022-02-09T12:40:14.0075402Z ##[error]Error: The process 'D:\Microsoft_Visual_Studio\2019\Common7\IDE\Extensions\TestPlatform\vstest.console.exe' failed with exit code 1
2022-02-09T12:40:14.8164576Z ##[error]VsTest task failed.
But then the report states that it was retried 3 times which 2 of the retries were seccusefull but still a failure status on the azure devops run.
The behavior of the report is the correct one and sadly this can't be configured to be changed.
What you can do is to adjust how the results are reported back to Azure DevOps.
You can configure it via the VSTest element in the srProfile- File.
This example means, that at least one retry has to be passing:
<VSTest testRetryResults="Unified" passRateAbsolute="1"/>
Docs: https://docs.specflow.org/projects/specflow-runner/en/latest/Profile/VSTest.html
Be aware that we have stopped the development of the SpecFlow+ Runner. More details here: https://specflow.org/using-specflow/the-retirement-of-specflow-runner/

Azure Dev Build Release Test Execution getting aborted

Whenever I try to run the coded UI test in On-premise agent machine with Azure build-release pipeline my tests are getting aborted since yesterday. I have changed nothing on the release pipeline and build definition.
Below is the error message I am getting in the release pipeline test run console :-
2019-04-24T06:48:27.4475294Z test settings id : 1026822
2019-04-24T06:48:27.4475397Z Build location: C:\agent\_work\r1\a
2019-04-24T06:48:27.4475515Z Build Id: 5133
2019-04-24T06:48:28.0903047Z Test run with Id 1038940 associated
2019-04-24T06:48:37.6929302Z Received the command : Start
2019-04-24T06:48:37.6944608Z TestExecutionHost.ProcessCommand. Start Command handled
2019-04-24T06:48:58.8010958Z Received the command : Stop
2019-04-24T06:48:58.8011508Z TestExecutionHost.ProcessCommand. Stop Command handled
2019-04-24T06:48:58.8011845Z SliceFetch Aborted. Moving to the TestHostEnd phase
2019-04-24T06:48:58.9585180Z Please use this link to analyze the test run : 'test run URL'
2019-04-24T06:48:58.9585816Z Test run '1038940' is in 'Aborted' state with 'Total Tests' : 3 and 'Passed Tests' : 0.
2019-04-24T06:48:58.9604537Z ##[error]Test run is aborted. Logging details of the run logs.
2019-04-24T06:48:58.9606187Z ##[error]System.Exception: The test run was aborted, failing the task.
2019-04-24T06:48:59.0826921Z ##########################################################################
2019-04-24T06:48:59.1608855Z ##[section]Finishing: Test run for Test plans
I am running those test from past 6 months and I have not changed anything on the build or release pipeline and suddenly I am getting the above error on azure release pipeline test run console while triggering the run.
Note: this happened since yesterday(April 24, 2019). Till Monday(April 22, 2019) everything was working fine.
I believe there might be some changes from Microsoft end but I am not sure about that.
I am running the test on Windows 10 environment.

How to determine if a job is failed

How can I programatically determine if a job has failed for good and will not retry any more? I've seen the following on failed jobs:
status:
conditions:
- lastProbeTime: 2018-04-25T22:38:34Z
lastTransitionTime: 2018-04-25T22:38:34Z
message: Job has reach the specified backoff limit
reason: BackoffLimitExceeded
status: "True"
type: Failed
However, the documentation doesn't explain why conditions is a list. Can there be multiple conditions? If so, which one do I rely on? Is it a guarantee that there will only be one with status: "True"?
JobConditions is similar as PodConditions. You may read about PodConditions in official docs.
Anyway, To determine a successful pod, I follow another way. Let's look at it.
There are two fields in Job Spec.
One is spec.completion (default value 1), which says,
Specifies the desired number of successfully finished pods the
job should be run with.
Another is spec.backoffLimit (default value 6), which says,
Specifies the number of retries before marking this job failed.
Now In JobStatus
There are two fields in JobStatus too. Succeeded and Failed. Succeeded means how many times the Pod completed successfully and Failed denotes, The number of pods which reached phase Failed.
Once the Success is equal or bigger than the spec.completion, the job will become completed.
Once the Failed is equal or bigger than the spec.backOffLimit, the job will become failed.
So, the logic will be here,
if job.Status.Succeeded >= *job.Spec.Completion {
return "completed"
} else if job.Status.Failed >= *job.Spec.BackoffLimit {
return "failed"
}
If so, which one do I rely on?
You might not have to choose, considering commit dd84bba64
When a job is complete, the controller will indefinitely update its conditions
with a Complete condition.
This change makes the controller exit the
reconcilation as soon as the job is already found to be marked as complete.
As https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#jobstatus-v1-batch says:
The latest available observations of an object's current state. When a
Job fails, one of the conditions will have type "Failed" and status
true. When a Job is suspended, one of the conditions will have type
"Suspended" and status true; when the Job is resumed, the status of
this condition will become false. When a Job is completed, one of the
conditions will have type "Complete" and status true. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

How to stop and resume a spring batch job

Goal : I am using spring batch for data processing and I want to have an option to stop/resume (where it left off).
Issue: I am able to send a stop signal to a running job and it gets stopped successfully. But when I try to send start signal to same job its creating a new instance of the job and starts as a fresh job.
My question is how can we achieve a resume functionality for a stopped job in spring batch.
You just have to run it with the same parameters. Just make sure you haven't marked the job as non-restarted and that you're not using RunIdIncrementer or similar to automatically generate unique job parameters.
See for instance, this example. After the first run, we have:
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [STOPPED]
Status is: STOPPED, job execution id 0
#1 step1 COMPLETED
#2 step2 STOPPED
And after the second:
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]
Status is: COMPLETED, job execution id 1
#3 step2 COMPLETED
#4 step3 COMPLETED
Note that stopped steps will be re-executed. If you're using chunk-oriented steps, make sure that at least the ItemReader implements ItemStream (and does it with the correct semantics).
Steps marked with allowRestartWithComplete will always be re-run.