In our dev environment we have lots of repos, lots of builds and lots of buildservers, and most of the time things work just like they should - however, we are seeing an increase in builds that fail because of timeouts.
These timeouts are not happening because we are getting close to the limit, but because something "gets stuck/blocked" in the pipeline and it stays on that step until timeout kills the build.
To better debug why that happens, we need to be able to query what builds fails because of this timeout, so we for instance can see, if it is a particular build server or agent that has this problem.
We can not find anything in the API that would give us the timeout error, but we can see that the UI is able to deduct it somehow:
So far we have narrowed it down to query all builds with completed status (through this API), but we get no completion reason, and buildtimes are never exact the same as the timeout of the build defintion, so "guessing" it from the execution plan will also be a bit shaky.
How can we filter our builds down to only the builds that have timed out?
We can use the below API to get details for a build.
Note: do not add timelineId, we should list all info
GET https://dev.azure.com/{organization}/{project}/_apis/build/builds/{buildId}/timeline?api-version=6.1-preview.2
If the build is canceled because of the timeout setting, we can get the message: The job running on agent Hosted Agent ran longer than the maximum time of xxx minutes. For more information, see https://go.microsoft.com/fwlink/?linkid=2077134
By the way, we can use the API Builds - List to filter all failed build. if the build is canceled due to a timeout setting. the result is failed instead of cancel.
Related
Azure Data Factory pipeline have been working fine for 2 years with Self-hosted Integration runtime (Azure VM 16GB)
In last few weeks pipeline have got very unreliable and data is no longer processed correctly.
Data Factory Activity using "MyAzureIntegrationRuntime" instance is failing with "timeout" error.
Some of activities get successfully completed, but most of them get failed.
ADF Monitor tell that node is "Unavailable". Typical ADF activity is Azure SQL lookup.
There is no event log errors in Virtual Machine. There seems to be enough CPU/RAM to execute IR activities.
Reboot of VM has once helped to recover connectivity and pipelines.
However latest VM reboot restored status to "Running" from “Unavailable”, but many pipeline activities get failed.
Integration Runtime is currently not in High Available Cluster.
There is single VM serving Sandbox, Dev, Test and Prod ADF. It has worked fine for last years except last 2 weeks.
How could I find what is the problem and fixed it?
How could I find what is the problem and fixed it?
For failed activities that are running on a self-hosted IR or a shared IR, the service supports viewing and uploading error logs. To get the error report ID, follow the instructions here, and then enter the report ID to search for related known issues.
On the Monitor page for the service UI, select Pipeline runs.
Under Activity runs, in the Error column, select the highlighted button to display the activity logs, as shown in the following screenshot:
The activity logs are displayed for the failed activity run.
For further assistance, select Send logs.
The Share the self-hosted integration runtime (IR) logs with Microsoft window opens.
Select which logs you want to send.
For a self-hosted IR, you can upload logs that are related to the failed activity or all logs on the self-hosted IR node.
For a shared IR, you can upload only logs that are related to the failed activity.
When the logs are uploaded, keep a record of the Report ID for later use if you need further assistance to solve the issue.
We are using the Publish Test Results task PublishTestResults#2 to publish junit type results in a pipeline in Azure Devops. It has previously worked fine but is now hanging for 10mins (I think this is the default job timeout) and then failing, even though the results have been published. If I try to cancel the job when it starts hanging, the cancel request is ignored and the job continues to hang. Has anyone else experienced similar?
This is the log output whilst the task is hanging
Starting: PublishTestResults
==============================================================================
Task : Publish Test Results
Description : Publish test results to Azure Pipelines
Version : 2.160.0
Author : Microsoft Corporation
Help : https://learn.microsoft.com/azure/devops/pipelines/tasks/test/publish-test-results
==============================================================================
##[warning]An error occurred while sending the request.
Publishing test results to test run '1033544'.
TestResults To Publish 11, Test run id:1033544
Test results publishing 11, remaining: 0. Test run id: 1033544
Async Command Start: Publish test results
We eventually nailed this down to only occurring on agents running as a service, rather than interactively. The problem is that our agents are behind a proxy. We provided the proxy settings as per the instructions but it appears the Publish Test Results task doesn't use that settings, so we had to provide it to the environment that runs the service as well by editing runsvc.sh:
export HTTP_PROXY=http://ourproxy:8080/
export NO_PROXY=localhost,127.0.0.1,localaddress
export HTTPS_PROXY=http://ourproxy:8080/
Publishing test results in AzureDevops Hangs
According to the error message:
[warning]An error occurred while sending the request.
It shows that an error was encountered while sending the request, you could enable the debug log by changing the default variables system.debug to true.
If it worked fine previously and you haven't changed your build definition, then the problem should be caused by your network or Azure devops agent server. You could use private agent to check if it related to the hosted agent.
Besides, for the cancel request is ignored and the job continues to hang, you could set the Build job cancel timeout in minutes in the build options:
So, it won't hang your job all the time.
Hope this helps.
We had the same issue and it really looks like a proxy problem. Since we don't need result files uploaded, the following 'input' helped:
publishRunAttachments: false
Procedure of my mainframe job has a step which performs an exchange between clone and base table. This step fails every time the job runs with resource unavailable error. The resource is a package for another program which reads the base table used in my job.
Since the job is failing with timeout error, I usually restart this. But to fix this permanently, is it possible to increase the timeout limit for this EXCHANGE process. In IBM manual, I could see "SET CURRENT LOCK TIMEOUT 30" for this. But is this valid. My EXCHANGE statement between clone and base table is coded in a control card. Is there any possibility I can increase the timeout so that the job does not go into error.
If any further details is required, please let me know
Any help on this is appreciated.
I have a build for an Ionic project and its E2E testing with SauceLabs. The build is timing out after 49 min 17 sec(50 min). All of my jobs are running well and logging output frequently at least every 1-2 min. The timeout is happening consistently at 50 min.
My build goes meets all the requirements as mentioned here to not suffer a time out. Also, there is no timeout for the build as mentioned in the docs. So the build shouldn't timeout as it is happening in the case. Any resolutions for this Issue?
Here are some of the logs:
https://travis-ci.org/magician03/moodlemobile2/builds/241500777
https://travis-ci.org/magician03/moodlemobile2/builds/241414546
https://travis-ci.org/magician03/moodlemobile2/builds/241401570
Your build ends with this message:
The job exceeded the maximum time limit for jobs, and has been
terminated.
It is the expected behaviour. Exists a limit of 50 minutes as explained here and here:
Build Timeouts #
It is very common for test suites or build scripts to hang. Travis CI
has specific time limits for each job, and will stop the build and add
an error message to the build log in the following situations:
A job produces no log output for 10 minutes
A job on travis-ci.org takes longer than 50 minutes
A job running on OS X infrastructure takes longer than 50 minutes - (applies to travis-ci.org or travis-ci.com)
A job on Linux infrastructure on travis-ci.com takes longer than 120 minutes
Some common reasons why builds might hang:
Waiting for keyboard input or another kind of human interaction
Concurrency issues (deadlocks, livelocks and so on) Installation of
native extensions that take very long time to compile There is no
timeout for a build; a build will run as long as all the jobs do as
long as each job does not timeout.
Your build doesn't complete before for a specific issue in your build.
I would ask another question focused in your code and language node_jsand no in this limit.
I develop native apps so I can not help on this topic but I found this ticket:
It seems that they updated Node.js to 6.X, tested it using Travis-ci, it failed and currently they don't use Travis-ci, so I would ask directly to MoodleHQ in their forums.
jleyva Juan Leyva added a comment - 03/Nov/16 6:05 PM Dani, can you
enable in your Travis account your moodlemobile2 repository so we can
see if Travis is working with the new dependencies? I already changed
the tracker fields so Travis is aware of the branch (but it requires
first you to enable you forked moodlemobile2 repo)
jleyva Juan Leyva added a comment - 03/Nov/16 7:31 PM Builds are
failing: https://travis-ci.org/dpalou/moodlemobile2/builds/172896611
Protractor or Jasmine or whatever is not working with this dependency
set
You can also check related issues and compare, this configuration works using:
node_modules/.bin/protractor e2e-tests/protractor.conf.js --directConnect
in protractor-conf.js change chromeOnly to directConnect
After starting my app for the first time, the first request always times out. If I tail the logs when this request is invoked, Play appears to be doing some kind of required post compilation work- resolving the same list of dependencies that were resolved on startup and initiating the database connection. Is there any way to force this extra work on startup?
When you run in prod mode this will not happen.
Even if your not building yet for production you can
run a test instance
You will need to be sure to set an application secret