Build-Deploy-Test Workflow does not reach Test Agent - deployment

at First im New to automated Tests with TFS, so i try actual many Things with try on Error.
But I Hope you can give me some Ideas What i perhaps do wrong.
I have set up a Environment in Microsoft Test Manager like described in serveral Blogs from MSDN. http://msdn.microsoft.com/en-us/library/hh873102.aspx#Prerequisites for example.
My Infrastructure is as follows: The Build Server and Test Controller are on the Same Machine (Win2012) but my Test Agent(Win2008R2) is an other Machine. Everything is TFS 2013 or VS 2013 Update 3.
I have a Build definition which builds the Tests on the Buildserver into a specified Dropfolder. This is working fine. And a Second Build Definition which uses the LabDefaultTemplate.11 for working with the Environment. Within i specified the Environment, the other Build Definiton a Deployment Script and also my Testplan with the associated Automated Tests.
But if i try to Start my Build-Deploy-Test Workflow i have got two outcomes.
First, without specifed a Deploment Script the Workflow Reaches Test Run and the Workflow became endless (It runs 16 Hours) then i aborted it.
And Second i specified the Deployment Script and now it seems i get the same but on the Deployment Step. I let it Run for almost 20 Minutes and i watched my Agent on VM Console.
All i saw is that he sometimes Disconnected for a short moment and then was Online again.
Now I'm Asking, why this Build Workflow could becomes endless?
The Tests would be CodedUiTests.
Here the Log of the last Run i stopped:
20:37 Overall Build Process
20:37 Application Deployment Workflow
00:00 Update Build Number
00:00 Get Build Details
01:00 If Build is needed
01:00 Do Build
00:00 Start Build Workflow
01:00 Wait For Build To Complete
00:00 Set Build Location
00:00 Get Build Location And Build Number
00:00 Compute build location needed
00:00 Compute build path
00:00 If user selected stored environment
00:00 Get Lab Environment Uri
00:00 Get Lab Environment
00:00 If Restore Snapshot
00:00 No Clean Snapshot
00:00 If Virtual Environment
00:00 If deployment or test needed
00:00 Wait For Environment To Be Ready
19:36 If deployment needed
19:36 Do deployment
00:00 Reserve Environment For Deployment
19:36 Deploy Build on Environment
19:36 Deploying Build
19:36 Run Deployment scripts
19:36 Run Deployment Task Deployment Task Logs for Machine: Win2008R2
00:00 Release Environment From Deployment
If you would need more specific informations please leave a Answer with the needed Things, and i will care about it, but i actual do not know what you could need and don't want to overfill this Question
Edit Dec 16th 2014:
Here the requested Details:
Deploying Build 00:30:00
Run Deployment scripts 00:30:00
Inputs
Values: Win2008R2TA02 | $(BuildLocation)\Uitest\deploymentScript.cmd $(BuildLocation)
Run Deployment Task00:30:00
Inputs
UseRoleForDeployment: False
MaxWaitTime: 00:30:00
ThrowOnError: True
BuildLocation: \\BuildServer\TFS Build\UiTest.Dev.Build\UiTest.Dev.Build_20141216.1
LabEnvironmentUri: vstfs:///LabManagement/LabEnvironment/11
DeploymentScriptDetails: Win2008R2TA02 | $(BuildLocation)\Uitest\deploymentScript.cmd $(BuildLocation)
The Deployment Script i found in the Internet, but i thought this could be working.
REM set build path
set buildlocation=%1
REM set deployment path
set targetdir="C:\deploy"
REM create deployment directory
if not exist %targetdir% (cmd /c mkdir %targetdir%)
REM copy build to the deployment directory
xcopy /c /y /e %buildlocation%\*.* %targetdir%
REM if you are using a deployment package you can run it here, after you copy it to your deployment directory
And i Call it as follows:
$(BuildLocation)\Uitest\deploymentScript.cmd $(BuildLocation)
In the Eventlog of the Test Controller there are some Errors Repeating while the Build is running:
Service Control Managar:
The Visual Studio Test Controller service terminated unexpectedly. It
has done this 1 time(s). The following corrective action will be
taken in 0 milliseconds: Restart the service.
.Net Runtime:
Application: QTController.exe Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.NullReferenceException Stack: at
Microsoft.TeamFoundation.TestManagement.Controller.BuildDropDownloadManager.GetLocalSharePath(System.String)
at
Microsoft.VisualStudio.QualityTools.Controller.DeploymentTaskMonitor.IsUsingServerDrop(Microsoft.VisualStudio.TestTools.Execution.DeploymentTask)
at
Microsoft.VisualStudio.QualityTools.Controller.DeploymentTaskMonitor.ProvisionBuildSharePermision(Microsoft.VisualStudio.TestTools.Execution.DeploymentTask)
at
Microsoft.VisualStudio.QualityTools.Controller.DeploymentTaskMonitor.ProcessNewDeploymentTasks()
at
Microsoft.VisualStudio.QualityTools.Controller.DeploymentTaskMonitor.Poll(System.Object)
at System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
at
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean) at
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean) at
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object) at
System.Threading.ThreadHelper.ThreadStart(System.Object)
I tried this solution but i does not helping and related to the Link i have no "Web Access" Folder to clear the Cache out.
https://social.msdn.microsoft.com/Forums/vstudio/en-US/c1ba09a8-5b8a-4c6d-8635-83085593647b/tfs-2013-deploy-to-testlab-is-failing-labdefaulttemplate11?forum=tfsbuild
I realy stuck on that Problem. Any Help would be very appreciated.

Ok, i found the issue.
The problem was a corrupt Visual Studio and MTM Installation on the Buildserver where also the TestController is installed.
If you stumble on this problem, try the following things.
- Are all Components on the same Version (in my case 2013 Update 3) Check also Agent and Controller (in it's Configuration Dialog you find a About Button)
- open the MTM on the Server and Check if it can switch between different testplans of the project (in m case that bring up an error, and thats why the Controller could not deliver the Cases to the Agent.)
if it is like in my case so try a uninstallation and reinstallation of Visual Studio and MTM
My order was to deinstall first MTM then VS after that install VS then MTM then the actual Update.

The build report log on visual studio is not that great. Have you checked the log on the tfs website? (build failed report -> diagnostics -> view log or on the website under builds -> definition -> diagnostics)
Have you set the timeout for the deployment script? 16 hours is a long time.
also can you link the deployment code to the script. Do you have any other tests that do run on the agent? Have you checked the login/password and deactivated the login screen on the agent because tfs has a problem accessing this.

Related

How to restart ubuntu agent on Azure Build?

Long story short, after trying out several solutions here to kill the VBCSCompiler before the MSBuild task didnt work out, I am gonna try one more option before calling it a day and just having to stick to windows2019 agent, even though the build time will be tripled.
So, after NuGet restore task, i need to reboot the ubuntu agent (hosted by Azure Pipelines Agent pool), i added a command-line task, but i am not sure what to write for the script...
I tried the following script command sudo reboot
but it didn't work (kept running for a while so i just cancelled the build)
I've also tried this command instead:
init 6
but I got an error:
Failed to set wall message, ignoring: Interactive authentication required.
Failed to reboot system via logind: Interactive authentication required.
Failed to open initctl fifo: Permission denied
Failed to talk to init daemon.
It's impossible, when you restart your Hosted Agent your build will fail. this is reason why is not allowed.

VSTS Agent service can't get code coverage data when running as Local System

Short version: Two builds, A and B, for the same commit, both running on our build server using the VSTS agent service
Build A:
Agent running as Network Service
Saves a .coverage file of 267kb, showing non-zero % code coverage
Runs successfully, no errors, same test logs as build B
Build B:
Agent running as Local System
Saves a .coverage file of 1kb, showing 0% code coverage
Runs successfully, no errors (except that a quality gate fails due to the 0% code coverage, but that's intentional), same test logs as build A
Extra info:
The VSTS Agent service normally ran on our build server as "Network Service", and all was well. Until we had to modify the agent service to run as "Local System" so it could access a cert in the "LocalMachine" store which we need for Azure AD service auth. After that, it still claimed to do everything successfully except that the code coverage file is tiny and claims 0% code coverage, which is weird because the unit tests are certainly being run. The logs from the two test tasks are exactly identical (except for things like timestamps and the build numbers), no helpful warnings or errors in there.
I'm sure it's probably not ideal to run the agent as Local System, but that account has more permissions than network service does, so I don't know how it could be a permission issue. I've probably just made a mistake in setting up something, but it seems like the only way out of this is to either
give Network Service extra permissions (bad)
regenerate / move the Azure AD service principal cert into the "CurrentUser" cert store for Network Service (feels bad but I'm not sure why)
set up a new service account and resign ourselves to having permissions issues forevermore (ugh)
Can we somehow diagnose what exactly is going on with this test task without resorting to procmon? Or is there a better way to manage this stuff?
Well this is rather annoying: I fixed it, but I don't know how. While demonstrating it to a colleague, all I did was repeat my previous steps of rebooting the server and switching the agent service back and forth between the two accounts a couple of times, at which point the problem stopped being reproducible. It seems this is one of those mysteriously vanishing problems that hides whenever you try too hard to investigate it. Hopefully it doesn't come back...

Azure Service Fabric publish upgrade from Visual Studio - PowerShell Script Error

I am trying to publish an upgrade of a Service Fabric application from Visual Studio 2017 to our Azure Service Fabric Cluster. In mid-September, I successfully published an upgrade of this same app with same PowerShell script to SFC with no issues. I am now trying to upgrade it at the next version number and suddenly getting this error.
I get the following error during Publish, related to Powershell.
2>Started executing script 'Deploy-FabricApplication.ps1'.
2>powershell -NonInteractive -NoProfile -WindowStyle Hidden -ExecutionPolicy Bypass -Command ". 'C:\Users\pj\Source\Workspaces\VDevelopment\trunk\Services\Sources\src\For.Application.ServiceFabric.Sources\Scripts\Deploy-FabricApplication.ps1' -ApplicationPackagePath 'C:\Users\pj\Source\Workspaces\VDevelopment\trunk\Services\Sources\src\For.Application.ServiceFabric.Sources\pkg\Debug' -PublishProfileFile 'C:\Users\pj\Source\Workspaces\VDevelopment\trunk\Services\Sources\src\For.Application.ServiceFabric.Sources\PublishProfiles\Cloud.xml' -DeployOnly:$false -ApplicationParameter:#{} -UnregisterUnusedApplicationVersionsAfterUpgrade $false -OverrideUpgradeBehavior 'None' -OverwriteBehavior 'SameAppTypeAndVersion' -SkipPackageValidation:$false -ErrorAction Stop"
2>Copying application package to image store...
2>Upload to Image Store succeeded
2>Registering application type...
2>Register application type started. Use Get-ServiceFabricApplicationType to query for status.
2>Running Image Builder process ...
2>Application package is registered.
2>Start upgrading application...
2>aka.ms/upgrade-defaultservices
2>Start-ServiceFabricApplicationUpgrade : aka.ms/upgrade-defaultservices
2>At C:\Program Files\Microsoft SDKs\Service
2>Fabric\Tools\PSModule\ServiceFabricSDK\Publish-UpgradedServiceFabricApplication.ps1:317 char:13
2>+ Start-ServiceFabricApplicationUpgrade #UpgradeParameters
2>+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2> + CategoryInfo : InvalidOperation: (Microsoft.Servi...usterConnection:ClusterConnection) [Start-ServiceFa
2> bricApplicationUpgrade], FabricException
2> + FullyQualifiedErrorId : UpgradeApplicationErrorId,Microsoft.ServiceFabric.Powershell.StartApplicationUpgrade
2>
2>Finished executing script 'Deploy-FabricApplication.ps1'.
2>Time elapsed: 00:07:39.0407526
2>The PowerShell script failed to execute.
========== Build: 1 succeeded, 0 failed, 10 up-to-date, 0 skipped ==========
========== Publish: 0 succeeded, 1 failed, 0 skipped ==========
Any idea what's going on here? Again, when I last published this in September, with the same script, no issues at all, and I haven't made any changes to the solution other than upgrading the Manifest versions to push it out as a new upgraded version.
I noted this S/O thread: Getting error as part of trying to upgrade Service Fabric Application using Start-ServiceFabricApplicationUpgrade and saw the user's error was similar, but the answer does not apply to my issue because all three steps in the answer provided are definitely included in my powershell deploy script.
I can add the deployment script if helpful, but will wait until that is requested as it's long, and I only want to post it here if someone feels it's needed to diagnose.
You are getting this error because you are changing some parameters in a DefaultService that are not allowed by default.
The link aka.ms/upgrade-defaultservices shown in the error logs explain this.
Some default service parameters defined in the application manifest
can also be upgraded as part of an application upgrade.
Only the service parameters that support being changed through
Update-ServiceFabricService can be changed as part of an upgrade. The
behavior of changing default services during application upgrade is as
follows:
Default services in the new application manifest that do not already exist in the cluster are created.
Default services that exist in both the previous and new application manifests are updated. The parameters of the default
service in the new application manifest overwrite the parameters of
the existing service. The application upgrade will rollback
automatically if updating a default service fails.
Default services that do not exist in the new application manifest are deleted if they exist in the cluster. Note that deleting a default
service will result in deleting all that service's state and cannot be
undone.
Also, there is this other SO question about the same thing: Default service descriptions can not be modified as part of upgrade set EnableDefaultServicesUpgrade to true
The item 1 above is a common approach, where new services are added to the solution and later created during the upgrade without errors, the item 2 and 3 are the restricted approach that requires the EnableDefaultServicesUpgrade.
The item 2, is like described in the answer you've added, you changed MinReplicaSize and TargetReplicaSize to 1 during a manual update, when SF validated the state of your service for upgrade, it identified the difference and prevented the upgrade to continue, if you had set cluster setting EnableDefaultServicesUpgrade to true it would continue and override the default values.
The item 3, would occur you when you removed the service and added again, you had changed or misspelled the name, SF default settings would prevent the deletion of this service.
Regarding the solution you've found(delete and recreate), is not ideal,
In scenarios where you have stateful services running in production, would be risky to apply, because you would have to backup the state, re-deploy the services, and restore the backup, in some cases, depending on what these changes are, you wouldn't be able to restore the backup, because they have to match with the original services definitions (partitions type, number, and son on). You would also lose the benefits of Rolling Updates, and your service would go down maybe for a while if these backups are big.
The issue had to do with us trying to push out the application with mismatched node instances. We have a stateful service running under this application that is supposed to have MinReplicaSize and TargetReplicaSize set to 3. Yesterday, due to an issue, we deleted and re-created this service inside the SF Explorer. Upon doing so, it reset the replica size parameters back to 1. So we used a Powershell script to change them back to 3, but that script did not include all the necessary commands to get the service back to the exact state it was in before we deleted it. So today when we went to upgrade the app, the app in SFC wouldn't accept an upgrade from VS deployment, because of mismatches between what was in the parameters of the solution vs. what was in our SFC. To resolve, we re-deleted those services first, then deployed from VS, and no more error.

Travis Build fails after 49 min even when logging output for all jobs every 1-2 min

I have a build for an Ionic project and its E2E testing with SauceLabs. The build is timing out after 49 min 17 sec(50 min). All of my jobs are running well and logging output frequently at least every 1-2 min. The timeout is happening consistently at 50 min.
My build goes meets all the requirements as mentioned here to not suffer a time out. Also, there is no timeout for the build as mentioned in the docs. So the build shouldn't timeout as it is happening in the case. Any resolutions for this Issue?
Here are some of the logs:
https://travis-ci.org/magician03/moodlemobile2/builds/241500777
https://travis-ci.org/magician03/moodlemobile2/builds/241414546
https://travis-ci.org/magician03/moodlemobile2/builds/241401570
Your build ends with this message:
The job exceeded the maximum time limit for jobs, and has been
terminated.
It is the expected behaviour. Exists a limit of 50 minutes as explained here and here:
Build Timeouts #
It is very common for test suites or build scripts to hang. Travis CI
has specific time limits for each job, and will stop the build and add
an error message to the build log in the following situations:
A job produces no log output for 10 minutes
A job on travis-ci.org takes longer than 50 minutes
A job running on OS X infrastructure takes longer than 50 minutes - (applies to travis-ci.org or travis-ci.com)
A job on Linux infrastructure on travis-ci.com takes longer than 120 minutes
Some common reasons why builds might hang:
Waiting for keyboard input or another kind of human interaction
Concurrency issues (deadlocks, livelocks and so on) Installation of
native extensions that take very long time to compile There is no
timeout for a build; a build will run as long as all the jobs do as
long as each job does not timeout.
Your build doesn't complete before for a specific issue in your build.
I would ask another question focused in your code and language node_jsand no in this limit.
I develop native apps so I can not help on this topic but I found this ticket:
It seems that they updated Node.js to 6.X, tested it using Travis-ci, it failed and currently they don't use Travis-ci, so I would ask directly to MoodleHQ in their forums.
jleyva Juan Leyva added a comment - 03/Nov/16 6:05 PM Dani, can you
enable in your Travis account your moodlemobile2 repository so we can
see if Travis is working with the new dependencies? I already changed
the tracker fields so Travis is aware of the branch (but it requires
first you to enable you forked moodlemobile2 repo)
jleyva Juan Leyva added a comment - 03/Nov/16 7:31 PM Builds are
failing: https://travis-ci.org/dpalou/moodlemobile2/builds/172896611
Protractor or Jasmine or whatever is not working with this dependency
set
You can also check related issues and compare, this configuration works using:
node_modules/.bin/protractor e2e-tests/protractor.conf.js --directConnect
in protractor-conf.js change chromeOnly to directConnect

How to trigger a build within a build chain after x days?

I am currently using Teamcity to deploy a web application to Azure Cloud Services. We typically deploy using powershell scripts to the Staging Slot and thereafter do a manual swap (Staging to Production) on the Azure Portal.
After the swap, we typically leave the Staging slot active with the old production deployment for a few days (in the event we need to revert/backout of the deployment) and thereafter delete it - this is a manual process.
I am looking to automate this process using Teamcity. My intended solution is to have a Teamcity build kick off x days after the deployment build has suceeded (The details of the build steps are irrelevant since I'd probably use powershell again to delete the staging slot)
This plan has pointed me to look into Teamcity build chains, snapshot dependencies etc.
What I have done so far is
correctly created the build chain by creating a snapshot dependency on the deployment build configuration and
created a Finish Build Trigger
At the moment, the current approach kickoffs the dependent build 'Delete Azure Staging Web' (B) immediately after the deployment build has succeeded. However, I would like this to be a delayed build after x days.
Looking at the above build chain, I would like the build B to run on 13-Aug-2016 at 7.31am (if x=3)
I have looked into the Schedule Trigger option as well, but am slightly lost as to how I can use it to achieve this. As far as I understand, using a cron expression will result in the build continuously running which is not what I want - I would like for the build B to only execute once.
Yes this can be done by making use of the REST api.
I've made a small sample which should convey the fundamental steps. This is a PowerShell script that will clear the triggers on another build configuration (determined by the parameter value in the script) and add a scheduled trigger with a start time X days on from the current time (determined by the parameter value in the script)
1) Add a PowerShell step to the main build, at the end and run add-scheduled-trigger as source code
2) Update the parameter values in the script
$BuildTypeId - This is the id of the configuration you want to add the trigger to
$NumberOfDays - This is the number of days ahead that you want to schedule the trigger for
There is admin / admin embedded in the script = Username / Password authentication for the REST api
One this is done you should see a scheduled trigger created / updated each time you build the first configuration
Hope this helps