VSTS Build jobs freeze sporadically - azure-devops

using visual studio team services online with an in house build agent. The build agent while running a job will randomly just freeze, the job is still active but there are no updates to the console, not errors in event logs etc. If I open the agent's _diag folder and look it will just repeat what is below until it decides to continue work.
17:02:19.850546 LogFileTimer_Callback - enter (20)
17:02:19.850546 LogFileTimer_Callback - processing job 7b9229d0-524e-4138-b6b3-33f630d109c6
17:02:19.850546 LogFileTimer_Callback - found 0 records for job 7b9229d0-524e-4138-b6b3-33f630d109c6
17:02:19.850546 LogFileTimer_Callback - leave
17:02:20.100159 StatusTimer_Callback - enter (27)
17:02:20.100159 StatusTimer_Callback - processing job 7b9229d0-524e-4138-b6b3-33f630d109c6
17:02:20.100159 StatusTimer_Callback - leave
17:02:20.240566 ConsoleTimer_Callback - enter (17)
17:02:20.240566 ConsoleTimer_Callback - Inside Lock
17:02:20.240566 ConsoleTimer_Callback - processing job 7b9229d0-524e-4138-b6b3-33f630d109c6
17:02:20.240566 ConsoleTimer_Callback - leave
17:02:20.755392 ConsoleTimer_Callback - enter (22)
17:02:20.755392 ConsoleTimer_Callback - Inside Lock
17:02:20.755392 ConsoleTimer_Callback - processing job 7b9229d0-524e-4138-b6b3-33f630d109c6
17:02:20.755392 ConsoleTimer_Callback - leave
17:02:20.864598 StatusTimer_Callback - enter (18)
17:02:20.864598 StatusTimer_Callback - processing job 7b9229d0-524e-4138-b6b3-33f630d109c6
17:02:20.864598 StatusTimer_Callback - leave
We have tried deleting the work folder, uninstalling the agent and reinstalling and it still just seems to freeze on random jobs. Any idea what else I could look into as why this is happening?

Just checked one log, and found these information existed in the log file here and there. Such as restore packages, upload logs, or retrieve files, etc.
These information don't mean there is an error. You may try to create a new agent on another machine to see whether this phenomenon would occur.

Related

How to collect more than 22 event ids with winlogbeat?

I've got a task to collect over 500 events from DC with winlogbeat. But windows got a limit 22 events to query. I'm using version 6.1.2. I've tried with processors like this:
winlogbeat.event_logs:
- name: Security
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
but with these settings client doesn't work, nothing in logs. If I run it from exe file it just starts and stops with no error.
If I try to do like it was written in the official manual:
winlogbeat.event_logs:
- name: Security
event_id: ...
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
client just crashes with "invalid event log key processors found". Also I've tried to create new custom view and take event from there, but apparently it also has query limit to 22 events.

Turning on email notifications breaks MS Release Management

I am running TFS 2013 Update 4, Release Management Client Update 4, Release Management Server Update 4, and Update 4 Deployment Agents. I am using ReleaseTfvcTemplate.12.xml.
When a developer checks in code, TFS Build compiles the code, and if it completes then it is released to the DEV stage. This works fine.
However, turning on emails creates a problem.
Let's say I need to notify 10 people of a deployment and then send those same 10 people "approval" emails after the deployment is accepted, which it automatically is. That's 20 emails.
I turned on verbose logging on the RM server and I see that each email takes 30 seconds to send. They send one at a time, one after the other. So it takes ten minutes to send twenty emails.
The emails start sending as soon as the deployment starts. The actual deployment usually takes around 1 minute. Release Management marks the build as deployed and keeps sending the "deploying" and "approval" emails. Meanwhile the TFS Build Configuration log is stuck waiting at:
Process each ConfigurationsToRelease
Release the build
Run the Release Management build process for the current configruation
If a deployment finishes its' emails because they are turned off or there are only 3-4 to send, then the TFS Build Configuration log completes the release and the build is marked successful. However, TFSBuild will only wait 5 minutes at the "Release the build" part of the ReleaseTfvcTemplate workflow. If it takes longer than 5 minutes to send 20 emails, which it does, the build fails. How do I increase this timeout? I have upped the timeout on every component/tool I could find in Release Management. I even changed some web.config timeout settings.
The end result is I end up with deployed code, Release Management thinks everything went fine, and TFS Build thinks the build failed.
Edit:
Here are some lines I pulled from the verbose RM server logs. Notice the timestamps. (I cut some lines out)
7/28/2015 3:49:48 PM - Verbose - (13008, 12024) - A workflow execution
is completed. 7/28/2015 3:49:48 PM - Information - (13008, 12024) -
DeploymentControllerServiceProcessor.OnActivityComplete: Workflow
completed successfully, accept the deployment step. LocalReleaseId:
596, LocalReleaseStepId: 2158 7/28/2015 3:54:47 PM - Information -
(13008, 6952) -
DeploymentControllerServiceProcessor.PrepareNotificationForDeployerImplementation:
NextActivityReadyForDeployment: 7/28/2015 3:54:47 PM - Information -
(13008, 6952) -
DeploymentControllerServiceProcessor.GetNextComponentReadyForDeployment:
DeploymentEvent: 7/28/2015 3:54:49 PM - Information - (13008, 12024)
- Exception in DeploymentControllerServiceProcessor.OnActivityComplete, app.Completed
7/28/2015 3:54:49 PM - Verbose - (13008, 12024) - The request was
aborted: The request was canceled.: \r\n\r\n at
System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at
Microsoft.TeamFoundation.Release.Data.WebRequest.PlatformHttpClient.EndGetResponse(IAsyncResult
asyncResult) at
Microsoft.TeamFoundation.Release.Data.WebRequest.RestClientResponseRetriever.EndGetAsyncMemoryStreamFromResponse(IAsyncResult
asyncResult, IPlatformHttpClient platformHttpClient) at
Microsoft.TeamFoundation.Release.Data.WebRequest.RestClientResponseRetriever.EndDownloadString(IAsyncResult
asyncResult, IPlatformHttpClient platformHttpClient) at
Microsoft.TeamFoundation.Release.Data.WebRequest.RestClient.EndPost(IAsyncResult
asyncResult) at
Microsoft.TeamFoundation.Release.Data.Proxy.RestProxy.HttpRequestor.<>c__DisplayClass1.b__0(String
url, String body) at
Microsoft.TeamFoundation.Release.Data.Proxy.RestProxy.BaseNotificationServiceProxy.SendNotification(Int32
releaseId, String releaseName, String applicationVersionName, String
stageTypeName, String environmentName, Int32 releaseStepId, Int32
releaseStepTypeId, Boolean releaseStepIsAutomated) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.CreateNextReleaseStep(Release
release, Stage stage, StageStep stageStep, Int32 releaseStageRank,
Int32 trialNumber) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.MoveToNextReleaseStep(Release
release, Stage currentStage, ReleaseStep currentReleaseStep) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.MoveWorkflowForward(Release
release, ReleasePath releasePath, Stage currentStage, ReleaseStep
currentReleaseStep, Int32 lastStepRankOfCurrentStage) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.AcceptStep(Release
release, Int32 releaseStepId, Int32 actualApproverId, String
approverComment, Nullable1 deferredDateTime) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.CreateNextReleaseStep(Release
release, Stage stage, StageStep stageStep, Int32 releaseStageRank,
Int32 trialNumber) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.MoveToNextReleaseStep(Release
release, Stage currentStage, ReleaseStep currentReleaseStep) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.MoveWorkflowForward(Release
release, ReleasePath releasePath, Stage currentStage, ReleaseStep
currentReleaseStep, Int32 lastStepRankOfCurrentStage) at
Microsoft.TeamFoundation.Release.Workflow.Services.ReleaseWorkflowService.AcceptStep(Release
release, Int32 releaseStepId, Int32 actualApproverId, String
approverComment, Nullable1 deferredDateTime) at
Microsoft.TeamFoundation.Release.ServiceProcessor.Processor.DeploymentControllerServiceProcessor.OnActivityComplete(String
workflow, WorkflowApplicationCompletedEventArgs e)
There is a setting on the "Administration" tab under "Settings" for "TFS Trigger Deployment Timeout". If you increase that, the build won't fail after 5 minutes.
I'd invest some time looking at why it takes 30 seconds to send each email, though. I've never seen that particular problem... it could be a network issue, or an issue with your mail server.

Selenium looping through jenkins and permission denied in cli

After struggling to get proper testsuites, I'm now pretty disappointed by the fact that , while following as close as possible this tutorial (pretty straightforward, right ?) Setting up Selenium server on a headless Jenkins CI build machine, Jenkins keeps looping on the current build, outputting :
So I decided to run a selenium build by hand on the ci machine, and got this :
user#machine:/var/log$ export DISPLAY=":99" && java -jar /var/lib/selenium/selenium- server.jar -browserSessionReuse -htmlSuite *firefox http://staging.site.com /var/lib/jenkins/jobs/project/workspace/tests/selenium/testsuite.html /var/lib/jenkins/jobs/project/workspace/logs/selenium.html
24 janv. 2012 19:27:56 org.openqa.grid.selenium.GridLauncher main
INFO: Launching a standalone server
19:27:59.927 INFO - Java: Sun Microsystems Inc. 20.0-b11
19:27:59.929 INFO - OS: Linux 3.0.0-14-generic amd64
19:27:59.951 INFO - v2.17.0, with Core v2.17.0. Built from revision 15540
19:27:59.958 INFO - Will recycle browser sessions when possible.
19:28:00.143 INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
19:28:00.144 INFO - Version Jetty/5.1.x
19:28:00.145 INFO - Started HttpContext[/selenium-server/driver,/selenium-server/driver]
19:28:00.147 INFO - Started HttpContext[/selenium-server,/selenium-server]
19:28:00.147 INFO - Started HttpContext[/,/]
19:28:00.183 INFO - Started org.openqa.jetty.jetty.servlet.ServletHandler#16ba8602
19:28:00.184 INFO - Started HttpContext[/wd,/wd]
19:28:00.199 INFO - Started SocketListener on 0.0.0.0:4444
19:28:00.199 INFO - Started org.openqa.jetty.jetty.Server#6f7a29a1
HTML suite exception seen:
java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:900)
at org.openqa.selenium.server.SeleniumServer.runHtmlSuite(SeleniumServer.java:603)
at org.openqa.selenium.server.SeleniumServer.boot(SeleniumServer.java:287)
at org.openqa.selenium.server.SeleniumServer.main(SeleniumServer.java:245)
at org.openqa.grid.selenium.GridLauncher.main(GridLauncher.java:54)
19:28:00.218 INFO - Shutting down...
19:28:00.220 INFO - Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=4444]
While understanding the output is'nt that hard, finding what to do to remove this issue is.
Any chance you guys already have been facing that kind of stuff ? Thanks
I only just got past these problems myself, but I was able to run your command when I pointed it at my .jar, testSuite and report file. I'm thinking that perhaps the location of your files under,
/var/lib/selenium
could be part of the problem. Try putting them where your user has permission perhaps under
/home/USERNAME/selenium
Other than that the only thing I can say is make sure your .jar, testSuite and report file are valid.
Also (I assume this is an error of copy and paste into stack overflow) but, this part of your command is incorrect
/var/lib/selenium/selenium- server.jar
You are not getting the error I would expect from an incorrect jar location so I assume something was lost when you pasted to stackoverflow.

Inspect and retry resque jobs via redis-cli

I am unable to run the resque-web on my server due to some issues I still have to work on but I still have to check and retry failed jobs in my resque queues.
Has anyone any experience on how to peek the failed jobs queue to see what the error was and then how to retry it using the redis-cli command line?
thanks,
Found a solution on the following link:
http://ariejan.net/2010/08/23/resque-how-to-requeue-failed-jobs
In the rails console we can use these commands to check and retry failed jobs:
1 - Get the number of failed jobs:
Resque::Failure.count
2 - Check the errors exception class and backtrace
Resque::Failure.all(0,20).each { |job|
puts "#{job["exception"]} #{job["backtrace"]}"
}
The job object is a hash with information about the failed job. You may inspect it to check more information. Also note that this only lists the first 20 failed jobs. Not sure how to list them all so you will have to vary the values (0, 20) to get the whole list.
3 - Retry all failed jobs:
(Resque::Failure.count-1).downto(0).each { |i| Resque::Failure.requeue(i) }
4 - Reset the failed jobs count:
Resque::Failure.clear
retrying all the jobs do not reset the counter. We must clear it so it goes to zero.

ClickOnce: DeploymentDownloadException: The operation has timed out

Symptom: ClickOnce installation starts and stops after around 600 kB (out of 2 MB).
Progress bar always stops at the same value (tried ten times).
Error log says that The operation has timed out (in inner exception) and fails with "DeploymentDownloadException (Unknown subtype)".
Error log details (irrelevant information trimmed):
ERROR DETAILS
Following errors were detected during this operation.
System.Deployment.Application.DeploymentDownloadException (Unknown subtype)
- Downloading http://fullpath/name.dll.deploy did not succeed.
- Source: System.Deployment
- Stack trace: at System.Deployment.Application.SystemNetDownloader.DownloadSingleFile(Downloa
dQueueItem next)
at
System.Deployment.Application.SystemNetDownloader.DownloadAllFiles()
at
System.Deployment.Application.FileDownloader.Download(SubscriptionState
subState)
--- Inner Exception ---
System.Net.WebException
- The operation has timed out.
- Source: System
- Stack trace:
at System.Net.ConnectStream.Read(Byte[] buffer,
Int32 offset, Int32 size)
at
System.Deployment.Application.SystemNetDownloader.DownloadSingleFile(Downloa
dQueueItem next)
This only happens for two customers. The install works OK for thousands of others. I have found numerous posts via google with no answer or generic "firewall is the issue" or "customer was using dialup".
Has anyone solved this? Is this a ClickOnce bug?
Disabling firewall software on the machine did not help because a hardware firewall installed on the network was the cause (FortiGate 30B).
I doubt that it's a bug. However, it seems like it gets stuck at one file in the deployment path. Maybe it is a type of file that is blocked by a firewall.
I would just remove all files but one from the build and see if that gets downloaded ok, and then add the rest of the files one by one (or maybe type by type) and see at what file ClickOnce gets stuck downloading.
If that doesn't seem to do anything, I'd build a dummy app and deploy it with ClickOnce and see if it installs at all on the customer's box.