When I run Parallel >> Manage Congifurations..., Matlab fails to pass the Distributed Job, the Parallel Job and the Matlabpool tests. My system has a double core: Intel Core i5 CPU M520 # 2.40GHz 2.40GHZ, 2GB RAM, Win7 64bit, Matlab R2011b. After the failed validation, I get the following output:
Validation Details
Configuration: "local" Type: local
-------------------------------------- Stage: Find Resource
Status: Passed Description: Validation passed
Command Line Output: (none)
-------------------------------------- Stage: Distributed Job
Status: Failed Description: The given stage reached the default or
user-specified timeout.
Command Line Output: (none)
Error Report: (none)
Debug Log: LOG FILE OUTPUT:
-------------------------------------- Stage: Parallel Job
Status: Failed Description: The given stage reached the default or
user-specified timeout.
Command Line Output: (none)
Error Report: (none)
Debug Log: LOG FILE OUTPUT:
-------------------------------------- Stage: Matlabpool
Status: Failed Description: A MATLAB pool is already open and might
interfere with further testing. To avoid this, before the next test
run try executing "matlabpool close".
Command Line Output: (none)
Error Report: (none)
Debug Log: (none)
This is pretty much what I get if I've called matlabpool prior to running the validation checks. You did pay attention to the advice given in the Status report from the Matlabpool stage didn't you, about closing an open matlabpool ?
Related
We are using devops to build our .net 4.7.2 application. As part of that, we are running the unit tests which are using the nunit framework and test runner.
It has been running fine for about 18 months, but has just stopped working in the last day :(
It's using the standard template for running the tests and looks like:
- task: VSTest#2
displayName: "Running tests"
inputs:
testSelector: 'testAssemblies'
testAssemblyVer2: |
**\*test*.dll
!**\*TestAdapter.dll
!**\obj\**
searchFolder: '$(System.DefaultWorkingDirectory)'
However, now it is failing the step with the following logs:
NUnit Adapter 4.2.0.0: Test execution started
Running all tests in D:\a\1\s\Configuration.Tests\bin\Release\Microsoft.VisualStudio.QualityTools.UnitTestFramework.dll
NUnit3TestExecutor discovered 0 of 0 NUnit test cases using Current Discovery mode, Explicit run
Running all tests in D:\a\1\s\Configuration.Tests\bin\Release\testcentric.engine.metadata.dll
NUnit3TestExecutor discovered 0 of 0 NUnit test cases using Current Discovery mode, Explicit run
Running all tests in D:\a\1\s\Api.Tests\bin\Release\testcentric.engine.metadata.dll
NUnit3TestExecutor discovered 0 of 0 NUnit test cases using Current Discovery mode, Explicit run
Running all tests in D:\a\1\s\CommunicationTests\bin\Release\testcentric.engine.metadata.dll
NUnit3TestExecutor discovered 0 of 0 NUnit test cases using Current Discovery mode, Explicit run
Running all tests in D:\a\1\s\Domain.Tests\bin\Release\testcentric.engine.metadata.dll
NUnit3TestExecutor discovered 0 of 0 NUnit test cases using Current Discovery mode, Explicit run
Running all tests in D:\a\1\s\packages\NUnit3TestAdapter.4.2.1\build\net35\testcentric.engine.metadata.dll
NUnit3TestExecutor discovered 0 of 0 NUnit test cases using Current Discovery mode, Explicit run
NUnit Adapter 4.2.0.0: Test execution complete
No test is available in D:\a\1\s\Configuration.Tests\bin\Release\Microsoft.VisualStudio.QualityTools.UnitTestFramework.dll D:\a\1\s\Configuration.Tests\bin\Release\testcentric.engine.metadata.dll D:\a\1\s\Api.Tests\bin\Release\testcentric.engine.metadata.dll D:\a\1\s\CommunicationTests\bin\Release\testcentric.engine.metadata.dll D:\a\1\s\Domain.Tests\bin\Release\testcentric.engine.metadata.dll D:\a\1\s\packages\NUnit3TestAdapter.4.2.1\build\net35\testcentric.engine.metadata.dll. Make sure that test discoverer & executors are registered and platform & framework version settings are appropriate and try again.
##[error]Could not find testhost
Results File: D:\a_temp\TestResults\VssAdministrator_WIN-FVJ4KUK6IFI_2022-08-18_12_38_44.trx
##[error]Test Run Aborted.
Total tests: Unknown
Passed: 110
Total time: 16.7203 Seconds
Vstest.console.exe exited with code 1.
**************** Completed test execution *********************
Test results files: D:\a_temp\TestResults\VssAdministrator_WIN-FVJ4KUK6IFI_2022-08-18_12_38_44.trx
Created test run: 1080
Publishing test results: 112
Publishing test results to test run '1080'.
TestResults To Publish 112, Test run id:1080
Test results publishing 112, remaining: 0. Test run id: 1080
Published test results: 112
Publishing Attachments: 1
Execution Result Code 1 is non zero, checking for failed results
Completed TestExecution Model...
##[warning]Vstest failed with error. Check logs for failures. There might be failed tests.
##[error]Error: The process 'D:\a_tasks\VSTest_ef087383-ee5e-42c7-9a53-
ab56c98420f9\2.205.0\Modules\DTAExecutionHost.exe' failed with exit code 1
##[error]Vstest failed with error. Check logs for failures. There might be failed tests.
Finishing: Running tests
Looking through this log, it seems that the nunit tests have run successfully, but it might be trying to run mstests? It is frustrating when devops gets an update and it breaks working pipelines.
We have the similar situation.
The unit tests are run with xUnit.
/TestAdapterPath:"D:\a\1\s" Starting test execution, please wait... A
total of 36 test files matched the specified pattern.
2.4828
##[error]Could not find testhost
Data collector 'Code Coverage' message: No code coverage data
available. Profiler was not initialized..
2.0273
##[error]Could not find testhost
Data collector 'Code Coverage' message: No code coverage data
available. Profiler was not initialized..
2.3746
##[error]Could not find testhost
Data collector 'Code Coverage' message: No code coverage data
available. Profiler was not initialized..
1.992
##[error]Could not find testhost
Data collector 'Code Coverage' message: No code coverage data
available. Profiler was not initialized..
4.8409
##[error]Could not find testhost
Data collector 'Code Coverage' message: No code coverage data
available. Profiler was not initialized..
2.1874
##[error]Could not find testhost
I compared the output of the successful run and the failed run and found the different version of the test platform. If you don't specify a version, the default version will be the latest and probably a preview one. So I add something in YAML to specify a workable version.
- task: VisualStudioTestPlatformInstaller#1
inputs:
versionSelector: 'SpecificVersion'
testPlatformVersion: '17.2.0'
- task: VSTest#2
inputs:
platform: '$(buildPlatform)'
configuration: '$(buildConfiguration)'
codeCoverageEnabled: True
vsTestVersion: 'toolsInstaller'
https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/test/vstest?view=azure-devops
https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/tool/vstest-platform-tool-installer?view=azure-devops
Is the on_fail directive of a step run when a previous step has failed ?
I'm using these steps :
- name: fail intentionally
service: busybox
command: false
- name: check if onfail is called
service: busybox
command: true
on_fail:
- command: echo reporting failure
Calling jet steps produces the following output :
(step: fail intentionally)
(image: busybox) (service: busybox) Image exists, using cached image
(step: fail intentionally) error ✗
(step: fail intentionally) container exited with a 1 code
My on_fail is not run.
Is that an issue with the jet utility or would things behave the same in Codeship ?
You have defined an on_fail contingency for the second test step (a step that will not fail). If the on_fail was set for the first step (which fails and stops the build), you would have noted the echoed statement.
This behavior would be consistent with a build running in CodeShip Pro.
I tried starting parpool in MATLAB 2015b. Command as follows,
parpool('local',3);
This command should allocate 3 workers. Whereas I received an error stating failure to start parpool. The error message as follows,
Error using parpool (line 94)
Failed to start a parallel pool. (For information in addition to
the causing error, validate the profile 'local' in the Cluster Profile
Manager.)
A similar query was posted in (https://nl.mathworks.com/matlabcentral/answers/196549-failed-to-start-a-parallel-pool-in-matlab2015a). I followed the same procedure, to validate the local profile as per the suggestions.
Using distcomp.feature( 'LocalUseMpiexec', false); or distcomp.feature( 'LocalUseMpiexec', true) in startup.m didn't create any improvement. Also when attempting to validate local profile still gives error message as follows,
VALIDATION DETAILS
Profile: local
Scheduler Type: Local
Stage: Cluster connection test (parcluster)
Status: Passed
Description:Validation Passed
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)
Stage: Job test (createJob)
Status: Failed
Description:The job errored or did not reach state finished.
Command Line Output:
Failed to determine if job 24 belongs to this cluster because: Unable to
read file 'C:\Users\varad001\AppData\Roaming\MathWorks\MATLAB
\local_cluster_jobs\R2015b\Job24.in.mat'. No such file or directory..
Error Report:(none)
Debug Log:(none)
Stage: SPMD job test (createCommunicatingJob)
Status: Failed
Description:The job errored or did not reach state finished.
Command Line Output:
Failed to determine if job 25 belongs to this cluster because: Unable to
read file 'C:\Users\varad001\AppData\Roaming\MathWorks\MATLAB
\local_cluster_jobs\R2015b\Job25.in.mat'. No such file or directory..
Error Report:(none)
Debug Log:(none)
Stage: Pool job test (createCommunicatingJob)
Status: Skipped
Description:Validation skipped due to previous failure.
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)
Stage: Parallel pool test (parpool)
Status: Skipped
Description:Validation skipped due to previous failure.
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)
I am receiving these error only in my cluster machine. But launching parpool in my standalone PC is working perfectly. Is there a way to rectify this issue?
When I run Chef-Client in PowerShell and allow the process to output to the screen using the following command:
& Chef-Client -z -r "chef-cookbook"
I get this output:
[2014-11-10T07:20:40-08:00] WARN: No config file found or specified on command line, using command line options.
Starting Chef Client, version 11.16.4
resolving cookbooks for run list: ["chef-cookbook"]
Synchronizing Cookbooks:
- chef-cookbook
- powershell-automation
Compiling Cookbooks...
Converging 2 resources
Recipe: powershell-automation::Port_Configuration
* powershell_script[Port_Configuration] action run (skipped due to not_if)
Recipe: powershell-automation::IIS_InstallAutomation
* powershell_script[IIS_InstallAutomation] action run (skipped due to not_if)
Running handlers:
Running handlers complete
Chef Client finished, 0/0 resources updated in 43.69728 seconds
When I run the same command, but capture it to a variable, using the following command:
$chefOutput = & Chef-Client -z -r "chef-cookbook"
The $chefOutput variable contains:
[2014-11-10T07:23:01-08:00] WARN: No config file found or specified on command line, using command line options.
[2014-11-10T07:23:01-08:00] INFO: Auto-discovered chef repository at C:/Temp
[2014-11-10T07:23:01-08:00] INFO: Starting chef-zero on host localhost, port 8889 with repository at repository at C:/Temp
One version per cookbook
[2014-11-10T07:23:06-08:00] INFO: *** Chef 11.16.4 ***
[2014-11-10T07:23:06-08:00] INFO: Chef-client pid: 3364
[2014-11-10T07:23:37-08:00] INFO: Setting the run_list to [recipe[chef-cookbook]] from CLI options
[2014-11-10T07:23:37-08:00] INFO: Run List is [recipe[chef-cookbook]]
[2014-11-10T07:23:37-08:00] INFO: Run List expands to [chef-cookbook]
[2014-11-10T07:23:37-08:00] INFO: Starting Chef Run for XXXXX.XXX.XXX.XXX.com
[2014-11-10T07:23:37-08:00] INFO: Running start handlers
[2014-11-10T07:23:37-08:00] INFO: Start handlers complete.
[2014-11-10T07:23:37-08:00] INFO: HTTP Request Returned 404 Not Found : Object not found: /reports/nodes/XXXXX.XX.XX.XX.com/runs
[2014-11-10T07:23:37-08:00] INFO: Loading cookbooks [chef-cookbook#2015.1.0, powershell-automation#2015.1.0]
[2014-11-10T07:23:37-08:00] INFO: Processing powershell_script[Port_Configuration] action run (powershell-automation::Port_Configuration line 22)
[2014-11-10T07:23:37-08:00] INFO: Processing bash[Guard resource] action run (dynamically defined)
[2014-11-10T07:23:38-08:00] INFO: bash[Guard resource] ran successfully
[2014-11-10T07:23:38-08:00] INFO: Processing powershell_script[IIS_InstallAutomation] action run (powershell-automation::IIS_InstallAutomation line 16)
[2014-11-10T07:23:43-08:00] INFO: Chef Run complete in 6.346486 seconds
[2014-11-10T07:23:43-08:00] INFO: Running report handlers
[2014-11-10T07:23:43-08:00] INFO: Report handlers complete
Why does this discrepancy between outputs happen?
Note: I am seeing that the output in the variable also contains the time stamps and INFO tags for each line. Based on this, I believe this is something to do with how Chef outputs vs something to do with PowerShell.
It checks if stdout is a TTY.
I am unable to run the resque-web on my server due to some issues I still have to work on but I still have to check and retry failed jobs in my resque queues.
Has anyone any experience on how to peek the failed jobs queue to see what the error was and then how to retry it using the redis-cli command line?
thanks,
Found a solution on the following link:
http://ariejan.net/2010/08/23/resque-how-to-requeue-failed-jobs
In the rails console we can use these commands to check and retry failed jobs:
1 - Get the number of failed jobs:
Resque::Failure.count
2 - Check the errors exception class and backtrace
Resque::Failure.all(0,20).each { |job|
puts "#{job["exception"]} #{job["backtrace"]}"
}
The job object is a hash with information about the failed job. You may inspect it to check more information. Also note that this only lists the first 20 failed jobs. Not sure how to list them all so you will have to vary the values (0, 20) to get the whole list.
3 - Retry all failed jobs:
(Resque::Failure.count-1).downto(0).each { |i| Resque::Failure.requeue(i) }
4 - Reset the failed jobs count:
Resque::Failure.clear
retrying all the jobs do not reset the counter. We must clear it so it goes to zero.