How to fix conflicting simulink simulation accelerated artificats issue when running test in parallel mode? - matlab

My goal is to optimize the time it takes to run a set of simulation test cases. I'm having issues running test with parallel processing and accelerated simulation features. https://www.mathworks.com/help/simulink/ug/how-the-acceleration-modes-work.html
Context:
I have 29 Simulink files that are called inside of a parametrize Matlab Unit Test. The Simulink files have a lot of reference models. Before running a 20 second simulation for each simulink, the simulinks have to load all reference models and create a lot of simulation artifacts in a work folder. A lot of reference model are shared in-between simulink projects. The simulink projects have 64/187 reference model that run in accelerated mode. Normal mode generates .mexw64, and accelerated mode generates .slxc and .mexw64 in the work folder.
Action:
I run 1 test once . All in normal mode. My tests Succeed.
I run 29 test sequentially. All in normal mode. My tests Succeed.
I run 1 test once, then I run the 29 test in parallel clusters.All in normal mode.My tests Succeed.(**See Ref #1)
I run 1 test once, All in Accelerator Mode. My tests Succeed.
I run 29 test sequentially. All in Accelerator mode. My tests Succeed.
I run 1 test once, then I run the 29 test in parallel clusters,All in Accelerator Mode.My tests Fails.
Expected & Results:
I expected my simulation running in accelerator/parallel mode to have same positive results as the normal/parallel mode. But :
I'm having read/write/building issue using shared resources on
parallel tread when I run 2 test in parallel.
My parallel thread fails when I try to run 29 at the
same time.
Any idea how to fix this?
I tried different build configuration in simulation and build settings, I tried reduce the number of accelerated targets, and reading online resources.
Error:
#1
Building with 'Microsoft Visual C++ 2015 (C)'.
LINK : fatal error LNK1104: cannot open file 'D:\GIT\***\***\work\sim_artifacts\***_src_msf.mexw64'
NMAKE : fatal error U1077: 'C:\PROGRA~1\MATLAB\R2017b\bin\win64\mex.EXE' : return code '0xffffffff'
Stop.
The make command returned an error of 2
'An_error_occurred_during_the_call_to_make' is not recognized as an internal or external command,
operable program or batch file.
### Build procedure for model: '***_src' aborted due to an error.
#2
The client lost connection to worker 4. This might be due to network problems,
or the interactive communicating job might have errored.
Ref:
How to fix missing simulink simulation artificats issue when running test in parallel mode?
https://www.mathworks.com/help/simulink/gui/rebuild.html
https://www.mathworks.com/help/simulink/ug/model-callbacks.html#
https://www.mathworks.com/help/simulink/ug/reuse-simulation-builds-for-faster-simulations.html
https://www.mathworks.com/help/matlab/ref/matlab.unittest.testrunner.runinparallel.html

My problem was a data concurrency issues.
I found the solution.
Solution:
Now I run a test once in accelerator mode to generate cached folder (simulation targets and accelerated artifacts). Then I copy the cache folder 29 times. then I assign each folder to 1 parallel test run as input. Hence, all parallel test read and write in different folder hence they don't conflict with each other anymore.
Ref:
https://www.youtube.com/watch?v=cKK3qpBjixA
https://www.mathworks.com/help/simulink/ug/model-reference-simulation-targets-1.html
https://www.mathworks.com/help/simulink/ug/not-recommended-using-sim-function-within-parfor.html
you need to address any workspace access issues and data concurrency issues to produce useful results. Specifically, the simulations need to create separately named output files and workspace variables. Otherwise, each simulation overwrites the same workspace variables and files, or can have collisions trying to write variables and files simultaneously.
Code:
function setParCacheFolder()
%% get Simulink Files names
originalPath = path;
addpath(genpath('***\***\***\***'));
file=getFileNames('FOOTEST_*.slx','***\***\***\***');
for i = 1:length(file)
%% open/load simulink system
[~,NAME,~]=fileparts(file(i));
sys{i}=NAME;
open_system(sys{i});
%% copy Cache Folder
copyfile(fullfile(pwd,'\***\work\sim_artifacts'), fullfile(pwd,strcat('\***\work\sim_artifacts',sys{i}{1})));
%% get specific cache folder
get_param(0, 'CacheFolder')
%% set specific cache folder
set_param(0, 'CacheFolder', fullfile(pwd,strcat('\***\work\sim_artifacts',sys{i}{1})));
%% save simulink project
save_system(sys{i}{1},[],'SaveDirtyReferencedModels','on','OverwriteIfChangedOnDisk',true);
bdclose('all');
end
path(originalPath);
end

Related

Get test execution logs during test run by Nunit Test Engine

We are using NUnit Test Engine to run test programatically.
Lokks like that after we add FrameworkPackageSettings.NumberOfTestWorkers to the Runner code, the test run for our Ui test hangs in execution. I'm not able to see at what time or event the execuiton hangs because Test Runned returns test result logs (in xml) only when entire execution ends
Is there a way to get test execution logs for each test?
I've added InternalTraceLevel and InternalTraceWriter but these logs are something different (BTW, looks like ParallelWorker#9 hangs even to write to console :) )
_package.AddSetting(FrameworkPackageSettings.InternalTraceLevel, "Debug");
var nunitInternalLogsPath = Path.GetDirectoryName(Uri.UnescapeDataString(new Uri(Assembly.GetExecutingAssembly().CodeBase).AbsolutePath)) + "\\NunitInternalLogs.txt";
Console.WriteLine("nunitInternalLogsPath: "+nunitInternalLogsPath);
StreamWriter writer = File.CreateText(nunitInternalLogsPath);
_package.AddSetting(FrameworkPackageSettings.InternalTraceWriter, writer);
The result file, with default name TestResult.xml is not a log. That is, it is not a file produced, line by line, as execution proceeds. Rather, it is a picture of the result of your entire run and therefore is only created at the end of the run.
InternalTrace logs are actual logs in that sense. They were created to allow us to debug the internal workings of NUnit. We often ask users to create them when an NUnit bug is being tracked. Up to four of them may be produced when running a test of a single assembly under nunit3-console...
A log of the console runner itself
A log of the engine.
A log of the agent used to run tests (if an agent is used)
A log received from the test framework running the tests
In your case, #1 is not produced, of course. Based on the content of the trace log, we are seeing #4, triggered by the package setting passed to the framework. I have seen the situation where the log is incomplete in the past but not recently. The logs normally use auto-flush to ensure that all output is actually written.
If you want to see a complete log #2, then set the WorkDirectory and InternalTrace properties of the engine when you create it.
However, as stated, these logs are all intended for debugging NUnit, not for debugging your tests. The console runner produces another "log" even though it isn't given that name. It's the output written to the console as the tests run, especially that produced when using the --labels option.
If you want some similar information from your own runner, I suggest producing it yourself. Create either console output or a log file of some kind, by processing the various events received from the tests as they execute. To get an idea of how to do this, I suggest examining the code of the NUnit3 console runner. In particular, take a look at the TestEventHandler class, found at https://github.com/nunit/nunit-console/blob/version3/src/NUnitConsole/nunit3-console/TestEventHandler.cs

Running scalapbc command from a thread pool

I am trying to run the scalapb command from a threadpool with each thread running the scalapbc command;
When I do that, I get an error of the form :
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007f32dd71fd70, pid=8346, tid=0x00007f32e0018700
As per my google search, this issue occurs when the /tmp folder is full or it is trying to be accessed by multiple code simultaneously.
My question is, is there a way to issue scalapbc commands using threading without getting above error? how to make sure that the temp folders being used by the individual threads don't interfere with each other?
This issue occurs most of the times when I run the code but sometimes the build passes as well.

failed using cuda-gdb to launch program with CUPTI calls

I'm having this weird issue: I have a program that uses CUPTI callbackAPI to monitor the kernels in the program. It runs well when it's directly launched; but when I put it under cuda-gdb and run, it failed with the following error:
error: function cuptiSubscribe(&subscriber, CUpti_CallbackFunc)my_callback, NULL) failed with error CUPTI_ERROR_NOT_INITIALIZED
I've tried all examples in CUPTI/samples and concluded that programs that use callbackAPI and activityAPI will fail under cuda-gdb. (They are all well-behaved without cuda-gdb) But the fail reason differs:
If I have calls from activityAPI, then once run it under cuda-gdb, it'll hang for a minute then exit with error:
The CUDA driver has hit an internal error. Error code: 0x100ff00000001c Further execution or debugging is unreliable. Please ensure that your temporary directory is mounted with write and exec permissions.
If I have calls from callbackAPI like my own program, then it'll fail out much sooner with the same error:
CUPTI_ERROR_NOT_INITIALIZED
Any experience on this kinda issue? I really appreciate that!
According to NVIDIA forum posting here and also referred to here, the CUDA "tools" must be used uniquely. These tools include:
CUPTI
any profiler
cuda-memcheck
a debugger
Only one of these can be "in use" on a code at a time. It should be fairly easy for developers to use a profiler, or cuda-memcheck, or a debugger independently, but a possible takeaway for those using CUPTI, who also wish to be able to use another CUDA "tool" on the same code, would be to provide a coding method to be able to disable CUPTI use in their application, when they wish to use another tool.

How can I confirm how many tests are running in parallel?

I'm running my unit tests on a 16-core machine. I can't see any difference in elapsed time when using no parallelization parameter, --workers=1 and --workers=32. I'd like to confirm that NUnit really is running the expected number of simultaneous tests, so that I don't spend time hunting down a non-existent problem in my test code.
I have [Parallelizable] (default scope, ParallelScope.Self) on the common base class. It is defined not on any other class or method. I'm using nunit3-console, both via Jenkins and on my local command line.
Is there a way to tell that tests are running in parallel? NUnit is reporting the correct number of worker threads, but there's no report saying (for example) how many tests were run in each thread.
Run Settings
ProcessModel: Multiple
RuntimeFramework: net-4.5
WorkDirectory: C:\Jenkins\workspace\myproject
NumberOfTestWorkers: 16
I can log the start and finish times of each test then manually check that there's a reasonable number of overlaps; is there any simpler and more repeatable way of getting what I want?
I turned on --trace=Verbose which produces a few files. InternalTrace.<pid1>.log and InternalTrace.<pid2>.<dll_name>.log together contained a thorough description of what was happening during the tests. The per-agent log (the one with the DLL name in the log file name) was pretty clear about the state of parallelization.
16:13:40.701 Debug [10] WorkItemDispatcher: Directly executing test1
16:13:47.506 Debug [10] WorkItemDispatcher: Directly executing test2
16:13:52.847 Debug [10] WorkItemDispatcher: Directly executing test3
16:13:58.922 Debug [10] WorkItemDispatcher: Directly executing test4
16:14:04.492 Debug [10] WorkItemDispatcher: Directly executing test5("param1")
16:14:09.720 Debug [10] WorkItemDispatcher: Directly executing test5("param2")
16:14:14.618 Debug [10] WorkItemDispatcher: Directly executing test5("param3")
That third field looks like a thread ID to me. So I believe that the agent is running only one test at a time, even though (I think..) I've made all the tests parallelizable and they're all in different test fixtures.
Now I've just got to figure out what I've done wrong and why they're not running in parallel...
I could be mistaken but it seems to me that to set [Parallelizable] parameter is not enough to make tests run in parallel. You also need to create nodes that will run tests. So if you use Jenkins you have to create Jenkins Slaves and only then your tests will be run in parallel.
So what I want to tell is that as I understand there is no possibility to run tests in parallel on one PC.
If there is such a possibility (to run tests in parallel on the same machine) it would be really great and I struggle to hear about it!

Launching matlab remotely on windows via ssh? Impossible?

Howdy, I am trying to run matlab remotely on windows via OpenSSH installed with Cygwin, but launching matlab in windows without the GUI seems to be impossible.
If i am logged in locally, I can launch matlab -nodesktop -nodisplay -r script, and matlab will launch up a stripped down GUI and do the command.
However, this is impossible to do remotely via ssh, as, matlab needs to display the GUI.
Does anyone have any suggestions or work arounds?
Thanks,
Bob
Short story: is your script calling exit()? Are you using "-wait"?
Long story: I think you're fundamentally out of luck if you want to interact with it, but this should work if you just want to batch jobs up. Matlab on Windows is a GUI application, not a console application, and won't interact with character-only remote connectivity. But you can still launch the process. Matlab will actually display the GUI - it will just be in a desktop session on the remote computer that you have no access to. But if you can get it to do your job without further input, this can be made to work, for some value of "work".
Your "-r script" switch is the right direction. But realize that on Windows, Matlab's "-r" behavior is to finish the script and then go back to the GUI, waiting for further input. You need to explicitly include an "exit()" call to get your job to finish, and add try/catches to make sure that exit() gets reached. Also, you should use a "-logfile" switch to capture a copy of all the command window output to a log file so you can see what it's doing (since you can't see the GUI) and have a record of prior runs.
Also, matlab.exe is asynchronous by default. Your ssh call will launch Matlab and return right away unless you add the "-wait" switch. Check the processes on the machine you're sshing to; Matlab may actually be running. Add -wait if you want it to block until finished.
One way to do this stuff just use -r to call to a standard job wrapper script that initializes your libraries and paths, runs a job, and does cleanup and exit. You'll also want to make a .bat wrapper that sets up the -logfile switch to point to a file with the job name, timestamp, and other info in it. Something like this at the M-code level.
function run_batch_job(jobname)
try
init_my_matlab_library(); % By calling classpath(), javaclasspath(), etc
feval(jobname); % assumes jobname is an M-file on the path
catch err
warning('Error occurred while running job %s: %s', jobname, err.message)
end
try
exit();
catch err
% Yes, exit() can throw errors
java.lang.System.exit(1); % Scuttle the process hard to make sure job finishes
end
% If your code makes it to here, your job will hang
I've set up batch job systems using this style in Windows Scheduler, Tidal, and TWS before. I think it should work the same way under ssh or other remote access.
A Matlab batch system on Windows like this is brittle and hard to manage. Matlab on Windows is fundamentally not built to be a headless batch execution system; assumptions about an interactive GUI are pervasive in it and hard to work around. Low-level errors or license errors will pop up modal dialog boxes and hang your job. The Matlab startup sequence seems to have race conditions. You can't set the exit status of MATLAB.exe. There's no way of getting at the Matlab GUI to debug errors the job throws. The log file may be buffered and you lose output near hangs and crashes. And so on.
Seriously consider porting to Linux. Matlab is much more suitable as a batch system there.
If you have the money or spare licenses, you could also use the Matlab Distributed Computing toolbox and server to run code on remote worker nodes. This can work for parallelization or for remote batch jobs.
There are two undocumented hacks that reportedly fix a similar problem - they are not guarantied to solve your particular problem but they are worth a try. Both of them depend on modifying the java.opts file:
-Dsun.java2d.pmoffscreen=false
Setting this option fixes a problem of extreme GUI slowness when launching Matlab on a remote Linux/Solaris computer.
-Djava.compiler=NONE
This option disables the Java just-in-time compiler (JITC). Note that it has no effect on the Matlab interpreter JITC. It has a similar effect to running Matlab with the '–nojvm' command-line option. Note that this prevents many of Matlab's GUI capabilities. Unfortunately, in some cases there is no alternative. For example, when running on a remote console or when running pre-2007 Matlab releases on Intel-based Macs. In such cases, using the undocumented '-noawt' command-line option, which enables the JVM yet prevents JAVA GUI, is a suggested compromise.
Using putty use ssh -X remote "matlab" it should work