Calculation on GPU leads to driver error "stopped responding" - matlab

I have this little nonsense script here which I am executing in MATLAB R2013b:
clear all;
n = 2000;
times = 50;
i = 0;
tCPU = tic;
disp 'CPU::'
A = rand(n, n);
B = rand(n, n);
disp '::Go'
for i = 0:times
CPU = A * B;
end
tCPU = toc(tCPU);
tGPU = tic;
disp 'GPU::'
A = gpuArray(A);
B = gpuArray(B);
disp '::Go'
for i = 0:times
GPU = A * B ;
end
tGPU = toc(tGPU);
fprintf('On CPU: %.2f sec\nOn GPU: %.2f sec\n', tCPU, tGPU);
Unfortunately after execution I receive a message from Windows saying: "Display driver stopped working and has recovered.".
Which I assume means that Windows did not get response from my graphic cards driver or something. The script returned without errors:
>> test
CPU::
::Go
GPU::
::Go
On CPU: 11.01 sec
On GPU: 2.97 sec
But no matter if the GPU runs out of memory or not, MATLAB is not able to use the GPU device before I restarted it. If I don't restart MATLAB I receive just a message from CUDA:
>> test
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
CPU::
::Go
GPU::
Error using gpuArray
An unexpected error occurred during CUDA execution.
The CUDA error was:
the launch timed out and was terminated
Error in test (line 21)
A = gpuArray(A);
Does anybody know how to avoid this issue or what I am doing wrong here?
If needed, my GPU Device:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'GeForce GTX 660M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6
ToolkitVersion: 5
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
FreeMemory: 1.9037e+09
MultiprocessorCount: 2
ClockRateKHz: 950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

The key piece of information is this part of the gpuDevice output:
KernelExecutionTimeout: 1
This means that the host display driver is active on the GPU you are running the compute jobs on. The NVIDIA display driver contains a watchdog timer which kills any task which takes more than a predefined amount of time without yielding control back to the driver for screen refresh. This is intended to prevent the situation where a long running or stuck compute job renders the machine unresponsive by freezing the display. The runtime of your Matlab script is clearly exceeding the display driver watchdog timer limit. Once that happens, the the compute context held on the device is destroyed and Matlab can no longer operate with the device. You might be able to reinitialise the context by calling reset, which I guess will run cudaDeviceReset() under the cover.
There is a lot of information about this watchdog timer on the interweb - for example this Stack Overflow question. The solution for how to modify this timeout is dependent on your OS and hardware. The simplest way to avoid this is to not run CUDA code on a display GPU, or increase the granularity of your compute jobs so that no one operation has a runtime which exceeds the timeout limit. Or just write faster code...

Related

How to fix error [USF-XSim-62] when simulating project with Vivado Xilinx (also using DPI-C)?

I have a Verilog project that makes use of a testbench written in SystemVerilog and a few imports/exports of functions through the DPI-C interface.
When attempting to simulate, I get an xsim error (as far as I can tell) and the simulation stops. I have struggled with this issue for a while, and there is no specific info given in the Vivado terminal together with the error.
The exact error received is:
ERROR: [USF-XSim-62] 'elaborate' step failed with error(s) while executing 'path/to/proj/<proj_name>/<proj_name>.sim/sim_1/behav/xsim/elaborate.bat' script. Please check that the file has the correct 'read/write/execute' permissions and the Tcl console output for any other possible errors or warnings.
ERROR: [Vivado 12-4473] Detected error while running simulation. Please correct the issue and retry this operation.
launch_simulation: Time (s): cpu = 00:00:01 ; elapsed = 00:00:06 . Memory (MB): peak = 1412.562 ; gain = 0.000
ERROR: [Common 17-39] 'launch_simulation' failed due to earlier errors.
Not only is the file created by Vivado (so it should have permissions to access it), but this error appears and disappears at random. For example, modifying the testbench file and then modifying it back to original (to force recompilation) sometimes allows the simulation to run, seemingly at random.
What is even more confusing is that there are some "safe states" of the testbench code that allow the simulation to always run. My initial hunch was that it was related to the DPI-C functions, but I tried altering the files in many ways and I didn't find any obvious correlation.
Mentions
I am using Vivado 2021.2
I have implemented an automated TCL script to compile the C files using xsc and insert them into the project directory
I am using xelab command line arguments to link the C code compiled with xsc: -sv_root path/to/xsc -sv_lib dpi.
Update
When I run xelab by itself (with -v), it gives the following output (after static elaboration and simulation data flow analysis passed):
SDG Object Count: 1284, SDG Object Memory Usage: 133 KB.
Time Resolution for simulation is 1ps
Compiling package std.std
ICR Memory Usage: 491KB, 8192KB
Compiling package xil_defaultlib.$unit_tb_sv
ICR Memory Usage: 491KB, 8192KB
Compiling module xil_defaultlib.decoder_interface
ICR Memory Usage: 500KB, 8192KB
Compiling module xil_defaultlib.lut_biases
ICR Memory Usage: 584KB, 8192KB
Compiling module xil_defaultlib.saturate_default
ICR Memory Usage: 758KB, 8192KB
Compiling module xil_defaultlib.variable_nodes_default
ICR Memory Usage: 1832KB, 8192KB
Compiling module xil_defaultlib.check_nodes_default
INFO: [XSIM 43-4009] "abs_prev_proc_elem", written at line 26 in file "D:/Projects/Matlab/NN_BP_BCH/nn-min-sum-decoding/hardware/check_nodes.v", has also been read in this always_comb/always_latch block and is not added to the sensitivity list.
INFO: [XSIM 43-4009] "reg_min", written at line 30 in file "D:/Projects/Matlab/NN_BP_BCH/nn-min-sum-decoding/hardware/check_nodes.v", has also been read in this always_comb/always_latch block and is not added to the sensitivity list.
INFO: [XSIM 43-4009] "reg_sign", written at line 31 in file "D:/Projects/Matlab/NN_BP_BCH/nn-min-sum-decoding/hardware/check_nodes.v", has also been read in this always_comb/always_latch block and is not added to the sensitivity list.
INFO: [XSIM 43-4009] "temp_reg", written at line 52 in file "D:/Projects/Matlab/NN_BP_BCH/nn-min-sum-decoding/hardware/check_nodes.v", has also been read in this always_comb/always_latch block and is not added to the sensitivity list.
ICR Memory Usage: 6896KB, 8192KB
Compiling module xil_defaultlib.interm_layer
ICR Memory Usage: 6902KB, 8192KB
Compiling module xil_defaultlib.llr_to_out_default
ICR Memory Usage: 6927KB, 8192KB
Compiling module xil_defaultlib.out_layer_default
ICR Memory Usage: 7674KB, 8192KB
Compiling module xil_defaultlib.decoder_top_default
ICR Memory Usage: 8281KB, 16384KB
Compiling module xil_defaultlib.tb
child killed: unknown signal

Gem5 in full system running spec2006 runs out of memory

What I am trying to do is run a spec2006 benchmark (namely the 410.bwaves one) in full system mode.
I have made a .rcS script to pass to the fs.py script and the command I type to start the simulation is as follows:
build/X86/gem5.opt configs/example/fs.py --script="../run_bwaves.rcS" --disk-image=ubuntu-14.04.img --kernel=x86_64-vmlinux-2.6.22.9
The result after some time is:
Free swap: 0kB
131072 pages of RAM
3650 reserved pages
18 pages shared
0 pages swap cached
Out of memory: kill process 807 (bwaves) score 13154 or a child
Killed process 807 (bwaves)
/tmp/script: line 39: 807 Killed ./bwaves
Full linux output here
Gem5 output here
I am guessing it has something to do with this line: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
but I am not sure.
I have tried adding a --mem-size=... flag but it brakes the simulation with a Memory size not divisible by page size error.
If anyone could help me I would be glad.
Edit: As suggested by comment, I used a large enough --mem-size flag divisible by the page size. The error now has turned into
bwaves[807]: segfault at 00007ffee647624c rip 0000000000410eb5 rsp 00007fff664761c0 error 4
/tmp/script: line 39: 807 Segmentation fault ./bwaves

ddrescue read non tried blocks

I'm trying to rescue a 1TB disk which has read errors. Because I didn't have a free 1TB drive, I created a raid 0 of two 500GB drives.
I used the command line from Wikipedia for the first run:
sudo ddrescue -f -n /dev/sdk /dev/md/md_test /home/user/rescue.map
ddrescue already completed this run after approximately 20 hours and more than 7000 read errors.
Now I'm trying to do a second run
sudo ddrescue -d -f -v -r3 /dev/sdk /dev/md/md_test /home/user/rescue.map
and read the non tried blocks but ddrescue gives me this:
GNU ddrescue 1.23
About to copy 1000 GBytes from '/dev/sdk' to '/dev/md/md_test'
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 19584 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 635060 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0
Current status
ipos: 1000 GB, non-trimmed: 0 B, current rate: 0 B/s
opos: 1000 GB, non-scraped: 0 B, average rate: 0 B/s
non-tried: 365109 MB, bad-sector: 0 B, error rate: 0 B/s
rescued: 635060 MB, bad areas: 0, run time: 0s
pct rescued: 63.49%, read errors: 0, remaining time: n/a
time since last successful read: n/a
Copying non-tried blocks... Pass 1 (forwards)
ddrescue: Write error: Invalid argument
I can't figure out what this write errors means, already searched the manual for answers.
Any help is appreciated! Thx!
After a while I found the cause for the write error, the capacity of the corrupt drive is 931,5G but the total capacity of the raid 0 was just 931,3G.
Realized it, while I took a closer look to the output of lsblk command.
So I rebuild the raid 0 array with 3 500G drives and ddrescue now works as expected.

functional_coverage not showing proper result

I have developed a simple uvm testbench to verify a simple adder. I have used functional coverage to monitor the coverage as well. The adder is 8 bit with inputs a and b and the output is c, which is 9 bits.
I have developed the transaction with 8 bits rand logic for a and b.
In sequence, I have run that with repeat(100) and it will randomize and drives a and b to the DUT. The best case for functional coverage for this scenario is (100/256)*100% i.e around 40% assuming that no value will be repeated. I sample the coverage in my scoreboard and get the coverage result in env.
Here are my code snippets
// monitor class
covergroup cg;
a : coverpoint sb_item.a;
b : coverpoint sb_item.b;
endgroup
...
function void write(input input_seq_item i);
sb_item = i;
if(sb_item.c == sb_item.a + sb_item.b)
begin
`uvm_info("SB","OK!",UVM_LOW)
cg.sample();
end
else
`uvm_error("SB",$sformatf("ERROR! %b + %b = %b", sb_item.a, sb_item.b, sb_item.c), UVM_LOW)
endfunction
// env class
...
task run_phase(uvm_phase phase);
sb.cg.stop();
phase.raise_objection(this);
sb.cg.start();
seq.start(sqr);
phase.drop_objection(this);
sb.cg.stop();
`uvm_info("env",$sformatf("The coverage collected is %f",sb.cg.a.get_coverage()),UVM_LOW);
endtask
...
When I run the code, I get coverage of around 81. Results shown below
# KERNEL: UVM_INFO /home/runner/monitor.sv(56) # 996: uvm_test_top.env.sb [SB] OK!
# KERNEL: UVM_INFO /home/runner/env.sv(34) # 996: uvm_test_top.env [env] The coverage collected is 85.937500
# KERNEL: UVM_INFO /home/build/vlib1/vlib/uvm-1.2/src/base/uvm_objection.svh(1271) # 996: reporter [TEST_DONE] 'run' phase is ready to proceed to the 'extract' phase
# KERNEL: UVM_INFO /home/build/vlib1/vlib/uvm-1.2/src/base/uvm_report_server.svh(855) # 996: reporter [UVM/REPORT/SERVER]
# KERNEL: --- UVM Report Summary ---
# KERNEL:
# KERNEL: ** Report counts by severity
# KERNEL: UVM_INFO : 204
# KERNEL: UVM_WARNING : 0
# KERNEL: UVM_ERROR : 0
# KERNEL: UVM_FATAL : 0
# KERNEL: ** Report counts by id
# KERNEL: [Driver] 100
# KERNEL: [RNTST] 1
# KERNEL: [SB] 100
# KERNEL: [TEST_DONE] 1
# KERNEL: [UVM/RELNOTES] 1
# KERNEL: [env] 1
# KERNEL:
# RUNTIME: Info: RUNTIME_0068 uvm_root.svh (521): $finish called.
# KERNEL: Time: 996 ns, Iteration: 61, Instance: /top, Process: #INITIAL#14_0#.
# KERNEL: stopped at time: 996 ns
# VSIM: Simulation has finished. There are no more test vectors to simulate.
exit
# FCOVER: Covergroup Coverage data has been saved to "fcover.acdb" database.
# VSIM: Simulation has finished.
Can anyone explain what mistake I am doing here? Is the coverage cumulative over all runs?
Whether the coverage is cumulative over all runs depends on what you're analyzing. I'm guessing you're analyzing only one simulation, though. Your calculation is correct, the maximum coverage you could get per test is about 40% (basically 40% per each coverpoint, averaged together), but that's highly unlikely to reach.
What you also need to look at (aside from the percentage) is what bins are actually getting created. I don't think you're getting a bin for each value of a or b, but that some of them might be clumped up together (i.e. a in [ 0..3 ] would be one bin and so on, leaving you with 256/4 bins instead of 256). Each coverpoint has an option called auto_bin_max, whose default value is 64. if you set this to 256 or explicitly declare a (range) bin for each value that a or b could take, you'll get a coverage percentage you'd expect.
As a side note, you typically don't create a coverage bin for every value of a data item, since this doesn't really make sense. In a typical device there are so many values the data items could take that you can't verify them all. What you would do, however is declare bins for more "interesting" situations. In your case, interesting values are 0, 8'hff and anything in between. What's also particularly interesting is crossing a and b and checking the combinations, especially the case where a and b are both 8'hff (as that's where your result would overflow on 8 bits and output a carry.

Matlab: NNTRAINTOOL requires Java which is not available

I am using Windows 7 system.
And trying to use PuTTY to connect to a linux server and run Matlab Neural Network training function on it.
Before I asked question I have looked into some similar question here. But none of them solve my problem.
The command I use to open matlab is:
matlab -nodisplay -nodesktop
And in my code I also set:
net.trainParam.showWindow = false;
But I still got the error:
??? Error using ==> nntraintool at 28
NNTRAINTOOL requires Java which is not available
Error in ==> trainlm>train_network at 228
[userStop,userCancel] = nntraintool('check');
Error in ==> trainlm at 113
[net,tr] = train_network(net,tr,data,fcns,param);
Error in ==> network.train at 107
[net,tr] = feval(net.trainFcn,net,X,T,Xi,Ai,EW,net.trainParam);
Error in ==> generateNN at 49
[net tr] = train(net, features, targets);
Error in ==> sixOutputNN at 30
[ net tr ] = generateNN(features, targets, HIDDEN_LAYER, ...
Error in ==> findBestSixOutputNN at 10
[~, tr] = sixOutputNN(features, targets, configs(i).hidden_layers, ...
Could anyone help me with this. Thank you very much.
Sounds like you need to install a JVM on the host computer. See this web site for some help: http://www.mathworks.com/help/techdoc/matlab_external/f98533.html#f122001
Solution here:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/301204
I just converted the line 33 of nntraintool.m from:
error(message('nnet:Java:NotAvailable'));
to
warning(message('nnet:Java:NotAvailable'));
So that I still remember that there is something fishy going on there!
It works like a charm!
This problem was present even in Matlab 2012a...