Training using mxnet in ipython console of Spyder not working - ipython

I'm running the mnist example of mxnet in Spyder IDE. The training does not progress, just as if the learning rate were 0 (see output below).
If I run the same file on the console using either python or ipython, it works as expected (seeing improvements in the second epoch).
If I run the file using the normal Python console of Spyder, it also works.
But if I run the file using the ipython console, I get the output shown below.
I'm new to Python. Any ideas?
Source code
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
#%%
import mxnet as mx
#%% load mnist
mnist = mx.test_utils.get_mnist()
#%% define data iterators
batch_size = 100
train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
#%% create input variable
data = mx.sym.var('data')
# Flatten the data from 4-D shape into 2-D (batch_size, num_channel*width*height)
data = mx.sym.flatten(data=data)
#%% Multilayer Perceptron with softmax
# The first fully-connected layer and the corresponding activation function
fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
act1 = mx.sym.Activation(data=fc1, act_type="relu")
# The second fully-connected layer and the corresponding activation function
fc2 = mx.sym.FullyConnected(data=act1, num_hidden = 64)
act2 = mx.sym.Activation(data=fc2, act_type="relu")
# MNIST has 10 classes
fc3 = mx.sym.FullyConnected(data=act2, num_hidden=10)
# Softmax with cross entropy loss
mlp = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
#%% Training
import logging
logging.getLogger().setLevel(logging.DEBUG) # logging to stdout
# create a trainable module on CPU
mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
mlp_model.fit(train_iter, # train data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate':0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback = mx.callback.Speedometer(batch_size, 100), # output progress for each 100 data batches
num_epoch=10) # train for at most 10 dataset passes
Output of ipython console in Spyder
runfile('/home/thomas/workspace/clover-python/mnist.py', wdir='/home/thomas/workspace/clover-python')
INFO:root:train-labels-idx1-ubyte.gz exists, skipping download
INFO:root:train-images-idx3-ubyte.gz exists, skipping download
INFO:root:t10k-labels-idx1-ubyte.gz exists, skipping download
INFO:root:t10k-images-idx3-ubyte.gz exists, skipping download
INFO:root:Epoch[0] Batch [100] Speed: 1304.05 samples/sec accuracy=0.097921
INFO:root:Epoch[0] Batch [200] Speed: 1059.19 samples/sec accuracy=0.098200
INFO:root:Epoch[0] Batch [300] Speed: 1178.64 samples/sec accuracy=0.099600
INFO:root:Epoch[0] Batch [400] Speed: 1292.71 samples/sec accuracy=0.098900
INFO:root:Epoch[0] Batch [500] Speed: 1394.21 samples/sec accuracy=0.096500
INFO:root:Epoch[0] Train-accuracy=0.101212
INFO:root:Epoch[0] Time cost=47.798
INFO:root:Epoch[0] Validation-accuracy=0.098000
INFO:root:Epoch[1] Batch [100] Speed: 1247.47 samples/sec accuracy=0.097921
INFO:root:Epoch[1] Batch [200] Speed: 1673.79 samples/sec accuracy=0.098200
INFO:root:Epoch[1] Batch [300] Speed: 1283.91 samples/sec accuracy=0.099600
INFO:root:Epoch[1] Batch [400] Speed: 1247.79 samples/sec accuracy=0.098900
INFO:root:Epoch[1] Batch [500] Speed: 1371.93 samples/sec accuracy=0.096500
INFO:root:Epoch[1] Train-accuracy=0.101212
INFO:root:Epoch[1] Time cost=44.201
INFO:root:Epoch[1] Validation-accuracy=0.098000
INFO:root:Epoch[2] Batch [100] Speed: 1387.72 samples/sec accuracy=0.097921
INFO:root:Epoch[2] Batch [200] Speed: 1196.37 samples/sec accuracy=0.098200
INFO:root:Epoch[2] Batch [300] Speed: 1220.44 samples/sec accuracy=0.099600
INFO:root:Epoch[2] Batch [400] Speed: 1387.75 samples/sec accuracy=0.098900
INFO:root:Epoch[2] Batch [500] Speed: 1279.58 samples/sec accuracy=0.096500
INFO:root:Epoch[2] Train-accuracy=0.101212
INFO:root:Epoch[2] Time cost=46.929
INFO:root:Epoch[2] Validation-accuracy=0.098000
INFO:root:Epoch[3] Batch [100] Speed: 1266.24 samples/sec accuracy=0.097921
and so on…

Do you have multiple versions of MXNet installed? One thing I can think of is to verify if ipython console and python console is using the same version of python and MXNet. In ipython, you can type import mxnet; mxnet? to see what version of MXNet you're using.

Per Eric's comment, let's verify that you're running the same version of mxnet in each environment. You can type
import mxnet
print(mxnet.__version__)

Related

Getting error in baking Yocto project recipe OS

I'm trying to build a core-image-minimal recipe on Ubuntu 20.04 in VirtualBox for Raspberry Pi3 board (~2GB RAM, 100 GB storage allowed). But facing a new issue while baking the image. In the local.conf file I've removed "tar.xz ext3" from "IMAGE_FSTYPES = ""tar.xz ext3 rpi-sdimg"" then the image baked successfully without any error, but not supporting the I2C & UART. But when I reverted the above-mentioned change it's throwing errors.
ERROR:
core-image-minimal-1.0-r0 do_image_tar: Execution of '/home/pranav/poky/build/tmp/work/raspberrypi3-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/run.do_image_tar.2549' failed with exit code 1
Logfile of failure stored in: /home/pranav/poky/build/tmp/work/raspberrypi3-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/log.do_image_tar.2549
Please, let me know what is the solution.
Complete error :
pranav#Pranav:~$ cd poky
pranav#Pranav:~/poky$ source oe-init-build-env
### Shell environment set up for builds. ###
You can now run 'bitbake <target>'
Common targets are:
core-image-minimal
core-image-sato
meta-toolchain
meta-ide-support
You can also run generated qemu images with a command like 'runqemu qemux86'
Other commonly useful commands are:
- 'devtool' and 'recipetool' handle common recipe tasks
- 'bitbake-layers' handles common layer tasks
- 'oe-pkgdata-util' handles common target package tasks
pranav#Pranav:~/poky/build$ bitbake core-image-minimal
Loading cache: 100% |###################################################################################################| Time: 0:00:02
Loaded 3295 entries from dependency cache.
NOTE: Resolving any missing task queue dependencies
Build Configuration:
BB_VERSION = "1.46.0"
BUILD_SYS = "x86_64-linux"
NATIVELSBSTRING = "universal"
TARGET_SYS = "arm-poky-linux-gnueabi"
MACHINE = "raspberrypi3"
DISTRO = "poky"
DISTRO_VERSION = "3.1.14"
TUNE_FEATURES = "arm vfp cortexa7 neon vfpv4 thumb callconvention-hard"
TARGET_FPU = "hard"
meta
meta-poky
meta-yocto-bsp = "dunfell:3d5dd4dd8d66650615a01cd210ff101daa60c0df"
meta-raspberrypi = "dunfell:934064a01903b2ba9a82be93b3f0efdb4543a0e8"
meta-oe
meta-multimedia
meta-networking
meta-python = "dunfell:ec978232732edbdd875ac367b5a9c04b881f2e19"
Initialising tasks: 100% |##############################################################################################| Time: 0:00:10
Sstate summary: Wanted 2 Found 0 Missed 2 Current 1135 (0% match, 99% complete)
NOTE: Executing Tasks
ERROR: core-image-minimal-1.0-r0 do_image_tar: Execution of '/home/pranav/poky/build/tmp/work/raspberrypi3-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/run.do_image_tar.2549' failed with exit code 1
ERROR: Logfile of failure stored in: /home/pranav/poky/build/tmp/work/raspberrypi3-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/log.do_image_tar.2549
Log data follows:
| DEBUG: Executing python function set_image_size
| DEBUG: 8013.200000 = 6164 * 1.300000
| DEBUG: 8192.000000 = max(8013.200000, 8192)[8192.000000] + 0
| DEBUG: 8192.000000 = int(8192.000000)
| DEBUG: 8192 = aligned(8192)
| DEBUG: returning 8192
| DEBUG: Python function set_image_size finished
| DEBUG: Executing shell function do_image_tar
| xz: Memory usage limit is too low for the given filter setup.
| xz: 1,250 MiB of memory is required. The limit is 954 MiB.
| WARNING: exit code 1 from a shell command.
| ERROR: Execution of '/home/pranav/poky/build/tmp/work/raspberrypi3-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/run.do_image_tar.2549' failed with exit code 1
ERROR: Task (/home/pranav/poky/meta/recipes-core/images/core-image-minimal.bb:do_image_tar) failed with exit code '1'
NOTE: Tasks Summary: Attempted 3053 tasks of which 3051 didn't need to be rerun and 1 failed.
Summary: 1 task failed:
/home/pranav/poky/meta/recipes-core/images/core-image-minimal.bb:do_image_tar
Summary: There was 1 ERROR message shown, returning a non-zero exit code.

"image is too large" keeps on happening to openbmc image for Raspberrypi platform

Could someone please give me advice to make an openbmc image for Raspberrypi platform ?
Before I tried, I looked through related documents and believed an openbmc image can be worked on Raspberrypi.
Like OpenBMC with Raspberry Pi (2 or 3) and build bmcweb?
and https://kevinleeblog.github.io/project1/2019/11/25/openbmc-for-raspberry-pi-zero/.
So, I followed these instructions and tried the following steps.
#1: Git clone openbmc.git to my local PC.
tm#tm-VB1:~/Rpi4-64$ git clone https://github.com/openbmc/openbmc.git
Snip the logs but it looks no problem.
Receiving objects: 100% (182121/182121), 84.10 MiB | 5.55 MiB/s, done.
Resolving deltas: 100% (96860/96860), done.
#2: set TEMPLATECONF for raspberrypi
tm#tm-VB1:~/Rpi4-64$ export TEMPLATECONF=meta-evb/meta-evb-raspberrypi/conf
tm#tm-VB1:~/Rpi4-64$ echo $TEMPLATECONF
meta-evb/meta-evb-raspberrypi/conf
#3: set up the environment by "openbmc-env"
tm#tm-VB1:~/Rpi4-64/openbmc$ . openbmc-env
### Initializing OE build env ###
Snip the logs but it looks no problem. As you know, the script automatically creates a subdirectory,build, under openbmc.
Common targets are:
obmc-phosphor-image
tm#tm-VB1:~/Rpi4-64/openbmc/build$
#4: Change the directory and edit local.conf for my Raspberrypi platform.
tm#tm-VB1:~/Rpi4-64/openbmc/build$ cat ./conf/local.conf
Snip the log for unchanged part.
MACHINE ??= "raspberrypi4-64" <<< Change here for my platform.
DL_DIR ?= "/home/tm/Yocto/downloads" <<< Add here for build-time reduction at retry.
SSTATE_DIR ?= "/home/tm/Yocto/sstate-cache" <<< Add here for build-time reduction at retry.
#5: Change FLASH_SIZE variable based on the following sugestion. https://github.com/openbmc/openbmc/issues/3590
tm#tm-VB1:~/Rpi4-64/openbmc/meta-phosphor/classes$ cat image_types_phosphor.bbclass
Snip the log.
# Flash characteristics in KB unless otherwise noted
FLASH_SIZE ?= "131072" <<< I changed only this variable from 32768 to 131072.
#6: bitbake starts.
tm#tm-VB1:~/Rpi4-64/openbmc/bitbake obmc-phosphor-image
Then, ERROR happened.
ERROR: Logfile of failure stored in: /home/tm/Rpi/openbmc/build/tmp/work/raspberrypi-openbmc-linux-gnueabi/obmc-phosphor-image/1.0-r0/temp/log.do_generate_static.2055074
DEBUG: Executing python function do_generate_static
DEBUG: Executing shell function do_mk_static_nor_image
32768+0 records in
32768+0 records out
33554432 bytes (34 MB, 32 MiB) copied, 0.09147 s, 367 MB/s
DEBUG: Shell function do_mk_static_nor_image finished
DEBUG: Considering file size=495980 name=/home/tm/Rpi/openbmc/build/tmp/deploy/images/raspberrypi/u-boot.bin
DEBUG: Spanning start=0K end=512K
DEBUG: Compare needed=495980 available=524288 margin=28308
484+1 records in
484+1 records out
495980 bytes (496 kB, 484 KiB) copied, 0.00120141 s, 413 MB/s
DEBUG: Considering file size=8266960 name=/home/tm/Rpi/openbmc/build/tmp/deploy/images/raspberrypi/fitImage-obmc-phosphor-initramfs-raspberrypi-raspberrypi
DEBUG: Spanning start=512K end=4864K
>>>DEBUG: Compare needed=8266960 available=4456448 margin=-3810512
ERROR: Image '/home/tm/Rpi/openbmc/build/tmp/deploy/images/raspberrypi/fitImage-obmc-phosphor-initramfs-raspberrypi-raspberrypi' is too large!
DEBUG: Python function do_generate_static finished
It said margin=-3810512.
Now, my 2nd try.
I removed the whole openbmc directory and did the same steps above.
But this time, I change FLASH_SIZE from 32768 to 262144.
It is the same result like below.
ERROR: obmc-phosphor-image-1.0-r0 do_generate_static: Image '/home/tm/Rpi4/openbmc/build/tmp/deploy/images/raspberrypi4/u-boot.bin' is too large!
ERROR: Logfile of failure stored in: /home/tm/Rpi4/openbmc/build/tmp/work/raspberrypi4-openbmc-linux-gnueabi/obmc-phosphor-image/1.0-r0/temp/log.do_generate_static.2061792
ERROR: Task (/openbmc/meta-phosphor/recipes-phosphor/images/obmc-phosphor-image.bb:do_generate_static) failed with exit code '1'
NOTE: Tasks Summary: Attempted 3915 tasks of which 2633 didn't need to be rerun and 1 failed.
Summary: 1 task failed:
/openbmc/meta-phosphor/recipes-phosphor/images/obmc-phosphor-image.bb:do_generate_static
Summary: There were 2 WARNING messages shown.
Summary: There was 1 ERROR message shown, returning a non-zero exit code.
tm#tm-VB1:~/Rpi4/openbmc/build$ cat /home/tm/Rpi4/openbmc/build/tmp/work/raspberrypi4-openbmc-linux-gnueabi/obmc-phosphor-image/1.0-r0/temp/log.do_generate_static.2061792
DEBUG: Executing python function do_generate_static
DEBUG: Executing shell function do_mk_static_nor_image
32768+0 records in
32768+0 records out
33554432 bytes (34 MB, 32 MiB) copied, 0.177223 s, 189 MB/s
DEBUG: Shell function do_mk_static_nor_image finished
DEBUG: Considering file size=548224 name=/home/tm/Rpi4/openbmc/build/tmp/deploy/images/raspberrypi4/u-boot.bin
DEBUG: Spanning start=0K end=512K
>>>DEBUG: Compare needed=548224 available=524288 margin=-23936
ERROR: Image '/home/tm/Rpi4/openbmc/build/tmp/deploy/images/raspberrypi4/u-boot.bin' is too large!
DEBUG: Python function do_generate_static finished
tm#tm-VB1:~/Rpi4/openbmc/build$
It said margin=-23936.
OK. Image is too large. So,my 3rd try.
I removed the whole openbmc directory and did the same steps above.
But this time, I change FLASH_SIZE from 32768 to 9437184.
It is the same result like below.
ERROR: obmc-phosphor-image-1.0-r0 do_generate_static: Image '/home/tm/Rpi4/openbmc/build/tmp/deploy/images/raspberrypi4/u-boot.bin' is too large!
ERROR: Logfile of failure stored in: /home/tm/Rpi4/openbmc/build/tmp/work/raspberrypi4-openbmc-linux-gnueabi/obmc-phosphor-image/1.0-r0/temp/log.do_generate_static.2058361
ERROR: Task (/openbmc/meta-phosphor/recipes-phosphor/images/obmc-phosphor-image.bb:do_generate_static) failed with exit code '1'
NOTE: Tasks Summary: Attempted 3935 tasks of which 0 didn't need to be rerun and 1 failed.
Summary: 1 task failed:
/openbmc/meta-phosphor/recipes-phosphor/images/obmc-phosphor-image.bb:do_generate_static
Summary: There were 4 WARNING messages shown.
Summary: There was 1 ERROR message shown, returning a non-zero exit code.
tm#tm-VB1:~/Rpi4/openbmc$
tm#tm-VB1:~/Rpi4/openbmc$ cat /home/tm/Rpi4/openbmc/build/tmp/work/raspberrypi4-openbmc-linux-gnueabi/obmc-phosphor-image/1.0-r0/temp/log.do_generate_static.2058361
DEBUG: Executing python function do_generate_static
DEBUG: Executing shell function do_mk_static_nor_image
32768+0 records in
32768+0 records out
33554432 bytes (34 MB, 32 MiB) copied, 0.173685 s, 193 MB/s
DEBUG: Shell function do_mk_static_nor_image finished
DEBUG: Considering file size=548224 name=/home/tm/Rpi4/openbmc/build/tmp/deploy/images/raspberrypi4/u-boot.bin
DEBUG: Spanning start=0K end=512K
>>>DEBUG: Compare needed=548224 available=524288 margin=-23936
ERROR: Image '/home/tm/Rpi4/openbmc/build/tmp/deploy/images/raspberrypi4/u-boot.bin' is too large!
DEBUG: Python function do_generate_static finished
tm#tm-VB1:~/Rpi4/openbmc$
It said the same margin as 256MB case.
My 4th try.
I removed the whole openbmc directory and did the same steps above.
I changed MACHINE ??= "raspberrypi4-64" to "raspberrypi2"
But this time, I change FLASH_SIZE from 32768 to 33554432.
It is the same result before.
My 5th try.
I removed the whole openbmc directory and did the same steps above.
I used MACHINE ??= "raspberrypi2"
But this time, I change FLASH_SIZE from 32768 to 67108864.
It is the same result before.
After I tried several variations, it always said "image is too large" although I changed FLASH_SIZE to much much larger one.
So, I am wondering if I have missed some important configuration or it needs another parameter to fix this except FLASH_SIZE.
By the way, I tried romulus and made it.
My environment is ubuntu-20.04.2.0-desktop-amd64.
I really appreciate someone could kindly give me advice to make this work.
Interesting, I don't have a quick fix for you but I did notice the partition that is over sized is the uboot partition. The uboot is a smaller separate binary installed on the machine. It looks as if your uboot build is over 512k and the partition is set for 512k. Your flash size is massize
FLASH_SIZE = 9437184" that is more then a gig, (because FLASH_SIZE is in K)
If I were you I would first try to build an older version of openbmc for raspberry pi. (It used to work so you just need to find the commit before uboot grew to big). Use git to move back a month until you find it works.
If that does not work I would try to modify the partition table.
here is where you failing
this looks fine building the uboot image looks fine
increasing the kernel offset make if build, but the other targets in openbmc will not be happy with this solution. So maybe meta-raspberry-pi will have to override the partition table (if uboot can not be shrunk)
What ever you do, open an issue on the github and share you changes. Also use the discord, and gerrit.
I just replicated this issue. We should fix it

ddrescue read non tried blocks

I'm trying to rescue a 1TB disk which has read errors. Because I didn't have a free 1TB drive, I created a raid 0 of two 500GB drives.
I used the command line from Wikipedia for the first run:
sudo ddrescue -f -n /dev/sdk /dev/md/md_test /home/user/rescue.map
ddrescue already completed this run after approximately 20 hours and more than 7000 read errors.
Now I'm trying to do a second run
sudo ddrescue -d -f -v -r3 /dev/sdk /dev/md/md_test /home/user/rescue.map
and read the non tried blocks but ddrescue gives me this:
GNU ddrescue 1.23
About to copy 1000 GBytes from '/dev/sdk' to '/dev/md/md_test'
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 19584 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 635060 MB, tried: 0 B, bad-sector: 0 B, bad areas: 0
Current status
ipos: 1000 GB, non-trimmed: 0 B, current rate: 0 B/s
opos: 1000 GB, non-scraped: 0 B, average rate: 0 B/s
non-tried: 365109 MB, bad-sector: 0 B, error rate: 0 B/s
rescued: 635060 MB, bad areas: 0, run time: 0s
pct rescued: 63.49%, read errors: 0, remaining time: n/a
time since last successful read: n/a
Copying non-tried blocks... Pass 1 (forwards)
ddrescue: Write error: Invalid argument
I can't figure out what this write errors means, already searched the manual for answers.
Any help is appreciated! Thx!
After a while I found the cause for the write error, the capacity of the corrupt drive is 931,5G but the total capacity of the raid 0 was just 931,3G.
Realized it, while I took a closer look to the output of lsblk command.
So I rebuild the raid 0 array with 3 500G drives and ddrescue now works as expected.

functional_coverage not showing proper result

I have developed a simple uvm testbench to verify a simple adder. I have used functional coverage to monitor the coverage as well. The adder is 8 bit with inputs a and b and the output is c, which is 9 bits.
I have developed the transaction with 8 bits rand logic for a and b.
In sequence, I have run that with repeat(100) and it will randomize and drives a and b to the DUT. The best case for functional coverage for this scenario is (100/256)*100% i.e around 40% assuming that no value will be repeated. I sample the coverage in my scoreboard and get the coverage result in env.
Here are my code snippets
// monitor class
covergroup cg;
a : coverpoint sb_item.a;
b : coverpoint sb_item.b;
endgroup
...
function void write(input input_seq_item i);
sb_item = i;
if(sb_item.c == sb_item.a + sb_item.b)
begin
`uvm_info("SB","OK!",UVM_LOW)
cg.sample();
end
else
`uvm_error("SB",$sformatf("ERROR! %b + %b = %b", sb_item.a, sb_item.b, sb_item.c), UVM_LOW)
endfunction
// env class
...
task run_phase(uvm_phase phase);
sb.cg.stop();
phase.raise_objection(this);
sb.cg.start();
seq.start(sqr);
phase.drop_objection(this);
sb.cg.stop();
`uvm_info("env",$sformatf("The coverage collected is %f",sb.cg.a.get_coverage()),UVM_LOW);
endtask
...
When I run the code, I get coverage of around 81. Results shown below
# KERNEL: UVM_INFO /home/runner/monitor.sv(56) # 996: uvm_test_top.env.sb [SB] OK!
# KERNEL: UVM_INFO /home/runner/env.sv(34) # 996: uvm_test_top.env [env] The coverage collected is 85.937500
# KERNEL: UVM_INFO /home/build/vlib1/vlib/uvm-1.2/src/base/uvm_objection.svh(1271) # 996: reporter [TEST_DONE] 'run' phase is ready to proceed to the 'extract' phase
# KERNEL: UVM_INFO /home/build/vlib1/vlib/uvm-1.2/src/base/uvm_report_server.svh(855) # 996: reporter [UVM/REPORT/SERVER]
# KERNEL: --- UVM Report Summary ---
# KERNEL:
# KERNEL: ** Report counts by severity
# KERNEL: UVM_INFO : 204
# KERNEL: UVM_WARNING : 0
# KERNEL: UVM_ERROR : 0
# KERNEL: UVM_FATAL : 0
# KERNEL: ** Report counts by id
# KERNEL: [Driver] 100
# KERNEL: [RNTST] 1
# KERNEL: [SB] 100
# KERNEL: [TEST_DONE] 1
# KERNEL: [UVM/RELNOTES] 1
# KERNEL: [env] 1
# KERNEL:
# RUNTIME: Info: RUNTIME_0068 uvm_root.svh (521): $finish called.
# KERNEL: Time: 996 ns, Iteration: 61, Instance: /top, Process: #INITIAL#14_0#.
# KERNEL: stopped at time: 996 ns
# VSIM: Simulation has finished. There are no more test vectors to simulate.
exit
# FCOVER: Covergroup Coverage data has been saved to "fcover.acdb" database.
# VSIM: Simulation has finished.
Can anyone explain what mistake I am doing here? Is the coverage cumulative over all runs?
Whether the coverage is cumulative over all runs depends on what you're analyzing. I'm guessing you're analyzing only one simulation, though. Your calculation is correct, the maximum coverage you could get per test is about 40% (basically 40% per each coverpoint, averaged together), but that's highly unlikely to reach.
What you also need to look at (aside from the percentage) is what bins are actually getting created. I don't think you're getting a bin for each value of a or b, but that some of them might be clumped up together (i.e. a in [ 0..3 ] would be one bin and so on, leaving you with 256/4 bins instead of 256). Each coverpoint has an option called auto_bin_max, whose default value is 64. if you set this to 256 or explicitly declare a (range) bin for each value that a or b could take, you'll get a coverage percentage you'd expect.
As a side note, you typically don't create a coverage bin for every value of a data item, since this doesn't really make sense. In a typical device there are so many values the data items could take that you can't verify them all. What you would do, however is declare bins for more "interesting" situations. In your case, interesting values are 0, 8'hff and anything in between. What's also particularly interesting is crossing a and b and checking the combinations, especially the case where a and b are both 8'hff (as that's where your result would overflow on 8 bits and output a carry.

Calculation on GPU leads to driver error "stopped responding"

I have this little nonsense script here which I am executing in MATLAB R2013b:
clear all;
n = 2000;
times = 50;
i = 0;
tCPU = tic;
disp 'CPU::'
A = rand(n, n);
B = rand(n, n);
disp '::Go'
for i = 0:times
CPU = A * B;
end
tCPU = toc(tCPU);
tGPU = tic;
disp 'GPU::'
A = gpuArray(A);
B = gpuArray(B);
disp '::Go'
for i = 0:times
GPU = A * B ;
end
tGPU = toc(tGPU);
fprintf('On CPU: %.2f sec\nOn GPU: %.2f sec\n', tCPU, tGPU);
Unfortunately after execution I receive a message from Windows saying: "Display driver stopped working and has recovered.".
Which I assume means that Windows did not get response from my graphic cards driver or something. The script returned without errors:
>> test
CPU::
::Go
GPU::
::Go
On CPU: 11.01 sec
On GPU: 2.97 sec
But no matter if the GPU runs out of memory or not, MATLAB is not able to use the GPU device before I restarted it. If I don't restart MATLAB I receive just a message from CUDA:
>> test
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT
> In test at 1
CPU::
::Go
GPU::
Error using gpuArray
An unexpected error occurred during CUDA execution.
The CUDA error was:
the launch timed out and was terminated
Error in test (line 21)
A = gpuArray(A);
Does anybody know how to avoid this issue or what I am doing wrong here?
If needed, my GPU Device:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'GeForce GTX 660M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6
ToolkitVersion: 5
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
FreeMemory: 1.9037e+09
MultiprocessorCount: 2
ClockRateKHz: 950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
The key piece of information is this part of the gpuDevice output:
KernelExecutionTimeout: 1
This means that the host display driver is active on the GPU you are running the compute jobs on. The NVIDIA display driver contains a watchdog timer which kills any task which takes more than a predefined amount of time without yielding control back to the driver for screen refresh. This is intended to prevent the situation where a long running or stuck compute job renders the machine unresponsive by freezing the display. The runtime of your Matlab script is clearly exceeding the display driver watchdog timer limit. Once that happens, the the compute context held on the device is destroyed and Matlab can no longer operate with the device. You might be able to reinitialise the context by calling reset, which I guess will run cudaDeviceReset() under the cover.
There is a lot of information about this watchdog timer on the interweb - for example this Stack Overflow question. The solution for how to modify this timeout is dependent on your OS and hardware. The simplest way to avoid this is to not run CUDA code on a display GPU, or increase the granularity of your compute jobs so that no one operation has a runtime which exceeds the timeout limit. Or just write faster code...