different thread blocks definition - matlab

I'm using PTX from matlab to call CUDA kernels, when testing the code on VS 2010 like this:
int TPB = 256;
int BPG = (Nx + TPB -1 ) / TPB;
dim3 blk(TPB,TPB,1);
dim3 grid(BPG ,BPG,1);
grad<<< grid,blk>>>(dev_y,dev_x,dev_b,dev_t,Nx,Ny);
trying to use the same configuration in matlab
TPB = 16;
BPG = floor((Nx + TPB -1 ) / TPB);
grad = parallel.gpu.CUDAKernel('reg.ptx','reg.cu','grad');
grad.ThreadBlockSize=[TPB TPB 1];
grad.GridSize = [BPG BPG];
knowning it's more than 512 thread per block which is the allowed number for my TESLA C1060, and I was right
Invalid size input to kernel ThreadBlockSize. You must provide a vector of up to 3 positive integers whose product is <= 512. The maximum value in each dimension is: [512,512,64].
any explanation why it's run correctly on VS 2010 without error like what happened in MATLAB?

The C++ code segment is not checking for errors after grad<<<>>>. The MATLAB wrapper has additional error checking. The launch configuration is out of bounds. Calling cudaGetLastError after the <<<>>> will report the launch configuration error.

Related

Matlab - Chien Search implementation initialisation

I'm trying to implement a DVBS2 (48408, 48600) BCH decoder and I'm having troubles with finding the roots of the locator polynomial. For the Chien search here the author initialises the registers taking into account the fact that it is shortened subtracting 48600 from (2^16 - 1). Why so?
This is the code I have so far:
function [error_locations, errors] = compute_chien_search(n, locator_polynomial, field, alpha_powers, n_max)
t = length(locator_polynomial);
error_locations = zeros(t, 1);
errors = 0;
% Init the registers with the locator polynomial coefficients.
coefficient_buffer = locator_polynomial;
alpha_degrees = uint32(1:t)';
alpha_polynoms = field(alpha_degrees);
alpha_polynoms = [1; alpha_polynoms];
for i = 1:n %
for j = 2:t
coefficient_buffer(j) = gf_mul_elements(coefficient_buffer(j), ...
alpha_polynoms(j), ...
field, alpha_powers, n_max);
end
% Compute locator polynomial at position i
tmp = 0;
for j = 2:t
tmp = bitxor(tmp, coefficient_buffer(j));
end
% Signal the error
if tmp == 1
errors = errors + 1;
error_locations(errors) = n_max - i + 1;
end
end
end
It almost gives me the correct result except for some error locations. For example: for errors made in positions
418 14150 24575 25775 37403
The code above gives me
48183 47718 34451 24026 22826
which after subtracting from 48600 gives:
417 882 14149 24574 25774
which is the position minus 1, except for the 37403, which it did not find.
What am I missing?
Edit:
The code in question is a DVBS2 12 error correcting 48408, 48600 BCH code. The generator polynomial has degree 192 and is given by multiplying the 12 minimal polynomials given on the standard’s documentation.
Update - I created an example C program using Windows | Visual Studio for BCH(48600,48408). On my desktop (Intel 3770K 3.5 ghz, Win 7, VS 2015), encode takes about 30 us, a 12 bit error correction takes about 4.5 ms. On my laptop, (Intel Core i7-10510U up to 4.9 ghz, Win 10, VS 2019), 12 bit error correction takes about 3.0 ms. I used a carryless multiply intrinsic to simplify generating the 192 bit polynomial, but this is a one time generated constant. Encode uses a [256][3] 64 bit unsigned integer polynomial (192 bits) table and decode uses a [256][12] 16 bit unsigned integer syndrome table, to process a byte at a time.
The code includes both Berlekamp Massey and Sugiyama extended Euclid decoders that I copied from existing RS code I have. For BCH (not RS) code, the Berlekamp Massey discrepancy will be zero on odd steps, so for odd steps, the discrepancy is not calculated (the iteration count since last update is incremented, the same as when a calculated discrepancy is zero). I didn't see a significant change in running time, but I left the check in there.
The run times are about the same for BM or Euclid.
https://github.com/jeffareid/misc/blob/master/bch48600.c
I suspect an overflow problem in the case of a failure at bit error index 37403, since it is the only bit index > 2^15-1 (32767). There is this comment on that web site:
This code is great. However, it does not work for the large block sizes in the DVB-S2
specification. For example, it doesn't work with:
n = 16200;
n_max = 65535;
k_max = 65343;
t = 12;
prim_poly = 65581;
The good news is that the problem is easy to fix. Replace all the uint16() functions
in the code with uint32(). You will also have to run the following Matlab function
once. It took several hours for gftable() to complete on my computer.
gftable(16, 65581); (hex 1002D => x^16 + x^5 + x^3 + x^2 + 1)
The Chien search should be looking for values (1/(2^(0))) to (1/(2^(48599))), then zero - log of those values to get offsets relative to the right most bit of the message, and 48599-offset to get indexes relative to the left most bit of the message.
If the coefficients of the error locator polynomial are reversed, then the Chien search would be looking for values 2^(0) to 2^(48599).
https://en.wikipedia.org/wiki/Reciprocal_polynomial

Error with matrix indices in heat flow simulation

When I run this code that I've written to simulate a heat flow model in MATLAB i get an error that says 'Subscript indices must either be real positive integers or logicals.' I think this is probably something to do with my linspace command generating a different type of variable not integers and so it's not working properly but I'm not sure how to amend my script to correct for this.
Cp = 400;
p = 8960;
k = 400;
a = k/(p*Cp);
dt = 0.01;
dx = sqrt(5*a*dt); %% 5 as 1/5 is smaller than 1/4 for stability
T = zeros(20000,10000);
for x = linspace(1,10000,10000);
T(x,:) = 1000;
end
for x = linspace(10001,20000,10000);
T(x,:) = 25;
end
for t = linspace(1,10000,10000);
for x = linspace(1,20000,20000);
T(x,t+1) = T(x,t)+a*dt*((T(x-1,t)-2*T(x,t)+ T(x+1,t))/(dx*dx));
end
end
The line that blows up is:
T(x,t+1) = T(x,t)+a*dt*((T(x-1,t)-2*T(x,t)+ T(x+1,t))/(dx*dx));
Specifically T(x-1,t) triggers the error because x starts as 1, hence x - 1 = 0 and 0 is not a valid index.
On a more general Matlab coding note, I would write x = 1:10000 instead of x = linspace(1,10000,10000), but this is not causing the error. Note that I'm only addressing the Matlab error message. I have no idea whether your overall code works.

Matlab: mix-up of built-in and external functions

Matlab (2015a) is behaving weirdly: a number of builtin functions are not responding as expected. For instance, typing
ttest([1 2], [1 2])
results in
Error using size
Dimension argument must be a positive integer scalar within indexing range.
Error in nanstd (line 59)
tile(dim) = size(x,dim);
Error in ttest (line 132)
sdpop = nanstd(x,[],dim);
If I do a which for each of these functions:
which size
which nanstd
which ttest
I get, respetively:
built-in (C:\Program Files\MATLAB\R2015a\toolbox\matlab\elmat\size)
C:\Program Files\MATLAB\R2015a\toolbox\stats\eml\nanstd.m
C:\Program Files\MATLAB\R2015a\toolbox\stats\stats\ttest.m
Each of these files looks fine, except that size.m has each one of its rows commented out.
What could be the problem here?
Perhaps related to your problem:
ttest for R2013a makes the following call:
sdpop = nanstd(x,[],dim);
The helpfile for R2013a version of nanstd states:
Y = nanstd(X,FLAG,DIM) takes the standard deviation along dimension DIM of X.
On the other hand, nanstd in the 2005 nansuite package downloaded off Mathworks file exchange states:
FORMAT: Y = nanstd(X,DIM,FLAG)
Notice how DIM and FLAG are reversed!
If I call R2013a's ttest such that it makes a call to the old, 2005 nansuite function nanstd, Matlab generates an error similar to yours:
Error using size
Dimension argument must be a positive integer scalar within indexing range.
Error in nanmean (line 46)
count = size(x,dim) - sum(nans,dim);
Error in nanstd (line 54)
avg = nanmean(x,dim);
Error in ttest (line 132)
sdpop = nanstd(x,[],dim);
If [] is passed as DIM instead of FLAG, then nanstd's call to size(x, DIM) triggers an error because [] is not a positive integer scalar. If something like this is the cause, the broader question is, what's going on with your Matlab install or setup or downloads or whatever such that you're calling archaic code? Or why is that archaic code even around? I don't know at what point in Matlab's release history that nanstd(x, FLAG, DIM) became supported (instead of simply nanstd(x, DIM))?
Archive: below is my old answer which misdiagnosed your problem
Both of your sample vectors x and y are the same (i.e. [1,2]). The estimated variance of the difference is 0, and all your stats are going to blow up with NaN.
Do the stats step by step, and it will be clear what's going on.
x = [1; 2]; % Data you used in the example.
y = [1; 2]; % Data you used in the example.
z = x - y; % Your call to ttest tests whether this vector is different from zero at a statistically significant level.
Now we do all the stats on z
r.n = length(z);
r.mu = mean(z);
r.standard_error = sqrt(var(z,1) / (r.n-1)); % For your data, this will be zero since z is constant!
r.t = r.mu ./ r.standard_error; % For your data, this will be inf because dividing by zero!
r.df = r.n - 1;
r.pvals(r.t >= 0) = 2 * (1 - tcdf(r.t(r.t>=0), r.df)); % For your data, tcdf returns NaN and this all fails...
r.pvals(r.t < 0) = 2 * tcdf(r.t(r.t<0), r.df);
etc...
This should match a call to
[h, p, ci, stats] = ttest(x-y);

Error:Maximum variable size allowed by the program is exceeded. while using sub2ind

Please suggest how to sort out this issue:
nNodes = 50400;
adj = sparse(nNodes,nNodes);
adj(sub2ind([nNodes nNodes], ind, ind + 1)) = 1; %ind is a vector of indices
??? Maximum variable size allowed by the program is exceeded.
I think the problem is 32/64-bit related. If you have a 32 bit processor, you can address at most
2^32 = 4.294967296e+09
elements. If you have a 64-bit processor, this number increases to
2^64 = 9.223372036854776e+18
Unfortunately, for reasons that are at best vague to me, Matlab does not use this full range. To find out the actual range used by Matlab, issue the following command:
[~,maxSize] = computer
On a 32-bit system, this gives
>> [~,maxSize] = computer
maxSize =
2.147483647000000e+09
>> log2(maxSize)
ans =
3.099999999932819e+01
and on a 64-bit system, it gives
>> [~,maxSize] = computer
maxSize =
2.814749767106550e+14
>> log2(maxSize)
ans =
47.999999999999993
So apparently, on a 32-bit system, Matlab only uses 31 bits to address elements, which gives you the upper limit.
If anyone can clarify why Matlab only uses 31 bits on a 32-bit system, and only 48 bits on a 64-bit system, that'd be awesome :)
Internally, Matlab always uses linear indices to access elements in an array (it probably just uses a C-style array or so), which implies for your adj matrix that its final element is
finEl = nNodes*nNodes = 2.54016e+09
This, unfortunately, is larger than the maximum addressable with 31 bits. Therefore, on the 32-bit system,
>> adj(end) = 1;
??? Maximum variable size allowed by the program is exceeded.
while this command poses no problem at all on the 64-bit system.
You'll have to use a workaround on a 32-bit system:
nNodes = 50400;
% split sparse array up into 4 pieces
adj{1,1} = sparse(nNodes/2,nNodes/2); adj{1,2} = sparse(nNodes/2,nNodes/2);
adj{2,1} = sparse(nNodes/2,nNodes/2); adj{2,2} = sparse(nNodes/2,nNodes/2);
% assign or index values to HUGE sparse arrays
function ret = indHuge(mat, inds, vals)
% get size of cell
sz = size(mat);
% return current values when not given new values
if nargin < 3
% I have to leave this up to you...
% otherwise, assign new values
else
% I have to leave this up to you...
end
end
% now initialize desired elements to 1
adj = indHuge(adj, sub2ind([nNodes nNodes], ind, ind + 1), 1);
I just had the idea to cast all this into a proper class, so that you can use much more intuitive syntax...but that's a whole lot more than I have time for now :)
adj = sparse(ind, ind + 1, ones(size(ind)), nNodes, nNodes, length(ind));
This worked fine...
And, if we have to access the last element of the sparse matrix, we can access by adj(nNodes, nNodes), but adj(nNodes * nNodes) throws error.

OpenCV 2.3 camera calibration

I'm trying to use OpenCV 2.3 python bindings to calibrate a camera. I've used the data below in matlab and the calibration worked, but I can't seem to get it to work in OpenCV. The camera matrix I setup as an initial guess is very close to the answer calculated from the matlab toolbox.
import cv2
import numpy as np
obj_points = [[-9.7,3.0,4.5],[-11.1,0.5,3.1],[-8.5,0.9,2.4],[-5.8,4.4,2.7],[-4.8,1.5,0.2],[-6.7,-1.6,-0.4],[-8.7,-3.3,-0.6],[-4.3,-1.2,-2.4],[-12.4,-2.3,0.9], [-14.1,-3.8,-0.6],[-18.9,2.9,2.9],[-14.6,2.3,4.6],[-16.0,0.8,3.0],[-18.9,-0.1,0.3], [-16.3,-1.7,0.5],[-18.6,-2.7,-2.2]]
img_points = [[993.0,623.0],[942.0,705.0],[1023.0,720.0],[1116.0,645.0],[1136.0,764.0],[1071.0,847.0],[1003.0,885.0],[1142.0,887.0],[886.0,816.0],[827.0,883.0],[710.0,636.0],[837.0,621.0],[789.0,688.0],[699.0,759.0],[768.0,800.0],[697.0,873.0]]
obj_points = np.array(obj_points)
img_points = np.array(img_points)
w = 1680
h = 1050
size = (w,h)
camera_matrix = np.zeros((3, 3))
camera_matrix[0,0]= 2200.0
camera_matrix[1,1]= 2200.0
camera_matrix[2,2]=1.0
camera_matrix[2,0]=750.0
camera_matrix[2,1]=750.0
dist_coefs = np.zeros(4)
results = cv2.calibrateCamera(obj_points, img_points,size,
camera_matrix, dist_coefs)
First off, your camera matrix is wrong. If you read the documentation, it should look like:
fx 0 cx
0 fy cy
0 0 1
If you look at yours, you've got it the wrong way round:
fx 0 0
0 fy 0
cx cy 1
So first, set camera_matrix to camera_matrix.T (or change how you construct camera_matrix. Remember that camera_matrix[i,j] is row i, column j).
camera_matrix = camera_matrix.T
Next, I ran your code and I see that "can't seem to get it to work" means the following error (by the way - always say what you mean by "can't seem to get it to work" in your questions - if it's an error, post the error. If it runs but gives you weirdo numbers, say so):
OpenCV Error: Assertion failed (ni >= 0) in collectCalibrationData, file /home/cha66i/Downloads/OpenCV-2.3.1/modules/calib3d/src/calibration.cpp, line 3161
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
cv2.error: /home/cha66i/Downloads/OpenCV-2.3.1/modules/calib3d/src/calibration.cpp:3161: error: (-215) ni >= 0 in function collectCalibrationData
I then read the documentation (very useful by the way) and noticed that obj_points and img_points have to be vectors of vectors, because it is possible to feed in sets of object/image points for multiple images of the same chessboard(/calibration points).
Hence:
cv2.calibrateCamera([obj_points], [img_points],size, camera_matrix, dist_coefs)
What? I still get the same error?!
Then, I had a look at the OpenCV python2 samples (in the folder OpenCV-2.x.x/samples/python2), and noticed a calibration.py showing me how to use the calibration functions (never underestimate the samples, they're often better than the documentation!).
I tried to run calibration.py but it doesn't run because it doesn't supply the camera_matrix and distCoeffs arguments, which are necessary. So I modified it to feed in a dummy camera_matrix and distCoeffs, and hey, it works!
The only difference I can see between my obj_points/img_points and theirs, is that theirs has dtype=float32, while mine doesn't.
So, I change my obj_points and img_points to also have dtype float32 (the python2 interface to OpenCV is funny like that; often functions don't work when matrices don't have a dtype):
obj_points = obj_points.astype('float32')
img_points = img_points.astype('float32')
Then I try again:
>>> cv2.calibrateCamera([obj_points], [img_points],size, camera_matrix, dist_coefs)
OpenCV Error: Bad argument
(For non-planar calibration rigs the initial intrinsic matrix must be specified)
in cvCalibrateCamera2, file ....
What?! A different error at least. But I did supply an initial intrinsic matrix!
So I go back to the documentation, and notice the flags parameter:
flags – Different flags that may be zero or a combination of the
following values:
CV_CALIB_USE_INTRINSIC_GUESS cameraMatrix contains valid initial
values of fx, fy, cx, cy that are optimized further
...
Aha, so I have to tell the function explicitly to use the initial guesses I provided:
cv2.calibrateCamera([obj_points], [img_points],size, camera_matrix.T, dist_coefs,
flags=cv2.CALIB_USE_INTRINSIC_GUESS)
Hurrah! It works!
(Moral of the story - read the OpenCV documentation carefully, and use the newest version (i.e. on opencv.itseez.com) if you're using the Python cv2 interface. Also, consult the examples in the samples/python2 directory to supplement the documentation. With these two things you should be able to work out most problems.)
After the help from mathematical.coffee I have got this 3d calibration to run.
import cv2
from cv2 import cv
import numpy as np
obj_points = [[-9.7,3.0,4.5],[-11.1,0.5,3.1],[-8.5,0.9,2.4],[-5.8,4.4,2.7],[-4.8,1.5,0.2],[-6.7,-1.6,-0.4],[-8.7,-3.3,-0.6],[-4.3,-1.2,-2.4],[-12.4,-2.3,0.9],[-14.1,-3.8,-0.6],[-18.9,2.9,2.9],[-14.6,2.3,4.6],[-16.0,0.8,3.0],[-18.9,-0.1,0.3],[-16.3,-1.7,0.5],[-18.6,-2.7,-2.2]]
img_points = [[993.0,623.0],[942.0,705.0],[1023.0,720.0],[1116.0,645.0],[1136.0,764.0],[1071.0,847.0],[1003.0,885.0],[1142.0,887.0],[886.0,816.0],[827.0,883.0],[710.0,636.0],[837.0,621.0],[789.0,688.0],[699.0,759.0],[768.0,800.0],[697.0,873.0]]
obj_points = np.array(obj_points,'float32')
img_points = np.array(img_points,'float32')
w = 1680
h = 1050
size = (w,h)
camera_matrix = np.zeros((3, 3),'float32')
camera_matrix[0,0]= 2200.0
camera_matrix[1,1]= 2200.0
camera_matrix[2,2]=1.0
camera_matrix[0,2]=750.0
camera_matrix[1,2]=750.0
dist_coefs = np.zeros(4,'float32')
retval,camera_matrix,dist_coefs,rvecs,tvecs = cv2.calibrateCamera([obj_points],[img_points],size,camera_matrix,dist_coefs,flags=cv.CV_CALIB_USE_INTRINSIC_GUESS)
The only problem I have now is why is the dist_coefs vector is 5 elements long when returned from the calibration function. the documentation says " if the vector contains four elements, it means that K3=0". But in fact K3 is is used, no matter the length of dist_coefs (4 or 5). Furthermore I can't seem to get flag CV_CALIB_FIX_K3 to work, tied to use that flag to force K3 to be zero. cashes saying an integer is required. this could be because I don't know how to do multiple flags at once, I'm just doing this, flags = (cv.CV..., cv.CV...).
Just to compare, from the matlab camera cal routine the results are...
Focal length: 2210. 2207.
principal point: 781. 738.
Distortions: 4.65e-2 -9.74e+0 3.9e-3 6.74e-3 0.0e+0
Rotation vector: 2.36 0.178 -0.131
Translation vector: 16.016 2.527 69.549
From this code,
Focal length: 1647. 1629.
principal point: 761. 711.
Distortions: -2.3e-1 2.0e+1 1.4e-2 -9.5e-2 -172e+2
Rotation vector: 2.357 0.199 -0.193
Translation vector: 16.511 3.307 48.946
I think if I could figure out how to force k3=0, the rest of the values would align right up.
For what it is worth, the following code snippet currently works under 2.4.6.1:
pattern_size = (16, 12)
pattern_points = np.zeros( (np.prod(pattern_size), 3), np.float32)
pattern_points[:, :2] = np.indices(pattern_size).T.reshape(-1, 2).astype(np.float32)
img_points = pattern_points[:, :2] * 2 + np.array([40, 30], np.float32)
print(cv2.calibrateCamera([pattern_points], [img_points], (400, 400), flags=cv2.CALIB_USE_INTRINSIC_GUESS))
Note that camera_matrix and dist_coefs are not needed.
Make dist_coeffs vector as 5 dimensional zero vector and then use CV_CALIB_FIX_K3 flag. You can see that last element in the vector (K3) will be zero.
When it comes to using multiple flags, you can OR them.
Example : cv.CV_CALIB_USE_INTRINSIC_GUESS | cv.CV_CALIB_FIX_K3
Use Point3f and Point2f instead of Point3d and Point2d to define object_points and image_points and it will work.