I'm trying to implement a DVBS2 (48408, 48600) BCH decoder and I'm having troubles with finding the roots of the locator polynomial. For the Chien search here the author initialises the registers taking into account the fact that it is shortened subtracting 48600 from (2^16 - 1). Why so?
This is the code I have so far:
function [error_locations, errors] = compute_chien_search(n, locator_polynomial, field, alpha_powers, n_max)
t = length(locator_polynomial);
error_locations = zeros(t, 1);
errors = 0;
% Init the registers with the locator polynomial coefficients.
coefficient_buffer = locator_polynomial;
alpha_degrees = uint32(1:t)';
alpha_polynoms = field(alpha_degrees);
alpha_polynoms = [1; alpha_polynoms];
for i = 1:n %
for j = 2:t
coefficient_buffer(j) = gf_mul_elements(coefficient_buffer(j), ...
alpha_polynoms(j), ...
field, alpha_powers, n_max);
end
% Compute locator polynomial at position i
tmp = 0;
for j = 2:t
tmp = bitxor(tmp, coefficient_buffer(j));
end
% Signal the error
if tmp == 1
errors = errors + 1;
error_locations(errors) = n_max - i + 1;
end
end
end
It almost gives me the correct result except for some error locations. For example: for errors made in positions
418 14150 24575 25775 37403
The code above gives me
48183 47718 34451 24026 22826
which after subtracting from 48600 gives:
417 882 14149 24574 25774
which is the position minus 1, except for the 37403, which it did not find.
What am I missing?
Edit:
The code in question is a DVBS2 12 error correcting 48408, 48600 BCH code. The generator polynomial has degree 192 and is given by multiplying the 12 minimal polynomials given on the standard’s documentation.
Update - I created an example C program using Windows | Visual Studio for BCH(48600,48408). On my desktop (Intel 3770K 3.5 ghz, Win 7, VS 2015), encode takes about 30 us, a 12 bit error correction takes about 4.5 ms. On my laptop, (Intel Core i7-10510U up to 4.9 ghz, Win 10, VS 2019), 12 bit error correction takes about 3.0 ms. I used a carryless multiply intrinsic to simplify generating the 192 bit polynomial, but this is a one time generated constant. Encode uses a [256][3] 64 bit unsigned integer polynomial (192 bits) table and decode uses a [256][12] 16 bit unsigned integer syndrome table, to process a byte at a time.
The code includes both Berlekamp Massey and Sugiyama extended Euclid decoders that I copied from existing RS code I have. For BCH (not RS) code, the Berlekamp Massey discrepancy will be zero on odd steps, so for odd steps, the discrepancy is not calculated (the iteration count since last update is incremented, the same as when a calculated discrepancy is zero). I didn't see a significant change in running time, but I left the check in there.
The run times are about the same for BM or Euclid.
https://github.com/jeffareid/misc/blob/master/bch48600.c
I suspect an overflow problem in the case of a failure at bit error index 37403, since it is the only bit index > 2^15-1 (32767). There is this comment on that web site:
This code is great. However, it does not work for the large block sizes in the DVB-S2
specification. For example, it doesn't work with:
n = 16200;
n_max = 65535;
k_max = 65343;
t = 12;
prim_poly = 65581;
The good news is that the problem is easy to fix. Replace all the uint16() functions
in the code with uint32(). You will also have to run the following Matlab function
once. It took several hours for gftable() to complete on my computer.
gftable(16, 65581); (hex 1002D => x^16 + x^5 + x^3 + x^2 + 1)
The Chien search should be looking for values (1/(2^(0))) to (1/(2^(48599))), then zero - log of those values to get offsets relative to the right most bit of the message, and 48599-offset to get indexes relative to the left most bit of the message.
If the coefficients of the error locator polynomial are reversed, then the Chien search would be looking for values 2^(0) to 2^(48599).
https://en.wikipedia.org/wiki/Reciprocal_polynomial
Related
I am trying to reproduce the results from an article. But so far I am not being successful. Here is the code I wrote
EDIT: Based on the initial comments of Zizy Archer, the code has been revised.
clear;
Nmax = 30; % number of rounds
M = 10000; % number of simulations
beta0 = 5*10^-6; % relative clock offset in micro seconds
alpha0 = 1.01; % relative clock skew
for simN = 1:M
for N = 1:Nmax
mean_dly = randi([20 50],N,1).*10^-6; % micro seconds
stdd_dly = randi([1 5],N,1).*10^-6; % micro seconds
XpropDly = normrnd(mean_dly,stdd_dly,N,1); % micro seconds
YpropDly = normrnd(mean_dly,stdd_dly,N,1); % micro seconds
prcssTme = randi([100 500],N,1).*10^-6; % micro seconds
T_1 = (1:N)'*10^-3; % milli seconds
T_2 = T_1 + XpropDly; % milli seconds
T_3 = T_2 + prcssTme; % milli seconds
T_4 = T_3 + YpropDly; % milli seconds
% actual time
T_2act = (T_1 + XpropDly).*alpha0 + beta0;
T_3act = (T_4 - YpropDly).*alpha0 + beta0;
% equation 13
A = sum(T_2act(1:N) + T_3act(1:N));
B = sum(T_1(1:N) + T_4(1:N));
C = sum((T_2act(1:N) + T_3act(1:N)).^2);
D = sum((T_2act(1:N) + T_3act(1:N)).*(T_1(1:N) + T_4(1:N)));
% equation 16
alpha0est(simN,N) = (A.^2-C.*Nmax)./(A.*B-D.*Nmax);
beta0est(simN,N) = (B.*C-A.*D)./(2.*(A.*B-D.*Nmax));
end
timestamps = [T_1 T_2 T_3 T_4];
clear T_*;
end
% equation 29 and 30
MSE_alpha = sum((alpha0est - alpha0).^2)/M;
MSE_beta = sum((beta0est - beta0).^2)/M;
figure %2(a)
semilogy(1:Nmax,MSE_beta.*10^12)
xlabel('N');ylabel('MSE of the estimated offset \beta_{0}')
figure %2(b)
semilogy(1:Nmax,MSE_alpha)
xlabel('N');ylabel('MSE of the estimated skew \alpha_{0}')
But this is what I get:
EDIT2: Snippets were removed.
Can anyone tell me what is wrong with my code?
Thanking you all in advance.
Next time at least try some rudimentary debugging yourself to figure out what could be wrong.
To debug, perhaps print out some variables or plot stuff. Put some conditionals to check if the values are somewhat expected or make no sense. It isn't that hard if you know something is wrong in such a small piece of code (however, if there was no paper you were trying to hit, this bug might have been lurking for a while).
Well, to the step-by-step solution in this case:
What you immediately notice is that if you plot alpha0est or beta0est, your estimate for alpha is systematically too high, at 1.015 instead of 1.01 for the single round case, similar for beta.
Now, what could it be? It obviously isn't noise in signal processing or delays, this one is shown as all this hairy stuff around the mean, you can set all delays to 0 to verify this. So, it has to be something else.
Looking further, you can notice that this systematic error is decreasing when you increase number of rounds performed, and is gone for full 30 rounds.
So, it has to be something with the number of rounds you are doing. Now try setting N = 10 instead of 30, whoa now 10 round case is fine. And there you have your bug. Equation 13 from the paper - there you have N elements summed. Equation 16 similarly multiplies with N. This N obviously has to be the same number. But as it turns out, in your code it isn't. Equation 13 in your code sums ROUNDS cases. Could be 1, could be 30. Equation 16 multiplies with N (=30, always).
All this could be easily avoided if you used saner variable names (all-caps, really?). If you used N for number of rounds performed, and maxN as the limit how many rounds you can try doing at maximum, you would easily get it right.
I noticed that for example the log(x) function returns slightly different values when called with vectors of different sizes in MATLAB.
Here is a minimal working example:
x1 = 0.1:0.1:1;
x2 = 0.1:0.1:1.1;
y1 = log(x1);
y2 = log(x2);
d = y1 - y2(1:length(x1));
d(7)
Executing returns:
>> ans =
-1.6653e-16
The behaviour seems to start when the vector becomes greater than 10 entries.
Although the difference is very small, when being scaled by a lot of operations using large vectors, the errors became big enough to notice.
Does anyone have an idea what is happening here?
The differences exist in x1 and x2 and those errors are propagated and potentially accentuated by log.
max(abs(x1 - x2(1:numel(x1))))
% 1.1102e-16
This is due to the inability of floating point number to represent your data exactly. See here for more information.
Per Suever’s answer, this is because for unfathomable reasons, Matlab’s colon operator [start : step : stop] with floating-point step produces non-bit-exact results even when start and step are the same, and only stop is different.
This is wrong, although it’s not unknown: in a blog post from 2006 (search for “Typical MATLAB Pitfall”), Loren notes that : colon operator can suffer from floating-point accuracy issues.
Numpy/Python does it right:
import numpy as np
np.all(np.arange(0.1,1.0+1e-4, 0.1) == np.arange(0.1, 1.1+1e-4, 0.1)[:-1]) # => True
(np.arange(start, stop, step) doesn’t include stop so I use stop+1e-4 above.)
Julia does it right too:
all(collect(0.1 : 0.1 : 1) .== collect(0.1 : 0.1 : 1.1)[1:10]) # => true
Alternative. Here’s a straightforward guess as to what Numpy’s arange is doing, in Matlab:
function y = arange(start, stop, step)
%ARANGE An alternative to Matlab's colon operator
%
% The API for this function follows Numpy's arange [1].
%
% ARANGE(START, STOP, STEP) produces evenly-spaced values within the half-open
% interval [START, STOP). The resulting vector has CEIL((STOP - START) / STEP)
% elements and is roughly equivalent to (START : STEP : STOP - STEP / 2), but
% may differ from this COLON-based version due to numerical differences.
%
% ARANGE(START, STOP) assumes STEP of 1.0.
%
% [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html
if nargin < 3 || isempty(step)
step = 1.0;
end
len = ceil((stop - start) / step);
y = start + (0 : len - 1) * step;
This function tries to keep things exact until the last possible moment, when it applies the scaling by step and shifting by start. With this, your original two vectors are bit-exact over their shared interval:
y1 = arange(0.1, 1.0 + 1e-4, 0.1);
y2 = arange(0.1, 1.1 + 1e-4, 0.1);
all(y2(1:numel(y1)) == y1) % => 1
And therefore all downstream operations like log are also bit-exact.
I will investigate whether this bug in Matlab is causing any problems in our internal code and check if we should enforce using linspace (which I believe, but have not checked, does not suffer as much from accuracy issues) or something like arange above instead of : for floating-point steps. (arange also can be tricky because, as the docs note, depending on (stop-start)/step, you may get a vector whose last element is greater than stop sometimes—those same docs also recommend using linspace with non-unit steps.)
I'm working on some Matlab code to perform something called the Index Calculus attack on a given cryptosystem (this involves calculating discrete log values), and I've gotten it all done except for one small thing. I cant figure out (in Matlab) how to solve a linear system of congruences mod p, where p is not prime. Also, this system has more than one variable, so, unless I'm missing something, the Chinese remainder theorem wont work.
I asked a question on the mathematics stackexchange with more detail/formatted mathjax here. I solved the issue in my question at that link, and now I'm attempting to find a utility that will allow me to solve the system of congruences modulo a non-prime. I did find a suite that includes a solver supporting modular arithmetic, but the modulus must be prime (here). I also tried stepping through to modify it to work with non-primes, but whatever method is used doesn't work, because it requires all elements of the system have inverses modulo p.
I've looked into using the ability in Matlab to call MuPAD functions, but from my testing, the MuPAD function linsolve (which seemed to be the best candidate) doesn't support non-prime modulus values either. Additionally, I've verified with Maple that this system is solvable modulo my integer of interest (8), so it can be done.
To be more specific, this is the exact command I'm trying to run in MuPAD:
linsolve([0*x + 5*y + 4*z + q = 2946321, x + 7*y + 2*q = 5851213, 8*x + y + 2*q = 2563617, 10*x + 5*y + z = 10670279],[x,y,z,q], Domain = Dom::IntegerMod(8))
Error: expecting 'Domain=R', where R is a domain of category 'Cat::Field' [linsolve]
The same command returns correct values if I change the domain to IntegerMod(23) and IntegerMod(59407), so I believe 8 is unsuitable because it's not prime. Here is the output when I try the above command with each 23 and 59407 as my domain:
[x = 1 mod 23, y = 1 mod 23, z = 12 mod 23, q = 14 mod 23]
[x = 14087 mod 59407, y = 1 mod 59407, z = 14365 mod 59407, q = 37320 mod 59407]
These answers are correct- x, y, z, and q correspond to L1, L2, L3, and L4 in the system of congruences located at my Math.StackExchange link above.
I'm wondering if you tried to use sym/linsolve and sym/solve previously, but may have passed in numeric rather than symbolic values. For example, this returns nonsense in terms of what you're looking for:
A = [0 5 4 1;1 7 0 2;8 1 0 2;10 5 1 0];
b = [2946321;5851213;2563617;10670279];
s = mod(linsolve(A,b),8)
But if you convert the numeric values to symbolic integers, sym/linsolve will keep everything in terms of rational fractions. Then
s = mod(linsolve(sym(A),sym(b)),8)
returns the expected answer
s =
6
1
6
4
This just solves the system linear system using symbolic math as if it were a normal matrix. For large systems this can be expensive, but I'd imagine no more than using MuPAD's numeric::linsolve or linalg::matlinsolve. sym/mod should return the modulus of the numerator of each solution component. I believe that you will get an error if the modulus and the denominator are not at least coprime.
sym/solve can also be used to solve this in a similar manner:
L = sym('L',[4,1]);
[L1,L2,L3,L4] = solve(A*L==b);
s = mod([L1;L2;L3;L4],8)
A possible issue with using either sym/solve or sym/linsolve is that if there are multiple solutions to the linear congruence problem (as opposed to the linear system), this approach may not return all of them.
Finally, using the MuPAD function numlib::ichrem (chinese remainder theorem for integers), here's some code that attempts to obtain the complete solution:
A = [0 5 4 1;1 7 0 2;8 1 0 2;10 5 1 0];
b = [2946321;5851213;2563617;10670279];
m = 10930888;
mf = str2num(strrep(char(factor(sym(m))),'*',' '));
A = sym(A);
b = sym(b);
s = sym(zeros(length(b),length(mf)));
for i = 1:length(mf)
s(:,i) = mod(linsolve(A,b),mf(i));
end
mstr = ['[' sprintf('%d,',mf)];
mstr(end) = ']';
r = sym(zeros(length(b),1));
for i = 1:length(b)
sstr = char(s(i,:));
r(i) = feval(symengine,'numlib::ichrem',sstr(9:end-2),mstr);
end
check = isequal(mod(A*r,m),b)
I'm not sure if any of this is what you're looking for, but hopefully it might be helpful. I think that it might be a good idea to put in a enhancement/service request with the MathWorks so that MuPAD and the other solvers can handle systems better in the future.
I'm using PTX from matlab to call CUDA kernels, when testing the code on VS 2010 like this:
int TPB = 256;
int BPG = (Nx + TPB -1 ) / TPB;
dim3 blk(TPB,TPB,1);
dim3 grid(BPG ,BPG,1);
grad<<< grid,blk>>>(dev_y,dev_x,dev_b,dev_t,Nx,Ny);
trying to use the same configuration in matlab
TPB = 16;
BPG = floor((Nx + TPB -1 ) / TPB);
grad = parallel.gpu.CUDAKernel('reg.ptx','reg.cu','grad');
grad.ThreadBlockSize=[TPB TPB 1];
grad.GridSize = [BPG BPG];
knowning it's more than 512 thread per block which is the allowed number for my TESLA C1060, and I was right
Invalid size input to kernel ThreadBlockSize. You must provide a vector of up to 3 positive integers whose product is <= 512. The maximum value in each dimension is: [512,512,64].
any explanation why it's run correctly on VS 2010 without error like what happened in MATLAB?
The C++ code segment is not checking for errors after grad<<<>>>. The MATLAB wrapper has additional error checking. The launch configuration is out of bounds. Calling cudaGetLastError after the <<<>>> will report the launch configuration error.
I am currently using the Toolbox Graph on the Matlab File Exchange to calculate curvature on 3D surfaces and find them very helpful (http://www.mathworks.com/matlabcentral/fileexchange/5355). However, the following error message is issued in “compute_curvature” for certain surface descriptions and the code fails to run completely:
> Error in ==> compute_curvature_mod at 75
> dp = sum( normal(:,E(:,1)) .* normal(:,E(:,2)), 1 );
> ??? Index exceeds matrix dimensions.
This happens only sporadically, but there is no obvious reason why the toolbox works perfectly fine for some surfaces and not for others (of a similar topology). I also noticed that someone had asked about this bug back in November 2009 on File Exchange, but that the question had gone unanswered. The post states
"compute_curvature will generate an error on line 75 ("dp = sum(
normal(:,E(:,1)) .* normal(:,E(:,2)), 1 );") for SOME surfaces. The
error stems from E containing indices that are out of range which is
caused by line 48 ("A = sparse(double(i),double(j),s,n,n);") where A's
values eventually entirely make up the E matrix. The problem occurs
when the i and j vectors create the same ordered pair twice in which
case the sparse function adds the two s vector elements together for
that matrix location resulting in a value that is too large to be used
as an index on line 75. For example, if i = [1 1] and j = [2 2] and s
= [3 4] then A(1,2) will equal 3 + 4 = 7.
The i and j vectors are created here:
i = [face(1,:) face(2,:) face(3,:)];
j = [face(2,:) face(3,:) face(1,:)];
Just wanted to add that the error I mentioned is caused by the
flipping of the sign of the surface normal of just one face by
rearranging the order of the vertices in the face matrix"
I have tried debugging the code myself but have not had any luck. I am wondering if anyone here has solved the problem or could give me insight – I need the code to be sufficiently general-purpose in order to calculate curvature for a variety of surfaces, not just for a select few.
The November 2009 bug report on File Exchange traces the problem back to the behavior of sparse:
S = SPARSE(i,j,s,m,n,nzmax) uses the rows of [i,j,s] to generate an
m-by-n sparse matrix with space allocated for nzmax nonzeros. The
two integer index vectors, i and j, and the real or complex entries
vector, s, all have the same length, nnz, which is the number of
nonzeros in the resulting sparse matrix S . Any elements of s
which have duplicate values of i and j are added together.
The lines of code where the problem originates are here:
i = [face(1,:) face(2,:) face(3,:)];
j = [face(2,:) face(3,:) face(1,:)];
s = [1:m 1:m 1:m];
A = sparse(i,j,s,n,n);
Based on this information removal of the repeat indices, presumably using unique or similar, might solve the problem:
[B,I,J] = unique([i.' j.'],'rows');
i = B(:,1).';
j = B(:,2).';
s = s(I);
The full solution may look something like this:
i = [face(1,:) face(2,:) face(3,:)];
j = [face(2,:) face(3,:) face(1,:)];
s = [1:m 1:m 1:m];
[B,I,J] = unique([i.' j.'],'rows');
i = B(:,1).';
j = B(:,2).';
s = s(I);
A = sparse(i,j,s,n,n);
Since I do not have a detailed understanding of the algorithm it is hard to tell whether the removal of entries will have a negative effect.