where is cula "culaSgesv" answer for X? - lapack

I just downloaded Cula and I want to use it's implemented functions for solving system of linear equation I looked into Examples Directory and I saw below code but it's very confusing when they want to obtain X solution of A*X=B they just copy B in X and since A is identity diagonal matrix so the answer IS, "B" and in this line of code nothing happens
status = culaSgesv(N, NRHS, A, N, IPIV, X, N);
(changing X to B didn't help!)
would you please tell me whats going on? Please tell me how can I get the answer "X" from this?
if anyone need any further information please just tell me.
#ifdef CULA_PREMIUM
void culaDoubleExample()
{
#ifdef NDEBUG
int N = 4096;
#else
int N = 780;
#endif
int NRHS = 1;
int i;
culaStatus status;
culaDouble* A = NULL;
culaDouble* B = NULL;
culaDouble* X = NULL;
culaInt* IPIV = NULL;
culaDouble one = 1.0;
culaDouble thresh = 1e-6;
culaDouble diff;
printf("-------------------\n");
printf(" DGESV\n");
printf("-------------------\n");
printf("Allocating Matrices\n");
A = (culaDouble*)malloc(N*N*sizeof(culaDouble));
B = (culaDouble*)malloc(N*sizeof(culaDouble));
X = (culaDouble*)malloc(N*sizeof(culaDouble));
IPIV = (culaInt*)malloc(N*sizeof(culaInt));
if(!A || !B || !IPIV)
exit(EXIT_FAILURE);
printf("Initializing CULA\n");
status = culaInitialize();
checkStatus(status);
// Set A to the identity matrix
memset(A, 0, N*N*sizeof(culaDouble));
for(i = 0; i < N; ++i)
A[i*N+i] = one;
// Set B to a random matrix (see note at top)
for(i = 0; i < N; ++i)
B[i] = (culaDouble)rand();
memcpy(X, B, N*sizeof(culaDouble));
memset(IPIV, 0, N*sizeof(culaInt));
printf("Calling culaDgesv\n");
DWORD dw1 = GetTickCount();
status = culaDgesv(N, NRHS, A, N, IPIV, X, N);
DWORD dw2 = GetTickCount();
cout<<"Time difference is "<<(dw2-dw1)<<" milliSeconds"<<endl;
if(status == culaInsufficientComputeCapability)
{
printf("No Double precision support available, skipping example\n");
free(A);
free(B);
free(IPIV);
culaShutdown();
return;
}
checkStatus(status);
printf("Verifying Result\n");
for(i = 0; i < N; ++i)
{
diff = X[i] - B[i];
if(diff < 0.0)
diff = -diff;
if(diff > thresh)
printf("Result check failed: i=%d X[i]=%f B[i]=%f", i, X[i], B[i]);
}
printf("Shutting down CULA\n\n");
culaShutdown();
free(A);
free(B);
free(IPIV);
}

You mention Sgesv but the sample code you have shown is for Dgesv. Nevertheless, the answer is the same.
According to the Netlib LAPACK reference, the B matrix of RHS vectors is passed to the function as the 6th parameter:
[in,out] B
B is DOUBLE PRECISION array, dimension (LDB,NRHS)
On entry, the N-by-NRHS matrix of right hand side matrix B.
On exit, if INFO = 0, the N-by-NRHS solution matrix X.
And the X matrix is returned in the same parameter location. So B when passed to the function contains the NxNRHS right-hand-side vectors, and the same parameter returns the X result.
In the code you have shown, they are actually passing a variable called X and after the result is returned (in the same variable X) they are comparing it against a variable named B which is perhaps confusing, but the concept is the same.
Since the A matrix in the example is the identity matrix, the correct solution of Ax = b is x=b
For the general case, you should pass your B matrix of RHS vectors in the 6th parameter location. Upon completion of the function, the result (X) will be returned in the same parameter.

Related

Finding smallest value for parameterised answer that satisfies condition

I want to find the smallest integer l that satisfies l^2 >= x, and mod(l,2)=0.
In the following example x=75, and hence l=10, since the previous even number doesn't fulfil the inequation: 8^2 <= 75 <= 10^2
I have tried this (ignoring the even-number requirement, which I can't to work):
syms l integer
eqn1 = l^2 >= 75;
% eqn2 = mod(l,2) == 0;
[sol_l, param, cond] = solve(eqn1, l, 'ReturnConditions', true);
But this does not give me anything helpful directly:
sol_l =
k
param =
k
cond =
(75^(1/2) <= k | k <= -75^(1/2)) & in(k, 'integer')
I would like to evaluate the conditions on the parameter and find the smallest value that satisfies the conditions.
Also, I would like to enforce the mod(l,2)=0 condition somehow, but I don't seem to get that work.
Using the solve for this task is like using a cannon to kill a mosquito. Actually, the answer of Lidia Parrilla is good and fast, although it can be simplified as follows:
l = ceil(sqrt(x));
if (mod(x,2) ~= 0)
l = l + 1;
end
% if x = 75, then l = 10
But I would like to point out something that no one else noticed. The condition provided by the solve function for l^2 >= 75 is:
75^(1/2) <= k | k <= -75^(1/2)
and it's absolutely correct. Since l is being raised to the power of 2, and since a negative number raised to the power of 2 produces a positive number, the equation will always have two distinct solutions: a negative one and a positive one.
For x = 75, the solutions will be l = 10 and l = -10. So, if you want to find the smallest number (and a negative number is always smaller than a positive one), the right solution will be:
l = ceil(sqrt(x));
if (mod(x,2) ~= 0)
l = l + 1;
end
l = l * -1;
If you want to return both solutions, the result will be:
l_pos = ceil(sqrt(x));
if (mod(x,2) ~= 0)
l_pos = l_pos + 1;
end
l_neg = l_pos * -1;
l = [l_neg l_pos];
I guess the easiest solution without employing the inequality and solve function would be to find the exact solution to your equation l^2 >= x, and then finding the next even integer. The code would look like this:
x = 75;
y = ceil(sqrt(x)); %Ceil finds the next bigger integer
if(~mod(y,2)) %If it's even, we got the solution
sol = y;
else %If not, get the next integer
sol = y+1;
end
The previous code gives the correct solution to the provided example (x = 75; sol = 10)

Determine if matrix A is subset of matrix B

For a matrix such as
A = [...
12 34 67;
90 78 15;
10 71 24];
how could we determine efficiently if it is subset of a larger matrix?
B = [...
12 34 67; % found
89 67 45;
90 78 15; % found
10 71 24; % found, so A is subset of B.
54 34 11];
Here are conditions:
all numbers are integers
matrices are so large, i.e., row# > 100000, column# may vary from 1 to 10 (same for A and B).
Edit:
It seems that ismember for the case of this question, when called only few times works just fine. My initial impression was due to previous experiences where ismember was being invoked many times inside a nested loop resulting in the worst performance.
clear all; clc
n = 200000;
k = 10;
B = randi(n,n,k);
f = randperm(n);
A = B(f(1:1000),:);
tic
assert(sum(ismember(A,B,'rows')) == size(A,1));
toc
tic
assert(all(any(all(bsxfun(#eq,B,permute(A,[3,2,1])),2),1))); %user2999345
toc
which results in:
Elapsed time is 1.088552 seconds.
Elapsed time is 12.154969 seconds.
Here are more benchmarks:
clear all; clc
n = 20000;
f = randperm(n);
k = 10;
t1 = 0;
t2 = 0;
t3 = 0;
for i=1:7
B = randi(n,n,k);
A = B(f(1:n/10),:);
%A(100,2) = 0; % to make A not submat of B
tic
b = sum(ismember(A,B,'rows')) == size(A,1);
t1 = t1+toc;
assert(b);
tic
b = ismember_mex(A,sortrows(B));
t2 = t2+toc;
assert(b);
tic
b = issubmat(A,B);
t3 = t3+toc;
assert(b);
end
George's skm's
ismember | ismember_mex | issubmat
n=20000,k=10 0.6326 0.1064 11.6899
n=1000,k=100 0.2652 0.0155 0.0577
n=1000,k=1000 1.1705 0.1582 0.2202
n=1000,k=10000 13.2470 2.0033 2.6367
*issubmat eats RAM when n or k is over 10000!
*issubmat(A,B), A is being checked as submat of B.
It seems that ismember is hard to beat, at least using MATLAB code. I created a C implementation which can be used using the MEX compiler.
#include "mex.h"
#if MX_API_VER < 0x07030000
typedef int mwIndex;
typedef int mwSize;
#endif /* MX_API_VER */
#include <math.h>
#include <stdlib.h>
#include <string.h>
int ismember(const double *y, const double *x, int yrow, int xrow, int ncol);
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
mwSize xcol, ycol, xrow, yrow;
/* output data */
int* result;
/* arguments */
const mxArray* y;
const mxArray* x;
if (nrhs != 2)
{
mexErrMsgTxt("2 input required.");
}
y = prhs[0];
x = prhs[1];
ycol = mxGetN(y);
yrow = mxGetM(y);
xcol = mxGetN(x);
xrow = mxGetM(x);
/* The first input must be a sparse matrix. */
if (!mxIsDouble(y) || !mxIsDouble(x))
{
mexErrMsgTxt("Input must be of type 'double'.");
}
if (xcol != ycol)
{
mexErrMsgTxt("Inputs must have the same number of columns");
}
plhs[0] = mxCreateLogicalMatrix(1, 1);
result = mxGetPr(plhs[0]);
*result = ismember(mxGetPr(y), mxGetPr(x), yrow, xrow, ycol);
}
int ismemberinner(const double *y, int idx, const double *x, int yrow, int xrow, int ncol) {
int from, to, i;
from = 0;
to = xrow-1;
for(i = 0; i < ncol; ++i) {
// Perform binary search
double yi = *(y + i * yrow + idx);
double *curx = x + i * xrow;
int l = from;
int u = to;
while(l <= u) {
int mididx = l + (u-l)/2;
if(yi < curx[mididx]) {
u = mididx-1;
}
else if(yi > curx[mididx]) {
l = mididx+1;
}
else {
// This can be further optimized by performing additional binary searches
for(from = mididx; from > l && curx[from-1] == yi; --from);
for(to = mididx; to < u && curx[to+1] == yi; ++to);
break;
}
}
if(l > u) {
return 0;
}
}
return 1;
}
int ismember(const double *y, const double *x, int yrow, int xrow, int ncol) {
int i;
for(i = 0; i < yrow; ++i) {
if(!ismemberinner(y, i, x, yrow, xrow, ncol)) {
return 0;
}
}
return 1;
}
Compile it using:
mex -O ismember_mex.c
It can be called as follows:
ismember_mex(x, sortrows(x))
First of all, it assumes that the columns of the matrices have the same size. It works by first sorting the rows of the larger matrix (x in this case, the second argument to the function). Then, a type of binary search is employed to identify whether the rows of the smaller matrix (y hereafter) are contained in x. This is done for each row of y separately (see ismember C function).
For a given row of y, it starts from the first entry and finds the range of indices (using the from and to variables) that match with the first column of x using binary search. This is repeated for the remaining entries, unless some value is not found, in which case it terminates and returns 0.
I tried implementing it this idea in MATLAB, but it didn't work that well. Regarding performance, I found that: (a) in case there are mismatches, it is usually much faster than ismember (b) in case the range of values in x and y is large, it is again faster than ismember, and (c) in case everything matches and the number of possible values in x and y is small (e.g. less than 1000), then ismember may be faster in some situations.
Finally, I want to point out that some parts of the C implementation may be further optimized.
EDIT 1
I fixed the warnings and further improved the function.
#include "mex.h"
#include <math.h>
#include <stdlib.h>
#include <string.h>
int ismember(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol);
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
unsigned int xcol, ycol, nrowx, nrowy;
/* arguments */
const mxArray* y;
const mxArray* x;
if (nrhs != 2)
{
mexErrMsgTxt("2 inputs required.");
}
y = prhs[0];
x = prhs[1];
ycol = (unsigned int) mxGetN(y);
nrowy = (unsigned int) mxGetM(y);
xcol = (unsigned int) mxGetN(x);
nrowx = (unsigned int) mxGetM(x);
/* The first input must be a sparse matrix. */
if (!mxIsDouble(y) || !mxIsDouble(x))
{
mexErrMsgTxt("Input must be of type 'double'.");
}
if (xcol != ycol)
{
mexErrMsgTxt("Inputs must have the same number of columns");
}
plhs[0] = mxCreateLogicalScalar(ismember(mxGetPr(y), mxGetPr(x), nrowy, nrowx, ycol));
}
int ismemberinner(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol) {
unsigned int from = 0, to = nrowx-1, i;
for(i = 0; i < ncol; ++i) {
// Perform binary search
const double yi = *(y + i * nrowy);
const double *curx = x + i * nrowx;
unsigned int l = from;
unsigned int u = to;
while(l <= u) {
const unsigned int mididx = l + (u-l)/2;
const double midx = curx[mididx];
if(yi < midx) {
u = mididx-1;
}
else if(yi > midx) {
l = mididx+1;
}
else {
{
// Binary search to identify smallest index of x that equals yi
// Equivalent to for(from = mididx; from > l && curx[from-1] == yi; --from)
unsigned int limit = mididx;
while(curx[from] != yi) {
const unsigned int mididx = from + (limit-from)/2;
if(curx[mididx] < yi) {
from = mididx+1;
}
else {
limit = mididx-1;
}
}
}
{
// Binary search to identify largest index of x that equals yi
// Equivalent to for(to = mididx; to < u && curx[to+1] == yi; ++to);
unsigned int limit = mididx;
while(curx[to] != yi) {
const unsigned int mididx = limit + (to-limit)/2;
if(curx[mididx] > yi) {
to = mididx-1;
}
else {
limit = mididx+1;
}
}
}
break;
}
}
if(l > u) {
return 0;
}
}
return 1;
}
int ismember(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol) {
unsigned int i;
for(i = 0; i < nrowy; ++i) {
if(!ismemberinner(y + i, x, nrowy, nrowx, ncol)) {
return 0;
}
}
return 1;
}
Using this version I wasn't able to identify any case where ismember is faster. Also, I noticed that one reason ismember is hard to beat is that it uses all cores of the machine! Of course, the function I provided can be optimized to do this too, but this requires much more effort.
Finally, before using my implementation I would advise you to do extensive testing. I did some testing and it seems to work, but I suggest you also do some additional testing.
For small matrices ismember should be enough, probably.
Usage: ismember(B,A,'rows')
ans =
1
0
1
1
0
I put this answer here, emphasizing on a need to solutions with higher performance. I will accept this answer only if there was no better solution.
Using ismember, if a row of A appears twice in B while another one is missing, might wrongly indicate that A is a member of B. The following solution is suitable if the rows of A and B doesn't need to be in the same order. However, I haven't tested its performance for large matrices.
A = [...
34 12 67;
90 78 15;
10 71 24];
B = [...
34 12 67; % found
89 67 45;
90 78 15; % found
10 71 24; % found, so A is subset of B.
54 34 11];
A = permute(A,[3 2 1]);
rowIdx = all(bsxfun(#eq,B,A),2);
colIdx = any(rowIdx,1);
isAMemberB = all(colIdx);
You have said number of columns <= 10. In addition, if the matrix elements are all integers representable as bytes, you could code each row into a two 64 bit integers. That would reduce the number of comparisons by a factor of 64.
For the general case, the following may not be all that much better for thin matrices, but scales very well as the matrices get fat due to the level 3 multiplication:
function yes = is_submat(A,B)
ma = size(A, 1);
mb = size(B, 1);
n = size(B, 2);
yes = false;
if ma >= mb
a = A(:,1);
b = B(:,1);
D = (0 == bsxfun(#minus, a, b'));
q = any(D, 2);
yes = all(any(D,1));
if yes && (n > 1)
A = A(q, :);
C = B*A';
za = sum(A.*A, 2);
zb = sum(B.*B, 2);
Z = sqrt(zb)*sqrt(za');
[~, ix] = max(C./Z, [], 2);
A = A(ix,:);
yes = all(A(:) == B(:));
end
end
end
In the above, I use the fact that the dot product is maximized when two unit vectors are equal.
For fat matrices (say 5000+ columns) with large numbers of unique elements the performance beats ismember quite handily, but otherwise, it is slower than ismember. For thin matrices ismember is faster by an order of magnitude.
Best case test for this function:
A = randi(50000, [10000, 10000]);
B = A(2:3:end, :);
B = B(randperm(size(B,1)),:);
fprintf('%s: %u\n', 'Number of columns', size(A,2));
fprintf('%s: %u\n', 'Element spread', 50000);
tic; is_submat(A,B); toc;
tic; all(ismember(B,A,'rows')); toc;
fprintf('________\n\n');
is_submat_test;
Number of columns: 10000
Element spread: 50000
Elapsed time is 10.713310 seconds (is_submat).
Elapsed time is 17.446682 seconds (ismember).
So I have to admit, all round ismember seems to be much better.
Edits: Edited to correct bug when there is only one column - fixing this also results in more efficient code. Also previous version did not distinguish between positive and negative numbers. Added timing tests.

Best Way to Add 3 Numbers (or 4, or N) in Java - Kahan Sums?

I found a completely different answer to this question, the whole original question makes no sense anymore. However, the answer way be useful, so I modify it a bit...
I want to sum up three double numbers, say a, b, and c, in the most numerically stable way possible.
I think using a Kahan Sum would be the way to go.
However, a strange thought occured to me: Would it make sense to:
First sum up a, b, and c and remember the (absolute value of the) compensation.
Then sum up a, c, b
If the (absolute value of the) compensation of the second sum is smaller, use this sum instead.
Proceed similar with b, a, c and other permutations of the numbers.
Return the sum with the smallest associated absolute compensation.
Would I get a more "stable" Addition of three numbers this way? Or does the order of numbers in the sum have no (use-able) impact on the compensation left at the end of the Summation? With (use-able) I mean to ask whether the compensation value itself is stable enough to contain Information that I can use?
(I am using the Java programming language, although I think this does not matter here.)
Many thanks,
Thomas.
I think I found a much more reliable way to solve the "Add 3" (or "Add 4" or "Add N" numbers problem.
First of all, I implemented my idea from the original post. It resulted into quite some big code which seemed, initially, to work. However, it failed in the following case: add Double.MAX_VALUE, 1, and -Double.MAX_VALUE. The result was 0.
#njuffa's comments inspired me dig somewhat deeper and at http://code.activestate.com/recipes/393090-binary-floating-point-summation-accurate-to-full-p/, I found that in Python, this problem has been solved quite nicely. To see the full code, I downloaded the Python source (Python 3.5.1rc1 - 2015-11-23) from https://www.python.org/getit/source/, where we can find the following method (under PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2):
static PyObject*
math_fsum(PyObject *self, PyObject *seq)
{
PyObject *item, *iter, *sum = NULL;
Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
double x, y, t, ps[NUM_PARTIALS], *p = ps;
double xsave, special_sum = 0.0, inf_sum = 0.0;
volatile double hi, yr, lo;
iter = PyObject_GetIter(seq);
if (iter == NULL)
return NULL;
PyFPE_START_PROTECT("fsum", Py_DECREF(iter); return NULL)
for(;;) { /* for x in iterable */
assert(0 <= n && n <= m);
assert((m == NUM_PARTIALS && p == ps) ||
(m > NUM_PARTIALS && p != NULL));
item = PyIter_Next(iter);
if (item == NULL) {
if (PyErr_Occurred())
goto _fsum_error;
break;
}
x = PyFloat_AsDouble(item);
Py_DECREF(item);
if (PyErr_Occurred())
goto _fsum_error;
xsave = x;
for (i = j = 0; j < n; j++) { /* for y in partials */
y = p[j];
if (fabs(x) < fabs(y)) {
t = x; x = y; y = t;
}
hi = x + y;
yr = hi - x;
lo = y - yr;
if (lo != 0.0)
p[i++] = lo;
x = hi;
}
n = i; /* ps[i:] = [x] */
if (x != 0.0) {
if (! Py_IS_FINITE(x)) {
/* a nonfinite x could arise either as
a result of intermediate overflow, or
as a result of a nan or inf in the
summands */
if (Py_IS_FINITE(xsave)) {
PyErr_SetString(PyExc_OverflowError,
"intermediate overflow in fsum");
goto _fsum_error;
}
if (Py_IS_INFINITY(xsave))
inf_sum += xsave;
special_sum += xsave;
/* reset partials */
n = 0;
}
else if (n >= m && _fsum_realloc(&p, n, ps, &m))
goto _fsum_error;
else
p[n++] = x;
}
}
if (special_sum != 0.0) {
if (Py_IS_NAN(inf_sum))
PyErr_SetString(PyExc_ValueError,
"-inf + inf in fsum");
else
sum = PyFloat_FromDouble(special_sum);
goto _fsum_error;
}
hi = 0.0;
if (n > 0) {
hi = p[--n];
/* sum_exact(ps, hi) from the top, stop when the sum becomes
inexact. */
while (n > 0) {
x = hi;
y = p[--n];
assert(fabs(y) < fabs(x));
hi = x + y;
yr = hi - x;
lo = y - yr;
if (lo != 0.0)
break;
}
/* Make half-even rounding work across multiple partials.
Needed so that sum([1e-16, 1, 1e16]) will round-up the last
digit to two instead of down to zero (the 1e-16 makes the 1
slightly closer to two). With a potential 1 ULP rounding
error fixed-up, math.fsum() can guarantee commutativity. */
if (n > 0 && ((lo < 0.0 && p[n-1] < 0.0) ||
(lo > 0.0 && p[n-1] > 0.0))) {
y = lo * 2.0;
x = hi + y;
yr = x - hi;
if (y == yr)
hi = x;
}
}
sum = PyFloat_FromDouble(hi);
_fsum_error:
PyFPE_END_PROTECT(hi)
Py_DECREF(iter);
if (p != ps)
PyMem_Free(p);
return sum;
}
This summation method is different from Kahan's method, it uses a variable number of compensation variables. When adding the ith number, at most i additional compensation variables (stored in the array p) get used. This means if I want to add 3 numbers, I may need 3 additional variables. For 4 numbers, I may need 4 additional variables. Since the number of used variables may increase from n to n+1 only after the nth summand is loaded, I can translate the above code to Java as follows:
/**
* Compute the exact sum of the values in the given array
* {#code summands} while destroying the contents of said array.
*
* #param summands
* the summand array – will be summed up and destroyed
* #return the accurate sum of the elements of {#code summands}
*/
private static final double __destructiveSum(final double[] summands) {
int i, j, n;
double x, y, t, xsave, hi, yr, lo;
boolean ninf, pinf;
n = 0;
lo = 0d;
ninf = pinf = false;
for (double summand : summands) {
xsave = summand;
for (i = j = 0; j < n; j++) {
y = summands[j];
if (Math.abs(summand) < Math.abs(y)) {
t = summand;
summand = y;
y = t;
}
hi = summand + y;
yr = hi - summand;
lo = y - yr;
if (lo != 0.0) {
summands[i++] = lo;
}
summand = hi;
}
n = i; /* ps[i:] = [summand] */
if (summand != 0d) {
if ((summand > Double.NEGATIVE_INFINITY)
&& (summand < Double.POSITIVE_INFINITY)) {
summands[n++] = summand;// all finite, good, continue
} else {
if (xsave <= Double.NEGATIVE_INFINITY) {
if (pinf) {
return Double.NaN;
}
ninf = true;
} else {
if (xsave >= Double.POSITIVE_INFINITY) {
if (ninf) {
return Double.NaN;
}
pinf = true;
} else {
return Double.NaN;
}
}
n = 0;
}
}
}
if (pinf) {
return Double.POSITIVE_INFINITY;
}
if (ninf) {
return Double.NEGATIVE_INFINITY;
}
hi = 0d;
if (n > 0) {
hi = summands[--n];
/*
* sum_exact(ps, hi) from the top, stop when the sum becomes inexact.
*/
while (n > 0) {
x = hi;
y = summands[--n];
hi = x + y;
yr = hi - x;
lo = y - yr;
if (lo != 0d) {
break;
}
}
/*
* Make half-even rounding work across multiple partials. Needed so
* that sum([1e-16, 1, 1e16]) will round-up the last digit to two
* instead of down to zero (the 1e-16 makes the 1 slightly closer to
* two). With a potential 1 ULP rounding error fixed-up, math.fsum()
* can guarantee commutativity.
*/
if ((n > 0) && (((lo < 0d) && (summands[n - 1] < 0d)) || //
((lo > 0d) && (summands[n - 1] > 0d)))) {
y = lo * 2d;
x = hi + y;
yr = x - hi;
if (y == yr) {
hi = x;
}
}
}
return hi;
}
This function will take the array summands and add up the elements while simultaneously using it to store the compensation variables. Since we load the summand at index i before the array element at said index may become used for compensation, this will work.
Since the array will be small if the number of variables to add is small and won't escape the scope of our method, I think there is a decent chance that it will be allocated directly on the stack by the JIT, which may make the code quite fast.
I admit that I did not fully understand why the authors of the original code handled infinities, overflows, and NaNs the way they did. Here my code deviates from the original. (I hope I did not mess it up.)
Either way, I can now sum up 3, 4, or n double numbers by doing:
public static final double add3(final double x0, final double x1,
final double x2) {
return __destructiveSum(new double[] { x0, x1, x2 });
}
public static final double add4(final double x0, final double x1,
final double x2, final double x3) {
return __destructiveSum(new double[] { x0, x1, x2, x3 });
}
If I want to sum up 3 or 4 long numbers and obtain the precise result as double, I will have to deal with the fact that doubles can only represent longs in -9007199254740992..9007199254740992L. But this can easily be done by splitting each long into two parts:
public static final long add3(final long x0, final long x1,
final long x2) {
double lx;
return __destructiveSum(new long[] {new double[] { //
lx = x0, //
(x0 - ((long) lx)), //
lx = x1, //
(x1 - ((long) lx)), //
lx = x2, //
(x2 - ((long) lx)), //
});
}
public static final long add4(final long x0, final long x1,
final long x2, final long x3) {
double lx;
return __destructiveSum(new long[] {new double[] { //
lx = x0, //
(x0 - ((long) lx)), //
lx = x1, //
(x1 - ((long) lx)), //
lx = x2, //
(x2 - ((long) lx)), //
lx = x3, //
(x3 - ((long) lx)), //
});
}
I think this should be about right. At least I can now add Double.MAX_VALUE, 1, and -Double.MAX_VALUE and get 1 as result.

Octave - how to operate with big numbers

I work on RSA algorithm in octave, but it isn't working in proper way. Problem appears while i try to use "^" function. Check my example below:
>> mod((80^65), 133)
terminal gives me:
ans = 0
I cannot fix this stuff, it's funny becouse even my system calculator return correct number (54)
to calculate this in correct way you can use fast power-modulo algorithm.
In c++, check function below where ->
a^b mod m:
int power_modulo_fast(int a, int b, int m)
{
int i;
int result = 1;
int x = a % m;
for (i=1; i<=b; i<<=1)
{
x %= m;
if ((b&i) != 0)
{
result *= x;
result %= m;
}
x *= x;
}
return result;
}

Most Efficient Way of Using mexCallMATLAB in Converting Double* to mxArray*

I am writing a MEX code in which I need to use pinv function. I am trying to find a way to pass the array of type double to pinv using mexCallMATLAB in the most efficient way. Let's for the sake of example say the array is named G and its size is 100.
double *G = (double*) mxMalloc( 100 * sizeof(double) );
where
G[0] = G11; G[1] = G12;
G[2] = G21; G[3] = G22;
Which means every four consecutive elements of G is a 2×2 matrix. G stores 25 different values of this 2×2 matrix.
I should note that these 2×2 matrices are not well-conditioned and they may contain all zero in their element. How can I use pinv function to calculate the pseudoinverse in the elements of G? For example, how can I pass the array to mexCallMATLAB in order to calculate the pseudoinverse of the first 2×2 matrix in G?
I thought of the following approach:
mxArray *G_PINV_input = mxCreateDoubleMatrix(2, 2, mxREAL);
mxArray *G_PINV_output = mxCreateDoubleMatrix(2, 2, mxREAL);
double *G_PINV_input_ptr = mxGetPr(G_PINV_input);
memcpy( G_PINV_input_ptr, &G[0], 4 * sizeof(double));
mexCallMATLAB(1, G_PINV_output, 1, G_PINV_input, "pinv");
I am not sure how good this approach is. Copying the values is not economical at all because the total number of elements in G in my actual application is large. Is there anyway to skip this copying?
Here is my implementation of the MEX-function:
my_pinv.cpp
#include "mex.h"
void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
// validate arguments
if (nrhs!=1 || nlhs>1)
mexErrMsgIdAndTxt("mex:error", "Wrong number of arguments");
if (!mxIsDouble(prhs[0]) || mxIsComplex(prhs[0]) || mxIsSparse(prhs[0]))
mexErrMsgIdAndTxt("mex:error", "Input isnt real dense double array");
if (mxGetNumberOfElements(prhs[0]) != 100)
mexErrMsgIdAndTxt("mex:error", "numel() != 100");
// create necessary arrays
mxArray *rhs[1], *lhs[1];
plhs[0] = mxCreateDoubleMatrix(100, 1, mxREAL);
rhs[0] = mxCreateDoubleMatrix(2, 2, mxREAL);
double *in = mxGetPr(prhs[0]);
double *out = mxGetPr(plhs[0]);
double *x = mxGetPr(rhs[0]), *y;
// for each 2x2 matrix
for (mwIndex i=0; i<100; i+=4) {
// copy 2x2 matrix into rhs
x[0] = in[i+0];
x[2] = in[i+1];
x[1] = in[i+2];
x[3] = in[i+3];
// lhs = pinv(rhs)
mexCallMATLAB(1, lhs, 1, rhs, "pinv");
// copy 2x2 matrix from lhs
y = mxGetPr(lhs[0]);
out[i+0] = y[0];
out[i+1] = y[1];
out[i+2] = y[2];
out[i+3] = y[3];
// free array
mxDestroyArray(lhs[0]);
}
// cleanup
mxDestroyArray(rhs[0]);
}
Here is a baseline implementation in MATLAB so that we can verify the results are correct:
my_pinv0.m
function y = my_pinv0(x)
y = zeros(size(x));
for i=1:4:numel(x)
y(i:i+3) = pinv(x([0 1; 2 3]+i));
end
end
Now we test the MEX-function:
% some vector
x = randn(100,1);
% MEX vs. MATLAB function
y = my_pinv0(x);
yy = my_pinv(x);
% compare
assert(isequal(y,yy))
EDIT:
Here is an another implementation:
my_pinv2.cpp
#include "mex.h"
inline void call_pinv(const double &a, const double &b, const double &c,
const double &d, double *out)
{
mxArray *rhs[1], *lhs[1];
// create input matrix [a b; c d]
rhs[0] = mxCreateDoubleMatrix(2, 2, mxREAL);
double *x = mxGetPr(rhs[0]);
x[0] = a;
x[1] = c;
x[2] = b;
x[3] = d;
// lhs = pinv(rhs)
mexCallMATLAB(1, lhs, 1, rhs, "pinv");
// get values from output matrix
const double *y = mxGetPr(lhs[0]);
out[0] = y[0];
out[1] = y[1];
out[2] = y[2];
out[3] = y[3];
// cleanup
mxDestroyArray(lhs[0]);
mxDestroyArray(rhs[0]);
}
void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
// validate arguments
if (nrhs!=1 || nlhs>1)
mexErrMsgIdAndTxt("mex:error", "Wrong number of arguments");
if (!mxIsDouble(prhs[0]) || mxIsComplex(prhs[0]) || mxIsSparse(prhs[0]))
mexErrMsgIdAndTxt("mex:error", "Input isnt real dense double array");
if (mxGetNumberOfElements(prhs[0]) != 100)
mexErrMsgIdAndTxt("mex:error", "numel() != 100");
// allocate output
plhs[0] = mxCreateDoubleMatrix(100, 1, mxREAL);
double *out = mxGetPr(plhs[0]);
const double *in = mxGetPr(prhs[0]);
// for each 2x2 matrix
for (mwIndex i=0; i<100; i+=4) {
// 2x2 input matrix [a b; c d], and its determinant
const double a = in[i+0];
const double b = in[i+1];
const double c = in[i+2];
const double d = in[i+3];
const double det = (a*d - b*c);
if (det != 0) {
// inverse of 2x2 matrix [d -b; -c a]/det
out[i+0] = d/det;
out[i+1] = -c/det;
out[i+2] = -b/det;
out[i+3] = a/det;
}
else {
// singular matrix, fallback to pseudo-inverse
call_pinv(a, b, c, d, &out[i]);
}
}
}
This time we compute the determinant of the 2x2 matrix, if is non-zero, we calculate the inverse ourselves according to:
Otherwise we fallback to invoking PINV from MATLAB for the pseudo-inverse.
Here is quick benchmark:
% 100x1 vector
x = randn(100,1); % average case, with normal 2x2 matrices
% running time
funcs = {#my_pinv0, #my_pinv1, #my_pinv2};
t = cellfun(#(f) timeit(#() f(x)), funcs, 'Uniform',true);
% compare results
y = cellfun(#(f) f(x), funcs, 'Uniform',false);
assert(isequal(y{1},y{2}))
I get the following timings:
>> fprintf('%.6f\n', t);
0.002111 % MATLAB function
0.001498 % first MEX-file with mexCallMATLAB
0.000010 % second MEX-file with "unrolled" matrix inverse (+ PINV as fallback)
The error is acceptable and within machine precision:
>> norm(y{1}-y{3})
ans =
2.1198e-14
You could also test the worst case, when many of the 2x2 matrices are singular:
x = randi([0 1], [100 1]);
You don't need to allocate the output. Just make the pointer and let pinv create the mxArray automatically.
mxArray *lhs;
Then just use & like,
mexCallMATLAB(1, &lhs, 1, &rhs, "pinv");