cula use of culaSgels - wrong argument? - lapack

I am trying to use the culaSgels function in order to solve Ax=B.
I modified the systemSolve example of the cula package.
void culaFloatExample()
{
int N=2;
int NRHS = 2;
int i,j;
double cula_time,start_time,end_time;
culaStatus status;
culaFloat* A = NULL;
culaFloat* B = NULL;
culaFloat* X = NULL;
culaFloat one = 1.0f;
culaFloat thresh = 1e-6f;
culaFloat diff;
printf("Allocating Matrices\n");
A = (culaFloat*)malloc(N*N*sizeof(culaFloat));
B = (culaFloat*)malloc(N*N*sizeof(culaFloat));
X = (culaFloat*)malloc(N*N*sizeof(culaFloat));
if(!A || !B )
exit(EXIT_FAILURE);
printf("Initializing CULA\n");
status = culaInitialize();
checkStatus(status);
// Set A
A[0]=1;
A[1]=2;
A[2]=3;
A[3]=4;
// Set B
B[0]=5;
B[1]=6;
B[2]=2;
B[3]=3;
printf("Calling culaSgels\n");
// Run CULA's version
start_time = getHighResolutionTime();
status = culaSgels('N',N,N, NRHS, A, N, A, N);
end_time = getHighResolutionTime();
cula_time = end_time - start_time;
checkStatus(status);
printf("Verifying Result\n");
for(i = 0; i < N; ++i){
for (j=0;j<N;j++)
{
diff = X[i+j*N] - B[i+j*N];
if(diff < 0.0f)
diff = -diff;
if(diff > thresh)
printf("\nResult check failed: X[%d]=%f B[%d]=%f\n", i, X[i+j*N],i, B[i+j*N]);
printf("\nResults:X= %f \t B= %f:\n",X[i+j*N],B[i+j*N]);
}
}
printRuntime(cula_time);
printf("Shutting down CULA\n\n");
culaShutdown();
free(A);
free(B);
}
I am using culaSgels('N',N,N, NRHS, A, N, A, N); to solve the system but :
1) The results show me that every element of X=0 , but B is right.
Also , it shows me the
Result check failed message
2) Studying the reference manual ,it says that one argument before the last argument (the A I have) ,should be the matrix B stored columnwised,but if I use "B" instead of "A" as parameter ,then I am not getting the correct B matrix.

Ok,code needs 3 things to work.
1) Change A to B ,so culaSgels('N',N,N, NRHS, A, N, B, N);
(I misunderstood that at exit B contains the solution)
2) Because CULA uses column major change A,B matrices accordingly.
3) Change to :
B = (culaFloat*)malloc(N*NRHS*sizeof(culaFloat));
X = (culaFloat*)malloc(N*NRHS*sizeof(culaFloat));
(use NHRS and not N which is the same in this example)
Thanks!

Related

How to emulate *really simple* variable bit shifts with SSE?

I have two variable bit-shifting code fragments that I want to SSE-vectorize by some means:
1) a = 1 << b (where b = 0..7 exactly), i.e. 0/1/2/3/4/5/6/7 -> 1/2/4/8/16/32/64/128/256
2) a = 1 << (8 * b) (where b = 0..7 exactly), i.e. 0/1/2/3/4/5/6/7 -> 1/0x100/0x10000/etc
OK, I know that AMD's XOP VPSHLQ would do this, as would AVX2's VPSHLQ. But my challenge here is whether this can be achieved on 'normal' (i.e. up to SSE4.2) SSE.
So, is there some funky SSE-family opcode sequence that will achieve the effect of either of these code fragments? These only need yield the listed output values for the specific input values (0-7).
Update: here's my attempt at 1), based on Peter Cordes' suggestion of using the floating point exponent to do simple variable bitshifting:
#include <stdint.h>
typedef union
{
int32_t i;
float f;
} uSpec;
void do_pow2(uint64_t *in_array, uint64_t *out_array, int num_loops)
{
uSpec u;
for (int i=0; i<num_loops; i++)
{
int32_t x = *(int32_t *)&in_array[i];
u.i = (127 + x) << 23;
int32_t r = (int32_t) u.f;
out_array[i] = r;
}
}

Extracting data from a matlab struct in mex

I'm following this example but I'm not sure what I missed. Specifically, I have this struct in MATLAB:
a = struct; a.one = 1.0; a.two = 2.0; a.three = 3.0; a.four = 4.0;
And this is my test code in MEX ---
First, I wanted to make sure that I'm passing in the right thing, so I did this check:
int nfields = mxGetNumberOfFields(prhs[0]);
mexPrintf("nfields =%i \n\n", nfields);
And it does yield 4, since I have four fields.
However, when I tried to extract the value in field three:
tmp = mxGetField(prhs[0], 0, "three");
mexPrintf("data =%f \n\n", (double *)mxGetData(tmp) );
It returns data =1.000000. I'm not sure what I did wrong. My logic is that I want to get the first element (hence index is 0) of the field three, so I expected data =3.00000.
Can I get a pointer or a hint?
EDITED
Ok, since you didn't provide your full code but you are working on a test, let's try to make a new one from scratch.
On Matlab side, use the following code:
a.one = 1;
a.two = 2;
a.three = 3;
a.four = 4;
read_struct(a);
Now, create and compile the MEX read_struct function as follows:
#include "mex.h"
void read_struct(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
if (nrhs != 1)
mexErrMsgTxt("One input argument required.");
/* Let's check if the input is a struct... */
if (!mxIsStruct(prhs[0]))
mexErrMsgTxt("The input must be a structure.");
int ne = mxGetNumberOfElements(prhs[0]);
int nf = mxGetNumberOfFields(prhs[0]);
mexPrintf("The structure contains %i elements and %i fields.\n", ne, nf);
mwIndex i;
mwIndex j;
mxArray *mxValue;
double *value;
for (i = 0; i < nf; ++i)
{
for (j = 0; j < ne; ++j)
{
mxValue = mxGetFieldByNumber(prhs[0], j, i);
value = mxGetPr(mxValue);
mexPrintf("Field %s(%d) = %.1f\n", mxGetFieldNameByNumber(prhs[0],i), j, value[0]);
}
}
return;
}
Does this correctly prints your structure?

Eigen: how can I substitute matrix positive values with 1 and 0 otherwise?

I want to write the following matlab code in Eigen (where K is pxp and W is pxb):
H = (K*W)>0;
However the only thing that I came up so far is:
H = ((K*W.array() > 0).select(1,0));
This code doesn't work as explained here, but replacing 0 with VectorXd::Constant(p,0) (as suggested in the link question) generates a runtime error:
Eigen::internal::variable_if_dynamic<T, Value>::variable_if_dynamic(T) [with T = long int; int Value = 1]: Assertion `v == T(Value)' failed.
How can I solve this?
You don't need .select(). You just need to cast an array of bool to an array of H's component type.
H = ((K * W).array() > 0.0).cast<double>();
Your original attempt failed because the size of your constant 1/0 array is not match with the size of H. Using VectorXd::Constant is not a good choice when H is MatrixXd. You also have a problem with parentheses. I think you want * rather than .* in matlab notation.
#include <iostream>
#include <Eigen/Eigen>
using namespace Eigen;
int main() {
const int p = 5;
const int b = 10;
MatrixXd H(p, b), K(p, p), W(p, b);
K.setRandom();
W.setRandom();
H = ((K * W).array() > 0.0).cast<double>();
std::cout << H << std::endl << std::endl;
H = ((K * W).array() > 0).select(MatrixXd::Constant(p, b, 1),
MatrixXd::Constant(p, b, 0));
std::cout << H << std::endl;
return 0;
}
When calling a template member function in a template, you need to use the template keyword.
#include <iostream>
#include <Eigen/Eigen>
using namespace Eigen;
template<typename Mat, typename Vec>
void createHashTable(const Mat &K, Eigen::MatrixXi &H, Mat &W, int b) {
Mat CK = K;
H = ((CK * W).array() > 0.0).template cast<int>();
}
int main() {
const int p = 5;
const int b = 10;
Eigen::MatrixXi H(p, b);
Eigen::MatrixXf W(p, b), K(p, p);
K.setRandom();
W.setRandom();
createHashTable<Eigen::MatrixXf, Eigen::VectorXf>(K, H, W, b);
std::cout << H << std::endl;
return 0;
}
See this for some explanation.
Issue casting C++ Eigen::Matrix types via templates

FFTW with MEX and MATLAB argument issues

I wrote the following C/MEX code using the FFTW library to control the number of threads used for a FFT computation from MATLAB. The code works great (complex FFT forward and backward) with the FFTW_ESTIMATE argument in the planner although it is slower than MATLAB. But, when I switch to the FFTW_MEASURE argument to tune up the FFTW planner, it turns out that applying one FFT forward and then one FFT backward does not return the initial image. Instead, the image is scaled by a factor. Using FFTW_PATIENT gives me an even worse result with null matrices.
My code is as follows:
Matlab functions:
FFT forward:
function Y = fftNmx(X,NumCPU)
if nargin < 2
NumCPU = maxNumCompThreads;
disp('Warning: Use the max maxNumCompThreads');
end
Y = FFTN_mx(X,NumCPU)./numel(X);
FFT backward:
function Y = ifftNmx(X,NumCPU)
if nargin < 2
NumCPU = maxNumCompThreads;
disp('Warning: Use the max maxNumCompThreads');
end
Y = iFFTN_mx(X,NumCPU);
Mex functions:
FFT forward:
# include <string.h>
# include <stdlib.h>
# include <stdio.h>
# include <mex.h>
# include <matrix.h>
# include <math.h>
# include </home/nicolas/Code/C/lib/include/fftw3.h>
char *Wisfile = NULL;
char *Wistemplate = "%s/.fftwis";
#define WISLEN 8
void set_wisfile(void)
{
char *home;
if (Wisfile) return;
home = getenv("HOME");
Wisfile = (char *)malloc(strlen(home) + WISLEN + 1);
sprintf(Wisfile, Wistemplate, home);
}
fftw_plan CreatePlan(int NumDims, int N[], double *XReal, double *XImag, double *YReal, double *YImag)
{
fftw_plan Plan;
fftw_iodim Dim[NumDims];
int k, NumEl;
FILE *wisdom;
for(k = 0, NumEl = 1; k < NumDims; k++)
{
Dim[NumDims - k - 1].n = N[k];
Dim[NumDims - k - 1].is = Dim[NumDims - k - 1].os = (k == 0) ? 1 : (N[k-1] * Dim[NumDims-k].is);
NumEl *= N[k];
}
/* Import the wisdom. */
set_wisfile();
wisdom = fopen(Wisfile, "r");
if (wisdom) {
fftw_import_wisdom_from_file(wisdom);
fclose(wisdom);
}
if(!(Plan = fftw_plan_guru_split_dft(NumDims, Dim, 0, NULL, XReal, XImag, YReal, YImag, FFTW_MEASURE *(or FFTW_ESTIMATE respectively)* )))
mexErrMsgTxt("FFTW3 failed to create plan.");
/* Save the wisdom. */
wisdom = fopen(Wisfile, "w");
if (wisdom) {
fftw_export_wisdom_to_file(wisdom);
fclose(wisdom);
}
return Plan;
}
void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[] )
{
#define B_OUT plhs[0]
int k, numCPU, NumDims;
const mwSize *N;
double *pr, *pi, *pr2, *pi2;
static long MatLeng = 0;
fftw_iodim Dim[NumDims];
fftw_plan PlanForward;
int NumEl = 1;
int *N2;
if (nrhs != 2) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Two input argument required.");
}
if (!mxIsDouble(prhs[0])) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Array must be double");
}
numCPU = (int) mxGetScalar(prhs[1]);
if (numCPU > 8) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"NumOfThreads < 8 requested");
}
if (!mxIsComplex(prhs[0])) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Array must be complex");
}
NumDims = mxGetNumberOfDimensions(prhs[0]);
N = mxGetDimensions(prhs[0]);
N2 = (int*) mxMalloc( sizeof(int) * NumDims);
for(k=0;k<NumDims;k++) {
NumEl *= NumEl * N[k];
N2[k] = N[k];
}
pr = (double *) mxGetPr(prhs[0]);
pi = (double *) mxGetPi(prhs[0]);
//B_OUT = mxCreateNumericArray(NumDims, N, mxDOUBLE_CLASS, mxCOMPLEX);
B_OUT = mxCreateNumericMatrix(0, 0, mxDOUBLE_CLASS, mxCOMPLEX);
mxSetDimensions(B_OUT , N, NumDims);
mxSetData(B_OUT , (double* ) mxMalloc( sizeof(double) * mxGetNumberOfElements(prhs[0]) ));
mxSetImagData(B_OUT , (double* ) mxMalloc( sizeof(double) * mxGetNumberOfElements(prhs[0]) ));
pr2 = (double* ) mxGetPr(B_OUT);
pi2 = (double* ) mxGetPi(B_OUT);
fftw_init_threads();
fftw_plan_with_nthreads(numCPU);
PlanForward = CreatePlan(NumDims, N2, pr, pi, pr2, pi2);
fftw_execute_split_dft(PlanForward, pr, pi, pr2, pi2);
fftw_destroy_plan(PlanForward);
fftw_cleanup_threads();
}
FFT backward:
This MEX function differs from the above only in switching pointers pr <-> pi, and pr2 <-> pi2 in the CreatePlan function and in the execution of the plan, as suggested in the FFTW documentation.
If I run
A = imread('cameraman.tif');
>> A = double(A) + i*double(A);
>> B = fftNmx(A,8);
>> C = ifftNmx(B,8);
>> figure,imagesc(real(C))
with the FFTW_MEASURE and FFTW_ESTIMATE arguments respectively I get this result.
I wonder if this is due to an error in my code or in the library. I tried different thing around the wisdom, saving not saving. Using the wisdom produce by the FFTW standalone tool to produce wisdom. I haven't seen any improvement. Can anyone suggest why this is happening?
Additional information:
I compile the MEX code using static libraries:
mex FFTN_Meas_mx.cpp /home/nicolas/Code/C/lib/lib/libfftw3.a /home/nicolas/Code/C/lib/lib/libfftw3_threads.a -lm
The FFTW library hasn't been compiled with:
./configure CFLAGS="-fPIC" --prefix=/home/nicolas/Code/C/lib --enable-sse2 --enable-threads --&& make && make install
I tried different flags without success. I am using MATLAB 2011b on a Linux 64-bit station (AMD opteron quad core).
FFTW computes not normalized transform, see here:
http://www.fftw.org/doc/What-FFTW-Really-Computes.html
Roughly speaking, when you perform direct transform followed by inverse one, you get
back the input (plus round-off errors) multiplied by the length of your data.
When you create a plan using flags other than FFTW_ESTIMATE, your input is overwritten:
http://www.fftw.org/doc/Planner-Flags.html

Bottoms-up mergesort problems!

I am having problems with bottoms-up mergesort. I have problems sorting/merging. Current code includes:
public void mergeSort(long[] a, int len) {
long[] temp = new long[a.length];
int length = 1;
while (length < len) {
mergepass(a, temp, length, len);
length *= 2;
}
}
public void mergepass(long[] a, long[] temp, int blocksize, int len) {
int k = 0;
int i = 1;
while(i <= (len/blocksize)){
if(blocksize == 1){break;}
int min = a.length;
for(int j = 0; j < blocksize; j++){
if(a[i*j] < min){
temp[k++] = a[i*j];
count++;
}
else{
temp[k++] = a[(i*j)+1];
count++;
}
}
for(int n = 0; n < this.a.length; n++){
a[n] = temp[n];
}
}
}
Obvious problems:
i is never incremented.
At no point do you compare two elements in the array. (Is that what if(a[i*j] < min) is supposed to be doing? I can't tell.)
Why are you multiplying i and j?
What's this.a.length?
Style problems:
mergeSort() takes len as an argument, even though arrays have an implicit length. To make matters worse, the function also uses a.length and length.
Generally poor variable names.
Nitpicks:
If you're going to make a second array of the same size, it is common to make one the "source" and the other the "destination" and swap them between passes, instead of sorting into a temporary array and copying them back again.