I wrote the following C/MEX code using the FFTW library to control the number of threads used for a FFT computation from MATLAB. The code works great (complex FFT forward and backward) with the FFTW_ESTIMATE argument in the planner although it is slower than MATLAB. But, when I switch to the FFTW_MEASURE argument to tune up the FFTW planner, it turns out that applying one FFT forward and then one FFT backward does not return the initial image. Instead, the image is scaled by a factor. Using FFTW_PATIENT gives me an even worse result with null matrices.
My code is as follows:
Matlab functions:
FFT forward:
function Y = fftNmx(X,NumCPU)
if nargin < 2
NumCPU = maxNumCompThreads;
disp('Warning: Use the max maxNumCompThreads');
end
Y = FFTN_mx(X,NumCPU)./numel(X);
FFT backward:
function Y = ifftNmx(X,NumCPU)
if nargin < 2
NumCPU = maxNumCompThreads;
disp('Warning: Use the max maxNumCompThreads');
end
Y = iFFTN_mx(X,NumCPU);
Mex functions:
FFT forward:
# include <string.h>
# include <stdlib.h>
# include <stdio.h>
# include <mex.h>
# include <matrix.h>
# include <math.h>
# include </home/nicolas/Code/C/lib/include/fftw3.h>
char *Wisfile = NULL;
char *Wistemplate = "%s/.fftwis";
#define WISLEN 8
void set_wisfile(void)
{
char *home;
if (Wisfile) return;
home = getenv("HOME");
Wisfile = (char *)malloc(strlen(home) + WISLEN + 1);
sprintf(Wisfile, Wistemplate, home);
}
fftw_plan CreatePlan(int NumDims, int N[], double *XReal, double *XImag, double *YReal, double *YImag)
{
fftw_plan Plan;
fftw_iodim Dim[NumDims];
int k, NumEl;
FILE *wisdom;
for(k = 0, NumEl = 1; k < NumDims; k++)
{
Dim[NumDims - k - 1].n = N[k];
Dim[NumDims - k - 1].is = Dim[NumDims - k - 1].os = (k == 0) ? 1 : (N[k-1] * Dim[NumDims-k].is);
NumEl *= N[k];
}
/* Import the wisdom. */
set_wisfile();
wisdom = fopen(Wisfile, "r");
if (wisdom) {
fftw_import_wisdom_from_file(wisdom);
fclose(wisdom);
}
if(!(Plan = fftw_plan_guru_split_dft(NumDims, Dim, 0, NULL, XReal, XImag, YReal, YImag, FFTW_MEASURE *(or FFTW_ESTIMATE respectively)* )))
mexErrMsgTxt("FFTW3 failed to create plan.");
/* Save the wisdom. */
wisdom = fopen(Wisfile, "w");
if (wisdom) {
fftw_export_wisdom_to_file(wisdom);
fclose(wisdom);
}
return Plan;
}
void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[] )
{
#define B_OUT plhs[0]
int k, numCPU, NumDims;
const mwSize *N;
double *pr, *pi, *pr2, *pi2;
static long MatLeng = 0;
fftw_iodim Dim[NumDims];
fftw_plan PlanForward;
int NumEl = 1;
int *N2;
if (nrhs != 2) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Two input argument required.");
}
if (!mxIsDouble(prhs[0])) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Array must be double");
}
numCPU = (int) mxGetScalar(prhs[1]);
if (numCPU > 8) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"NumOfThreads < 8 requested");
}
if (!mxIsComplex(prhs[0])) {
mexErrMsgIdAndTxt( "MATLAB:FFT2mx:invalidNumInputs",
"Array must be complex");
}
NumDims = mxGetNumberOfDimensions(prhs[0]);
N = mxGetDimensions(prhs[0]);
N2 = (int*) mxMalloc( sizeof(int) * NumDims);
for(k=0;k<NumDims;k++) {
NumEl *= NumEl * N[k];
N2[k] = N[k];
}
pr = (double *) mxGetPr(prhs[0]);
pi = (double *) mxGetPi(prhs[0]);
//B_OUT = mxCreateNumericArray(NumDims, N, mxDOUBLE_CLASS, mxCOMPLEX);
B_OUT = mxCreateNumericMatrix(0, 0, mxDOUBLE_CLASS, mxCOMPLEX);
mxSetDimensions(B_OUT , N, NumDims);
mxSetData(B_OUT , (double* ) mxMalloc( sizeof(double) * mxGetNumberOfElements(prhs[0]) ));
mxSetImagData(B_OUT , (double* ) mxMalloc( sizeof(double) * mxGetNumberOfElements(prhs[0]) ));
pr2 = (double* ) mxGetPr(B_OUT);
pi2 = (double* ) mxGetPi(B_OUT);
fftw_init_threads();
fftw_plan_with_nthreads(numCPU);
PlanForward = CreatePlan(NumDims, N2, pr, pi, pr2, pi2);
fftw_execute_split_dft(PlanForward, pr, pi, pr2, pi2);
fftw_destroy_plan(PlanForward);
fftw_cleanup_threads();
}
FFT backward:
This MEX function differs from the above only in switching pointers pr <-> pi, and pr2 <-> pi2 in the CreatePlan function and in the execution of the plan, as suggested in the FFTW documentation.
If I run
A = imread('cameraman.tif');
>> A = double(A) + i*double(A);
>> B = fftNmx(A,8);
>> C = ifftNmx(B,8);
>> figure,imagesc(real(C))
with the FFTW_MEASURE and FFTW_ESTIMATE arguments respectively I get this result.
I wonder if this is due to an error in my code or in the library. I tried different thing around the wisdom, saving not saving. Using the wisdom produce by the FFTW standalone tool to produce wisdom. I haven't seen any improvement. Can anyone suggest why this is happening?
Additional information:
I compile the MEX code using static libraries:
mex FFTN_Meas_mx.cpp /home/nicolas/Code/C/lib/lib/libfftw3.a /home/nicolas/Code/C/lib/lib/libfftw3_threads.a -lm
The FFTW library hasn't been compiled with:
./configure CFLAGS="-fPIC" --prefix=/home/nicolas/Code/C/lib --enable-sse2 --enable-threads --&& make && make install
I tried different flags without success. I am using MATLAB 2011b on a Linux 64-bit station (AMD opteron quad core).
FFTW computes not normalized transform, see here:
http://www.fftw.org/doc/What-FFTW-Really-Computes.html
Roughly speaking, when you perform direct transform followed by inverse one, you get
back the input (plus round-off errors) multiplied by the length of your data.
When you create a plan using flags other than FFTW_ESTIMATE, your input is overwritten:
http://www.fftw.org/doc/Planner-Flags.html
Related
I have two variable bit-shifting code fragments that I want to SSE-vectorize by some means:
1) a = 1 << b (where b = 0..7 exactly), i.e. 0/1/2/3/4/5/6/7 -> 1/2/4/8/16/32/64/128/256
2) a = 1 << (8 * b) (where b = 0..7 exactly), i.e. 0/1/2/3/4/5/6/7 -> 1/0x100/0x10000/etc
OK, I know that AMD's XOP VPSHLQ would do this, as would AVX2's VPSHLQ. But my challenge here is whether this can be achieved on 'normal' (i.e. up to SSE4.2) SSE.
So, is there some funky SSE-family opcode sequence that will achieve the effect of either of these code fragments? These only need yield the listed output values for the specific input values (0-7).
Update: here's my attempt at 1), based on Peter Cordes' suggestion of using the floating point exponent to do simple variable bitshifting:
#include <stdint.h>
typedef union
{
int32_t i;
float f;
} uSpec;
void do_pow2(uint64_t *in_array, uint64_t *out_array, int num_loops)
{
uSpec u;
for (int i=0; i<num_loops; i++)
{
int32_t x = *(int32_t *)&in_array[i];
u.i = (127 + x) << 23;
int32_t r = (int32_t) u.f;
out_array[i] = r;
}
}
I'm following this example but I'm not sure what I missed. Specifically, I have this struct in MATLAB:
a = struct; a.one = 1.0; a.two = 2.0; a.three = 3.0; a.four = 4.0;
And this is my test code in MEX ---
First, I wanted to make sure that I'm passing in the right thing, so I did this check:
int nfields = mxGetNumberOfFields(prhs[0]);
mexPrintf("nfields =%i \n\n", nfields);
And it does yield 4, since I have four fields.
However, when I tried to extract the value in field three:
tmp = mxGetField(prhs[0], 0, "three");
mexPrintf("data =%f \n\n", (double *)mxGetData(tmp) );
It returns data =1.000000. I'm not sure what I did wrong. My logic is that I want to get the first element (hence index is 0) of the field three, so I expected data =3.00000.
Can I get a pointer or a hint?
EDITED
Ok, since you didn't provide your full code but you are working on a test, let's try to make a new one from scratch.
On Matlab side, use the following code:
a.one = 1;
a.two = 2;
a.three = 3;
a.four = 4;
read_struct(a);
Now, create and compile the MEX read_struct function as follows:
#include "mex.h"
void read_struct(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
if (nrhs != 1)
mexErrMsgTxt("One input argument required.");
/* Let's check if the input is a struct... */
if (!mxIsStruct(prhs[0]))
mexErrMsgTxt("The input must be a structure.");
int ne = mxGetNumberOfElements(prhs[0]);
int nf = mxGetNumberOfFields(prhs[0]);
mexPrintf("The structure contains %i elements and %i fields.\n", ne, nf);
mwIndex i;
mwIndex j;
mxArray *mxValue;
double *value;
for (i = 0; i < nf; ++i)
{
for (j = 0; j < ne; ++j)
{
mxValue = mxGetFieldByNumber(prhs[0], j, i);
value = mxGetPr(mxValue);
mexPrintf("Field %s(%d) = %.1f\n", mxGetFieldNameByNumber(prhs[0],i), j, value[0]);
}
}
return;
}
Does this correctly prints your structure?
In order to create a MEX function and use it in my MATLAB code, like this:
[pow,index] = mx_minimum_power(A11,A12,A13,A22,A23,A33);
I've created the file mx_minimum_power.cpp and written the following code in it:
#include <math.h>
#include <complex>
#include "mex.h"
#include "matrix.h"
#include "cvm.h"
#include "blas.h"
#include "cfun.h"
using std::complex;
using namespace cvm;
/* The gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
const int arraysize = 62172;
const int matrixDimention = 3;
float *inMatrixA11 = (float *)mxGetPr(prhs[0]);
complex<float> *inMatrixA12 = (complex<float> *)mxGetPr(prhs[1]);
complex<float> *inMatrixA13 = (complex<float> *)mxGetPr(prhs[2]);
float *inMatrixA22 = (float *)mxGetPr(prhs[3]);
complex<float> *inMatrixA23 = (complex<float> *)mxGetPr(prhs[4]);
float *inMatrixA33 = (float *)mxGetPr(prhs[5]);
basic_schmatrix< float, complex<float> > A(matrixDimention);
int i = 0;
for (i = 0; i < arraysize; i++)
{
A.set(1, 1, inMatrixA11[i]);
A.set(1, 2, inMatrixA12[i]);
A.set(1, 3, inMatrixA13[i]);
A.set(2, 2, inMatrixA22[i]);
A.set(2, 3, inMatrixA23[i]);
A.set(3, 3, inMatrixA33[i]);
}
}
And then in order to be able to debug the code, I've created the mx_minimum_power.pdb and mx_minimum_power.mexw64 files, using the following code in the Matlab Command Window:
mex -g mx_minimum_power.cpp cvm_em64t_debug.lib
The files blas.h, cfun.h, cvm.h and cvm_em64t_debug.lib are in the same directory as mx_minimum_power.cpp.
They are the headers and library files of the CVM Class Library.
Then I've attached MATLAB.exe to Visual Studio 2013, using the way explained here.
and have set a breakpoint at line40:
When I run my MATLAB code, there's no error until the specified line.
But if I click on the Step Over button, I'll encounter the following message:
With the following information added to the Output:
First-chance exception at 0x000007FEFCAE9E5D in MATLAB.exe: Microsoft C++ exception: cvm::cvmexception at memory location 0x0000000004022570.
> throw_segv_longjmp_seh_filter()
throw_segv_longjmp_seh_filter(): C++ exception
< throw_segv_longjmp_seh_filter() = EXCEPTION_CONTINUE_SEARCH
Can you suggest me why libmex.pdb is needed at that line and how should I solve the issue?
If I stop debugging, I'll get the following information in MATLAB Command Window:
Unexpected Standard exception from MEX file.
What() is:First index value 0 is out of [1,4) range
Right before pressing the step over button, we have the following values for A11[0],A12[0],A13[0],A22[0],A23[0],A33[0]:
that are just right as my expectations, according to what MATLAB passes to the MEX function:
Maybe the problem is because of wrong allocation for matrix A, it is as follows just before pressing the step over button.
The problem occurs because we have the following code in lines 48 to 53 of the cvm.h file:
// 5.7 0-based indexing
#if defined (CVM_ZERO_BASED)
# define CVM0 TINT_ZERO //!< Index base, 1 by default or 0 when \c CVM_ZERO_BASED is defined
#else
# define CVM0 TINT_ONE //!< Index base, 1 by default or 0 when \c CVM_ZERO_BASED is defined
#endif
That makes CVM0 = 1 by default. So if we press the Step Into(F11) button at the specified line two times, we will get into line 34883 of the file cvm.h:
basic_schmatrix& set(tint nRow, tint nCol, TC c) throw(cvmexception)
{
this->_set_at(nRow - CVM0, nCol - CVM0, c);
return *this;
}
Press Step Into(F11) at line this->_set_at(nRow - CVM0, nCol - CVM0, c); and you'll go to the definition of the function _set_at:
// sets both elements to keep matrix hermitian, checks ranges
// zero based
void _set_at(tint nRow, tint nCol, TC val) throw(cvmexception)
{
_check_lt_ge(CVM_OUTOFRANGE_LTGE1, nRow, CVM0, this->msize() + CVM0);
_check_lt_ge(CVM_OUTOFRANGE_LTGE2, nCol, CVM0, this->nsize() + CVM0);
if (nRow == nCol && _abs(val.imag()) > basic_cvmMachMin<TR>()) { // only reals on main diagonal
throw cvmexception(CVM_BREAKS_HERMITIANITY, "real number");
}
this->get()[this->ld() * nCol + nRow] = val;
if (nRow != nCol) {
this->get()[this->ld() * nRow + nCol] = _conjugate(val);
}
}
pressing Step Over(F10) button,you'll get the result:
so in order to get nRow=1 and nCol=1 and not nRow=0 and nCol=0, which is out of the range [1,4), you should write that line of code as:
A.set(2, 2, inMatrixA11[i]);
First of all, I never tried to call C code in a Matlab program - so it might just be a stupid mistake which I just can't figure out.
The C function is the following one and can be found on here, it is called durlevML.c and is part of the ARFIMA(p,d,q) estimator suite:
#include "mex.h"
#include "matrix.h"
#define square(p) ((p)*(p))
#define inv(q) (1/(q))
/* Durbin-Levinson algorithm for linear stationary AR(FI)MA(p,d,q) processes
Slightly altered for the maximum likelihood estimation
(C) György Inzelt 2011 */
void levinson_recur1(double* v,double* L, int N, double* gammas,int step)
{
int i,k;
if(step==0)
{
*(v + step) = *(gammas + step);
*(L + step) = 1;
for(k = step+1;k < N;++k)
{
*(L + k) = 0;
}
}
else if(step > 0 && step < N)
{
//phi_tt
*(L + step*N ) = (-1.00)* *(gammas + step);
if(step > 1)
{
for(i = 1;i < step ;++i)
{
*(L + step*N) -= *(L + (step-1)*N + (step -1) - i ) * *(gammas + step - i) ;
}
}
*(L +step*N) *= inv( *(v + step-1) );
//v_t
*(v + step) = *(v + step-1)*(1- square( *(L + step*N) ));
//phi_tj
for(i =1; i < step; ++i)
{
*(L + step*N + step - i) = *(L + (step-1)*N + (step -1) - i) + *(L + step*N ) * *(L + (step-1)*N + i -1 ) ;
}
//filling L with zeros and ones
*(L + step*N + step ) = 1;
if(step != N-1)
{
for(k = step*N +step+1 ;k < step*N + N ;++k)
{
*(L + k) =0;
}
}
}
if(step < N-1)
levinson_recur1(v,L,N,gammas,++step);
}
/* The gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int step=0;
int N;
double *gammas,*v,*L;
// getting the autocovariances
gammas = mxGetPr(prhs[0]);
N = mxGetM(prhs[0]);
// v
plhs[0] = mxCreateDoubleMatrix(0,0,mxREAL);
mxSetM(plhs[0],N);
mxSetN(plhs[0],1);
mxSetData(plhs[0], mxMalloc(sizeof(double)*N*1));
// L
plhs[1] = mxCreateDoubleMatrix(0,0,mxREAL);
mxSetM(plhs[1],square(N));
mxSetN(plhs[1],1);
mxSetData(plhs[1], mxMalloc(sizeof(double)*square(N)*1));
//
v = mxGetPr(plhs[0]);
L = mxGetPr(plhs[1]);
//
levinson_recur1(v, L, N,gammas,step);
//
return;
}
I have two problems now, the first one I already solved:
I got the following warning
Warning: You are using gcc version "4.6.3-1ubuntu5)". The version
currently supported with MEX is "4.4.6".
For a list of currently supported compilers see:
http://www.mathworks.com/support/compilers/current_release/
mex: durlevML.c not a normal file or does not exist.
but following the solution by changing the corresponding entries and installing the gcc-4.4 the warning disappeared, but the second actual problem still persists, although I don't get any warning any longer I still get the error
mex: durlevML.c not a normal file or does not exist.
What can be the possible reason for this?
I am using
ubuntu 12.04
Matlab 2012b
gcc-4.6 but was able to bring Matlab to use gcc-4.4
The problem occurred while running the function arfima_test.m which calls the durlevML.c function, the relevant (I guess) part of this is
% compiling the C/MEX file
c = input('Would you like to compile the MEX source? (1/0)');
switch c
case(1)
mex durlevML.c
case(0)
end
My thanks go to Ander Biguri due to his the comment!
The solution was indeed very easy, the problem was that the C file durlevML.c was not properly linked, so the compiler couldn't find anything. So I had to add some information in arfima_test.m in the following way
mex durlevML.c <---------------- needs to be linked /home/....
The C code itself was alright with one smaller issue. My compiler - I used gcc-4.4 due to Matlab incompatibilities with newer ones - didn't recognize the comments // and always produced errors, so I had to change them to the long comment format /**/ which finally worked.
Besides this everything worked, as far as I can tell!
I have the following kernel
__global__ void func( float * arr, int N ) {
int rtid = blockDim.x * blockIdx.x + threadIdx.x;
if( rtid < N )
{
float* row = (float*)((char*)arr + rtid*N*sizeof(float) );
for (int c = 1; c < N; ++c)
{
//Manipulation
}
}
}
When I call the kernel from MATLAB using
gtm= parallel.gpu.GPUArray(ones(a,b,'double'));
OR gtm= parallel.gpu.GPUArray(ones(1,b,'double'));
gtm=k.feval(gtm,b);
it is giving the following error:
Error using ==> feval
parallel.gpu.GPUArray must match the exact input type as specified on the kernel
prototype.
Error in ==> sameInit at 65 gtm=k.feval(gtm,b);
Can someone please tell me where am I going wrong.
Thanking You,
Viharri P L V.
The kernel object "k" that is being created in MATLAB have the following structure:
MaxNumLHSArguments: 1
NumRHSArguments: 2
ArgumentTypes: {'inout single' 'in int32 scalar'}
with the above mentioned CUDA kernel prototype i.e.,
__global__ void func( float * arr, int N )
So, there was an mismatch error. We need to either change the prototype of the CUDA kernel to
__global__ void func( double * arr, int N )
or create the MATLAB array with 'single' type.
gtm= parallel.gpu.GPUArray(ones(a,b,'single'));