I have images with different rotational orientations. I want to find correct rotation angle using cross-correlation maximization. Since my image set is big, I wanted to speed up normxcorr2 function using the mex file here.
I used the following code to calculate matched_angle:
function [matched_angle, max_corr_vecq, matched_angle_mex, max_corr_vecq_mex] = get_correct_rotation(moving, fixed)
for theta = 360:-10:10
rotated = imrotate(moving, theta,'bicubic','crop');
corr2d_map = normxcorr2(double(rotated), double(fixed));
corr2d_map_mex = normxcorr2_mex(double(rotated), double(fixed),'full');
[max_corr_vec(theta/10), ~] = max(corr2d_map(:));
[max_corr_vec_mex(theta/10), ~] = max(corr2d_map_mex(:));
end
% Interpolate correlation max vector for half degree resolution
max_corr_vecq = interp1(10:10:360, max_corr_vec, 0.5:0.5:360, 'spline');
[~, matched_angle] = max(max_corr_vecq);
matched_angle = 0.5 * matched_angle;
% Interpolate correlation max vector for half degree resolution
max_corr_vecq_mex = interp1(10:10:360, max_corr_vec_mex, 0.5:0.5:360, 'spline');
[~, matched_angle_mex] = max(max_corr_vecq_mex);
matched_angle_mex = 0.5 * matched_angle_mex;
end
However using those two same images (Moving Template Image & Fixed Reference Image) for two different normxcorr2 & normxcorr2_mex gives totally different results.
plot(0.5:0.5:360, max_corr_vecq, 'linewidth',2); hold on;
plot(0.5:0.5:360, max_corr_vecq_mex, 'linewidth',2);
legend({'MATLAB Built-in', 'MEX'});
set(gca, 'FontSize', 14, 'FontWeight', 'bold');
See Result Plot.
Does anyone has an idea what is going on? I could not found any entry regarding the accuracy of that mex file. And according to the author:
the following are equivalent:
result = normxcorr2_mex(template, image, 'full');
AND
result = normxcorr2(template, image);
except that normxcorr2_mex has 0's in the 'invalid' area along the boundary
which should not be problem in my case. Since I am only checking the max correlation value.
Since my previous answer, I have found the normcorr2_mex library to be consistently slower (than MATLAB) and incorrect in all of my use cases.
As I really needed a C++ implementation (that I could verify with MATLAB), I created my own. The code is listed here:
/* normxcorr2_mex.cpp
*
* A MATLAB-mex wrapper around a C/C++ implementation of the Normalised Cross Correlation algorithm described
* by #dafnahaktana in https://stackoverflow.com/questions/44591037/speed-up-calculation-of-maximum-of-normxcorr2.
*
* This module uses the 'integral image' data structure described in the posted MATLAB/Octave code (based upon the
* original Industrial Light & Magic paper at http://scribblethink.org/Work/nvisionInterface/nip.pdf), but replaces
* the "naive" correlation step with a Fourier transform implementation for larger template sizes.
*
* Daniel Eaton released a MATLAB-mex library (http://www.cs.ubc.ca/research/deaton/remarks_ncc.html) with the
* same function name as this one in 2013. Indeed, I acknowledge [and flatteringly plagiarise] his interface and
* naming convention. Unfortunaly, I was unable to duplicate the speed (wrt MATLABs normxcorr2) improvements he
* claimed with the image sizes I required. Curiously, I also observed different results using his library compared
* with MATLABs built-in function (despite being claimed to be identical). This was also noted by others here:
* https://stackoverflow.com/questions/48641648/different-results-of-normxcorr2-and-normxcorr2-mex. This module
* does match normxcorr2 on both the MATLAB R2016b and R2017a/b versions tested, using the (accompanying) test script.
* Like Daniel's module, however, this function returns only the 'valid' region of correlation values, i.e. it
* doesn't pad the output array to match the input image size.
*
* This function is called via:
* NCC = normxcorr2_mex (TEMPLATE, A);
* Where:
* TEMPLATE - The (double precision) matrix to correlate with A.
* A - (Double precision) input matrix for correlation with the TEMPLATE. Note size(A) > size(TEMPLATE).
* NCC - is the computed normalised cross correlation coefficients of the matrices TEMPLATE and A.
* The size of the correlation coefficient matrix is given as:
*
* size(NCC) = [(Ar - TEMPLATEr + 1), (Ac - TEMPLATEc + 1)] ; where:
*
* Ar, Ac and TEMPLATEr, TEMPLATEc are the number of (rows, cols) of A and TEMPLATE respectively.
*
* This module requires the Eigen C++ library (http://eigen.tuxfamily.org/index.php?title=Main_Page) for compilation
* and may be compiled within MATLAB via:
*
* mex -I'[Path to]\eigen-3.3.5' normxcorr2_mex.cpp
*
* Since NCC is such a computationally intensive task, this module may be linked against the openMP library to exploit a
* pool of worker threads and distribute some of the embarrassingly parellel operations within across a number of CPU cores.
* Only rudimentary use is made of the library, but the following compilation option provides speedups generally
* exceeding 50%:
*
* mex -I'[Path to]\eigen-3.3.5' CXXFLAGS="$CXXFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" normxcorr2_mex.cpp
*
*
* You are free to do with this code as you wish. For this reason, it is released under the UNLICENSE model:
*
* This is free and unencumbered software released into the public domain.
*
* Anyone is free to copy, modify, publish, use, compile, sell, or
* distribute this software, either in source code form or as a compiled
* binary, for any purpose, commercial or non-commercial, and by any
* means.
*
* In jurisdictions that recognize copyright laws, the author or authors
* of this software dedicate any and all copyright interest in the
* software to the public domain. We make this dedication for the benefit
* of the public at large and to the detriment of our heirs and
* successors. We intend this dedication to be an overt act of
* relinquishment in perpetuity of all present and future rights to this
* software under copyright law.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*
* For more information, please refer to <http://unlicense.org/>
*/
#include "mex.h"
#include <cstring>
#include <algorithm>
#include <limits>
#include <vector>
#include <cmath>
#include <complex>
#include <iostream>
#include <Eigen/Core>
#include <unsupported/Eigen/FFT>
using namespace Eigen;
// If we're compiled/linked with openMP, turn off Eigen's parallelisation
#ifdef _OPENMP
#define EIGEN_DONT_PARALLELIZE
#define EIGEN_NO_DEBUG
#endif
// For very small input templates, performing the raw 2D correlation in the spatial domain may be faster than
// the transform domain (due to the overhead that the latter involves). The decision which approach to use is
// made at runtime by comparing the size (=rows*cols) of the input TEMPLATE matrix with the following constant.
// Feel free to experiment with this value in your own application!
#define TEMPLATE_SIZE_THRESHOLD 401
// 2D Cross-correlation performed via the "naive approach" (laborious spatial domain convolution).
ArrayXXd spatialXcorr (const Ref<const ArrayXXd>& img, const Ref<const ArrayXXd>& templ)
{
int32_t r, c;
ArrayXXd xcorr2(img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
for (r=0; r<(img.rows()-templ.rows()+1); r++)
for (c=0; c<(img.cols()-templ.cols()+1); c++)
xcorr2(r,c) = (templ*img.block(r,c,templ.rows(),templ.cols())).sum();
return(xcorr2);
}
// 2D Cross-correlation performed via Fourier transform
ArrayXXd transformXcorr (const Ref<const ArrayXXd>& img, const Ref<const ArrayXXd>& templ)
{
ArrayXXd xcorr2(img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
// Copy the input arrays into a matrix the next power-of-2 up in size
int32_t nextPow2r = (int32_t)(pow(2.0, round(0.5+log((double)(img.rows()))/log(2.0))));
int32_t nextPow2c = (int32_t)(pow(2.0, round(0.5+log((double)(img.cols()))/log(2.0))));
MatrixXd imgPwr2 = MatrixXd::Zero(nextPow2r, nextPow2c);
MatrixXd templPwr2 = MatrixXd::Zero(nextPow2r, nextPow2c);
// A -> copied to top-left corner.
// TEMPLATE is rotated 180 degrees to account for rotation/flip performed during convolution.
imgPwr2.block(0, 0, img.rows(), img.cols()) = img.matrix();
templPwr2.block(0, 0, templ.rows(), templ.cols()) = (templ.matrix().colwise().reverse()).rowwise().reverse();
// Perform 2D FFTs via sequential 1D transforms (Rows first, then columns)
MatrixXcd imgFT(nextPow2r, nextPow2c), templFT(nextPow2r, nextPow2c), prodFT(nextPow2r, nextPow2c);
// Rows first...
#ifdef _OPENMP // If using parallel threads, then each thread
// must have it's own copy of the eigenFFT plan.
#pragma omp parallel for schedule(dynamic)
for (int32_t r=0; r<nextPow2r; r++) { // This is unnecesary for single-threaded execution as
// each evaluation of the FFT is identical in length
VectorXcd rowVec(nextPow2c); // and data type.
FFT<double> eigenFFT;
// The creation of the plan is computationally expensive
#else // and so we do it once, outside of the loop in the single
// threaded case (to reduce the run time by a factor > 2).
VectorXcd rowVec(nextPow2c);
FFT<double> eigenFFT;
for (int32_t r=0; r<nextPow2r; r++) {
#endif
eigenFFT.fwd(rowVec, imgPwr2.row(r));
imgFT.row(r) = rowVec;
eigenFFT.fwd(rowVec, templPwr2.row(r));
templFT.row(r) = rowVec;
}
// ...then columns.
#ifdef _OPENMP
#pragma omp parallel for schedule(dynamic)
for (int32_t c=0; c<nextPow2c; c++) {
VectorXcd colVec(nextPow2r);
FFT<double> eigenFFT;
#else
VectorXcd colVec(nextPow2r);
for (int32_t c=0; c<nextPow2c; c++) {
#endif
eigenFFT.fwd(colVec, imgFT.col(c));
imgFT.col(c) = colVec;
eigenFFT.fwd(colVec, templFT.col(c));
templFT.col(c) = colVec;
}
// Mutliply complex Fourier domain matricies
prodFT = imgFT.cwiseProduct(templFT);
// Transform (complex) Fourier product back -> (real) spatial domain (2D IFFT).
// Reuse templPwr2 as the output variable for efficiency.
// Rows first (again)...
#ifdef _OPENMP
#pragma omp parallel for schedule(dynamic)
for (int32_t r=0; r<nextPow2r; r++) {
FFT<double> eigenFFT;
VectorXcd rowVec(nextPow2c);
#else
for (int32_t r=0; r<nextPow2r; r++) {
#endif
eigenFFT.inv(rowVec, prodFT.row(r));
prodFT.row(r) = rowVec;
}
// ...and lastly, columns.
#ifdef _OPENMP
#pragma omp parallel for schedule(dynamic)
for (int32_t c=0; c<nextPow2c; c++) {
FFT<double> eigenFFT;
VectorXcd colVec(nextPow2r);
#else
for (int32_t c=0; c<nextPow2c; c++) {
#endif
eigenFFT.inv(colVec, prodFT.col(c));
templPwr2.col(c) = colVec.real();
}
// Extract the valid region of correlation coefficients
xcorr2 = templPwr2.array().block(templ.rows()-1, templ.cols()-1, img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
return(xcorr2);
}
// Normalised cross-correlation top-level function
ArrayXXd normxcorr2 (const Ref<const ArrayXXd>& templ, const Ref<const ArrayXXd>& img)
{
ArrayXXd templZMean(templ.rows(), templ.cols());
ArrayXXd scalingCoeffs(img.rows() - templ.rows() +1, img.cols() - templ.cols() +1);
ArrayXXd normxcorr(img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
ArrayXXd integralImg(img.rows()+2, img.cols()+2), integralImgSq(img.rows()+2, img.cols()+2);
ArrayXXd windowMeanA = ArrayXXd::Zero(img.rows() - templ.rows() +1, img.cols() - templ.cols() +1);
ArrayXXd windowMeanASq = ArrayXXd::Zero(img.rows() - templ.rows() +1, img.cols() - templ.cols() +1);
// Calculate the standard deviation of the TEMPLATE
double templSizeRcp = 1.0/(double)(templ.rows()*templ.cols());
templZMean = templ-templ.mean();
double templateStd = sqrt((templZMean.pow(2)).sum()*templSizeRcp);
// Compute mean and standard deviation of input matrix A over the template window size. Firsly...
// Construct array for computing the integral image(s) + zero pad the edges to avoid boundary issues
integralImg.block(0, 0, 1, integralImg.cols()) = ArrayXXd::Zero(1, integralImg.cols());
integralImg.block(0, 0, integralImg.rows(), 1) = ArrayXXd::Zero(integralImg.rows(), 1);
integralImg.block(0, integralImg.cols()-1, integralImg.rows(), 1) = ArrayXXd::Zero(integralImg.rows(), 1);
integralImg.block(integralImg.rows()-1, 0, 1, integralImg.cols()) = ArrayXXd::Zero(1, integralImg.cols());
integralImgSq.block(0, 0, 1, integralImgSq.cols()) = ArrayXXd::Zero(1, integralImgSq.cols());
integralImgSq.block(0, 0, integralImgSq.rows(), 1) = ArrayXXd::Zero(integralImgSq.rows(), 1);
integralImgSq.block(0, integralImgSq.cols()-1, integralImgSq.rows(), 1) = ArrayXXd::Zero(integralImgSq.rows(), 1);
integralImgSq.block(integralImgSq.rows()-1, 0, 1, integralImgSq.cols()) = ArrayXXd::Zero(1, integralImgSq.cols());
// Calculate cumulative sum. Along the length of each row first...
for (int32_t r=0; r<img.rows(); r++) {
double sum = 0.0;
double sumSq = 0.0;
for (int32_t c=0; c<img.cols(); c++) {
sum += img(r,c);
sumSq += (img(r,c)*img(r,c));
integralImg(r+1, c+1) = sum;
integralImgSq(r+1, c+1) = sumSq;
}
}
// ...and then down each column.
for (int32_t c=1; c<=img.cols(); c++) {
double sum = 0.0;
double sumSq = 0.0;
for (int32_t r=1; r<=img.rows(); r++) {
sum += integralImg(r,c);
sumSq += integralImgSq(r,c);
integralImg(r,c) = sum;
integralImgSq(r,c) = sumSq;
}
}
// Determine start/finish indexes for the boundaries of the summed area
int32_t rStart = (int32_t)(0.5 + templ.rows()/2.0);
int32_t rEnd = img.rows() - rStart + (templ.rows() % 2);
int32_t cStart = (int32_t)(0.5 + templ.cols()/2.0);
int32_t cEnd = img.cols() - cStart + (templ.cols() % 2);
// Evaluate the sum of intensities
windowMeanA += ( integralImg.block(templ.rows(), templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
- integralImg.block(templ.rows(), 0, rEnd-rStart+1, cEnd-cStart+1) \
- integralImg.block(0, templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
+ integralImg.block(0, 0, rEnd-rStart+1, cEnd-cStart+1) )*templSizeRcp;
// Evaluate the sum of intensities (squared)
windowMeanASq += ( integralImgSq.block(templ.rows(), templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
- integralImgSq.block(templ.rows(), 0, rEnd-rStart+1, cEnd-cStart+1) \
- integralImgSq.block(0, templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
+ integralImgSq.block(0, 0, rEnd-rStart+1, cEnd-cStart+1) )*templSizeRcp;
// Calculate the standard deviation (squared) of A over the template size window
// Standard deviation = sqrt(windowMeanASq - windowMeanA.square());
scalingCoeffs = (windowMeanASq - windowMeanA.square());
// Amalgamate the element-by-element test/square root with other coefficients scaling for efficiency
for (int32_t r=0; r<scalingCoeffs.rows(); r++)
for (int32_t c=0; c<scalingCoeffs.cols(); c++)
if (scalingCoeffs(r,c) > 0)
scalingCoeffs(r,c) = templSizeRcp/(templateStd*sqrt(scalingCoeffs(r,c)));
else
scalingCoeffs(r,c) = std::numeric_limits<double>::quiet_NaN();
// Decide which 2D correlation approach to use (transform or spatial domain)
if ((templ.rows()*templ.cols()) > TEMPLATE_SIZE_THRESHOLD)
normxcorr = scalingCoeffs*transformXcorr(img, templZMean);
else
normxcorr = scalingCoeffs*spatialXcorr(img, templZMean);
return(normxcorr);
}
// ******************** Minimal MEX wrapper ********************
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// Check the number of arguments
if (nrhs != 2)
mexErrMsgIdAndTxt("MATLAB:normxcorr2_mex", "Usage: NCC = normxcorr2_mex (TEMPLATE, A);");
// Verify input array sizes
size_t rowsTempl = mxGetM(prhs[0]);
size_t colsTempl = mxGetN(prhs[0]);
size_t rowsA = mxGetM(prhs[1]);
size_t colsA = mxGetN(prhs[1]);
if ((rowsA <= rowsTempl) || (colsA <= colsTempl))
mexErrMsgIdAndTxt("MATLAB:normxcorr2_mex", "Size of TEMPLATE must be less than input matrix A.");
#ifdef _OPENMP
// Required for Eigen versions < 3.3 and for *some* non-compliant C++11 compilers.
// (Warn Eigen our application might be calling it from multiple threads).
initParallel();
#endif
// Perform correlation
ArrayXXd xcorr(rowsA-rowsTempl+1, colsA-colsTempl+1);
xcorr = normxcorr2 (Map<ArrayXXd>(mxGetPr(prhs[0]), rowsTempl, colsTempl), Map<ArrayXXd>(mxGetPr(prhs[1]), rowsA, colsA));
// Return data to MATLAB
plhs[0] = mxCreateDoubleMatrix(rowsA-rowsTempl+1, colsA-colsTempl+1, mxREAL);
Map<ArrayXXd> (mxGetPr(plhs[0]), xcorr.rows(), xcorr.cols()) = xcorr;
return;
}
As per the comments in the header, save the file to normxcorr2_mex.cpp and compile with:
mex -I'[Path to]\eigen-3.3.5' normxcorr2_mex.cpp for
single-threaded operation, or with
mex -I'[Path to]\eigen-3.3.5' CXXFLAGS="$CXXFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp"
normxcorr2_mex.cpp for multi-threaded openMP support.
The timing and correct operation of the code can be verified with the following MATLAB script:
% testHarness.m
%
% Verify the results of the compiled normxcorr2_mex() function against
% MATLABs inbuilt normxcorr2() function. This takes aaaaages to run!
%% Simulation/comparison parameters
nRunsA = 50; % Number of trials for accuracy comparison
nRunsT = 30; % Number of repetitions for execution time detemination
nStepsT = 50; % Number of input matrix size steps to take in execution time measurement
maxImSize = [1343 1745]; % (Deliberately non-round-number) maximum image size for tests
maxTemplSize = [248 379]; % Maximum image template size
%% Accuracy comparison
sumSqErr = zeros(1, nRunsA);
fprintf(2, 'Accuracy comparison\n');
for nRun = 1:nRunsA
fprintf('Run %d (of %d)\n', nRun, nRunsA);
% Create input images/templates of random content and size
randSizeScale = 0.02 + 0.98*rand(1, 2);
img = rand(round(maxImSize.*randSizeScale));
templ = rand(round(maxTemplSize.*randSizeScale));
% MATLABs inbuilt function
resultMatPadded = normxcorr2(templ, img);
% Remove unwanted padding
[rTempl, cTempl] = size(templ);
[rImg, cImg] = size(img);
resultMat = resultMatPadded(rTempl:rImg, cTempl:cImg);
% MEX function
resultMex = normxcorr2_mex(templ, img);
% Compare results
sumSqErr(nRun) = sum(sum( (resultMat-resultMex).^2 ));
end
figure;
plot(sumSqErr);
title('Accuracy comparison between MATLAB and MEX normxcorr2');
xlabel('Run #');
ylabel('\Sigma |MATLAB-MEX|^2');
grid on;
%% Timing comparison
avMatT = zeros(1, nStepsT);
avMexT = zeros(1, nStepsT);
fprintf(2, 'Timing comparison\n');
for stp = 1:nStepsT
fprintf('Run %d (of %d)\n', stp, nStepsT);
% Create input images/templates of random content and progressively larger size
img = rand(round(maxImSize*stp/nStepsT));
templ = rand(round(maxTemplSize.*stp/nStepsT));
% MATLABs function
tStart = tic;
for exec = 1:nRunsT
dummy = normxcorr2(templ, img);
end
avMatT(stp) = toc(tStart)/nRunsT;
% MEX function
tStart = tic;
for exec = 1:nRunsT
dummy = normxcorr2_mex(templ, img);
end
avMexT(stp) = toc(tStart)/nRunsT;
end
figure;
plot((1:nStepsT)/(0.01*nStepsT), avMatT, 'rx-', (1:nStepsT)/(0.01*nStepsT), avMexT, 'bo-');
title('Execution time comparison between MATLAB and MEX normxcorr2');
xlabel('Input array size [% of maximum]');
ylabel('Evaluation time [s]');
legend('MATLAB', 'MEX');
grid on;
The above C++/mex implementation and MATLAB's inbuilt normxcorr2 function agree to a level approaching the limits of the underlying double-precision data type. It turns out that the recent MATLAB normxcorr2 is hard to beat in speed though - even when using openMP - as this comparative timing plot shows when run on my elderly i7-980 CPU.
Unfortunately I don't have an explanation, but can confirm the issue appears to be with the library and not your implementation. I had issues building the normxcorr2_mex library with the MinGW64 compiler under windows which made me wary of possible variations between builds. Builds under both Debian Linux and Windows exhibit the same (incorrect) behaviour compared to MATLAB's built-in normxcorr2 function, as shown in the plot included here.
To assist anyone else building the library under Windows, I had to coerce the C++ compiler with the following command line:
mex -O CXXFLAGS="$CXXFLAGS -std=c++03 -fpermissive" normxcorr2_mex.cpp cv_src/*.cpp
Incidentally, I also found the mex implementation to be an order of magnitude slower than MATLABs!
I have a large matrix of random values (e.g. 200,000 x 6,000) between 0-1 named 'allGSR.'
I used the following code to create a logical array (?) where 1 represents numbers less than .05
sig = (allGSR < .05);
What I'd like to do is to return an array of size 1 x 200,000 called maxSIG where each row represents the MAXIMUM number of sequential ones. So for example, if in row 1, columns 3-6 are ones, that is 4 ones in a row and if columns 100-109 are ones that is 10 ones in a row and if that is the maximum number of ones in a row I would like the first column of maxSIG to be the value '10.'
I have been doing this with for loops, if statements, and counters; this is ugly and tedious and was wondering if there is an easier or more efficient way.
Thank you for any insight.
EDIT: Whoops, should probably share the loop.
EDIT 2: So I just wrote out what my basic code is with a smaller (100 x 6,000) matrix. This code should run. Sorry for the inconvenience.
GSR = 6000;
samples = 100;
allGSR = zeros(samples, GSR);
for x = 1:samples
y = rand(GSR, 1)'; %Transpose so it's 1x6000 and not 6000x1
allGSR(x,:) = y;
end
countSIG = zeros(samples,1);
abovethreshold = (allGSR < .05); %.05 can be replaced by whatever
for z = 1:samples
count = 0;
holdArray = zeros(1,GSR);
for a = 1:GSR
if abovethreshold(z,a) == true
count = count + 1;
else
count = 0;
end
holdArray(1,a) = count;
end
maxrun = max(holdArray);
countSIG(z,1) = maxrun;
end
Here's one approach using diff, find & accumarray -
append_col = zeros(size(abovethreshold,1),1);
df = diff([append_col abovethreshold append_col],[],2).'; %//'
[R1,C1] = find(df==1);
[R2,C2] = find(df==-1);
out = zeros(samples,1);
out(1:max(C1)) = accumarray(C1,R2 - R1,[],#max);
In the code posted above, we are creating a fat array with abovethreshold and then transposing it. From performance point of view, the transpose operation might not be the best thing to do. So, rather we can move things around it rather than itself, like so -
append_col = zeros(size(abovethreshold,1),1);
df = diff([append_col abovethreshold append_col],[],2); %//'
[R1,C1] = find(df==1);
[R2,C2] = find(df==-1);
[~,idx1] = sort(R1);
[~,idx2] = sort(R2);
out = zeros(samples,1);
out(1:max(R1)) = accumarray(R1(idx1),C2(idx2) - C1(idx1),[],#max);
If you're worried about memory allocation, speed, etc... on huge arrays, I'd just do your same basic algorithm in c++. Throw this in something like myfunction.cpp file and compile with mex -largeArrayDims myfunction.cpp.
You can then call from matlab with counts = myfunction(allGSR, .05);
I haven't tested this beyond that it compiles.
#include "mex.h"
#include "matrix.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
if(nrhs != 2)
mexErrMsgTxt("Invalid number of inputs. Shoudl be 2 input argument.");
if(nlhs != 1)
mexErrMsgTxt("Invalid number of outputs. Should be 1 output arguments.");
if(!mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]))
mexErrMsgTxt("First two arguments are not doubles");
const mxArray *input_array = prhs[0];
const mxArray *threshold_array = prhs[1];
size_t input_rows = mxGetM(input_array);
size_t input_cols = mxGetN(input_array);
size_t threshold_rows = mxGetM(threshold_array);
size_t threshold_cols = mxGetN(threshold_array);
if(threshold_rows != 1 || threshold_cols != 1)
mexErrMsgTxt("threshold array should be a scalar");
mxArray *output_array = mxCreateDoubleMatrix(1, input_rows, mxREAL);
double *output_data = mxGetPr(output_array);
double *input_data = mxGetPr(input_array);
double threshold = *mxGetPr(threshold_array);
for(int z = 0; z < input_rows; z++) {
int count = 0;
int max_count = 0;
for(int a = 0; a < input_cols; a++) {
if(input_data[z + a * input_rows] < threshold) {
count++;
} else {
if(count > max_count)
max_count = count;
count = 0;
}
}
if(count > max_count)
max_count = count;
output_data[z] = max_count;
}
plhs[0] = output_array;
}
I'm not sure if you want to check for above or below threshold? Whatever you do, you'd change the input_data[z + a * input_rows] < threshold) to whatever comparison operator you want.
Here's a one-liner, albeit slow since cellfun is a loop:
maxSIG=cellfun(#(x) max(getfield(regionprops(x),'Area')),mat2cell(allGSR,ones(6000,1),100));
The Image Processing Toolbox function regionprops identifies connected groups of 1's in a logical matrix. By operating on each row of your matrix, and returning specifically the Area property, we get the length of each connected segment of 1's in each row. The max function picks out the length in each row you're looking for.
Note the mat2cell call is necessary to split allGSR into a cell matrix of rows, so that cellfun can be called.
I have struggling with an implementation of MEX file, I get run time error!
I would like to given two inputs : inMatToInv, inMatToMul I would like to compute : inv(inMatToInv)*inMatToMul, I assume that the user enters only 3x3 matrices.
I also assume that the inMatToInv in invertible.
/*==========================================================
* inv_and_mul_3by3.c
* inverse 3x3 matrix and multiply it by another 3x3 matrix
* The calling syntax is: outMatrix = (mat_to_inv,mat_to_mul)
*========================================================*/
#include "mex.h"
/* The computational routine */
void inv_and_mul_3by3(double *mat_to_inv, double *mat_to_mul, double *out)
{
// Description : out = inv(mat_to_inv)*mat_to_mul
double *inversed;
double det;
/* Calculate the matrix deteminant */
det = mat_to_inv[0]*(mat_to_inv[4]*mat_to_inv[8]-mat_to_inv[7]*mat_to_inv[5]) - mat_to_inv[3]*(mat_to_inv[1]*mat_to_inv[8]-mat_to_inv[7]*mat_to_inv[2])+mat_to_inv[6]*(mat_to_inv[1]*mat_to_inv[5]-mat_to_inv[4]*mat_to_inv[2]);
// Calcualte the inversed matrix
inversed[0] = (mat_to_inv[4]*mat_to_inv[8] - mat_to_inv[7]*mat_to_inv[5])/det;
inversed[3] = (mat_to_inv[6]*mat_to_inv[5] - mat_to_inv[3]*mat_to_inv[8])/det;
inversed[6] = (mat_to_inv[3]*mat_to_inv[7] - mat_to_inv[6]*mat_to_inv[4])/det;
inversed[1] = (mat_to_inv[7]*mat_to_inv[2] - mat_to_inv[1]*mat_to_inv[8])/det;
inversed[4] = (mat_to_inv[0]*mat_to_inv[8] - mat_to_inv[6]*mat_to_inv[2])/det;
inversed[7] = (mat_to_inv[6]*mat_to_inv[1] - mat_to_inv[0]*mat_to_inv[7])/det;
inversed[2] = (mat_to_inv[1]*mat_to_inv[5] - mat_to_inv[4]*mat_to_inv[2])/det;
inversed[5] = (mat_to_inv[3]*mat_to_inv[2] - mat_to_inv[0]*mat_to_inv[5])/det;
inversed[8] = (mat_to_inv[0]*mat_to_inv[4] - mat_to_inv[3]*mat_to_inv[1])/det;
out[0] = inversed[0]*mat_to_mul[0] + inversed[3]*mat_to_mul[1] + inversed[6]*mat_to_mul[2];
out[1] = inversed[1]*mat_to_mul[0] + inversed[4]*mat_to_mul[1] + inversed[7]*mat_to_mul[2];
out[2] = inversed[2]*mat_to_mul[0] + inversed[5]*mat_to_mul[1] + inversed[8]*mat_to_mul[2];
out[3] = inversed[0]*mat_to_mul[3] + inversed[3]*mat_to_mul[4] + inversed[6]*mat_to_mul[5];
out[4] = inversed[1]*mat_to_mul[3] + inversed[4]*mat_to_mul[4] + inversed[7]*mat_to_mul[5];
out[5] = inversed[2]*mat_to_mul[3] + inversed[5]*mat_to_mul[4] + inversed[8]*mat_to_mul[5];
out[6] = inversed[0]*mat_to_mul[6] + inversed[3]*mat_to_mul[7] + inversed[6]*mat_to_mul[8];
out[7] = inversed[1]*mat_to_mul[6] + inversed[4]*mat_to_mul[7] + inversed[7]*mat_to_mul[8];
out[8] = inversed[2]*mat_to_mul[6] + inversed[5]*mat_to_mul[7] + inversed[8]*mat_to_mul[8];
}
/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
double *inMatToInv; /* 3x3 input matrix that is inversed */
double *inMatToMul; /* 3x3 input matrix that multiply the inversed matrix*/
double *outMatrix; /* 3x3 output matrix */
/* create pointers to the real data in the input matrix */
inMatToInv = mxGetPr(prhs[0]);
inMatToMul = mxGetPr(prhs[1]);
/* create the output matrix */
plhs[0] = mxCreateDoubleMatrix(3,3,mxREAL);
/* get a pointer to the real data in the output matrix */
outMatrix = mxGetPr(plhs[0]);
/* call the computational routine */
inv_and_mul_3by3(inMatToInv,inMatToMul,outMatrix);
}
any help will be much appreciated.
Thanks,
Tamir
You declare a pointer double *inversed, but you do not allocate any memory for it. So when you access inversed[0] you get an error. Try:
double inversed[9];
I'd recommend using the blas and lapack interface for matrix operations in a C mex file. Have a look at Mathworks documentation on doing this here.
You would be interested in the lapack routines dgetrf and dgetri for the matrix inversion and dgemm for matrix multiplication. The lapack interface can be really messy and difficult to grasp but there are lots of references online about them. For example, this stack overflow Q/A.
I am trying to parallelize a section of my Matlab code using OpenMP in a mex file. The section in theMatlab code that I want to parallelize is:
for i = 1 : n
D(:, i) = CALC(A, B(:,i), C(i));
end
I have written this in order to parallelize it:
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
size_t r,n,i,G;
double *A, *B, *C, *D;
int nthreads;
nthreads = 4;
A = mxGetPr(prhs[0]); /* first input matrix */
B = mxGetPr(prhs[1]); /* second input matrix */
C = mxGetPr(prhs[2]);/* third input matrix */
/* dimensions of input matrices */
r = mxGetN(prhs[0]);
n = mxGetN(prhs[1]);
plhs[0] = mxCreateDoubleMatrix(r,n, mxREAL);
D = mxGetPr(plhs[0]);
G=n/nthreads;
omp_set_num_threads(nthreads);
#pragma omp parallel for schedule (dynamic, G)
{
for i = 1 : n
D(:, i) = CALC(A, B(:,i), C(i));
}
}
CALC is a Matlab function I have written. My challenge is how to use Mexcallmatlab to call in the CALC function to the mex file so that it can execute it in parallel inside my mex file, and return the elements of each column of D (i.e. D(:, i) back to my Matlab code.
Sorry for the lenghty question. Any help I can get on this will be highly appreciated.
You need to use multiple MATLAB processes to be able to run multiple calls in parallel. The easiest way would be to use parallel computing toolbox and use parfor instead of for loop.
I'm trying to have the same FFT with FFTW and Matlab. I use MEX files to check if FFTW is good. I think I have everything correct but :
I get absurd values from FFTW,
I do not get the same results when running the FFTW code several times on the same input signal.
Can someone help me get FFTW right?
--
EDIT 1 : I finally figured out what was wrong, BUT...
FFTW is very unstable : I get the right spectrum 1 time out of 5 !
How come? Plus, when I get it right, it doesn't have symmetry (which is not a very serious problem but that's too bad).
--
Here is the Matlab code to compare both :
fs = 2000; % sampling rate
T = 1/fs; % sampling period
t = (0:T:0.1); % time vector
f1 = 50; % frequency in Hertz
omega1 = 2*pi*f1; % angular frequency in radians
phi = 2*pi*0.25; % arbitrary phase offset = 3/4 cycle
x1 = cos(omega1*t + phi); % sinusoidal signal, amplitude = 1
%%
mex -I/usr/local/include -L/usr/local/lib/ -lfftw3 mexfftw.cpp
N=256;
S1=mexfftw(x1,N);
S2=fft(x1,N);
plot(abs(S1)),hold,plot(abs(S2),'r'), legend('FFTW','Matlab')
Here is the MEX file :
/*********************************************************************
* mex -I/usr/local/include -L/usr/local/lib/ -lfftw3 mexfftw.cpp
* Use above to compile !
*
********************************************************************/
#include <matrix.h>
#include <mex.h>
#include "fftw3.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
//declare variables
mxArray *sig_v, *fft_v;
int nfft;
const mwSize *dims;
double *s, *fr, *fi;
int dimx, dimy, numdims;
//associate inputs
sig_v = mxDuplicateArray(prhs[0]);
nfft = static_cast<int>(mxGetScalar(prhs[1]));
//figure out dimensions
dims = mxGetDimensions(prhs[0]);
numdims = mxGetNumberOfDimensions(prhs[0]);
dimy = (int)dims[0]; dimx = (int)dims[1];
//associate outputs
fft_v = plhs[0] = mxCreateDoubleMatrix(nfft, 1, mxCOMPLEX);
//associate pointers
s = mxGetPr(sig_v);
fr = mxGetPr(fft_v);
fi = mxGetPi(fft_v);
//do something
double *in;
fftw_complex *out;
fftw_plan p;
in = (double*) fftw_malloc(sizeof(double) * dimy);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * nfft);
p = fftw_plan_dft_r2c_1d(nfft, s, out, FFTW_ESTIMATE);
fftw_execute(p); /* repeat as needed */
for (int i=0; i<nfft; i++) {
fr[i] = out[i][0];
fi[i] = out[i][1];
}
fftw_destroy_plan(p);
fftw_free(in);
fftw_free(out);
return;
}
Matlab uses the fftw library to perform its ffts, on my platform (Mac OS) this leads to issues with the linker as mex replaces the desired library with Matlab's version of fftw. To avoid this static link to the library using mex "-I/usr/local/include /usr/local/lib/libfftw3.a mexfftw.cpp".
The input of fftw_plan_dft_r2c_1d is not destroyed so you don't need to duplicate the input (note: not true for fftw_plan_dft_c2r_1d). The output has size nfft/2+1 as the output of a real fft is Hermitian. So to get the full output use:
for (i=0; i<nfft/2+1; i++) {
fr[i] = out[i][0];
fi[i] = out[i][1];
}
for (i=1; i<nfft/2+1; i++) {
fr[nfft-i] = out[i][0];
fi[nfft-i] = out[i][1];
}
Should "p = fftw_plan_dft_r2c_1d(nfft, s, out, FFTW_ESTIMATE);"
be
"p = fftw_plan_dft_r2c_1d(nfft, in, out, FFTW_ESTIMATE);".
'in' is 16-byte aligned, but 's' may not.
I am not sure whether it will cause a problem. I have a similar code about FFTW,
it sometimes gives me correct result and other times NaN. Also, I tried to test
my code with python ctypes, and actually it has the same weired behavior.
Finally, I found this post Checking fftw3 with valgrind, and it helps me.
For me, the issue is the reservation of heap storage in FFTW which are not freed
even after the program terminates.
fftw_cleanup()
resolves my problem. Maybe it can help you too.