Mex files and memory management - matlab

I have a mex code where the output variable has the same name as the input variable, but it changes size as a result of the operations of the mex code. For instance, I have something like:
A=Function(A) where A in the input is a 100 X 1 vector (much much larger in my simulation) and the output A is a 50 X 1 vector. I want to understand how memory is managed in this situation. After the operation is finished, does A now occupy 50 X 1 worth of space and the rest is free to allocate to other variables?
Thanks!
Siddharth

That is correct, the data buffer for the original A is destroyed by MATLAB and a new buffer is created (the same mxArray structure address is reused presumably by copying the new one onto the original after deallocating the original array's data buffer). This is assuming you are not writing to prhs[i] in you MEX file!
You can see this with format debug. You will observed that the output mxArray has the same address, but it's data buffer has a different address, so it has clearly reallocated the output array. This suggests that the original buffer is deallocated or queued to be deallocated.
Starting with the output for a change, of the file testMEX.mexw64 that takes the first half of the input array's first row and copies it into a new array:
>> format debug
>> A = rand(1,8)
A =
Structure address = efbb890
m = 1
n = 8
pr = 77bb6c40
pi = 0
0.2581 0.4087 0.5949 0.2622 0.6028 0.7112 0.2217 0.1174
>> A = testMEX(A)
A =
Structure address = efbb890
m = 1
n = 4
pr = 77c80380
pi = 0
0.2581 0.4087 0.5949 0.2622
Note that pr is different, meaning that MATLAB has created a new data buffer. However, the mxArray "Structure address" is the same. So, at the minimum, the old data buffer will be deallocated. Whether or not the original mxArray structure is simply mutated or a new mxArray is created is another question (see below).
Edit: The following is some evidence to suggest that an entirely new mxArray is created and it is copied onto the old mxArray
Add the following two lines to the MEX function:
mexPrintf("prhs[0] = %X, mxGetPr = %X, value = %lf\n",
prhs[0], mxGetPr(prhs[0]), *mxGetPr(prhs[0]));
mexPrintf("plhs[0] = %X, mxGetPr = %X, value = %lf\n",
plhs[0], mxGetPr(plhs[0]), *mxGetPr(plhs[0]));
The result is:
prhs[0] = EFBB890, mxGetPr = 6546D840, value = 0.258065
plhs[0] = EFA2DA0, mxGetPr = 77B65660, value = 0.258065
Clearly there is a temporary mxArray at EFA2DA0 containing the output (plhs[0]), and this mxArray header/structure is entirely copied onto the old mxArray structure (the one as A in the base MATLAB workspace). Before this copy happens, MATLAB surely deallocates the data buffer at 6546D840.
testMEX.cpp
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mxAssert(nrhs == 1 && mxGetM(prhs[0]) == 1, "Input must be a row vector.");
double *A = mxGetPr(prhs[0]);
size_t cols = mxGetN(prhs[0]);
size_t newCols = cols / 2;
plhs[0] = mxCreateDoubleMatrix(1, newCols, mxREAL);
for (int i = 0; i < newCols; ++i)
mxGetPr(plhs[0])[i] = A[i];
mexPrintf("prhs[0] = %X, mxGetPr = %X, value = %lf\n",
prhs[0], mxGetPr(prhs[0]), *mxGetPr(prhs[0]));
mexPrintf("plhs[0] = %X, mxGetPr = %X, value = %lf\n",
plhs[0], mxGetPr(plhs[0]), *mxGetPr(plhs[0]));
}

Related

Matlab: fastest method of reading parts/sequences of a large binary file

I want to read parts from a large (ca. 11 GB) binary file. The currently working solution is to load the entire file ( raw_data ) with fread(), then crop out pieces of interest ( data ).
Question: Is there a faster method of reading small (1-2% of total file, partially sequential reads) parts of a file, given something like a binary mask (i.e. a logical index of specific bytes of interst) in Matlab? Specifics below.
Notes for my specific case:
data of interest (26+e6 bytes, or ca. 24 MB) is roughly 2% of raw_data (1.2e+10 bytes or ca. 11 GB)
each 600.000 bytes contain ca 6.500 byte reads, which can be broken down to roughly 1.200 read-skip cycles (such as 'read 10 bytes, skip 5000 bytes').
the read instructions of the total file can be broken down in ca 20.000 similar but (not exactly identical) read-skip cycles (i.e. ca. 20.000x1.200 read-skip cycles)
The file is read from a GPFS (parallel file system)
Excessive RAM, newest Matlab ver and all toolboxes are available for the task
My initial idea of fread-fseek cycle proved to be extrodinarily much slower (see psuedocode below) than reading the whole file. Profiling revealed fread() is slowest (being called over a million times probably obvious to the experts here).
Alternatives I considered: memmapfile() [ ref ] has no feasible read multiple small parts as far as I could find. The MappedTensor library might be the next thing I'd look into. Related but didn't help, just to link to article: 1, 2.
%open file
fi=fopen('data.bin');
%example read-skip data
f_reads = [20 10 6 20 40]; %read this number of bytes
f_skips = [900 6000 40 300 600]; %skip these bytes after each read instruction
data = []; %save the result here
fseek(fi,90000,'bof'); %skip initial bytes until first read
%read the file
for ind=1:nbr_read_skip_cylces-1
tmp_data = fread(fi,f_reads(ind));
data = [data; tmp_data]; %add newly read bytes to data variable
fseek(fi,f_skips(ind),'cof'); %skip to next read position
end
FYI: To get an overview and for transparency, I've compiled some plots (below) of the first ca 6.500 read locations (of my actual data) that, after collapsing into fread-fseek pairs can, can be summarized in 1.200 fread-fseek pairs.
I would do two things to speed up your code:
preallocate the data array.
write a C MEX-file to call fread and fseek.
This is a quick test I did to compare using fread and fseek from MATLAB or C:
%% Create large binary file
data = 1:10000000; % 80 MB
fi = fopen('data.bin', 'wb');
fwrite(fi, data, 'double');
fclose(fi);
n_read = 1;
n_skip = 99;
%% Read using MATLAB
tic
fi = fopen('data.bin', 'rb');
fseek(fi, 0, 'eof');
sz = ftell(fi);
sz = floor(sz / (n_read + n_skip));
data = zeros(1, sz);
fseek(fi, 0, 'bof');
for ind = 1:sz
data(ind) = fread(fi, n_read, 'int8');
fseek(fi, n_skip, 'cof');
end
toc
%% Read using C MEX-file
mex fread_test_mex.c
tic
data = fread_test_mex('data.bin', n_read, n_skip);
toc
And this is fread_test_mex.c:
#include <stdio.h>
#include <mex.h>
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
// No testing of inputs...
// inputs = 'data.bin', 1, 99
char* fname = mxArrayToString(prhs[0]);
int n_read = mxGetScalar(prhs[1]);
int n_skip = mxGetScalar(prhs[2]);
FILE* fi = fopen(fname, "rb");
fseek(fi, 0L, SEEK_END);
int sz = ftell(fi);
sz /= n_read + n_skip;
plhs[0] = mxCreateNumericMatrix(1, sz, mxDOUBLE_CLASS, mxREAL);
double* data = mxGetPr(plhs[0]);
fseek(fi, 0L, SEEK_SET);
char buffer[1];
for(int ind = 1; ind < sz; ++ind) {
fread(buffer, 1, n_read, fi);
data[ind] = buffer[0];
fseek(fi, n_skip, SEEK_CUR);
}
fclose(fi);
}
I see this:
Elapsed time is 6.785304 seconds.
Building with 'Xcode with Clang'.
MEX completed successfully.
Elapsed time is 1.376540 seconds.
That is, reading the data is 5x as fast with a C MEX-file. And that time includes loading the MEX-file into memory. A second run is a bit faster (1.14 s) because the MEX-file is already loaded.
In the MATLAB code, if I initialize data = []; and then extend the matrix every time I read like OP does:
tmp = fread(fi, n_read, 'int8');
data = [data, tmp];
then the execution time for that loop was 159 s, with 92.0% of the time spent in the data = [data, tmp] line. Preallocating really is important!

Efficiently appending data to a variable using Matlab's C API

I have a C program which repeatedly executes an algorithm and outputs its intermediate result after each iteration. The data will be processed using Matlab R2019a.
I'm using the C Matrix API to create .MAT files, and I can write a matrix to the .MAT file:
MATFile *m = matOpen("matlab.MAT", "w");
mxArray *a = mxCreateDoubleMatrix(1, 1, mxREAL);
*mxGetPr(a) = nxm;
matPutVariable(m, "nxm", a);
mxDestroyArray(a);
matClose(m);
However, the documentation for [maxPutVariable](https://www.mathworks.com/help/matlab/apiref/matputvariable.html) states that if I use the same variable name twice, the second will overwrite the first.
I do not want to store in memory all of my intermediate values. Perhaps I could read in the matrix, extend it to include a new value, and write it out again.
Is there a decent way to do this using the C API, or should I just write Matlab code to parse a different output format?
You can save multiple named variables in a single MAT-file. And when you add a new variable to a MAT-file, it's appended to the end of the file, so that's not bad on I/O even if your existing file is relatively large. So what I'd do is have each loop iteration of your C program store its output into a new named nxm<NNN> variable in the MAT-file, and then have your Matlab program read the file and concatenate them all together.
In your C code:
MATFILE *m = matOpen("matlab.mat", "w");
[...]
char var_name[1024];
for (i = 0; i < n_iterations; i++) {
[... do work to produce nxm ...]
mxArray *a = mxCreateDoubleMatrix(1, 1, mxREAL);
*mxGetPr(a) = nxm;
sprintf(var_name, "nxm%d", i);
matPutVariable(m, var_name, a);
mxDestroyArray(a);
}
matClose(m);
Then on the Matlab side after that's all done:
s = load('matlab.mat');
c = struct2cell(s);
all_data = cat(1, c{:});
You could also write your code so that each pass reads in the previous pass's output, extends the array, and writes it back to the same named variable. But that would be a lot more coding in C, and (I think) would perform less well if your intermediate results were large, instead of these single numbers.
For that matter, since it looks like you're working with a small data set - nxm is a scalar double - you could just fprintf("%d\n", nxm) your nxm values to a text file with a single number on each line, and then read it in to Matlab using fscanf or str2double.

Efficient import of semi structured text

I have multiple text files saved from a Tekscan pressure mapping system. I'm am trying to find the most efficient method for importing the multiple comma delimited matrices into one 3-d matrix of type uint8. I have developed a solution, which makes repeated calls to the MATLAB function dlmread. Unfortunately, it takes roughly 1.5 min to import the data. I have included the code below.
This code makes calls to two other functions I wrote, metaextract and framecount which I have not included as they aren't truly relevant to answering the question at hand.
Here are two links to samples of the files I am using.
The first is a shorter file with 90 samples
The second is a longer file with 3458 samples
Any help would be appreciated
function pressureData = tekscanimport
% Import TekScan data from .asf file to 3d matrix of type double.
[id,path] = uigetfile('*.asf'); %User input for .asf file
if path == 0 %uigetfile returns zero on cancel
error('You must select a file to continue')
end
path = strcat(path,id); %Concatenate path and id to full path
% function calls
pressureData.metaData = metaextract(path);
nLines = linecount(path); %Find number of lines in file
nFrames = framecount(path,nLines);%Find number of frames
rowStart = 25; %Default starting row to read from tekscan .asf file
rowEnd = rowStart + 41; %Frames are 42 rows long
colStart = 0;%Default starting col to read from tekscan .asf file
colEnd = 47;%Frames are 48 rows long
pressureData.frames = zeros([42,48,nFrames],'uint8');%Preallocate for speed
f = waitbar(0,'1','Name','loading Data...',...
'CreateCancelBtn','setappdata(gcbf,''canceling'',1)');
setappdata(f,'canceling',0);
for i = 1:nFrames %Loop through file skipping frame metadata
if getappdata(f,'canceling')
break
end
waitbar(i/nFrames,f,sprintf('Loaded %.2f%%', i/nFrames*100));
%Make repeated calls to dlmread
pressureData.frames(:,:,i) = dlmread(path,',',[rowStart,colStart,rowEnd,colEnd]);
rowStart = rowStart + 44;
rowEnd = rowStart + 41;
end
delete(f)
end
I gave it a try. This code opens your big file in 3.6 seconds on my PC. The trick is to use sscanf instead of the str2double and str2number functions.
clear all;tic
fid = fopen('tekscanlarge.txt','rt');
%read the header, stop at frame
header='';
l = fgetl(fid);
while length(l)>5&&~strcmp(l(1:5),'Frame')
header=[header,l,sprintf('\n')];
l = fgetl(fid);
if length(l)<5,l(end+1:5)=' ';end
end
%all data at once
dat = fread(fid,inf,'*char');
fclose(fid);
%allocate space
res = zeros([48,42,3458],'uint8');
%get all line endings
LE = [0,regexp(dat','\n')];
i=1;
for ct = 2:length(LE)-1 %go line by line
L = dat(LE(ct-1)+1:LE(ct)-1);
if isempty(L),continue;end
if all(L(1:5)==['Frame']')
fr = sscanf(L(7:end),'%u');
i=1;
continue;
end
% sscan can only handle row-char with space seperation.
res(:,i,fr) = uint8(sscanf(strrep(L',',',' '),'%u'));
i=i+1;
end
toc
Does anyone knows of a faster way to convert than sscanf? Because it spends the majority of time on this function (2.17 seconds). For a dataset of 13.1MB I find it very slow compared to the speed of the memory.
Found a way to do it in 0.2 seconds that might be usefull for others as well.
This mex-file scans through a list of char values for numbers and reports them back. Save it as mexscan.c and run mex mexscan.c.
#include "mex.h"
/* The computational routine */
void calc(unsigned char *in, unsigned char *out, long Sout, long Sin)
{
long ct = 0;
int newnumber=0;
for (int i=0;i<Sin;i+=2){
if (in[i]>=48 && in[i]<=57) { //it is a number
out[ct]=out[ct]*10+in[i]-48;
newnumber=1;
} else { //it is not a number
if (newnumber==1){
ct++;
if (ct>Sout){return;}
}
newnumber=0;
}
}
}
/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
unsigned char *in; /* input vector */
long Sout; /* input size of output vector */
long Sin; /* size of input vector */
unsigned char *out; /* output vector*/
/* check for proper number of arguments */
if(nrhs!=2) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nrhs","two input required.");
}
if(nlhs!=1) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nlhs","One output required.");
}
/* make sure the first input argument is type char */
if(!mxIsClass(prhs[0], "char")) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notDouble","Input matrix must be type char.");
}
/* make sure the second input argument is type uint32 */
if(!mxIsClass(prhs[0], "char")) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notDouble","Input matrix must be type char.");
}
/* get dimensions of the input matrix */
Sin = mxGetM(prhs[0])*2;
/* create a pointer to the real data in the input matrix */
in = (unsigned char *) mxGetPr(prhs[0]);
Sout = mxGetScalar(prhs[1]);
/* create the output matrix */
plhs[0] = mxCreateNumericMatrix(1,Sout,mxUINT8_CLASS,0);
/* get a pointer to the real data in the output matrix */
out = (unsigned char *) mxGetPr(plhs[0]);
/* call the computational routine */
calc(in,out,Sout,Sin);
}
Now this script runs in 0.2 seconds and returns the same result as the previous script.
clear all;tic
fid = fopen('tekscanlarge.txt','rt');
%read the header, stop at frame
header='';
l = fgetl(fid);
while length(l)>5&&~strcmp(l(1:5),'Frame')
header=[header,l,sprintf('\n')];
l = fgetl(fid);
if length(l)<5,l(end+1:5)=' ';end
end
%all data at once
dat = fread(fid,inf,'*char');
fclose(fid);
S=[48,42,3458];
d = mexscan(dat,uint32(prod(S)+3458));
d(1:prod(S(1:2))+1:end)=[];%remove frame numbers
d = reshape(d,S);
toc

How to get sizes of a matrix inside a struct in mex

I have a program in mex that takes several inputs and checks if they are proper size or not.
One of the inputs is structure with several scalars and matrices inside. My problem is that, one of the fields can be either a 3x1 or 3xN matrix. Whenever it is a 3xN, I get a weird/wrong result.
So lets see 3 examples:
3rd input is a matrix. If I do:
mrows = mxGetM(prhs[2]);
ncols = mxGetN(prhs[2]);
mexPrintf("%d x %d \n", (int)mrows,(int)ncols);
Prints:
>> 1 x 360
Nice.
Then , the structure.
for(int ifield=0; ifield<nfields; ifield++) {
tmp=mxGetField(prhs[1],0,fieldnames[ifield]); //fieldnames has the names they shoudl have
// check if that fieldname exists in the struct
if(tmp==NULL){
mexPrintf("%s number: %d %s \n", "FIELD",ifield+1, fieldnames[ifield]);
mexErrMsgIdAndTxt( "CBCT:MEX:Atb:InvalidInput",
"Above field is missing. Check spelling. ");
}
switch(ifield){ //for each field checnk if it is how it shoudl be
// some other things that work
case 8:
mrows = mxGetM(tmp);
ncols = mxGetN(tmp);
mexPrintf("%d x %d \n", (int)mrows,(int)ncols);
//check if they are what the should be
break;
case 9:
mrows = mxGetM(tmp);
ncols = mxGetN(tmp);
mexPrintf("%d x %d \n", (int)mrows,(int)ncols);
//check if they are what the should be
break;
}
}
So the 8th field in the structure is str.field8=[0;0;0]; and the 9th field is str.field9=[zeros(1,360);zeros(1,360)]. But this code prints:
>> 3 x 1
>> -96713592 x 2
What is happening here? Should I use other functions to get the size of a matrix inside a struct? am I getting the data wrongly in my tmp variable?
I am confused because if prhs[2] is a matrix, it prints the right size, so mxGetM() and mxGetN() seem to do what I want.
The problem has nothing to do with mex or how matrices are obtained from the struct, but with casting the mcols and nrows variables to int.
If when casted, they are casted to long int instead of int, they right values will show.
My machine is a win7 64bit
A 16-bit int only seems to be an issue on systems with the "LP32" data model, which for example is used on the Win-16 API. Refs one and two.
Although, 16-bit is all the C++ standard guarantees, whereas a long int has this guarantee.

Finding the row indices of a logical vector without using find()

My program handles huge amount of data and the function find is the one to blame for taking so much time to execute. At some point I get a logical vector and I want to extract row indices of the 1 elements in the vector. How can I do that without using the find function?
Here's a demo:
temp = rand(10000000, 1);
temp1 = temp > 0.5;
temp2 = find(temp1);
But it is too slow in case of having much more data. Any suggestion?
Thank you
Find seems to be a very optimized function. What I did was to create a mex version very restricted to this particular problem. Running time was cut by half. :)
Here is the code:
#include <math.h>
#include <matrix.h>
#include <mex.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mxLogical *in;
double *out;
int i, nInput, nTrues;
// Get the number of elements of the input.
nInput = mxGetNumberOfElements(prhs[0]);
// Get a pointer to the logical input array.
in = mxGetLogicals(prhs[0]);
// Allocate memory for the output. As we don't know the number of
// matches, we allocate an array the same size of the input. We will
// probably reallocate it later.
out = mxMalloc(sizeof(double) * nInput);
// Count the number of 'trues' and store its positions.
for (nTrues = 0, i = 0; i < nInput; )
if (in[i++])
out[nTrues++] = i;
// Reallocate the array, if necessary.
if (nTrues < nInput)
out = mxRealloc(out, sizeof(double) * nTrues);
// Assign the indexes to the output array.
plhs[0] = mxCreateDoubleMatrix(0, 0, mxREAL);
mxSetPr(plhs[0], out);
mxSetM(plhs[0], nTrues);
mxSetN(plhs[0], 1);
}
Just save it to a file called, for example, find2.c and compile with mex find2.c.
Assuming:
temp = rand(10000000, 1);
temp1 = temp > 0.5;
Running times:
tic
temp2 = find(temp1);
toc
Elapsed time is 0.082875 seconds.
tic
temp2 = find2(temp1);
toc
Elapsed time is 0.044330 seconds.
IMPORTANT NOTE: this function has no error handling. It's assumed the input is always a logical array and the output is a double array. Caution is required.
You could try to split your calculations in small pieces. This will not reduce the amount of calculations you have to do, but it might still be faster since the data fits into fast cache memory, instead of in the slow main memory (or in the worst case you might even be swapping to disk). Something like this:
temp = rand(10000000, 1);
n = 100000; % chunk size
for i = 1:floor(length(temp) / n)
chunk = temp(((i-1) * n + 1):(i*n))
temp1 = chunk > 0.5;
temp2 = find(temp1);
do_stuff(temp2)
end
You can create an array of regular index and then apply logical indexing. I didn't check if it was faster than find tough.
Example:
Index=1:size(temp);
Found = Index(temp1);