Is it possible to create a sub-array without using additional memory on Matlab? - matlab

Well, I am trying to implement an algorithm on Matlab. It requires the usage of a slice of an high dimensional array inside a for loop. When I try to use the logical indexing, Matlab creates an additional copy of that slice and since my array is huge, it takes a lot of time.
slice = x(startInd:endInd);
What I am trying to do is to use that slice without copying it. I just need the slice data to input a linear operator. I won't update that part during the iterations.
To do so, I tried to write a Mex file whose output is a double
type array and whose size is equal to the intended slice data size.
plhs[0] = mxCreateUninitNumericMatrix(0, 0, mxDOUBLE_CLASS,mxREAL); % initialize but do not allocate any additional memory
ptr1 = mxGetPr(prhs[0]); % get the pointer of the input data
Then set the pointer of the output to the starting index of the input data.
mxSetPr(plhs[0], ptr1+startInd);
mxSetM(plhs[0], 1);
mxSetN(plhs[0], (endInd-startInd)); % Update the dimensions as intended
When I set the starting index to be zero, it just works fine. When I try to give
other values than 0, Mex file compiles with no error but Matlab crashes when the Mex function is called.
slice = mex_slicer(x, startInd, endInd);
What might be the problem here?

The way you assign the data pointer to the array, it means that MATLAB will attempt to free that memory when the array is deleted or has something else assigned to it. Attempting to call free using a pointer that was not obtained by malloc will cause a crash.
Unfortunately, MATLAB does not support "views", arrays that point at parts of a different array. So there is no way to do what you want to do.
An alternative solution would be to:
store your data differently, so that it doesn't take as much time to index (e.g. in smaller arrays)?
perform all your computations in C or C++ inside a MEX-file, where you can very simply point at sub-ranges of a larger data block.

See this FEX submission on creating MATLAB variables that "point" to the interior data of an existing variable. You can either do it as a shared data copy which is designed to be safe (but incurs some additional overhead), or as an unprotected direct reference (faster but risks crashing MATLAB if you don't clear it properly).
https://www.mathworks.com/matlabcentral/fileexchange/65842-sharedchild-creates-a-shared-data-copy-of-a-contiguous-subsection-of-an-existing-variable

Related

Simulink: Use Enumeration As Index

I feel like this is something that'd be absurdly easy in C# but is impossible in Simulink. I am trying to use an enumerated value as an array index. The trick is: I have an array that is sized for the number of elements in the enumeration, but their values are non-contiguous. So I want the defined enumeration and Simulink code to read the value at A(4). Obviously, it will instead read A(999). Any way to get the behavior I'm looking for?
classdef Example < Simulink.IntEnumType
enumeration
value1 (1)
value2 (2)
value13 (13)
value999 (999)
end
end
// Below in Simulink; reputation is not good enough to post images.
A = Data Store Memory
A.InitialValue = uint16(zeros(1, length(enumeration('Example'))))
// Do a Data Store Read with Indexing enabled; Index Option = Index vector (dialog)
A(Example.value999)
After a weekend of experimentation, I came up with a working solution: using a Simulink Function to call a MATLAB function that searches for the correct index using the "find" command. In my particular instance, I was assigning the data to Data Store Memory, so I was able to just pass the enumeration index and a new value to these blocks, but you could just as easily have a single input block that spits out the requested index. (My reputation is still too low to post pictures, so hopefully my textual descriptions will suffice.)
Data Store Memory 'A': Data type = uint16, Dimensions = length(enumeration('RegisterList'))
Simulink Function: SetValueA(ExampleEnum, NewValue)
--> MATLAB Function: SetA_Val(ExampleEnum, NewValue)
--> function SetModbusRegister(RegisterListEnum, NewValue)
global A;
if(isa(ExampleEnum, 'Example'))
A(find(enumeration('Example') == ExampleEnum, 1)) = NewValue;
end
From there, you use the Function Caller blocks in Simulink with the "Function prototype" filled in with "SetValueA(ExampleEnum,NewValue)" anywhere you wish to set this data. The logic would get more complicated if you wished to use vectors and write multiple values at once, but this is at least a starting point. It should just be a matter of modifying the Simulink and MATLAB functions to allow vector inputs and looping through those inputs in the MATLAB function.
EDIT 1
Slight update: If your MATLAB function is set up such that you cannot use variable-length vectors in it, just replace the "find" function with the "ismember" function. Using a scalar in ismember always returns a scalar, and the MATLAB compiler won't complain about it.

Manipulating large sets of data in Matlab, asking for advice on a few things, cells and numeric array operations, with performance in mind

This is a cross-post from here:
Link to post in the Mathworks community
Currently I'm working with large data sets, I've saved those data set as matlab files with the two biggest files being 9.5GB and 5.9GB.
These files contain a cell array each of 1x8 (this is done for addressibility and to prevent mixing up data from each of the 8 cells and I specifically wanted to avoid eval).
Each cell then contains a 3D double matrix, for one it's 1001x2002x201 and the other it is 2003x1001x201 (when process it I chop of 1 row at the end to get it to 2002).
Now I'm already running my script and processing it on a server (64 cores and plenty of RAM, matlab crashed on my laptop, as I need more than 12GB ram on windows). Nonetheless it still takes several hours to finish running my script and I still need to do some extra operations on the data which is why I'm asking advice.
For some of the large cell arrays, I need to find the maximum value of the entire set of all 8 cells, normally I would run a for loop to get the maximum of each cel and store each value in a temporay numeric array and then use the function max again. This will work for sure I'm just wondering if there's a better more efficient way.
After I find the maximum I need to do a manipulation over all this data as well, normally I would do something like this for an array:
B=A./maxvaluefound;
A(B > a) = A(B > a)*constant;
Now I could put this in a for loop, adress each cell and run this, however I'm not sure how efficient that would be though. Do you think there's a better way then a for loop that's not extremely complicated/difficult to implement?
There's one more thing I need to do which is really important, each cell as I said before is a slice (consider it time), while inside each slide is the value for a 3D matrix/plot. Now I need to integrate the data so that I get more slices. The reason I need to do this that I need to create slices/frames/plots to create a movie/gif. I'm planning on plotting the 3d data using scatter3 where this data is represented by color. I plan on using alpha values to make it see through so that one can actually see the intensity in this 3d plot. However I understand how to use griddata but apparently it's quite slow. Some of the other methods where hard to understand. Thus what would be the best way to interpolate these (time) slices in an efficient way over the different cells in the cell array? Please explain it if you can, preferably with an example.
I've added a pic for the Linux server info I'm running it on below, note I can not update the matlab version unfortunately, it's R2016a:
I've also attached part of my code to give a better idea of what I'm doing:
if (or(L03==2,L04==2)) % check if this section needs to be executed based on parameters set at top of file
load('../loadfilewithpathnameonmypc.mat')
E_field_650nm_intAll=cell(1,8); %create empty cell array
parfor ee=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
E_field_650nm_intAll{ee}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cell 1-8
for qq=1:2:xres
tt=(qq+1)/2; %consecutive number instead of spacing 2
T1=griddata(Xsall{ee},Ysall{ee},EfieldsAll{ee}(:,:,qq)',XIT,ZIT,'natural'); %change data on non-uniform grid to uniform gridded data
E_field_650nm_intAll{ee}(:,:,tt)=T1; %fill up each cell with uniform data
end
end
clear T1
clear qq tt
clear ee
save('../savelargefile.mat', 'E_field_650nm_intAll', '-v7.3')
end
if (L05==2) % check if this section needs to be executed based on parameters set at top of file
if ~exist('E_field_650nm_intAll','var') % if variable not in workspace load it
load('../loadanotherfilewithpathnameonmypc.mat');
end
parfor tt=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
CFxLight{tt}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cells 1 to 8
for qq=1:xres
CFs=Cafluo3D{tt}(1:lxq2,:,qq)'; %get matrix slice and tranpose matrix for point-wise multiplication
CFxLight{tt}(:,:,qq)=CFs.*E_field_650nm_intAll{tt}(:,:,qq); %point-wise multiple the two large matrices for each cell and put in new cell array
end
end
clear CFs
clear qq tt
save('../saveanotherlargefile.mat', 'CFxLight', '-v7.3')
end

Matlab Horzcat - Out of memory

Any trick to avoid an out of memory error in matlab?
I am assuming that the reason it shows up is because matlab is very inefficient in using horzcat and actually needs to temporarily duplicate matrices.
I have a matrix A with size 108977555 x 25. I want to merge this with three vectors d, m and y with size 108977555 x 1 each.
My machine has 32GB ram, and the above matrice + vectors occupy 18GB.
Now I want to run the following command:
A = [A(:,1:3), d, m, y, A(:,5:end)];
But that yields the error:
Error using horzcat
Out of memory. Type HELP MEMORY for your options.
Any trick to do this merge?
Working with Large Data Sets. If you are working with large data sets, you need to be careful when increasing the size of an array to avoid getting errors caused by insufficient memory. If you expand the array beyond the available contiguous memory of its original location, MATLAB must make a copy of the array and set this copy to the new value. During this operation, there are two copies of the original array in memory.
Restart matlab, I often find it doesn't fully clean up its memory or it get's fragmented, leading to lower maximal array sizes.
Change your datatype (if you can). E.g. if you're only dealing with numbers 0 - 255, use uint8, the memory size will reduce by a factor 8 compared to an array of doubles
Start of with A already large enough (i.e. 108977555x27 instead of 108977555x25 and insert in place:
A(:, 4) = d;
clear d
A(:, 5) = m;
clear m
A(:, 6) = y;
Merge the data in one datatype to reduce total memory requirement, eg a date easily fits into one uint32.
Leave the data separated, think about why you want the data in one matrix in the first place and if that is really necessary.
Use C-code to do the data allocation yourself (only if you're really desperate)
Further reading: https://nl.mathworks.com/help/matlab/matlab_prog/memory-allocation.html
Even if you could make it using Gunther's suggestions, it will just occupy memory. Right now it takes more than half of available memory. So, what are you planning to do then? Even simple B = A+1 doesn't fit. The only thing you can do is stuff like sum, or operations on part of array.
So, you should consider going to tall arrays and other related big data concepts, which are exactly meant to work with such large datasets.
https://www.mathworks.com/help/matlab/tall-arrays.html
You can first try the efficient memory management strategies as mentioned on the official mathworks site : https://in.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
Use Single (4 bytes) or some other smaller data type instead of Double (8 bytes) if your code can work with that.
If possible use block processing (like rows or columns) i.e. store blocks as separate mat files and load and access only those parts of the matrix which are required.
Use matfile command for loading large variables in parts. Perhaps something like this :
save('A.mat','A','-v7.3')
oldMat = matfile('A.mat');
clear A
newMat = matfile('Anew.mat','Writeable',true) %Empty matfile
for i=1:27
if (i<4), newMat.A(:,i) = oldMat.A(:,i); end
if (i==4), newMat.A(:,i) = d; end
if (i==5), newMat.A(:,i) = m; end
if (i==6), newMat.A(:,i) = y; end
if (i>6), newMat.A(:,i) = oldMat.A(:,i-2); end
end

error in array of struct matlab

I want to train data on various labels using svm and want svm model as array of struct. I am doing like this but getting the error:
Subscripted assignment between dissimilar structures.
Please help me out
model = repmat(struct(),size);
for i=1:size
model(i) = svmtrain(train_data,labels(:,i),'Options', options);
end
A structure array in MATLAB can only contain structures with identical fields. By first creating an array of empty structures1 and then trying to fill it with SVMStruct structures, you try to create a mixed array with some empty structures, and some SVMStruct structures. This is not possible, thus MATLAB complains about "dissimilar structures" (not-equal structures: empty vs. SVMStruct).
To allocate an array of structs, you will have to specify all fields during initialization and fill them all with initial values - which is rather inconvenient in this case. A very simple alternative is to drop this initialization, and run your loop the other way around2,3:
for ii=sizeOfLabels:-1:1
model(ii) = svmtrain(train_data,labels(:,ii),'Options', options);
end
That way, for ii=sizeOfLabels, e.g. ii=100, MATLAB will call model(100)=..., while model doesn't exist yet. It will then allocate all space needed for 100 SVMStructs and fill the first 99 instances with empty values. That way you pre-allocate the memory, without having to worry about initializing the values.
1Note: if e.g. size=5, calling repmat(struct(),size) will create a 5-by-5 matrix of empty structs. To create a 1-by-5 array of structs, call repmat(struct(),1,size).
2Don't use size as a variable name, as this is a function. If you do that, you can't use the size function anymore.
3i and j denote the imaginary unit in MATLAB. Using them as a variable slows the code down and is error-prone. Use e.g. k or ii for loops instead.

Matlab preallocating arrays when final array-size is unknown [duplicate]

I am trying to speed up a script that I have written in Matlab that dynamically allocates memory to a matrix (basicallly reads a line of data from a file and writes it into a matrix, then reads another line and allocates more memory for a larger matrix to store the next line). The reason I did this instead of preallocating memory using zeroes() or something was that I don't know the exact size the matrix needs to be to hold all of the data. I also don't know the maximum size of the matrix, so I can't just preallocate a max size and then get rid of memory that I didn't use. This was fine for small amounts of data, but now I need to scale my script up to read many millions of data points and this implementation of dynamic allocation is just much too slow.
So here is my attempt to speed up the script: I tried to allocate memory in large blocks using the zeroes function, then once the block is filled I allocate another large block. Here is some sample code:
data = [];
count = 0;
for ii = 1:num_filelines
if mod(count, 1000) == 0
data = [data; zeroes(1000)]; %after 1000 lines are read, allocate another 1000 line
end
data(ii, :) = line_read(file); %line_read reads a line of data from 'file'
end
Unfortunately this doesn't work, when I run it I get an error saying "Error using vertcat
Dimensions of matrices being concatenated are not consistent."
So here is my question: Is this method of allocating memory in large blocks actually any faster than incremental dynamic allocation, and also why does the above code not run? Thanks for the help.
What I recommend doing, if you know the number of lines and can just guess a large enough number of acceptable columns, use a sparse matrix.
% create a sparse matrix
mat = sparse(numRows,numCols)
A sparse matrix will not store all of the zero elements, it only stores pointers to indices that are non-zero. This can help save a lot of space. They are used and accessed the same as any other matrix. That is only if you really need it in a matrix format from the beginning.
If not, you can just do everything as a cell. Preallocate a cell array with as many elements as lines in your file.
data = cell(1,numLines);
% get matrix from line
for i = 1:numLines
% get matrix from line
data{i} = lineData;
end
data = cell2mat(data);
This method will put everything into a cell array, which can store "dynamically" and then be converted to a regular matrix.
Addition
If you are doing the sparse matrix method, to trim up your matrix once you are done, because your matrix will likely be larger than necessary, you can trim this down easily, and then cast it to a regular matrix.
[val,~] = max(sum(mat ~= 0,2));
mat(:,val:size(mat,2)) = [];
mat = full(mat); % use this only if you really need the full matrix
This will remove any unnecessary columns and then cast it to a full matrix that includes the 0 elements. I would not recommend casting it to a full matrix, as this requires a ton more space, but if you truly need it, use it.
UPDATE
To get the number of lines in a file easily, use MATLAB's perl interpretter
create a file called countlines.pl and paste in the two lines below
while (<>) {};
print $.,"\n";
Then you can run this script on your file as follows
numLines = str2double(perl('countlines.pl','data.csv'));
Problem solved.
From MATLAB forum thread here
remember it is always best to preallocate everything before hand, because technically when doing shai's method you are reallocating large amounts a lot, especially if it is a large file.
To solve your error, simply use this syntax when allocating
data = [data; zeroes(1000, size(data,2))];
You might want to read the first line outside the loop so you'll know the number of columns and make the first allocation for data.
If you want to stick to your code as written I would substitute your initialization of data, data = [] to
data = zeros(1,1000);
Keep in mind though the warning from #MZimmerman6: zeros(1000) generates a 1000 x 1000 array. You may want to change all of your zeros statements to zeros( ... ,Nc), where Nc = length of line in characters.