Adding large data sets within a for loop in matlab - matlab

I'm attempting to make a very basic gating system, however the smallest data I have is single cycle which equals a 18*100 array. I've attempted just plotting it with a hold on/off function and collecting the data after with a h=findobj(gca,'Type','line');. However this takes forever and requires a lot of reshaping. Is there a simpler way to either store the data or add the complete arrays (not sum as that does line by line which is a no-no) together in the for loop?
h=findobj(gca,'Type','line'); %data retrieved from orginal figure
x=get(h,'Xdata');
y=get(h,'Ydata');
X=reshape(x,(18),[]);
Y=reshape(y,(18),[]);
hold on
for i=1:4;
xx=X(:,i);
yy=Y(:,i);
gx=cell2mat(xx);
gy=cell2mat(yy);
plot(gx) % manipulated data from orginal figure,
plot(gy) % plot required to extract all the for loop data
end
hold off
Basically I just want to add the four gx together and divide them, however they have to be added as bulk, not line by line as one cycle of the loop equals a cycle of the system. (Also the 4 is just a number it really is more like 60+, which is why I can't just do it manually).
Many thanks!

Related

Manipulating large sets of data in Matlab, asking for advice on a few things, cells and numeric array operations, with performance in mind

This is a cross-post from here:
Link to post in the Mathworks community
Currently I'm working with large data sets, I've saved those data set as matlab files with the two biggest files being 9.5GB and 5.9GB.
These files contain a cell array each of 1x8 (this is done for addressibility and to prevent mixing up data from each of the 8 cells and I specifically wanted to avoid eval).
Each cell then contains a 3D double matrix, for one it's 1001x2002x201 and the other it is 2003x1001x201 (when process it I chop of 1 row at the end to get it to 2002).
Now I'm already running my script and processing it on a server (64 cores and plenty of RAM, matlab crashed on my laptop, as I need more than 12GB ram on windows). Nonetheless it still takes several hours to finish running my script and I still need to do some extra operations on the data which is why I'm asking advice.
For some of the large cell arrays, I need to find the maximum value of the entire set of all 8 cells, normally I would run a for loop to get the maximum of each cel and store each value in a temporay numeric array and then use the function max again. This will work for sure I'm just wondering if there's a better more efficient way.
After I find the maximum I need to do a manipulation over all this data as well, normally I would do something like this for an array:
B=A./maxvaluefound;
A(B > a) = A(B > a)*constant;
Now I could put this in a for loop, adress each cell and run this, however I'm not sure how efficient that would be though. Do you think there's a better way then a for loop that's not extremely complicated/difficult to implement?
There's one more thing I need to do which is really important, each cell as I said before is a slice (consider it time), while inside each slide is the value for a 3D matrix/plot. Now I need to integrate the data so that I get more slices. The reason I need to do this that I need to create slices/frames/plots to create a movie/gif. I'm planning on plotting the 3d data using scatter3 where this data is represented by color. I plan on using alpha values to make it see through so that one can actually see the intensity in this 3d plot. However I understand how to use griddata but apparently it's quite slow. Some of the other methods where hard to understand. Thus what would be the best way to interpolate these (time) slices in an efficient way over the different cells in the cell array? Please explain it if you can, preferably with an example.
I've added a pic for the Linux server info I'm running it on below, note I can not update the matlab version unfortunately, it's R2016a:
I've also attached part of my code to give a better idea of what I'm doing:
if (or(L03==2,L04==2)) % check if this section needs to be executed based on parameters set at top of file
load('../loadfilewithpathnameonmypc.mat')
E_field_650nm_intAll=cell(1,8); %create empty cell array
parfor ee=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
E_field_650nm_intAll{ee}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cell 1-8
for qq=1:2:xres
tt=(qq+1)/2; %consecutive number instead of spacing 2
T1=griddata(Xsall{ee},Ysall{ee},EfieldsAll{ee}(:,:,qq)',XIT,ZIT,'natural'); %change data on non-uniform grid to uniform gridded data
E_field_650nm_intAll{ee}(:,:,tt)=T1; %fill up each cell with uniform data
end
end
clear T1
clear qq tt
clear ee
save('../savelargefile.mat', 'E_field_650nm_intAll', '-v7.3')
end
if (L05==2) % check if this section needs to be executed based on parameters set at top of file
if ~exist('E_field_650nm_intAll','var') % if variable not in workspace load it
load('../loadanotherfilewithpathnameonmypc.mat');
end
parfor tt=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
CFxLight{tt}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cells 1 to 8
for qq=1:xres
CFs=Cafluo3D{tt}(1:lxq2,:,qq)'; %get matrix slice and tranpose matrix for point-wise multiplication
CFxLight{tt}(:,:,qq)=CFs.*E_field_650nm_intAll{tt}(:,:,qq); %point-wise multiple the two large matrices for each cell and put in new cell array
end
end
clear CFs
clear qq tt
save('../saveanotherlargefile.mat', 'CFxLight', '-v7.3')
end

Retrieve Gradient of Reference Line Generated by probplot

I am generating probability plots for a number of data sets in matlab.
I am plotting them using probplot with a weibull distribution reference line
data = [1,1,1,1,2,2,2,3,4,5,3,3,2,2,1,3,5,7,2,4,2] ;
h = probplot('weibull',data) ;
This function as per the matlab documentation returns a graphic array object. This appears to only contain the original data and not the reference line.
Is there any way of retreiving information about about this reference line without plotting it and indiviually extracting it using the figure tools (very much not an option I'd like to go down as there are potentionally hundreds of plots to go through).
I can see there is wblplot that returns a line array of 3 lines, one of which is the original data and one of the others is likely the reference the line however I will have to try different distributions to fit further down the road and would prefer to keep a generic approach.
You are wrong!
data = [1,1,1,1,2,2,2,3,4,5,3,3,2,2,1,3,5,7,2,4,2] ;
h = probplot('weibull',data) ;
b=h(2);
figure
plot(b.XData,b.YData)
h is a graphic array object, so its an array. The first element contains the original data, but the second h(2) contains the reference line.

Splitting non-continuous sized matrix in vectors

I'm writing an piece of software within Matlab. Here, the user can define a dimension say 3.
This dimension is subsequently the number of iterations of a for loop. Within this loop, I construct a matrix to store the results which are generated during every iteration. So, the data of every iteration is stored in a row of a matrix.
Therefore, the size of the matrix depends on the size of the loop and thus the user input.
Now, I want to separate each row of this matrix (cl_matrix) and create separate vectors for every row automatically. How would one go on about? I am stuck here...
So far I have:
Angle = [1 7 15];
for i = 1:length(Angle)
%% do some calculations here %%
cl_matrix(i,:) = A.data(:,7);
end
I want to automate this based on the length of Angle:
length(Angle)
cl_1 = cl_matrix(1,:);
cl_7 = cl_matrix(2,:);
cl_15= cl_matrix(3,:);
Thanks!
The only way to dynamically generate in the workspace variables variables whos name is built by aggregating string and numeric values (as in your question) is to use the eval function.
Nevertheless, eval is only one character far from "evil", seductive as it is and dangerous as it is as well.
A possible compromise between directly working with the cl_matrix and generating the set of array cl_1, cl_7 and cl_15 could be creating a structure whos fields are dynamically generated.
You can actually generate a struct whos field are cl_1, cl_7 and cl_15 this way:
cl_struct.(['cl_' num2str(Angle(i))])=cl_matrix(i,:)
(you might notice the field name, e. g. cl_1, is generated in the same way you could generate it by using eval).
Using this approach offers a remarkable advantage with respect to the generation of the arrays by using eval: you can access to the field od the struct (that is to their content) even not knowing their names.
In the following you can find a modified version of your script in which this approach has been implemented.
The script generate two structs:
the first one, cl_struct_same_length is used to store the rows of the cl_matrix
thesecond one, cl_struct_different_length is used to store arrays of different length
In the script there are examples on how to access to the fileds (that is the arrays) to perform some calculations (in the example, to evaluate the mean of each of then).
You can access to the struct fields by using the functions:
getfield to get the values stored in it
fieldnames to get the names (dynamically generated) of the field
Updated script
Angle = [1 7 15];
for i = 1:length(Angle)
% do some calculations here %%
% % % cl_matrix(i,:) = A.data(:,7);
% Populate cl_matrix
cl_matrix(i,:) = randi(10,1,10)*Angle(i);
% Create a struct with dinamic filed names
cl_struct_same_length.(['cl_' num2str(Angle(i))])=cl_matrix(i,:)
cl_struct_different_length.(['cl_' num2str(Angle(i))])=randi(10,1,Angle(i))
end
% Use "fieldnames" to get the names of the dinamically generated struct's field
cl_fields=fieldnames(cl_struct_same_length)
% Loop through the struct's fileds to perform some calculation on the
% stored values
for i=1:length(cl_fields)
cl_means(i)=mean(cl_struct_same_length.(cl_fields{i}))
end
% Assign the value stored in a struct's field to a variable
row_2_of_cl_matrix=getfield(cl_struct_different_length,(['cl_' num2str(Angle(2))]))
Hope this helps.

Average of values from multiple matrices in Matlab

I have 50 matrices contained in one folder, all of dimension 181 x 360. How do I cycle through that folder and take an average of each corresponding data points across all 50 matrices?
If the matrices are contained within Matlab variables stored using save('filename','VariableName') then they can be opened using load('filename.mat').
As such, you can use the result of filesInDirectory = dir; to get a list of all your files, using a search pattern if appropriate, like files = dir('*.mat');
Next you can use your load command, and then whos to see which variables were loaded. You should consider storing these for ease clearing after each iteration of your loop.
Once you have your matrix loaded (one at a time), you can take averages as you need, probably summing a value across multiple loop iterations, then dividing by a total counter you've been measuring (using perhaps count = count + size(MatrixVar, dimension);).
If you need all of the matrices loaded at once, then you can modify the above idea, to load using a loop, then average outside of the loop. In this case, you may need to take care - but 50*181*360 isn't too bad I suspect.
A brief introduction to the load command can be found at this link. It talks mainly about opening one matrix, then plotting the values, but there are some comments about dealing with headers, if needed, and different ways in which you can open data, if load is insufficient. It doesn't talk about binary files, though.
Note on binary files, based on comment to OP's question:
If the file can be opened using
FID = fopen('filename.dat');
fread(FID, 'float');
then you can replace the steps referring to load above, and instead use a loop to find filenames using dir, open the matrices using fopen and fread, then average as needed, finally closing the files and clearing the matrices.
In this case, probably your file identifier is the only part you're likely to need to change during the loop (although your total will increase, if that's how you want to average your data)
Reshaping the matrix, or inverting it, might make the code clearer (which is good!), but might not be necessary depending on what you're trying to average - it may be that selecting only a subsection of the matrix is sufficient.
Possible example code?
Assuming that all of the files in the current directory are to be opened, and that no files are elsewhere, you could try something like:
listOfFiles = dir('*.dat');
for f = 1:size(listOfFiles,1)
FID = fopen(listOfFiles(f).name);
Data = fread(FID, 'float');
% Reshape if needed?
Total = Total + sum(Data(start:end,:)); % This might vary, depending on what you want to average etc.
Counter = Counter + (size(Data,1) * size(Data,2)); % This product will be the 181*360 you had in the matrix, in this case
end
Av = Total/Counter;

Extract parts of a big matrix and allocate them in new variables with loop function

I am a total beginner in MATLAB and I hope to find some help here. I have some model prediction results for 80 individuals alltogether in one large matrix. I need to extract the data for each individual from the big matrix, assign them in a new variable/matrix, do some extra calculations and then plot certain information as needed.
To do so, I am trying to write a script with a loop function but in a complicated, or maybe more accurately: in a primitive way!
Simplified Example:
My matrix is called: All_Indi_Data .... its dimension is: 600 rows x 21 columns
%Column 1: grouping variable (e.g., code or ID with values 1,2,3,4,5, etc.);
%Column 2: independent var.;
%Column 3: t;
%Column 4: OBS;
%Column 5: PRED;
i= length (All_Indi_Data);
%% First Indi.
q=1; % indicating the ID of the indi for which I want to extract the data
j=1; % variable added to insure writing start from the first row
for r=1:i
if All_Indi_Data (r,1)==q
Indi_1 (j,1:21) = All_Indi_Data (r,1:21)
j=j+1
end
end
%% Second Indi.
q=q+1
j=1
for r=1:i
if All_Indi_Data (r,1)==q
Indi_2 (j,1:21) = All_Indi_Data (r,1:21)
j=j+1
end
end
.
.
.
1) My first question is: can I allocate these data in new variables (Indi_1, Indi_2, ect.) in a more simple way with or without the loop function?!!! I would appreciate your help a lot.
2) Is there any code or any way to plot these selected parts (according to the grouping variable, e.g. data for Indi_1) from the previously mentioned big matrix without wasting a lot of time and space (wto recopying the core part of the code again and again) for the script, and using the loop function?! in other words, I would like to detect - with loop function & the grouping variable- which values are of interest and then to plot them (e.g. data in colum 3 with data from column 4 for each individual, starting from the first to the last)?!
I hope that I described my problem clearly and hope to hear something from the expert guys :) ...
Thanks a lot in advance ..
Try the following code:
for idx=1:80
pos=find(All_Indi_Data(:,1)==idx);
eval(['Indi_' num2str(idx) '=All_Indi_Data(pos,:);']);
end
What I do is: in each iteration, I search for a value of the ID, indicated in the variable idx. Note that I do not use ´i´ as the name of a variable, because Matlab uses it and ´j´ and the imaginary unit for complex numbers and that could cause problems.
Then, using find I search for the position (or positions) of All_Indi_Data in which I can find the information of that individual. Now I have in the variable ´pos´ the indexes of the rows in which there is information for the individual of interest.
Finally, using eval I extract the data for each individual into a variable. Note that eval combined with a loop makes it easy to create lots of variables. I indicate the rows I want to extract with ´pos´ and, as I want all the columns, I use just ´:´ (you could use ´1:21´ too).
With another similar loop you can plot the information you want. For example:
for idx=1:80
eval(['x=Indi_' num2str(idx) ';']);
% Now I have in X the information for this individual
%Plot the columns of x I want
plot(x(:, 3), x(:,4));
pause; %stay here until a press a key
end