Average across many files on matlab - matlab

I was wondering if I can get help with this. I have many .mat files with an array in each and I want to average each cell individually (average all (1,1)s, all (1,2)s, ... (2,1)s etc.) and store them.
Thanks for the help!

I'm not quite sure how your data is organised, but you can do something like this:
% Assume you know the size of the arrays and that the variables r and c
% hold the numbers of rows and columns respectively.
xTotals = zeros(r, c);
xCount = 0;
% for each file: assume the data is loaded into a variable called x, which is
% r rows by c columns
for ...
xTotals = xTotals + x;
xCount = xCount + 1;
end
xAvg = xTotals / xCount;
And xAvg will contain the average for each array cell. Note that you probably know xCount without having to count each time you go round the loop, but it depends on where you are getting your data. Hopefully you get the idea!

Related

Matlab interp1 gives last row as NaN

I have a problem similar to here. However, it doesn't seem that there is a resolution.
My problem is as such: I need to import some files, for example, 5. There are 20 columns in each file, but the number of lines are varied. Column 1 is time in terms of crank-angle degrees, and the rest are data.
So my code first imports all of the files, finds the file with the most number of rows, then creates a multidimensional array with that many rows. The timing is in engine cycles so, I would then remove lines from the imported file that go beyond a whole engine cycle. This way, I always have data in terms of X whole engine cycles. Then I would just interpolate the data to the pre-allocated array to have a giant multi-dimensional array for the 5 data files.
However, this seems to always result in the last row of every column of every page being filled with NaNs. Please have a look at the code below. I can't see where I'm doing wrong. Oh, and by the way, as I have been screwed over before, this is NOT homework.
maxlines = 0;
maxcycle = 999;
for i = 1:1
filename = sprintf('C:\\Directory\\%s\\file.out',folder{i});
file = filelines(filename); % Import file clean
lines = size(file,1); % Find number of lines of file
if lines > maxlines
maxlines = lines; % If number of lines in this file is the most, save it
end
lastCAD = file(end,1); % Add simstart to shift the start of the cycle to 0 CAD
lastcycle = fix((lastCAD-simstart)./cycle); % Find number of whole engine cycles
if lastcycle < maxcycle
maxcycle = lastcycle; % Find lowest number of whole engine cycles amongst all designs
end
cols = size(file,2); % Find number of columns in files
end
lastcycleCAD = maxcycle.*cycle+simstart; % Define last CAD of whole cycle that can be used for analysis
% Import files
thermo = zeros(maxlines,cols,designs); % Initialize array to proper size
xq = linspace(simstart,lastcycleCAD,maxlines); % Define the CAD degrees
for i = 1:designs
filename = sprintf('C:\\Directory\\%s\\file.out',folder{i});
file = importthermo(filename, 6, inf); % Import the file clean
[~,lastcycleindex] = min(abs(file(:,1)-lastcycleCAD)); % Find index of end of last whole cycle
file = file(1:lastcycleindex,:); % Remove all CAD after that
thermo(:,1,i) = xq;
for j = 2:17
thermo(:,j,i) = interp1(file(:,1),file(:,j),xq);
end
sprintf('file from folder %s imported OK',folder{i})
end
thermo(end,:,:) = []; % Remove NaN row
Thank you very much for your help!
Are you sampling out of the range? if so, you need to tell interp1 that you want extrapolation
interp1(file(:,1),file(:,j),xq,'linear','extrap');

Saving values of variable in MATLAB

Hi for my code I would like to know how to best save my variable column. column is 733x1. Ideally I would like to have
column1(y)=column, but I obtain the error:
Conversion to cell from logical is not possible.
in the inner loop. I find it difficult to access these stored values in overlap.
for i = 1:7
for y = 1:ydim % ydim = 436
%execute code %code produces different 'column' on each iteration
column1{y} = column; %'column' size 733x1 %altogether 436 sets of 'column'
end
overlap{i} = column1; %iterates 7 times.
end
Ideally I want overlap to store 7 variables saved that are (733x436).
Thanks.
I'm assuming column is calculated using a procedure where each column is dependent on the latter. If not, then there are very likely improvements that can be made to this:
column = zeros(733, 1); % Might not need this. Depends on you code.
all_columns = zeros(xdim, ydim); % Pre-allocate memory (always do this)
% Note that the first dimension is usually called x,
% and the second called y in MATLAB
overlap = cell(7, 1);
overlap(:) = {zeros(xdim, ydim)}; % Pre-allocate memory
for ii = 1:numel(overlap) % numel is better than length
for jj = 1:ydim % ii and jj are better than i and j
% several_lines_of_code_to_calculate_column
column = something;
all_columns(:, jj) = column;
end
overlap{ii} = all_columns;
end
You can access the variables in overlap like this: overlap{1}(1,1);. This will get the first element in the first cell. overlap{2} will get the entire matrix in the second cell.
You specified that you wanted 7 variables. Your code implies that you know that cells are better than assigning it to different variables (var1, var2 ...). Good! The solution with different variables is bad bad bad.
Instead of using a cell array, you could instead use a 3D-array. This might make processing later on faster, if you can vectorize stuff for instance.
This will be:
column = zeros(733, 1); % Might not need this. Depends on you code.
overlap = zeros(xdim, ydim, 7) % Pre-allocate memory for 3D-matrix
for ii = 1:7
for jj = 1:ydim
% several_lines_of_code_to_calculate_column
column = something;
all_column(:, jj, ii) = column;
end
end

MATLAB data file when it overs its memory

There is a Matrix of 500000000 X 5.
And the sample of this data is like this :
1 01 06:0 48407
1 01 06:1 48407
.
.
.
865850 31 23:5 1586884493
Each column means [area_number date hour:minute amount_of_data]
I want to load them entirely, after that make another 865850 X 4464 matrix from their 5th column values. In this new matrix, row insists area_number. And each column means amout_of_data according to time priority.
This is what I wrote.
clear all; close all;
fileID=fopen('data2.txt','r');
Data=fscanf(fileID, '%d %d %d:%d %d',[5 100000]);
Data = Data';
Zeros = zeros(4000, 4464);
DataA = Data(:,1); % indexs
DataB = Data(:,2); % dates
DataC = Data(:,3); % hours
DataD = Data(:,4); % minutes
DataE = Data(:,5); % data
for m=1:40000
r = DataA(m);
c = (DataB(m)-1)*24*6 + DataC(m)*6 + DataD(m);
Zeros(r,c) = DataE(m);
end
I can't finish it because the matrix too big to load it at once.
It overs memory limitation of MATLAB.
Please help me...
Thank you~!
To solve your problem, using the matfile command is probably the best choice. It allows you to write data directly to a mat-file on the filesystem but access it like a variable.
If I understood your data right, all lines with the same index are next to each other, and all data with the same index is small enough to fit your memory.
Read all data with index 1
process it like you did above, creating one row of your intended matrix
Write this row to your matfile
Proceed with the next index until you reach the end

Plotting multiple lines within a FOR loopin MATLAB

Okay so this sounds easy but no matter how many times I have tried I still cannot get it to plot correctly. I need only 3 lines on the same graph however still have an issue with it.
iO = 2.0e-6;
k = 1.38e-23;
q = 1.602e-19;
for temp_f = [75 100 125]
T = ((5/9)*temp_f-32)+273.15;
vd = -1.0:0.01:0.6;
Id = iO*(exp((q*vd)/(k*T))-1);
plot(vd,Id,'r',vd,Id,'y',vd,Id,'g');
legend('amps at 75 F', 'amps at 100 F','amps at 125 F');
end;
ylabel('Amps');
xlabel('Volts');
title('Current through diode');
Now I know the plot function that is currently in their isn't working and that some kind of variable needs setup like (vd,Id1,'r',vd,Id2,'y',vd,Id3,'g'); however I really can't grasp the concept of changing it and am seeking help.
You can use the "hold on" function to make it so each plot command plots on the same window as the last.
It would be better to skip the for loop and just do this all in one step though.
iO = 2.0e-6;
k = 1.38e-23;
q = 1.602e-19;
temp_f = [75 100 125];
T = ((5/9)*temp_f-32)+273.15;
vd = -1.0:0.01:0.6;
% Convert this 1xlength(vd) vector to a 3xlength(vd) vector by copying it down two rows.
vd = repmat(vd,3,1);
% Convert this 1x3 array to a 3x1 array.
T=T';
% and then copy it accross to length(vd) so each row is all the same value from the original T
T=repmat(T,1,length(vd));
%Now we can calculate Id all at once.
Id = iO*(exp((q*vd)./(k*T))-1);
%Then plot each row of the Id matrix as a seperate line. Id(1,:) means 1st row, all columns.
plot(vd,Id(1,:),'r',vd,Id(2,:),'y',vd,Id(3,:),'g');
ylabel('Amps');
xlabel('Volts');
title('Current through diode');
And that should get what you want.

How to efficiently find correlation and discard points outside 3-sigma range in MATLAB?

I have a data file m.txt that looks something like this (with a lot more points):
286.842995
3.444398
3.707202
338.227797
3.597597
283.740414
3.514729
3.512116
3.744235
3.365461
3.384880
Some of the values (like 338.227797) are very different from the values I generally expect (smaller numbers).
So, I am thinking that
I will remove all the points that lie outside the 3-sigma range. How can I do that in MATLAB?
Also, the bigger problem is that this file has a separate file t.txt associated with it which stores the corresponding time values for these numbers. So, I'll have to remove the corresponding time values from the t.txt file also.
I am still learning MATLAB, and I know there would be some good way of doing this (better than storing indices of the elements that were removed from m.txt and then removing those elements from the t.txt file)
#Amro is close, but the FIND is unnecessary (look up logical subscripting) and you need to include the mean for a true +/-3 sigma range. I would go with the following:
%# load files
m = load('m.txt');
t = load('t.txt');
%# find values within range
z = 3;
meanM = mean(m);
sigmaM = std(m);
I = abs(m - meanM) <= z * sigmaM;
%# keep values within range
m = m(I);
t = t(I);
%# load files
m = load('m.txt');
t = load('t.txt');
%# find outliers indices
z = 3;
idx = find( abs(m-mean(m)) > z*std(m) );
%# remove them from both data and time values
m(idx) = [];
t(idx) = [];