How to store .csv data and calculate average value in MATLAB - matlab

Can someone help me to understand how I can save in matlab a group of .csv files, select only the columns in which I am interested and get as output a final file in which I have the average value of the y columns and standard deviation of y axes? I am not so good in matlab and so I kindly ask if someone to help me to solve this question.
Here what I tried to do till now:
clear all;
clc;
which_column = 5;
dirstats = dir('*.csv');
col3Complete=0;
col4Complete=0;
for K = 1:length(dirstats)
[num,txt,raw] = xlsread(dirstats(K).name);
col3=num(:,3);
col4=num(:,4);
col3Complete=[col3Complete;col3];
col4Complete=[col4Complete;col4];
avgVal(K)=mean(col4(:));
end
col3Complete(1)=[];
col4Complete(1)=[];
%columnavg = mean(col4Complete);
%columnstd = std(col4Complete);
% xvals = 1 : size(columnavg,1);
% plot(xvals, columnavg, 'b-', xvals, columnavg-columnstd, 'r--', xvals, columnavg+columstd, 'r--');
B = reshape(col4Complete,[5000,K]);
m=mean(B,2);
C = reshape (col4Complete,[5000,K]);
S=std(C,0,2);
Now I know that I should compute mean and stdeviation inside for loop, using mean()function, but I am not sure how I can use it.

which_column = 5;
dirstats = dir('*.csv');
col3Complete=[]; % Initialise as empty matrix
col4Complete=[];
avgVal = zeros(length(dirstats),2); % initialise as columnvector
for K = 1:length(dirstats)
[num,txt,raw] = xlsread(dirstats(K).name);
col3=num(:,3);
col4=num(:,4);
col3Complete=[col3Complete;col3];
col4Complete=[col4Complete;col4];
avgVal(K,1)=mean(col4(:)); % 1st column contains mean
avgVal(K,2)=std(col4(:)); % 2nd column contains standard deviation
end
%columnavg = mean(col4Complete);
%columnstd = std(col4Complete);
% xvals = 1 : size(columnavg,1);
% plot(xvals, columnavg, 'b-', xvals, columnavg-columnstd, 'r--', xvals, columnavg+columstd, 'r--');
B = reshape(col4Complete,[5000,K]);
meanVals=mean(B,2);
I didn't change much, just initialised your arrays as empty arrays so you do not have to delete the first entry later on and made avgVal a column vector with the mean in column 1 and the standard deviation in column 1. You can of course add two columns if you want to collect those statistics for your 3rd column in the csv as well.
As a side note: xlsread is rather heavy for reading files, since Excel is horribly inefficient. If you want to read a structured file such as a csv, it's faster to use importdata.
Create some random matrix to store in a file with header:
A = rand(1e3,5);
out = fopen('output.csv','w');
fprintf(out,['ColumnA', '\t', 'ColumnB', '\t', 'ColumnC', '\t', 'ColumnD', '\t', 'ColumnE','\n']);
fclose(out);
dlmwrite('output.csv', A, 'delimiter','\t','-append');
Load it using csvread:
data = csvread('output.csv',1);
data now contains your five columns, without any headers.

Related

Read multiple files with for loop

My code is below. In the code, I am evaluating only the data in the 'fb2010' file. I want to add other files" 'fb2020', 'fb2030', and 'fb2040' and evaluate their data by the same code. My question is how to apply a for loop and include the other data files. I tried, but I got confused by the for loop.
load('fb2010'); % loading the data
x = fb2010(3:1:1502,:);
% y_filt = filter(b,a,x); % filtering the received signal
y_filt= filter(b,a,x,[],2);
%%%%%%% fourier transform
nfft = length(y_filt);
res = fft(y_filt,nfft,2)/nfft;
res2 = res(:,1:nfft/2+1); %%%% taking single sided spectrum
res3 = fft(res2,[],2);
for i = 3:1:1500 %%%% dividing each row by first row.
resd(i,:) = res3(i,:)./res3(1,:);
end
I'm assuming that your files are MAT-files, not ASCII. You can do this by having load return a struct and using dynamic field referencing:
n = 4;
for i = 1:n
vname = ['fb20' int2str(i) '0']; % Create file/variable name based on index
s = load(vname); % Load data as struct (overwriting previous s)
x = s.(vname)(3:1:1502,:); % Access struct with dynamic field reference
% Rest of your code
...
end
If you're using a plain ASCII file, load won't produce a struct. However, such files are much simpler (see documentation for load/save). The following code would probably work:
n = 4;
for i = 1:n
vname = ['fb20' int2str(i) '0']; % Create file/variable name based on index
s = load(vname); % Load data as matrix (overwriting previous s)
x = s(3:1:1502,:); % Directly index matrix
% Rest of your code
...
end
It would be a good idea to add the file extension to your load command to make your code more readable.

Reading a Complex CSV format Into Matlab

4,7,33 308:0.364759856031284 1156:0.273818346738286 1523:0.17279792082766 9306:0.243665855423149
7,4,33 1156:0.185729429759684 1681:0.104443202690279 5351:0.365670526234034 6212:0.0964006003127458
I have a textfile in the above format. The first 3 columns are labels and need to be extracted into another file keeping the order intact.
As one proceeds through the file, each row has varying number of labels. I have been reading the csvread function in Matlab but that is generic and would not work in the above case.
Next, I need to extract
308:0.364759856031284 1156:0.273818346738286 1523:0.17279792082766 9306:0.243665855423149
such that at column 308 in the Matrix for Row 1, I put in the value 0.364759856031284.
Since your row format changes you need to read the file line by line
fid = fopen(filename, 'rt'); % open file for text reading
resultMatrix = []; % start with empty matrix
while 1
line = fgetl(fid);
if ~ischar(line), break, end
% get the value separators, :
seps = strfind(line, ':');
% get labels
whitespace = iswhite(line(1:seps(1)-1)); % get white space
lastWhite = find(whitespace, 'last'); % get last white space pos
labelString = deblank(line(1:lastWhite)); % string with all labels separated by ,
labelString(end+1) = ','; % add final , to match way of extraction
ix = strfind(labelString, ','); % get pos of ,
labels = zeros(numel(ix),1); % create the labels array
start = 1;
for n1 = 1:numel(labels)
labels(n1,1) = str2num(labelString(start:ix(n1)-1));
end
% continue and do a similar construction for the COL:VALUE pairs
% If you do not know how many rows and columns your final matrix will have
% you need to continuously update the size 1 row for each loop and
% then add new columns if any of the current rows COLs are outside current matrix
cols = zeros(numel(seps,1);
values = cols;
for n1 = 1:numel(seps)
% find out current columns start pos and value end pos using knowledge of
% the separator, :, position and that whitespace separates COL:VALUE pairs
cols(n1,1) = str2num(line(colStart:seps(n1)-1));
values(n1,1) = str2num(line(seps(n1)+1:valueEnd));
end
if isempty(resultMatrix)
resultMatrix = zeros(1,max(cols)); % make the matrix large enough to hold the last col
else
[nR,nC] = size(resultMatrix); % get current size
if max(cols) > nC
% add columns to resultMatrix so we still can fit column data
resultMatrix = [resultMatrix, zeros(nR, max(cols)-nC)];
end
% add row
resultMatrix = [resultMatrix;zeros(1,max(nC,max(cols)))];
end
% loop through cols and values and populate the resultMatrix with wanted data
end
fclose(fid);
Note, the above code is untested so it will contain errors but it should be easy to fix.
I purposely left out minor parts and it should not be too difficult filling that out.

Save a sparse array in csv

I have a huge sparse matrix a and I want to save it in a .csv. I can not call full(a) because I do not have enough ram memory. So, calling dlmwrite with full(a) argument is not possible. We must note that dlmwrite is not working with sparse formatted matrices.
The .csv format is depicted below. Note that the first row and column with the characters should be included in the .csv file. The semicolon in the (0,0) position of the .csv file is necessary too.
;A;B;C;D;E
A;0;1.5;0;1;0
B;2;0;0;0;0
C;0;0;1;0;0
D;0;2.1;0;1;0
E;0;0;0;0;0
Could you please help me to tackle this problem and finally save the sparse matrix in the desired form?
You can use csvwrite function:
csvwrite('matrix.csv',a)
You could do this iteratively, as follows:
A = sprand(20,30000,.1);
delimiter = ';';
filename = 'filecontaininghugematrix.csv';
dims = size(A);
N = max(dims);
% create names first
idx = 1:26;
alphabet = dec2base(9+idx,36);
n = ceil(log(N)/log(26));
q = 26.^(1:n);
names = cell(sum(q),1);
p = 0;
for ii = 1:n
temp = repmat({idx},ii,1);
names(p+(1:q(ii))) = num2cell(alphabet(fliplr(combvec(temp{:})')),2);
p = p + q(ii);
end
names(N+1:end) = [];
% formats for writing
headStr = repmat(['%s' delimiter],1,dims(2));
headStr = [delimiter headStr(1:end-1) '\n'];
lineStr = repmat(['%f' delimiter],1,dims(2));
lineStr = ['%s' delimiter lineStr(1:end-1) '\n'];
fid = fopen(filename,'w');
% write header
header = names(1:dims(2));
fprintf(fid,headStr,header{:});
% write matrix rows
for ii = 1:dims(1)
row = full(A(ii,:));
fprintf(fid, lineStr, names{ii}, row);
end
fclose(fid);
The names cell array is quite memory demanding for this example. I have no time to fix that now, so think about this part yourself if it is really a problem ;) Hint: just write the header element wise, first A;, then B; and so on. For the rows, you can create a function that maps the index ii to the desired character, in which case the complete first part is not necessary.

MATLAB convert from struct to table and ouput as csv

As part of an image processing pipeline using 'regionprops' in Matlab I generate the struct:
vWFfeatures =
1631x1 struct array with fields:
Area
Centroid
MajorAxisLength
MinorAxisLength
Eccentricity
EquivDiameter
Where 'Centroid' is a Vector containing [x, y] for example [12.4, 26.2]. I would like to convert this struct to a table and save as a CSV file. The objective is to separate the 'Centroid' vector into two columns in the table labelled Centroid_X and Centroid_Y for example. I am not sure how to achieve this.
So far I have investigated using the 'struct2table' function. This ouputs the 'Centroid' as one column. In addition when I try to assign the output to a variable I get an error:
table = struct2table(vWFfeatures)
Error using struct2table
Too many output arguments.
I cannot understand this, any help please?
Since the original struct2table isn't available to you, you might want to implement specifically the behavior you're trying to achieve yourself.
In this case, this means extracting the values you want to save, (split the array,) then save the data:
data_Centroid = vertcat(vWFfeatures.Centroid); %// contains the centroid data
Centroid_X = data_Centroid(:,1); %// The first column is X
Centroid_Y = data_Centroid(:,2); %// the second column is Y
csvwrite('centroid.csv',data_Centroid); %// writes values into csv
If you want column headers in your csv, it gets complicated because csvwrite can only handle numeric arrays:
celldata = num2cell(num2str(data_Centroid)); %// create cell array
celldata(:,3) = celldata(:,4); %// copy col 4 (y data) into col 3 (spaces)
for i=1:length(celldata)
celldata{i,2} = ','; %// col 2 has commas
celldata{i,4} = '\n'; %// col 4 has newlines
end
celldata = celldata'; %'// transpose to make the entries come columnwise
strdata = ['Centroid_X,Centroid_Y\n',celldata{:}]; %// contains all as string
fid = fopen('centroid.csv','w'); % writing the string into the csv
fprintf(fid,strdata);
fclose(fid);
This is how I solved it in the end: extracted each field from struct used horzcat to join into a new array then defined headers and used csvwrite_with_headers, to ouput to csv.
wpbFeatures = regionprops(vWFlabelled, 'Area','Centroid', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'EquivDiameter');
wpbArea = vertcat(wpbFeatures.Area);
wpbCentroid = vertcat(wpbFeatures.Centroid);
wpbCentroidX = wpbCentroid(:,1);
wpbCentroidY = wpbCentroid(:,2);
wpbFeret = max(imFeretDiameter(vWFlabelled, linspace(0,180,201)), [], 2);
wpbMajorAxisLength = vertcat(wpbFeatures.MajorAxisLength);
wpbMinorAxisLength = vertcat(wpbFeatures.MinorAxisLength);
wpbEccentricity = vertcat(wpbFeatures.Eccentricity);
wpbEquivDiameter = vertcat(wpbFeatures.EquivDiameter);
wpbFeatures = horzcat(wpbArea, wpbCentroidX, wpbCentroidY, wpbFeret, wpbMajorAxisLength, wpbMinorAxisLength, wpbEccentricity, wpbEquivDiameter);
headers = {'Area','CentroidX','CentroidY', 'Feret', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'EquivDiameter'};
csvwrite_with_headers(strcat(PlateName, '_ResultsFeatures.csv'),wpbFeatures,headers);

How to import a sequence of Excel Files in matlab as a column vectors or as a cell array?

I want to import a sequence of excel files with a large amount of data in them. The problem that I have is I want to process the data in each file at a time and store the output from this into a variable, but each time I try to process a different file the variable gets overwritten in the variable workspace. Is there anyway I could store these files and process each file at a time?
numFiles = 1;
range = 'A2:Q21';
sheet = 1;
myData = cell(1,numFiles); % Importing data from Excel
for fileNum = 1:numFiles
fileName = sprintf('myfile%02d.xlsx',fileNum);
myData{fileNum} = importfile3(fileName,sheet,range);
end
data = cell2mat(myData);
The actual data import is performed by importfile3 which is, for the most part, a wrapper for the xlsread function that returns a matrix corresponding to the specified range of excel data.
function data = importfile3(workbookFile, sheetName, range)
% If no sheet is specified, read first sheet
if nargin == 1 || isempty(sheetName)
sheetName = 1;
end
% If no range is specified, read all data
if nargin <= 2 || isempty(range)
range = '';
end
%% Import the data
[~, ~, raw] = xlsread(workbookFile, sheetName, range);
%% Replace non-numeric cells with 0.0
R = cellfun(#(x) ~isnumeric(x) || isnan(x),raw); % Find non-numeric cells
raw(R) = {0.0}; % Replace non-numeric cells
%% Create output variable
data = cell2mat(raw);
The issue that you are running in to is a result of cell2mat concatenating all of the data in your cells in to one large 2-dimensional matrix. If you were to import two excel files with 20 rows and 17 columns, each, this would result in a 2-dimensional matrix of size [20 x 34]. The doc for cell2mat has a nice visual describing this.
I see that your importfile3 function returns a matrix, and based on your use of cell2mat in your final line of code, it looks like you would like to have your final result be in the form of a matrix. So I think the easiest way to go about this is to just bypass the intermediate myData cell array.
In the example code below, the resulting data is a 3-dimensional matrix. The 1st dimension indicates row number, 2nd dimension is column number, and 3rd dimension is file number. Cell arrays are very useful for "jagged" data, but based on the code you provided, each excel data set that you import will have the same number of rows and columns.
numFiles = 2;
range = 'A2:Q21';
sheet = 1;
% Number of rows and cols known before data import
numRows = 20;
numCols = 17;
data = zeros(numRows,numCols,numFiles);
for fileNum = 1:numFiles
fileName = sprintf('myfile%02d.xlsx',fileNum);
data(:,:,fileNum) = importfile3(fileName,sheet,range);
end
Accessing this data is now very straight-forward.
data(:,:,1) returns the data imported from your first excel file.
data(:,:,2) returns the data imported from your second excel file.
etc.