I am trying to read in this data set into a cell array but have two problems
1) the delimiter is spaces that vary on each column
2) 6 of the entries in the 4th column have question marks instead of numbers
What is a good way to read this data into a cell array from the file?
Try the following:
x = importdata('auto-mpg.data'); %// read lines
y = cell(numel(x),9); %// preallocate with 9 cols (acccording to your file)
for n = 1:numel(x)
y(n,:) = regexp(x{n}, '(\s\s+)|\t', 'split'); %// split each line into
%// columns using as separator either more than one space or a tab
%//(according to your file)
end
The result is in the 398x9 cell array of strings y.
Here is the code based on MATLAB Import Tool:
% Initialize variables.
filename = '/home/gknor/Pulpit/auto-mpg.data';
delimiter = {'\t',' '};
% Read columns of data as strings:
formatSpec = '%s%s%s%s%s%s%s%s%[^\n\r]';
% Open the text file.
fileID = fopen(filename,'r');
% Read columns of data according to format string.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'ReturnOnError', false);
% Close the text file.
fclose(fileID);
% Convert the contents of columns containing numeric strings to numbers.
% Replace non-numeric strings with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2,3,4,5,6,7,8]
% Converts strings in the input cell array to numbers. Replaced non-numeric
% strings with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1);
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData{row}, regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if any(numbers==',');
thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(thousandsRegExp, ',', 'once'));
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric strings to numbers.
if ~invalidThousandsSeparator;
numbers = textscan(strrep(numbers, ',', ''), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch me
end
end
end
% Replace non-numeric cells with NaN
R = cellfun(#(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
data = cat(2,raw,dataArray{9});
% Clear temporary variables
clearvars -except data
More about Import Tool you can find here.
Related
Matlab help: I am trying to export all 50 points to a CSV. How can I append the csv files after each iteration?
%import csv file
filename = 'Q:\Electroporation\raw_works.csv';
delimiter = ',';
startRow = 2;
%% Format for each line of text:
formatSpec = '%s%f%f%f%f%f%f%f%f%f%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to the format.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
%% Close the text file.
fclose(fileID);
%% Create output variable
rawworks = table(dataArray{1:end-1}, 'VariableNames', {'Name','One','Two','Three','Four','Five','Six','Seven','Eight','Nine'});
%% Clear temporary variables
clearvars filename delimiter startRow formatSpec fileID dataArray ans;
So far the data in MATLAB.
% store data into a variable
table= rawworks;
array=str2double(table2array(table)); % convert to array
ss= size(array)
N= ss(1)
ones=ones(50,14);
xls=zeros(50,14);
Now I do the math operation
for i= 1:N
A=[array(i,2) array(i,3) array(i,4);array(i,5) array(i,6) array(i,7);array(i,8) array(i,9) array(i,10)];
%diognalize
[U,S,V]=svd(A);
P1=S(1,1);
P2=S(2,2);
P3=S(3,3);
%output data
data_for_excel_file=[A(1,1) A(1,2) A(1,3) A(2,1) A(2,2) A(2,3) A(3,1) A(3,2) A(3,3) P1 P2 P3 P1/P2 P1/P3 ]
here is where I'm having problems. How can I make csvwrite append to the end of % file. Currently, it is only writing out the last result instead of all 50.
csvwrite('Diognalized_output.csv',data_for_excel_file,1) %HELP
end
If you are okay using the more general function dlmwrite instead, you can use its -append flag to add your output to the end of the file each time.
Change your last line from
csvwrite('Diognalized_output.csv',data_for_excel_file,1)
to
dlmwrite('Diognalized_output.csv,data_for_excel_file,'-append')
The default delimiter for dlmwrite is the comma (,) so you get the same output format here.
I would like to make some calculation on my Raman spectra, I have a problem to read my input file. My file .txt contain 2 columns: X = Wavelength (cm-1) and Y = Raman intensity. The name of the file contains the coordinates of the position or the Raman spectrum was collected, for example (0.00,-05.00) or (-2.00,-0.50).
function Read_Raman_Files
% Reads Raman spectra from txt files.
% Each file contains the data for a single Raman spectrum:
% X = Wavelength (cm-1)
% Y = Raman intensity
% The name of the input file contains the coordinates at which the spectrum is taken.
% Results are stored in 'data.mat'.
files = dir('-5.0,0.00.txt');
Ncurves = length(files);
if Ncurves==0, display('No txt files found!'); return; end
for i = 1:Ncurves,
i
fname = files(i).name;
data = importdata(fname);
if i==1, X = data(:,i); end
Y(:,i) = data(:,2);
dash = strfind(fname,'__');
Xpos(i) = str2num(fname(strfind(fname,'Xµm_')+4:dash(2)-1));
Ypos(i) = str2num(fname(strfind(fname,'Yµm_')+4:dash(3)-1));
end;
save('data.mat', 'Ncurves', 'X', 'Y', 'Xpos', 'Ypos');
return
Here is an example on how to read the content of a file that has 2 columns of integers separated by comma:
formatSpec = '%d%d';
[x, y] = textread('yourFile.txt', formatSpec, 'delimiter',',');
I have a text file, suppose myfile.txt, that stores floating point coordinates in following manner:
53
-464.000000 -20.000000 0.009000
-464.000000 -17.000000 0.042000
-464.000000 -13.000000 0.074000
-464.000000 -11.000000 0.096000
-464.000000 -8.000000 0.114000
...
...
...
42
380.000000 193.000000 7.076000
381.000000 190.000000 7.109000
383.000000 186.000000 7.141000
384.000000 184.000000 7.163000
384.000000 183.000000 7.186000
386.000000 179.000000 7.219000
...
...
...
the first line specifies the number of lines for the first set of coordinates, followed by that many lines. and then theres is another integer specifying the the number of lines for the next set of coordinates.
i.e. 1st line has 53, so next 53 lines are 1st set of coords(ending at line 54). Then line 55 has value 42, so next 42 lines are 2nd set of coords.
How can I read the text file such that i read 1st line, and the next 53 lines are read and stored in matrix. Then read 42 and the next 42 lines are read and stored? The text file is like this until EOF.
You could do it with Low-Level File I/O, like this:
fid = fopen('myfile.txt', 'r');
matrix = {};
while ~feof(fid)
N = fscanf(fid, '%d\n', 1);
matrix{end + 1} = fscanf(fid, '%f\n', [3 N])';
end
fclose(fid);
fid = fopen('myfile.txt ', 'r');
ind = 1;
mycell = {};
while ~feof(fid)
N = fscanf(fid, '%d\n', 1);
matrix = fscanf(fid, '%f\n', [3 N]);
matrix = transpose(matrix);
cell{ind} = matrix;
ind = ind + 1;
end
fclose(fid);
My two cents:
function matrix_cells = import_coordinates(filename, startRow, endRow)
%IMPORT_COORDINATES Import numeric data from a text file as a matrix.
% MATRIX_CELLS = IMPORT_COORDINATES(FILENAME) Reads data from text file FILENAME
% for the default selection and store it into the array of cells MATRIX_CELLS.
%
% MATRIX_CELLS = IMPORT_COORDINATES(FILENAME, STARTROW, ENDROW) Reads data from
% rows STARTROW through ENDROW of text file FILENAME.
%
% Example:
% matrix_cells = import_coordinates('coordinates.txt', 1, 15);
%
% See also TEXTSCAN.
%% Initialize variables.
delimiter = ' ';
if nargin<=2
startRow = 1;
endRow = inf;
end
%% Format string for each line of text:
% column1: double (%f)
% column2: double (%f)
% column3: double (%f)
% For more information, see the TEXTSCAN documentation.
formatSpec = '%f%f%f%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'EmptyValue' ,NaN,'HeaderLines', startRow(1)-1, 'ReturnOnError', false);
for block=2:length(startRow)
frewind(fileID);
dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'MultipleDelimsAsOne', true, 'EmptyValue' ,NaN,'HeaderLines', startRow(block)-1, 'ReturnOnError', false);
for col=1:length(dataArray)
dataArray{col} = [dataArray{col};dataArrayBlock{col}];
end
end
%% Close the text file.
fclose(fileID);
%% Post processing for unimportable data.
% No unimportable data rules were applied during the import, so no post
% processing code is included. To generate code which works for
% unimportable data, select unimportable cells in a file and regenerate the
% script.
%% Create output variable
matrix_temp = [dataArray{1:end-1}];
%% MY CONTRIBUTION ;)
% find rows with nan values
[i,~]=find(isnan(matrix_temp));
indexes=unique(i);
%% Create output cell array
matrix_cells=cell(numel(indexes),1);
%% Fill cell array filtering out unuseful rows and parting original matrix
for i=1:1:numel(indexes)-1
matrix_cells{i}=matrix_temp(indexes(i)+1:indexes(i+1)-1,:);
end
% Last matrix
matrix_cells{end}=matrix_temp(indexes(end)+1:size(matrix_temp,1),:);
The first part (reading and import of text file) of this function was autogenerated by MATLAB. I added only the code to split and save the matrix into a cell array.
I assume that coordinates do not contain nan values; morevoer, I do not check for consistency between the declared number of rows and the actual one for each block.
Here an example; the following is the file coordinates.txt
5
-464.000000 -20.000000 0.009000
-464.000000 -17.000000 0.042000
-464.000000 -13.000000 0.074000
-464.000000 -11.000000 0.096000
-464.000000 -8.000000 0.114000
3
380.000000 193.000000 7.076000
381.000000 190.000000 7.109000
383.000000 186.000000 7.141000
2
384.000000 184.000000 7.163000
384.000000 183.000000 7.186000
1
386.000000 179.000000 7.219000
Excute the function:
coordinate_matrices=import_coordinates('coordinates.txt')
coordinate_matrices =
[5x3 double]
[3x3 double]
[2x3 double]
[1x3 double]
This is the content of every cell:
>> coordinate_matrices{1}
ans =
-464.0000 -20.0000 0.0090
-464.0000 -17.0000 0.0420
-464.0000 -13.0000 0.0740
-464.0000 -11.0000 0.0960
-464.0000 -8.0000 0.1140
>> coordinate_matrices{2}
ans =
380.0000 193.0000 7.0760
381.0000 190.0000 7.1090
383.0000 186.0000 7.1410
>> coordinate_matrices{3}
ans =
384.0000 184.0000 7.1630
384.0000 183.0000 7.1860
>> coordinate_matrices{4}
ans =
386.0000 179.0000 7.2190
I'm using Matlab's import data code generator to pass data to a series of commands. This works fine when I run the script and reference a single file, but if I loop through several files, my variables aren't updated as I expect. I believe I have traced the problem to 'fileID' not updating after the first iteration of the loop.
In the code below, I can confirm that 'filename' is updated with each iteration of the loop, while 'fileID' is not. Consequently, the same vector is assigned to the variable 'y' in each iteration.
Can anyone suggest where I am going wrong?
FileList = dir('*.csv');
N = size(FileList,1);
for k = 1:N
% get the file name:
filename = FileList(k).name;
delimiter = ',';
startRow = 2;
%% Format string for each line of text:
% column2: double (%f)
% column3: double (%f)
% column4: double (%f)
% column5: double (%f)
% For more information, see the TEXTSCAN documentation.
formatSpec = '%*s%f%f%f%f%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to format string.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
%% Close the text file.
fclose(fileID);
%% Allocate imported array to column variable names
O1 = dataArray{:, 1};
H1 = dataArray{:, 2};
L1 = dataArray{:, 3};
C1 = dataArray{:, 4};
%% Test filename and fileID
filename
fileID
%% Clear temporary variables
clearvars filename delimiter startRow formatSpec fileID dataArray ans;
y=C1;
figure
plot(y);
end
FileID is not supposed to change like you expect. fileID is only a file identifier, the extracted data is in dataArray as you scan the text with the help of fileID.
So that FileID will be usually equal to 3 if you just open the file and you closed it before open a new one. If you don't close it there will be a different number in fileID for each file.
I have 10 binary files, each storing a list of numbers. I want to load each file in turn, and then append a cell array y with the numbers in that file. So if each file contains 20 numbers, I want my final cell to be 10x20. How do I do this? The following code does not work:
for i=1:10
% Load an array into variable 'x'
y = {y x}
end
You only need a minor modification to your code:
y = cell(1,10); %// initiallize if possible. Not necessary
for ii = 1:10 %// better not use i as a variable (would override imaginary unit)
%// Load an array into variable 'x'
y{ii} = x; %// fill ii-th cell with x.
%// Or use y{end+1} = x if you haven't initiallized y
end
If you are reading strictly numbers and want an array (rather than cells), this could work:
% read CSV numbers from file into array
temp = {};
out = [];
for i=1:10
% my example files were called input1.txt, input2.txt, etc
filename = strcat('input', num2str(i), '.txt');
fid = fopen(filename, 'r');
temp = textscan(fid,'%d','delimiter',',');
out(i,:) = cell2mat(temp);
fclose(fid);
end
'out' is a 10x20 array