Related
I'm trying to load in data from a text file. The first two rows are headers, following the headers the first two columns are date and time. The rest of the columns are floats.
data should have 11 columns, however, whos returns that size is only 1x3
Data txt file:
fid = fopen('allunderway.txt', 'rt');
data = textscan(fid, '%{M/dd/yyyy}D %{HH:mm:ss}D %4.2f %2.4f %2.5f %2.4f %2.4f %2.2f %4.2f %3.1f %1.4f', 'HeaderLines', 2, 'CollectOutput', true);
fclose(fid);
whos data
date = data{1};
time = data{2};
wnd_td = data{10};
wnd_ts = data{11};
You could try using a delimiter instead, seems like this is a tab separated file.
you might have to try both 'rt' and 'r' in the fopen command.
As for the textscan part try adding this
'Delimiter','\t','EmptyValue',NaN
It adds tabs as a delimiter and replaces empty values with NaN.
or uses spaces as delimiters and set it so that it doesn't matter if there's 1 or multiple spaces
'Delimiter',' ','MultipleDelimsAsOne',1
Or use 'Whitespace' as the delimiter (uses both tabs and spaces).
I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;
When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.
Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).
I am using the fgetl command to read a .csv file but instead of returning the results I wanted as:
"HIST",1,1,27,PWH,"1"
it returned with additional space between each character:
" H I S T " , 1 , 1 , 2 7 , P W H , " 1 "
I know that I can replace the space with regexprep, but my file contains billions of lines so the added expression might consume considerably more time. I had a feeling that this is a unicode issue and someone pointed out the same issue when he used Java and it was related to unicode. I wonder if anyone knows a better way to deal with the problem in MATLAB?
Update:
It should be the unicode issue because the .csv file is an output from another program, and when I read it using fgetl the spaces are added. However, if I save the .csv file again using Excel and read the .csv file using fgetl again, it returns the results I want.
I am not able to provide an example because the .csv file is very large and I cannot make a small sample because when I open and save it from Excel, this problem is gone.
For the purpose of demonstration, let's consider a demo file - demo.csv:
"GIST",1,6,17,PWH,"1"
"FIST",0,4,72,WPH,"2"
"MIST",3,2,27,WHP,"3"
You have some options:
textscan (for any text file with a known structure):
fID = fopen('demo.csv');
C = textscan(fID,'%s%d%d%d%s%s','Delimiter',{',','"'},'MultipleDelimsAsOne',1);
fclose(fID);
Which results in:
C =
{3x1 cell} [3x1 int32] [3x1 int32] [3x1 int32] {3x1 cell} {3x1 cell}
Import helper + generate script (AKA overkill is an understatement):
Which results in:
%% Import data from text file.
% Script for importing data from the following text file:
%
% F:\demo.csv
%
% To extend the code to different selected data or a different text file, generate a
% function instead of a script.
% Auto-generated by MATLAB on 2016/04/20 19:51:32
%% Initialize variables.
filename = 'F:\demo.csv';
delimiter = ',';
%% Read columns of data as strings:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%q%q%q%q%q%q%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to format string.
% This call is based on the structure of the file used to generate this code. If an error
% occurs for a different file, try regenerating the code from the Import Tool.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
%% Close the text file.
fclose(fileID);
%% Convert the contents of columns containing numeric strings to numbers.
% Replace non-numeric strings with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[2,3,4,6]
% Converts strings in the input cell array to numbers. Replaced non-numeric strings with
% NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1);
% Create a regular expression to detect and remove non-numeric prefixes and suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData{row}, regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if any(numbers==',');
thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'));
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric strings to numbers.
if ~invalidThousandsSeparator;
numbers = textscan(strrep(numbers, ',', ''), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch me
end
end
end
%% Split data into numeric and cell columns.
rawNumericColumns = raw(:, [2,3,4,6]);
rawCellColumns = raw(:, [1,5]);
%% Allocate imported array to column variable names
GIST = rawCellColumns(:, 1);
VarName2 = cell2mat(rawNumericColumns(:, 1));
VarName3 = cell2mat(rawNumericColumns(:, 2));
VarName4 = cell2mat(rawNumericColumns(:, 3));
PWH = rawCellColumns(:, 2);
VarName6 = cell2mat(rawNumericColumns(:, 4));
%% Clear temporary variables
clearvars filename delimiter formatSpec fileID dataArray ans raw col numericData rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me rawNumericColumns rawCellColumns;
csvread (for numeric values only; which means it is not applicable here).
I happened to have the same issue. I opened a .csv file using textscan and it added 1 whitespace on both side of any character and I also noticed that when opening the variable storing the read data, the font was different than the usual in Matlab.
We managed to solve this issue by opening the '.csv' file into Notepad++ and changed the encoding to UTF-8. It solved the problem.
Hope it helps!
Seeking help from skillful Matlab users!
I'm kind of new to Matlab and hope somebody has the time to help me. I need to import some .txt-files from a directory. I have found a way to do this trough the import tool. There are some data using comma insted of dots, so importdata will not work, but the 'import data' tool does.
So i'm wondering (and hoping) if it is possible to edit the generated function to import all the files in the directory, in such a way as the single file is imported? I want each file to be imported as matrix variable (double). I want to import all the files in one process (loop). Also there are many files and they all have some 100 000 lines or so.
If someone see an easy way to do this i would appreciate the help. Please keep the explanation on a low level, as i'm quite novice. I get the following function using the 'import data' tool:
function Streaming0x00x00158D00000E04621709201405 = importfile1(filename, startRow, endRow)
%IMPORTFILE1 Import numeric data from a text file as a matrix.
% STREAMING0X00X00158D00000E04621709201405 = IMPORTFILE1(FILENAME) Reads
% data from text file FILENAME for the default selection.
%
% STREAMING0X00X00158D00000E04621709201405 = IMPORTFILE1(FILENAME,
% STARTROW, ENDROW) Reads data from rows STARTROW through ENDROW of text
% file FILENAME.
%
% Example:
% Streaming0x00x00158D00000E04621709201405 =
% importfile1('Streaming_0_x_0_0_x_00158D00000E0462_17-09-2014_05.32.24_part000.txt',
% 17, 137834);
%
% See also TEXTSCAN.
% Auto-generated by MATLAB on 2015/02/04 09:28:07
%% Initialize variables.
delimiter = ';';
if nargin<=2
startRow = 17;
endRow = inf;
end
%% Read columns of data as strings:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
textscan(fileID, '%[^\n\r]', startRow(1)-1, 'ReturnOnError', false);
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'ReturnOnError', false);
for block=2:length(startRow)
frewind(fileID);
textscan(fileID, '%[^\n\r]', startRow(block)-1, 'ReturnOnError', false);
dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'ReturnOnError', false);
for col=1:length(dataArray)
dataArray{col} = [dataArray{col};dataArrayBlock{col}];
end
end
%% Close the text file.
fclose(fileID);
%% Convert the contents of columns containing numeric strings to numbers.
% Replace non-numeric strings with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts strings in the input cell array to numbers. Replaced non-numeric
% strings with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1);
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData{row}, regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if any(numbers=='.');
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(thousandsRegExp, '.', 'once'));
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric strings to numbers.
if ~invalidThousandsSeparator;
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(numbers, '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch me
end
end
end
%% Replace non-numeric cells with NaN
R = cellfun(#(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%% Create output variable
Streaming0x00x00158D00000E04621709201405 = cell2mat(raw);
If something is unclear, please comment.
All help is useful, thanks :)
If all files are the same, you can make a cell array of the filenames (NOT a standard array, they do not behave correctly on strings). Then you can loop over the cell array. For instance:
fname_arr = {'file1.txt','file2.txt'}; % your filenames go here
for k in length(fname_arr):
filename = fname_arr{k};
%% Open the text file.
fileID = fopen(filename,'r'); % start of the relevant part of your codeblock
<...> % omitting the stuff in the middle of the code
fclose(fileID) % end of the relevant part of your codeblock
allDataArray{k} = DataArray
end
Then allDataArray is a cell array whose kth element contains the DataArray obtained from file fname_arr{k}.
Ok, tried something here. Implemented this in my function:
output=(dir_output);
for k=1:length(output);
filename = output{k}.name;
%% Open the text file. fileID = fopen(filename,'r');
where 'dir_output' is a struct, containing all the file names in the Directory. Also put in:
%% Close the text file.
fclose(fileID);
allDataArray{k} = DataArray;
end
Get this as error:
>> function1 Undefined function or variable 'dir_output'. Error in function1 (line 30) output=(dir_output);
Why???
I have a question regarding the importing of .txt files. The file is in the format below, the problem is that matlab does not seem to recognize the "new line" character indicators following every "$", so matlab just sees the 5th line as a continuous stream of data
Data Matlab sees:
01-24-2013 [6:01:53]
Kp (0070.0000)
Ki (0200.0000)
Kd (0009.0000)
$,0045,0044,0000.05,0011.53,0005.64,$,0045,0048,0000.04,0011.55,0005.66,$....etc
01-24-2013 [7:01:48]
Data Wordpad sees:
01-24-2013 [6:01:53]
Kp (0070.0000)
Ki (0200.0000)
Kd (0009.0000)
$,0045,0044,0000.05,0011.53,0005.64,
$,0045,0048,0000.04,0011.55,0005.66,
$, ....
I have no problem importing the format seen by "wordpad (re-saved with)" using "csvread" and skipping column 1, but for the raw .txt file "Data Matlab sees", I cant find a way to tell Matlab how to read. Ideally, I would like to tell Matlab to skip to Row-5, then start reading data and creating a new line in the matrix [nx5] every time it encounters a "$". Is there a way to detect the "$" and reformat the data into a usable matrix form?
Thanks!
I don't know how you managed to read this data as one line, but suppose you did and you want to split it. You can use the almighty regexp to for that:
C = regexp(str, '\$,', 'split');
Then turn the strings into numbers and convert everything into a matrix:
C = cellfun(#str2num, C, 'Uniform', false);
A = vertcat(C{:});
Regarding the second part of the question:
Ideally, I would like to tell Matlab to skip to Row-5, then start reading data...
You can make textread do that by using the 'headerlines' option:
C = textread('file.txt', '%s', 1, 'headerlines', 4, 'delimiter', '\n')
str = C{1};
and then use the code that employs regexp to split the string str.
Note that this will only work if MATLAB indeed "sees" the 5th line like you described. If not, you'll simply get only the first row in your matrix.
Example
str = '$,0045,0044,0000.05,0011.53,0005.64,$,0045,0048,0000.04,0011.55,0005.66';
C = cellfun(#str2num, regexp(str, '\$,', 'split'), 'Uniform', false);
A = vertcat(C{:})
This results in:
A =
45.0000 44.0000 0.0500 11.5300 5.6400
45.0000 48.0000 0.0400 11.5500 5.6600