Reading Text file comma seperated on Matlab - matlab

I'm trying to read a comma separated text file that looks like the following:
2017-10-24,01:17:38,2017-10-24,02:17:38,+1.76,L,Meters
2017-10-24,02:57:31,2017-10-24,03:57:31,+1.92,H,Meters
2017-10-24,05:53:35,2017-10-24,06:53:35,+1.00,L,Meters
2017-10-24,10:45:01,2017-10-24,11:45:01,+2.06,H,Meters
2017-10-24,13:27:16,2017-10-24,14:27:16,+1.78,L,Meters
2017-10-24,15:07:16,2017-10-24,16:07:16,+1.92,H,Meters
2017-10-24,18:12:08,2017-10-24,19:12:08,+0.98,L,Meters
My code so far is:
LT_data = fopen('D:\Beach Erosion and Recovery\Bournemouth\Bournemouth Tidal Data\tidal_data_jtide.txt');% text file containing predicted low tide times
LT_celldata = textscan(LT_data,'%D %D %D %D %d ','delimiter',',')'

For mixed data types, I'd recommend readtable. This will read your data straight into a table object without having to specify a format spec or use fopen,
t = readtable( 'myFile.txt', 'ReadVariableNames', false, 'Delimiter', ',' );
Then you can easily manipulate the data
% Variable names in the table
t.Properties.VariableNames = {'Date1', 'Time1', 'Date2', 'Time2', 'Value', 'Dim', 'Units'};
% Create full datetime object columns from the date and time columns
t.DateTime1 = datetime( strcat(t.Date1,t.Time1), 'InputFormat', 'yyyy-MM-ddHH:mm:ss' );
If you do know the formats, you can specify the 'format' property within readtable and it will convert the data when reading.

This is working perfectly. The formatspec need to be edited.
file = 'D:\Beach Erosion and Recovery\Bournemouth\Bournemouth Tidal Data\tidal_data_jtide.txt'
fileID = fopen(file);
LT_celldata = textscan(fileID,'%D%D%D%D%d%[^\n\r]','delimiter',',')

Related

read complicated format .txt file into Matlab

I have a txt file that I want to read into Matlab. Data format is like below:
term2 2015-07-31-15_58_25_612 [0.9934343, 0.3423043, 0.2343433, 0.2342323]
term0 2015-07-31-15_58_25_620 [12]
term3 2015-07-31-15_58_25_625 [2.3333, 3.4444, 4.5555]
...
How can I read these data in the following way?
name = [term2 term0 term3] or namenum = [2 0 3]
time = [2015-07-31-15_58_25_612 2015-07-31-15_58_25_620 2015-07-31-15_58_25_625]
data = {[0.9934343, 0.3423043, 0.2343433, 0.2342323], [12], [2.3333, 3.4444, 4.5555]}
I tried to use textscan in this way 'term%d %s [%f, %f...]', but for the last data part I cannot specify the length because they are different. Then how can I read it? My Matlab version is R2012b.
Thanks a lot in advance if anyone could help!
There may be a way to do that in one single pass, but for me these kind of problems are easier to sort with a 2 pass approach.
Pass 1: Read all the columns with a constant format according to their type (string, integer, etc ...) and read the non constant part in a separate column which will be processed in second pass.
Pass 2: Process your irregular column according to its specificities.
In a case with your sample data, it looks like this:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %s %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data into variables
name = M{1,1} ;
time = M{1,2} ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,3} ) ;
What happened:
The first textscan instruction reads the full file. In the format specifier:
term%d read the integer after the literal expression 'term'.
%s read a string representing the date.
%*c ignore one character (to ignore the character '[').
%[^]] read everything (as a string) until it finds the character ']'.
%*[^\n] ignore everything until the next newline ('\n') character. (to not capture the last ']'.
After that, the first 2 columns are easily dispatched into their own variable. The 3rd column of the result cell array M contains strings of different lengths containing different number of floating point number. We use cellfun in combination with another textscan to read the numbers in each cell and return a cell array containing double:
Bonus:
If you want your time to be a numeric value as well (instead of a string), use the following extension of the code:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %f-%f-%f-%f_%f_%f_%f %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data
name = M{1,1} ;
time_vec = cell2mat( M(1,2:7) ) ;
time_ms = M{1,8} ./ (24*3600*1000) ; %// take care of the millisecond separatly as they are not handled by "datenum"
time = datenum( time_vec ) + time_ms ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,end} ) ;
This will give you an array time with a Matlab time serial number (often easier to use than strings). To show you the serial number still represent the right time:
>> datestr(time,'yyyy-mm-dd HH:MM:SS.FFF')
ans =
2015-07-31 15:58:25.612
2015-07-31 15:58:25.620
2015-07-31 15:58:25.625
For comlicated string parsing situations like such it is best to use regexp. In this case assuming you have the data in file data.txt the following code should do what you are looking for:
txt = fileread('data.txt')
tokens = regexp(txt,'term(\d+)\s(\S*)\s\[(.*)\]','tokens','dotexceptnewline')
% Convert namenum to numeric type
namenum = cellfun(#(x)str2double(x{1}),tokens)
% Get time stamps from the second row of all the tokens
time = cellfun(#(x)x{2},tokens,'UniformOutput',false);
% Split the numbers in the third column
data = cellfun(#(x)str2double(strsplit(x{3},',')),tokens,'UniformOutput',false)

Textscan Matlab ; Doesn't read the format

I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1

Import Mixed CSV that has quotes around text

I am importing a CSV file that is comma delimited into MATLAB. Each column has quotes around anything I want to consider as text and then a comma.
I am using read_mixed_csv function from the answer to this question to read in the data as a cell: Import CSV file with mixed data types
thisdata = read_mixed_csv(fname, ','); % Reads in the CSV file
thisdata = regexprep(thisdata, '^"|"$','');
However, since a few of my columns look like this:
"FAIRHOPE, Alabama"
"FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA"
"Daphne-Fairhope-Foley, AL"
MATLAB places everything after a comma into a new column. So
"Daphne-Fairhope-Foley, AL"
Becomes two columns
"Daphne-Fairhope-Foley
AL"
How can I get MATLAB to read in a mixed csv file and not only consider a comma as a delimiter, but also consider the quotation marks? Is there a more automated way of doing it than textscan? If textscan is an option, what would that look like?
Here is a sample of the data I'm trying to read in with the header included:
"State Code","County Code","Site Num","Parameter Code","POC","Latitude","Longitude","Datum","Parameter Name","Sample Duration","Pollutant Standard","Date Local","Units of Measure","Event Type","Observation Count","Observation Percent","Arithmetic Mean","1st Max Value","1st Max Hour","AQI","Method Name","Local Site Name","Address","State Name","County Name","City Name","CBSA Name","Date of Last Change"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-01","Micrograms/cubic meter (LC)","None",1,100.0,7.3,7.3,0,30,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-04","Micrograms/cubic meter (LC)","None",1,100.0,7.6,7.6,0,32,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-07","Micrograms/cubic meter (LC)","None",1,100.0,8.6,8.6,0,36,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
"01","003","0010","88101",1,30.498001,-87.881412,"NAD83","PM2.5 - Local Conditions","24 HOUR","PM25 24-hour 2006","2013-01-10","Micrograms/cubic meter (LC)","None",1,100.0,7,7,0,29,"R & P Model 2025 PM2.5 Sequential w/WINS - GRAVIMETRIC","FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, FAIRHOPE, ALABAMA","Alabama","Baldwin","Fairhope","Daphne-Fairhope-Foley, AL","2014-02-11"
*Note: Converting the CSV file to a tab delimited file makes it easier for MATLAB to deal with and circumvents this problem.
Having a text qualifier (like ") is a little tricky, but the following might work if you ensure that each row of your table will have the same number of columns (and probably no empty ones).
Anything not within the text qualifier must be convertible to a number.
function C = csvmixed(eachLine,delim,textQualifier)
% Outputs cell containing mixed string and numeric data given a delimiter (',')
% and a text qualifier ('"'). Each line of the delimited file must be loaded into
% the cell array eachLine, and each line must have the same number of columns.
%
% Example:
% fid = fopen('testcsv.txt','r');
% eachLine = textscan(fid,'%s','Delimiter','\n'); fclose(fid);
% C = csvmixed(eachLine{1},',','"')
assert(ischar(delim) && numel(delim)==1);
assert(ischar(textQualifier) && numel(textQualifier)==1);
% find strings, as specified by the input qualifier
patternStr = sprintf('"([^"]*)"%c?',delim);
patternStr = strrep(patternStr,'"',textQualifier);
Cstr = regexp(eachLine,patternStr,'tokens');
% find numeric data
patternNum = sprintf('(?<=(,|^))[^%c,a-zA-Z]*(?=(,|$))',textQualifier);
patternNum = strrep(patternNum,',',delim);
Cnum = regexp(eachLine,patternNum,'match','emptymatch');
numCols = cellfun(#numel,Cstr) + cellfun(#numel,Cnum);
assert(nnz(diff(numCols))==0,'Number of columns not consistent.')
% get string extents (begin, start) indexes for each string
strExtents = regexp(eachLine,patternStr,'tokenExtents');
% deal out parsed data for each line
C = cell(numel(eachLine),numCols(1));
for ii = 1:numel(eachLine),
strBounds = vertcat(strExtents{ii}{:});
delimLocs = getDelimLocs(eachLine{ii},strBounds,delim);
strCellMap = getCellMap(strBounds,delimLocs);
C(ii,strCellMap) = [Cstr{ii}{:}]; % TODO: preallocate
C(ii,~strCellMap) = num2cell(str2double(Cnum{ii})); % all else must be numeric
end
end
function delimLocs = getDelimLocs(lineText,solidBounds,delim)
delimCharLocs = strfind(lineText,delim);
delimLocs = delimCharLocs(~any(bsxfun(#ge,delimCharLocs,solidBounds(:,1)) & ...
bsxfun(#le,delimCharLocs,solidBounds(:,2)),1));
end
function cellMap = getCellMap(typeBounds,delimLocs)
cellMap = any(bsxfun(#gt,typeBounds(:,1),[0 delimLocs]) & ...
bsxfun(#lt,typeBounds(:,1),[delimLocs Inf]), 1);
end
UPDATE: Fix small typos in getDelimLocs. Add preallocation of cell array.
Use the file exchange code replaceinfile to replace the strings that have commas in them with a period instead.
Use read_mixed_csv from Import CSV file with mixed data types to read in the file.
Remove the extra quotes from the strings that are still left.
replaceinfile(', ', '. ', fname); % Replace commas that was inside quotes and not meant to be separated as periods so they don't show up as a new column
thisdata = read_mixed_csv(fname, ','); % Reads in the CSV file (\t for tab)
thisdata = regexprep(thisdata, '^"|"$',''); % Remove quotes from file and only keep the first 28 columns (last two columns are empty)
For replaceinfile.m function:
For running the code on Linux, change the first line of the section on Perl to
perlCmd = sprintf('"%s"', '/usr/bin/perl');

tab delimited text file from matlab

The following code generates a similar dataset to what I am currently working with:
clear all
a = rand(131400,12);
DateTime=datestr(datenum('2011-01-01 00:01','yyyy-mm-dd HH:MM'):4/(60*24):...
datenum('2011-12-31 23:57','yyyy-mm-dd HH:MM'),...
'yyyy-mm-dd HH:MM');
DateTime=cellstr(DateTime);
header={'DateTime','temp1','temp2','temp4','temp7','temp10',...
'temp13','temp16','temp19','temp22','temp25','temp30','temp35'};
I'm trying to convert the outputs into one variable (called 'Data'), i.e. have header as the first row (1,:), 'DateTime' starting from row 2 (2:end,1) and running through each row, and finally having 'a' as the data (2:end,2:end) if that makes sense. So, 'DateTime' and 'header' are used as the heading for the rows and column respectively. Following this I need to save this into a tab delimited text file.
I hope I've been clear in expressing what I'm attempting.
An easy way, but might be not the fastest:
Data = [header; DateTime, num2cell(a)];
filename = 'test.txt';
dlmwrite(filename,1); %# no create text file, not Excel
xlswrite(filename,Data);
UPDATE:
It appears that xlswrite actually changes the format of DateTime values even if it writes to a text file. If the format is important here is the better and actually faster way:
filename = 'test.txt';
out = [DateTime, num2cell(a)];
out = out'; %# our cell array will be printed by columns, so we have to transpose
fid = fopen(filename,'wt');
%# printing header
fprintf(fid,'%s\t',header{1:end-1});
fprintf(fid,'%s\n',header{end});
%# printing the data
fprintf(fid,['%s\t', repmat('%f\t',1,size(a,2)-1) '%f\n'], out{:});
fclose(fid);

skip reading headers in MATLAB

I had a similar question. but what i am trying now is to read files in .txt format into MATLAB. My problem is with the headers. Many times due to errors the system rewrites the headers in the middle of file and then MATLAB cannot read the file. IS there a way to skip it? I know i can skip reading some characters if i know what the character is.
here is the code i am using.
[c,pathc]=uigetfile({'*.txt'},'Select the data','V:\data');
file=[pathc c];
data= dlmread(file, ',', 1,4);
this way i let the user pick the file. My files are huge typically [ 86400 125 ]
so naturally it has 125 header fields or more depends on files.
Thanks
Because the files are so big i cannot copy , but its in format like
day time col1 col2 col3 col4 ...............................
2/3/2010 0:10 3.4 4.5 5.6 4.4 ...............................
..................................................................
..................................................................
and so on
With DLMREAD you can read only numeric data. It will not read date and time, as your first two columns contain. If other data are all numeric you can tell DLMREAD to skip first row and 2 columns on the right:
data = dlmread(file, ' ', 1,2);
To import also day and time you can use IMPORTDATA instead of DLMREAD:
A = importdata(file, ' ', 1);
dt = datenum(A.textdata(2:end,1),'mm/dd/yyyy');
tm = datenum(A.textdata(2:end,2),'HH:MM');
data = A.data;
The date and time will be converted to serial numbers. You can convert them back with DATESTR function.
It turns out that you can still use textscan. Except that you read everything as string. Then, you attempt to convert to double. 'str2double' returns NaN for strings, and since headers are all strings, you can identify header rows as rows with all NaNs.
For example:
%# find and open file
[c,pathc]=uigetfile({'*.txt'},'Select the data','V:\data');
file=[pathc c];
fid = fopen(file);
%# read all text
strData = textscan(fid,'%s%s%s%s%s%s','Delimiter',',');
%# close the file again
fclose(fid);
%# catenate, b/c textscan returns a column of cells for each column in the data
strData = cat(2,strData{:});
%# convert cols 3:6 to double
doubleData = str2double(strData(:,3:end));
%# find header rows. headerRows is a logical array
headerRowsL = all(isnan(doubleData),2);
%# since I guess you know what the headers are, you can just remove the header rows
dateAndTimeCell = strData(~headerRowsL,1:2);
dataArray = doubleData(~headerRowsL,:);
%# and you're ready to start working with your data