skip reading headers in MATLAB - matlab

I had a similar question. but what i am trying now is to read files in .txt format into MATLAB. My problem is with the headers. Many times due to errors the system rewrites the headers in the middle of file and then MATLAB cannot read the file. IS there a way to skip it? I know i can skip reading some characters if i know what the character is.
here is the code i am using.
[c,pathc]=uigetfile({'*.txt'},'Select the data','V:\data');
file=[pathc c];
data= dlmread(file, ',', 1,4);
this way i let the user pick the file. My files are huge typically [ 86400 125 ]
so naturally it has 125 header fields or more depends on files.
Thanks
Because the files are so big i cannot copy , but its in format like
day time col1 col2 col3 col4 ...............................
2/3/2010 0:10 3.4 4.5 5.6 4.4 ...............................
..................................................................
..................................................................
and so on

With DLMREAD you can read only numeric data. It will not read date and time, as your first two columns contain. If other data are all numeric you can tell DLMREAD to skip first row and 2 columns on the right:
data = dlmread(file, ' ', 1,2);
To import also day and time you can use IMPORTDATA instead of DLMREAD:
A = importdata(file, ' ', 1);
dt = datenum(A.textdata(2:end,1),'mm/dd/yyyy');
tm = datenum(A.textdata(2:end,2),'HH:MM');
data = A.data;
The date and time will be converted to serial numbers. You can convert them back with DATESTR function.

It turns out that you can still use textscan. Except that you read everything as string. Then, you attempt to convert to double. 'str2double' returns NaN for strings, and since headers are all strings, you can identify header rows as rows with all NaNs.
For example:
%# find and open file
[c,pathc]=uigetfile({'*.txt'},'Select the data','V:\data');
file=[pathc c];
fid = fopen(file);
%# read all text
strData = textscan(fid,'%s%s%s%s%s%s','Delimiter',',');
%# close the file again
fclose(fid);
%# catenate, b/c textscan returns a column of cells for each column in the data
strData = cat(2,strData{:});
%# convert cols 3:6 to double
doubleData = str2double(strData(:,3:end));
%# find header rows. headerRows is a logical array
headerRowsL = all(isnan(doubleData),2);
%# since I guess you know what the headers are, you can just remove the header rows
dateAndTimeCell = strData(~headerRowsL,1:2);
dataArray = doubleData(~headerRowsL,:);
%# and you're ready to start working with your data

Related

Reading Text file comma seperated on Matlab

I'm trying to read a comma separated text file that looks like the following:
2017-10-24,01:17:38,2017-10-24,02:17:38,+1.76,L,Meters
2017-10-24,02:57:31,2017-10-24,03:57:31,+1.92,H,Meters
2017-10-24,05:53:35,2017-10-24,06:53:35,+1.00,L,Meters
2017-10-24,10:45:01,2017-10-24,11:45:01,+2.06,H,Meters
2017-10-24,13:27:16,2017-10-24,14:27:16,+1.78,L,Meters
2017-10-24,15:07:16,2017-10-24,16:07:16,+1.92,H,Meters
2017-10-24,18:12:08,2017-10-24,19:12:08,+0.98,L,Meters
My code so far is:
LT_data = fopen('D:\Beach Erosion and Recovery\Bournemouth\Bournemouth Tidal Data\tidal_data_jtide.txt');% text file containing predicted low tide times
LT_celldata = textscan(LT_data,'%D %D %D %D %d ','delimiter',',')'
For mixed data types, I'd recommend readtable. This will read your data straight into a table object without having to specify a format spec or use fopen,
t = readtable( 'myFile.txt', 'ReadVariableNames', false, 'Delimiter', ',' );
Then you can easily manipulate the data
% Variable names in the table
t.Properties.VariableNames = {'Date1', 'Time1', 'Date2', 'Time2', 'Value', 'Dim', 'Units'};
% Create full datetime object columns from the date and time columns
t.DateTime1 = datetime( strcat(t.Date1,t.Time1), 'InputFormat', 'yyyy-MM-ddHH:mm:ss' );
If you do know the formats, you can specify the 'format' property within readtable and it will convert the data when reading.
This is working perfectly. The formatspec need to be edited.
file = 'D:\Beach Erosion and Recovery\Bournemouth\Bournemouth Tidal Data\tidal_data_jtide.txt'
fileID = fopen(file);
LT_celldata = textscan(fileID,'%D%D%D%D%d%[^\n\r]','delimiter',',')

joining arrays in Matlab and writing to file using dlmwrite( ) adds extra space

I am generating 2500 values in Matlab in format (time,heart_rate, resp_rate) by using below code
numberOfSeconds = 2500;
time = 1:numberOfSeconds;
newTime = transpose(time);
number0 = size(newTime, 1)
% generating heart rates
heart_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intHeartRate = int64(heart_rate);
number1 = size(intHeartRate, 1)
% hist(heart_rate)
% generating resp rates
resp_rate = 50 +(70-50) * rand (numberOfSeconds,1);
intRespRate = int64(resp_rate);
number2 = size(intRespRate, 1)
% hist(heart_rate)
% joining time and sensor data
joinedStream = strcat(num2str(newTime),{','},num2str(intHeartRate),{','},num2str(intRespRate))
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', joinedStream,'delimiter','');
The data shown in the console is alright, but when I save this data to a .txt file, it contains extra spaces in beginning. Hence I am not able to parse the .txt file to generate input stream. Please help
Replace the last two lines of your code with the following. No need to use strcat if you want a CSV output file.
dlmwrite('/Users/amar/Desktop/geenrated/rate.txt', [newTime intHeartRate intRespRate]);
π‘‡β„Žπ‘’ π‘ π‘œπ‘™π‘’π‘‘π‘–π‘œπ‘› 𝑠𝑒𝑔𝑔𝑒𝑠𝑑𝑒𝑑 𝑏𝑦 π‘ƒπΎπ‘œ 𝑖𝑠 π‘‘β„Žπ‘’ π‘ π‘–π‘šπ‘π‘™π‘’π‘ π‘‘ π‘“π‘œπ‘Ÿ π‘¦π‘œπ‘’π‘Ÿ π‘π‘Žπ‘ π‘’. π‘‡β„Žπ‘–π‘  π‘Žπ‘›π‘ π‘€π‘’π‘Ÿ 𝑒π‘₯π‘π‘™π‘Žπ‘–π‘›π‘  π‘€β„Žπ‘¦ π‘¦π‘œπ‘’ 𝑔𝑒𝑑 π‘‘β„Žπ‘’ 𝑒𝑛𝑒π‘₯𝑝𝑒𝑐𝑑𝑒𝑑 π‘œπ‘’π‘‘π‘π‘’π‘‘.
The data written in the file is exactly what is shown in the console.
>> joinedStream(1) %The exact output will differ since 'rand' is used
ans =
cell
' 1,60,63'
num2str basically converts a matrix into a character array. Hence number of characters in its each row must be same. So for each column of the original matrix, the row with the maximum number of characters is set as a standard for all the rows with less characters and the deficiency is filled by spaces. Columns are separated by 2 spaces. Take a look at the following smaller example to understand:
>> num2str([44, 42314; 4, 1212421])
ans =
2Γ—11 char array
'44 42314'
' 4 1212421'

MATLAB reading CSV file with timestamp and values

I have the following sample from a CSV file. Structure is:
Date ,Time(Hr:Min:S:mS), Value
2015:08:20,08:20:19:123 , 0.05234
2015:08:20,08:20:19:456 , 0.06234
I then would like to read this into a matrix in MATLAB.
Attempt :
Matrix = csvread('file_name.csv');
Also tried an attempt formatting the string.
fmt = %u:%u:%u %u:%u:%u:%u %f
Matrix = csvread('file_name.csv',fmt);
The problem is when the file is read the format is wrong and displays it differently.
Any help or advice given would be greatly appreciated!
EDIT
When using #Adriaan answer the result is
2015 -11 -9
8 -17 -1
So it seems that MATLAB thinks the '-' is the delimiter(separator)
Matrix = csvread('file_name.csv',1,0);
csread does not support a format specifier. Just enter the number of header rows (I took it to be one, as per example), and number of header columns, 0.
You file, however, contains non-numeric data. Thus import it with importdata:
data = importdata('file_name.csv')
This will get you a structure, data with two fields: data.data contains the numeric data, i.e. a vector containing your value. data.textdata is a cell containing the rest of the data, you need the first two column and extract the numerics from it, i.e.
for ii = 2:size(data.textdata,1)
tmp1 = data.textdata{ii,1};
Date(ii,1) = datenum(tmp1,'YYYY:MM:DD');
tmp2 = data.textdata{ii,2};
Date(ii,2) = datenum(tmp2,'HH:MM:SS:FFF');
end
Thanks to #Excaza it turns out milliseconds are supported.

read complicated format .txt file into Matlab

I have a txt file that I want to read into Matlab. Data format is like below:
term2 2015-07-31-15_58_25_612 [0.9934343, 0.3423043, 0.2343433, 0.2342323]
term0 2015-07-31-15_58_25_620 [12]
term3 2015-07-31-15_58_25_625 [2.3333, 3.4444, 4.5555]
...
How can I read these data in the following way?
name = [term2 term0 term3] or namenum = [2 0 3]
time = [2015-07-31-15_58_25_612 2015-07-31-15_58_25_620 2015-07-31-15_58_25_625]
data = {[0.9934343, 0.3423043, 0.2343433, 0.2342323], [12], [2.3333, 3.4444, 4.5555]}
I tried to use textscan in this way 'term%d %s [%f, %f...]', but for the last data part I cannot specify the length because they are different. Then how can I read it? My Matlab version is R2012b.
Thanks a lot in advance if anyone could help!
There may be a way to do that in one single pass, but for me these kind of problems are easier to sort with a 2 pass approach.
Pass 1: Read all the columns with a constant format according to their type (string, integer, etc ...) and read the non constant part in a separate column which will be processed in second pass.
Pass 2: Process your irregular column according to its specificities.
In a case with your sample data, it looks like this:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %s %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data into variables
name = M{1,1} ;
time = M{1,2} ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,3} ) ;
What happened:
The first textscan instruction reads the full file. In the format specifier:
term%d read the integer after the literal expression 'term'.
%s read a string representing the date.
%*c ignore one character (to ignore the character '[').
%[^]] read everything (as a string) until it finds the character ']'.
%*[^\n] ignore everything until the next newline ('\n') character. (to not capture the last ']'.
After that, the first 2 columns are easily dispatched into their own variable. The 3rd column of the result cell array M contains strings of different lengths containing different number of floating point number. We use cellfun in combination with another textscan to read the numbers in each cell and return a cell array containing double:
Bonus:
If you want your time to be a numeric value as well (instead of a string), use the following extension of the code:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %f-%f-%f-%f_%f_%f_%f %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data
name = M{1,1} ;
time_vec = cell2mat( M(1,2:7) ) ;
time_ms = M{1,8} ./ (24*3600*1000) ; %// take care of the millisecond separatly as they are not handled by "datenum"
time = datenum( time_vec ) + time_ms ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,end} ) ;
This will give you an array time with a Matlab time serial number (often easier to use than strings). To show you the serial number still represent the right time:
>> datestr(time,'yyyy-mm-dd HH:MM:SS.FFF')
ans =
2015-07-31 15:58:25.612
2015-07-31 15:58:25.620
2015-07-31 15:58:25.625
For comlicated string parsing situations like such it is best to use regexp. In this case assuming you have the data in file data.txt the following code should do what you are looking for:
txt = fileread('data.txt')
tokens = regexp(txt,'term(\d+)\s(\S*)\s\[(.*)\]','tokens','dotexceptnewline')
% Convert namenum to numeric type
namenum = cellfun(#(x)str2double(x{1}),tokens)
% Get time stamps from the second row of all the tokens
time = cellfun(#(x)x{2},tokens,'UniformOutput',false);
% Split the numbers in the third column
data = cellfun(#(x)str2double(strsplit(x{3},',')),tokens,'UniformOutput',false)

tab delimited text file from matlab

The following code generates a similar dataset to what I am currently working with:
clear all
a = rand(131400,12);
DateTime=datestr(datenum('2011-01-01 00:01','yyyy-mm-dd HH:MM'):4/(60*24):...
datenum('2011-12-31 23:57','yyyy-mm-dd HH:MM'),...
'yyyy-mm-dd HH:MM');
DateTime=cellstr(DateTime);
header={'DateTime','temp1','temp2','temp4','temp7','temp10',...
'temp13','temp16','temp19','temp22','temp25','temp30','temp35'};
I'm trying to convert the outputs into one variable (called 'Data'), i.e. have header as the first row (1,:), 'DateTime' starting from row 2 (2:end,1) and running through each row, and finally having 'a' as the data (2:end,2:end) if that makes sense. So, 'DateTime' and 'header' are used as the heading for the rows and column respectively. Following this I need to save this into a tab delimited text file.
I hope I've been clear in expressing what I'm attempting.
An easy way, but might be not the fastest:
Data = [header; DateTime, num2cell(a)];
filename = 'test.txt';
dlmwrite(filename,1); %# no create text file, not Excel
xlswrite(filename,Data);
UPDATE:
It appears that xlswrite actually changes the format of DateTime values even if it writes to a text file. If the format is important here is the better and actually faster way:
filename = 'test.txt';
out = [DateTime, num2cell(a)];
out = out'; %# our cell array will be printed by columns, so we have to transpose
fid = fopen(filename,'wt');
%# printing header
fprintf(fid,'%s\t',header{1:end-1});
fprintf(fid,'%s\n',header{end});
%# printing the data
fprintf(fid,['%s\t', repmat('%f\t',1,size(a,2)-1) '%f\n'], out{:});
fclose(fid);