Extracting the Hour from A imported CSV file in Matlab - matlab

Probably a silly question but in matlab I have imported CSV data. One of the columns I have imported into my workspace is a timestamp and it is of the format "4568x1 cell" (stored as strings). It is of the format
timestamp(1) = 'Reading time"
timestamp(2) = '2014-12-19 00:00:43 UTC'
timestamp(3) = '2014-12-19 00:01:43 UTC'
,etc How do I just get the hour so I get 00,01,etc and store it in a new array?

I'd just use datevec here, after some trimming:
timestamp(1) = []; % remove the first line
timestamp = cell2mat(timestamp); % now all are same length can do this
timestamp = timestamp(:,1:end-4); % remove UTC part
timevec = datevec(timestamp); % probably don't need format specifier
time_date should now be a 4567 x 6 matrix (since we removed one row), where the columns are year, month, day, hour, minute, and second. Therefore for the hours we just need this:
hours = time_date(:,4);

You need to use datenum to convert the date string to a numerical format, e.g.
timestamp{2} = '2014-12-19 00:00:43 UTC'; % use curly braces as cell array
timestamp{3} = '2014-12-19 00:01:43 UTC';
date_num(1) = datenum(timestamp{2}(1:end-4),'yyyy-mm-dd HH:MM:SS'); % remove the ' UTC' characters at the end of the string
date_num(2) = datenum(timestamp{3}(1:end-4),'yyyy-mm-dd HH:MM:SS'); % remove the ' UTC' characters at the end of the string
% etc... (use a for loop to go through all elements of the cell array)
hh = [0 diff(date_num)*24*60]; % elapsed time in hours, taking the first time stamp as reference

Since the hour consists of digits surrounded by : symbols, you could easily apply a regular expression with lookaround, and then convert the detected strings into numbers:
%// Data
timestamp{1} = 'Reading time';
timestamp{2} = '2014-12-19 00:00:43 UTC';
timestamp{3} = '2014-12-19 00:01:43 UTC';
%// Get hours as a cell array of cells containing strings:
hours_cells = regexp(timestamp(2:end), '(?<=:)(\d+)(?=:)', 'match'); %// no title
%// Convert to numbers:
hours = cellfun(#(x) str2num(x{1}), hours_cells);
If the character structure is fixed (each number includes left zero padding to equalize width), you can identify the positions of the hour (characters 15 and 16 in this case) and do it more simply:
hours = cellfun(#(x) str2num(x([15 16])), timestamp(2:end));
or, possibly faster,
hours = cellfun(#(x) (x([15 16])-48)*[10;1], timestamp(2:end));
%// 48 is ASCII for '0'

Related

MATLAB: loading data with multiple types in single column

I have some data from a tide gauge I have it at the moment as a .csv file. I would like to load this data into MATLAB as I need to edit it. There are 2 columns that I am interested in, the first is a date & time column in the format [dd/mm/yyyy hh:mm] and the second is a column for tidal elevation. The tidal elevation data are primarily numbers to 3 d.p. however some of the data have letters which are used as flags. I can't use csvread because the date & time format so I changing it to a number in excel (I would prefer to keep it in date time format) but I then couldn't use csvread because it didn't like the letter flags. I tried using readtable which worked (for dates as numbers) however my tidal elevation data is stuck as in a cell as cell2mat doesn't work because I read the elevation data in in string format because of the letters.
I would basically like to know is there an easier way to get the data loaded into MATLAB as what I am doing is a real mess at the moment.
Sample Data:
28/01/1994 22:15 3.312
28/01/1994 22:30 3.057
28/01/1994 22:45 2.793
28/01/1994 23:00 2.541T
28/01/1994 23:15 2.303T
28/01/1994 23:30 2.083
28/01/1994 23:45 1.882
What I've tried:
filename = 'C:\User\Documents\Tide_Data\Fish_all.csv';
fileID = fopen(filename);
data = textread(filename,'%{dd/MM/yyyy HH:mm}D %s');
Badly formed format string, so I changed the date to a number in excel.
data = csvread(filename);
Can't read the letter T so outputs an error.
I had more code which got further before I reached a dead end but I can't reproduce it
I would suggest to read the file using textscan and then convert the date & time string to datenum and convert the last column to double if the letter exist or not.
C = textscan(fileID,'%s %s %s');
% allocate
result = zeros(7,2);
for ii = 1:7
% current date string
dateX = [C{1,1}{ii,1} C{1,2}{ii,1}];
% current number
numStr = C{1,3}{ii,1};
if sum(numStr == 'T') > 0
% remove char
numStr = regexprep(numStr,'T','');
end
% collect
result(ii,:) = [datenum(dateX, 'dd/mm/yyyyHH:MM'), str2double(numStr)];
end
You can then convert date numbers to any string format using datestr

Conversion between time string and second value

I am playing around with the conversion between the time string and the value in second in MATLAB. However, I notice this inconsistency.
startTime = '00:19:00';
N = 15; % minutes
% convert it to the value in sec
startSec = datenum(startTime, 'hh:mm:ss');
% N minutes passed
endSec = startSec+60*N;
% convert it back to the string format
endTime = datestr(endSec, 'hh:mm:ss');
I am expecting my endTime to be '00:34:00', but it turns out to be '00:12:00'.
Why?
I'm surprised your code works at all, because the format strings you passed to datenum are invalid; they need to be uppercase.
Second problem is your assumption that datenum converts the first argument into seconds and returns that value. From the documentation linked above:
DateNumber = datenum(DateString) converts date strings to serial date numbers. ...
A serial date number represents the whole and fractional number of
days from a fixed, preset date (January 0, 0000).
So you need to convert your time offset value also to a DateNumber before adding it to the first result. Here's a fixed version of your code
startTime = '00:19:00';
N = 15; % minutes
% convert it to the value in sec
startSec = datenum(startTime, 'HH:MM:SS');
% N minutes passed
endSec = startSec + datenum(sprintf('00:%02d:00', N), 'HH:MM:SS');
% convert it back to the string format
endTime = datestr(endSec, 'HH:MM:SS');
detenum does not return seconds. Instead, it returns:
the whole and fractional number of days from a fixed, preset date
(January 0, 0000).
startTime = '00:19:00';
N = 15; % minutes
% convert it to the value in sec
startSec = datenum(startTime, 'HH:MM:SS');
startSec = startSec * 24*60*60; % get seconds
% N minutes passed
endSec = startSec+60*N;
% convert it back to the string format
endTime = datestr(endSec / (24*60*60), 'HH:MM:SS');
% will result in
%endTime = 00:34:00
Firstly,
'HH:MM:SS'
is your desired datestring (mm is months). Secondly, datenum doesn't return seconds, it returns days passed from year zero.

extend matrix of time elements in matlab

I've got 1xn matrix called time that I imported from a csv file. Is there any way to extend this matrix by following the time pattern (so that the days per month work)? For example, if I start with.
time =
'"2013-05-01"'
'"2013-05-02"'
'"2013-05-03"'
'"2013-05-04"'
'"2013-05-05"'
And somehow add 5 observations, my matrix becomes:
time =
'"2013-05-01"'
'"2013-05-02"'
'"2013-05-03"'
'"2013-05-04"'
'"2013-05-05"'
'"2013-05-06"'
'"2013-05-07"'
'"2013-05-08"'
'"2013-05-09"'
'"2013-05-10"'
If time is a char matrix:
N = 5; %// how many dates to add
lastdate = datenum(strrep(time(end,:),'"',''),29); %// last available date
time = [time; [repmat('"',N,1) datestr(lastdate+(1:N),29) repmat('"',N,1)] ];
If time is a cell array, just replace last line by
time = [time; mat2cell([repmat('"',N,1) datestr(lastdate+(1:N),29) repmat('"',N,1)],ones(1,N)) ];
This works by reading the last string date, converting to numerical date with datenum, generating N new consecutive dates, and then converting back to string with datestr. The double quotes are dealt with separately.
Example:
>>time = ['"2013-05-04"'; '"2013-05-05"']
time =
"2013-05-04"
"2013-05-05"
gives
>> N = 5; %// how many dates to add
lastdate = datenum(strrep(time(end,:),'"',''),29); %// last available date
time = [time; [repmat('"',N,1) datestr(lastdate+(1:N),29) repmat('"',N,1)] ]
time =
"2013-05-04"
"2013-05-05"
"2013-05-06"
"2013-05-07"
"2013-05-08"
"2013-05-09"
"2013-05-10"
I'm assuming time is a cell array here:
t = cell2mat(time);
n = 5;
t = datenum(t,'"yyyy-mm-dd"'); % using custom format
tdiff = t(end)-t(end-1); % assuming
l = length(t);
newtime = zeros(l+n,1);
newtime(1:l)=t;
newtime(l+1:end) = (t(end)+tdiff):tdiff:(t(end)+tdiff*n);
You can use datestr to convert back to the date format of your choice.

For command + interpolation: need some tips

I have a matrix A with three columns: daily dates, prices, and hours - all same size vector - there are multiple prices associated to hours in a day.
sample data below:
A_dates = A_hours= A_prices=
[20080902 [9.698 [24.09
20080902 9.891 24.59
200080902 10.251 24.60
20080903 9.584 25.63
200080903 10.45 24.96
200080903 12.12 24.78
200080904 12.95 26.98
20080904 13.569 26.78
20080904] 14.589] 25.41]
Keep in my mind that I have about two years of daily data with about 10 000 prices per day that covers almost every minutes in a day from 9:30am to 16:00pm. Actually my initial dataset time was in milliseconds. I then converted my milliseconds in hours. I have some hours like 14.589 repeated three times with 3 different prices. Hence I did the following:
time=[A_dates,A_hours,A_prices];
[timeinhr,price]=consolidator(time,A_prices,'mean'); where timeinhr is both vector A_dates and A_hours
to take an average price at each say 14.589hours.
then for any missing hours with .25 .50 .75 and integer hours - I wish to interpolate.
For each date, hours repeat and I need to interpolate linearly prices that I don't have for some "wanted" hours. But of course I can't use the command interp1 if my hours repeats in my column because I have multiple days. So say:
%# here I want hours in 0.25unit increments (like 9.5hrs)
new_timeinhr = 0:0.25:max(A_hours));
day_hour = rem(new_timeinhour, 24);
%# Here I want only prices between 9.5hours and 16hours
new_timeinhr( day_hour <= 9.2 | day_hour >= 16.1 ) = [];
I then create a unique vectors of day and want to use a for and if command to interpolate daily and then stack my new prices in a vector one after the other:
days = unique(A_dates);
for j = 1:length(days);
if A_dates == days(j)
int_prices(j) = interp1(A_hours, A_prices, new_timeinhr);
end;
end;
My error is:
In an assignment A(I) = B, the number of elements in B and I must be the same.
How can I write the int_prices(j) to the stack?
I recommend converting your input to a single monotonic time value. Use the MATLAB datenum format, which represents one day as 1. There are plenty of advantages to this: You get the builtin MATLAB time/date functions, you get plot labels formatted nicely as date/time via datetick, and interpolation just works. Without test data, I can't test this code, but here's the general idea.
Based on your new information that dates are stored as 20080902 (I assume yyyymmdd), I've updated the initial conversion code. Also, since the layout of A is causing confusion, I'm going to refer to the columns of A as the vectors A_prices, A_hours, and A_dates.
% This datenum vector matches A. I'm assuming they're already sorted by date and time
At = datenum(num2str(A_dates), 'yyyymmdd') + datenum(0, 0, 0, A_hours, 0, 0);
incr = datenum(0, 0, 0, 0.25, 0, 0); % 0.25 hour
t = (At(1):incr:At(end)).'; % Full timespan of dataset, in 0.25 hour increments
frac_hours = 24*(t - floor(t)); % Fractional hours into the day
t_business_day = t((frac_hours > 9.4) & (frac_hours < 16.1)); % Time vector only where you want it
P = interp1(At, A_prices, t_business_day);
I repeat, since there's no test data, I can't test the code. I highly recommend testing the date conversion code by using datestr to convert back from the datenum to readable dates.
Converting days/hours to serial date numbers, as suggested by #Peter, is definitely the way to go. Based on his code (which I already upvoted), I present below a simple example.
First I start by creating some fake data resembling what you described (with some missing parts as well):
%# three days in increments of 1 hour
dt = datenum(num2str((0:23)','2012-06-01 %02d:00'), 'yyyy-mm-dd HH:MM'); %#'
dt = [dt; dt+1; dt+2];
%# price data corresponding to each hour
p = cumsum(rand(size(dt))-0.5);
%# show plot
plot(dt, p, '.-'), datetick('x')
grid on, xlabel('Date/Time'), ylabel('Prices')
%# lets remove some rows as missing
idx = ( rand(size(dt)) < 0.1 );
hold on, plot(dt(idx), p(idx), 'ro'), hold off
legend({'prices','missing'})
dt(idx) = [];
p(idx) = [];
%# matrix same as yours: days,prices,hours
ymd = str2double( cellstr(datestr(dt,'yyyymmdd')) );
hr = str2double( cellstr(datestr(dt,'HH')) );
A = [ymd p hr];
%# let clear all variables except the data matrix A
clearvars -except A
Next we interpolate the price data across the entire range in 15 minutes increments:
%# convert days/hours to serial date number
dt = datenum(num2str(A(:,[1 3]),'%d %d'), 'yyyymmdd HH');
%# create a vector of 15 min increments
t_15min = (0:0.25:(24-0.25))'; %#'
tt = datenum(0,0,0, t_15min,0,0);
%# offset serial date across all days
ymd = datenum(num2str(unique(A(:,1))), 'yyyymmdd');
tt = bsxfun(#plus, ymd', tt); %#'
tt = tt(:);
%# interpolate data at new datetimes
pp = interp1(dt, A(:,2), tt);
%# extract desired period of time from each day
idx = (9.5 <= t_15min & t_15min <= 16);
idx2 = bsxfun(#plus, find(idx), (0:numel(ymd)-1)*numel(t_15min));
P = pp(idx2(:));
%# plot interpolated data, and show extracted periods
figure, plot(tt, pp, '.-'), datetick('x'), hold on
plot([tt(idx2);nan(1,numel(ymd))], [pp(idx2);nan(1,numel(ymd))], 'r.-')
hold off, grid on, xlabel('Date/Time'), ylabel('Prices')
legend({'interpolated prices','period of 9:30 - 16:00'})
and here are the two plots showing the original and interpolated data:
I think I might have solved it this way:
new_timeinhr = 0:0.25:max(A(:,2));
day_hour = rem(new_timeinhr, 24);
new_timeinhr( day_hour <= 9.4 | day_hour >= 16.1 ) = [];
days=unique(data(:,1));
P=[];
for j=1:length(days);
condition=A(:,1)==days(j);
intprices = interp1(A(condition,2), A(condition,3), new_timeinhr);
P=vertcat(P,intprices');
end;

From UTC to date vector

I have a set of date expressed in form of 2010-07-31T23:01:57Z. I need to transform it into vector. I could do this as datavec(mydate) it will transform automatically the string into vector, but this function doesn't accept UTC string date. So I have rudely resolved in this way:
a = `2010-07-31T23:01:57Z`; %UTC date format
a(20) = ''; %remove Z
a(11) = ' '; %replace T with a space
a = datavec(a); % [2010, 7, 31, 23, 1, 57]
In this way a is a datevector and I can use etime(T1, T0) to compute time difference between T1 and T0. Is this the unique way or I can do something stylish?
As Marc suggests, all you have to do is 'help' MATLAB a bit by specifying the data format.
datevec('2010-07-31T23:01:57Z','yyyy-mm-ddTHH:MM:SS')
should do the trick.
use the fieldSpecIn part of the datevec() call. Read all about it here:
http://www.mathworks.com/help/techdoc/matlab_prog/bspgcx2-1.html#bspgc4m-1
Except for the TZ spec at the end of your string and the -'s and :'s, yours is identical to the standard ISO 8601 format. You can either modify yours, or create your own spec string.
Please note that Matlab time has no concept of time zone. You must keep track of that yourself.
As stated by others, you could specify the format of the expected date strings to functions like DATEVEC and DATENUM. Example:
%# set of dates and an initial date
T0 = '2010-01-01T00:00:00Z';
T = repmat({'2010-07-31T23:01:57Z'}, [10 1]); %# cell array
%# format of datetime
frmt = 'yyyy-mm-ddTHH:MM:SS';
%# convert to serial date numbers, and compute difference between all T and T0
n = datenum(T,frmt);
n0 = datenum(T0,frmt);
tdiff = bsxfun(#minus, n, n0);
Alternatively, if you have an unusual date format, you can always split and parse it yourself with functions like TEXTSCAN:
vec = zeros(numel(T),6);
for i=1:numel(T)
C = textscan(T{i}, '%f', 'Delimiter',['-' ':' 'T' 'Z']);
vec(i,:) = C{1};
end
n = datenum(vec);
Note that in both cases, you can convert back serial times to date strings with the DATESTR function.