I have Matlab table with strings and I would like to convert it to the datenum format. I tried:
datenum(Tbl)
but I received the error:
Error using datenum (line 181)
DATENUM failed.
Caused by:
Error using datevec (line 103)
The input to DATEVEC was not an array of character vectors.
Here is a sample of my Tbl:
Tbl = table;
Tbl.('A') = {'29/07/2017'; 0};
Tbl.('B') = {'29/07/2017'; '31/07/2017'};
Try varfun:
varfun(#datenum,Tbl)
to produce
datenum_A datenum_B
_________ _________
12791 12791
0 13521
Option 2
Alternatively, can do them one column at a time like this:
Tbl.('A') = cellfun(#datenum,Tbl.('A'))
to produce
Tbl =
A B
_____ ____________
12791 '29/07/2017'
0 '31/07/2017'
Then you can do it for 'B', etc.
Convert table to an array first and then apply datenum alongwith the format of date. Inclusion of numbers in your data is weird but, anyway, here is a solution:
numdate= table2array(Tbl); %Converting table to array
ind = cellfun(#ischar,numdate); %Finding logical indices of dates stored as char array
%Finding serial date number of dates; not doing anything on numeric values
numdate(ind)=cellfun(#(x) datenum(x, 'dd/mm/yyyy'), numdate(ind),'un',0); %Serial Datenums
%Converting back the date number serials into dates
dateback=numdate; dateback(ind)=cellfun(#datestr,numdate(ind),'un',0);
Output:
>> numdate
numdate =
[736905] [736905]
[ 0] [736907]
>> dateback
dateback =
'29-Jul-2017' '29-Jul-2017'
[ 0] '31-Jul-2017'
Related
I have a dataframe with a column of type cell.
In each cell is a date, written as 'Mmm-yyyy'. For example, 'Apr-1997' and 'Oct-2002'.
How can I turn the cell array into datetime format, and then sort the dataframe on this date column.
Assuming that by "dataframe" you mean a table:
t = table; % example table
t.datestring = {'Apr-1997'; 'Oct-2002'; 'Jan-2000'};
t.other = [10; 20; 30];
This creates an example table:
t =
3×2 table
datestring other
__________ _____
'Apr-1997' 10
'Oct-2002' 20
'Jan-2000' 30
To create a new column with content of type datetime and sort rows based on that:
t.date = datetime(t.datestring); % create new column
[~, ind] = sort(datetime(t.date)); % get index based on sorting that column
t_new = t(ind, :); % apply that index to the rows
This gives
t_new =
3×3 table
datestring other date
__________ _____ ___________
'Apr-1997' 10 01-Apr-1997
'Jan-2000' 30 01-Jan-2000
'Oct-2002' 20 01-Oct-2002
MATLAB R2015b
I have a table containing a date string and a time string in various formats in two columns for each row:
11.01.2016 | 00:00:00 | data
10/19/16 | 05:29:00 | data
12.02.16 | 06:40 | data
I want to convert this two columns to one column with a common format:
31.12.2017 14:00:00
My current solution uses a loop over each row and combines the columns as strings, checks for the various formats to use datetime with an appropriate format string and then uses datestr with the desired format string. Datetime was not able to automatically determine the format of the input string.
As you can imagine, this is horribly slow for large tables (approx. 50000 rows).
Is there any faster solution?
Thanks in advance.
I gave a try to vectorize the code. The trick is to
convert tables > cell > char-array, then
manipulate char strings, then
convert back from char-array > cell > table
Also, there is an important bit to pad all cells having shorter lenths with 'null' character in a vectorized way. Without this, it will not be possible to convert from cell > char-array. Here is the code.
clc
clear all
%% create Table T
d={'11.01.2016';
'10/19/16';
'12.02.16'};
t={'00:00:00';
'05:29:00';
'06:40'};
dat=[123;
456;
789];
T = table(d,t,dat);
%% deal with dates in Table T
% separate date column and convert to cell
dd = table2cell(T(:,1));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter dates
nmax=max(cellfun(#numel,dd));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,dd,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ddd=cell2mat(temp1);
% replace the separators in 3rd and 6th location with '.' (period)
ddd(:,[3 6]) = repmat(['.' '.'], length(dd),1);
% find indexes of shorter dates
short_year_idx = find(uint16(ddd(:,nmax)) == 0);
% find the year value for those short_year cases
yy = ddd(short_year_idx,[7 8]);
% replace null chars with '20XX' string in desirted place
ddd(short_year_idx,7:nmax) = ...
[repmat('20',size(short_year_idx,1),1) yy];
% convert char array back to cell and replace in table
dddd = mat2cell(ddd,ones(1,size(d,1)),nmax);
T(:,1) = table(dddd);
%% deal with times in Table T
% separate time column and convert to cell
tt = table2cell(T(:,2));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter times
nmax=max(cellfun(#numel,tt));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,tt,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ttt=cell2mat(temp1);
% find indexes of shorter times (assuming only ':00' in end is missing
short_time_idx = find(uint16(ttt(:,nmax)) == 0);% dirty hack, as null=0 in ascii
% replace null chars with ':00' string
ttt(short_time_idx,[6 7 8]) = repmat(':00',size(short_time_idx,1),1);
% convert char array back to cell and replace in table
tttt = mat2cell(ttt,ones(1,size(t,1)),nmax);
T(:,2) = table(tttt);
If you call the two columns cell arrays c1 and c2, then something like this should work:
c = detestr(datenum(strcat(c1,{' '},c2)), 'dd.mm.yyyy HH:MM:SS')
Then you would need to drop the old columns and put this one c in their place. On the inside, datenum must be doing something similar to what you're doing, however, so I'm not sure if this will be faster. I suspect that it is because (we can hope) the standard functions are optimized.
If your table isn't representing those as cell arrays, then you may need to do a pre-processing step to form the cell arrays for strcat.
I'm trying to convert a character vector (200,000 rows) into Matlab serial numbers.
The format is '01/07/2015 00:00:59'.
This takes an incredibly long time, and online I can only find tips for solving this in Matlab. Any ideas how I can improve this?
You can use the datenum(datevector) type of input for datenum.
It is much faster than the string parsing. I frequently use this trick whenever I have to import long date/time data (which is nearly everyday).
It consists in sending a mx6 (or mx3) matrix, containing values representing [yy mm dd HH MM SS]. The matrix should be of type double.
It means instead of letting Matlab/Octave do the parsing, you read all the numbers in the string with your favourite way (textscan, fscanf, sscanf, ...), then you send numbers to datenum instead of string.
In the example below I generated a long array (86401x19) of date string as sample data:
>> strDate(1:5,:)
ans =
31/07/2015 15:10:13
31/07/2015 15:10:14
31/07/2015 15:10:15
31/07/2015 15:10:16
31/07/2015 15:10:17
To convert that to datenum faster than by the conventional way, I use:
strDate = [strDate repmat(' ',size(strDate,1),1)] ; %// add a whitespace at the end of each line
M = textscan( strDate.' , '%f/%f/%f %f:%f:%f' ) ; %'// read each value independently
M = cell2mat(M) ; %// convert to matrix
M = M(:,[3 2 1 4 5 6]) ; %// reorder columns
dt = datenum(M ) ; %// convert to serial date
This should bring speed up in Matlab but I am pretty sure it should improve things in Octave too. To quantify that at least on Matlab, here's a quick benchmark:
function test_datenum
d0 = now ;
d = (d0:1/3600/24:d0+1).' ; %// 1 day worth of date (one per second)
strDate = datestr(d,'dd/mm/yyyy HH:MM:SS') ; %'// generate the string array
fprintf('Time with automatic date parsing: %f\n' , timeit(#(x) datenum_auto(strDate)) )
fprintf('Time with customized date parsing: %f\n', timeit(#(x) datenum_preparsed(strDate)) )
function dt = datenum_auto(strDate)
dt = datenum(strDate,'dd/mm/yyyy HH:MM:SS') ; %// let Matlab/Octave do the parsing
function dt = datenum_preparsed(strDate)
strDate = [strDate repmat(' ',size(strDate,1),1)] ; %// add a whitespace at the end of each line
M = textscan( strDate.' , '%f/%f/%f %f:%f:%f' ) ; %'// read each value independently
M = cell2mat(M) ; %// convert to matrix
M = M(:,[3 2 1 4 5 6]) ; %// reorder columns
dt = datenum(M ) ; %// convert to serial date
On my machine, it yields:
>> test_datenum
Time with automatic date parsing: 0.614698
Time with customized date parsing: 0.073633
Of course you could also compact the code in a couple of lines:
M = cell2mat(textscan([strDate repmat(' ',size(strDate,1),1)].','%f/%f/%f %f:%f:%f'))) ;
dt = datenum( M(:,[3 2 1 4 5 6]) ) ;
But I tested it and the improvement is so marginal that it is not really worth the loss of readability.
I have a text file which contains binary data in the following manner:
00000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111110000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000000000111111111111111111111111111111111111110000000011100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111100111110000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111110111110000000000000000000000000000000
00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111000000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111110000011100000000000000000000000000000000
00000000000000000000000000000000000000000000011111111111111111111111111111111111100000000011100000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111110111100000000000000000000000000000000
Please note that each 1 or 0 is independent i.e the values are not decimal. I need to find the column wise sum of the file. There are 125 columns in all and there are 840946 rows.
I have tried textread, fscanf and a few other matlab commands, but the result is that they all read each row in decimal format and create a 840946x1 array. I want to create a 840946x125 matrix to compute a column wise sum.
You can use textread to do it. Just read strings and later process them with sscanf, one digit at a time
A = textread('data.txt', '%s');
ncols = size(A, 1);
nrows = size(A{1}, 2);
A = reshape(sscanf([A{:}], '%1d'), nrows, ncols);
Note that now A is transposed, i.e. you have 125 rows.
The column-wise sum is then computed simply by
colsum = sum(A);
Here's a slightly hack-ish approach:
A = textread('data.txt', '%s');
colsum = sum(cat(1,A{:})-'0')
Breakdown:
textread will read each line of 0's and 1's as a single string. A will therefore be a cell-string, with each element equal to a string of length 125.
cat(1,A{:}) will concatenate the cell string into a "normal" Matlab character array of size 840946-by-125.
Subtracting the ASCII-value '0' from any character array consisting of 0's and 1's will return their numeric representation. For example, 'a'-0 = 97, the ASCII-value for lower-case 'a'.
sum will finally sum over the columns of this array.
I want to do something like
scatter(timesRefined, upProb)
where timesRefined is a cell array in which each entry is a string corresponding to a time moment, such as 8:32:21.122 and upProb is simply a vector of numbers with same length as cell array. What is the most convenient way to do this?
You can convert your timesRefined cell to a numeric representation of date with datenum
>> timesRefined = {'8:32:21.122','9:30:54.123'};
>> datenum(timesRefined)
ans =
734869.355800023
734869.396459757
The resulting number expresses a date as days from the epoch. Since you are not concerned with days, just time, and provided your observations are contained within one day, you can simply take the fractional part of the datenum output:
>> datestr(mod(datenum(timesRefined),1))
ans =
8:32 AM
9:30 AM
and do scater(mod(datenum(timesRefined),1),upProb)
EDIT:
As pointed out by Pursuit, you can use the result of datenum directly as your x values and use datetick('x','HH:MM:SS.FFF')
strsplit from the Matlab file exchange should help. If all values are numeric, you'll get a matrix back.
timestr = '8:32:21.122';
timenum = strsplit(timestr,':');
convmat = [60*60; 60; 1];
time_in_seconds = sum(timenum .* convmat);