How can I convert a cell array to a datetime in MATLAB? - matlab

I have a dataframe with a column of type cell.
In each cell is a date, written as 'Mmm-yyyy'. For example, 'Apr-1997' and 'Oct-2002'.
How can I turn the cell array into datetime format, and then sort the dataframe on this date column.

Assuming that by "dataframe" you mean a table:
t = table; % example table
t.datestring = {'Apr-1997'; 'Oct-2002'; 'Jan-2000'};
t.other = [10; 20; 30];
This creates an example table:
t =
3×2 table
datestring other
__________ _____
'Apr-1997' 10
'Oct-2002' 20
'Jan-2000' 30
To create a new column with content of type datetime and sort rows based on that:
t.date = datetime(t.datestring); % create new column
[~, ind] = sort(datetime(t.date)); % get index based on sorting that column
t_new = t(ind, :); % apply that index to the rows
This gives
t_new =
3×3 table
datestring other date
__________ _____ ___________
'Apr-1997' 10 01-Apr-1997
'Jan-2000' 30 01-Jan-2000
'Oct-2002' 20 01-Oct-2002

Related

Converting table strings to datenum

I have Matlab table with strings and I would like to convert it to the datenum format. I tried:
datenum(Tbl)
but I received the error:
Error using datenum (line 181)
DATENUM failed.
Caused by:
Error using datevec (line 103)
The input to DATEVEC was not an array of character vectors.
Here is a sample of my Tbl:
Tbl = table;
Tbl.('A') = {'29/07/2017'; 0};
Tbl.('B') = {'29/07/2017'; '31/07/2017'};
Try varfun:
varfun(#datenum,Tbl)
to produce
datenum_A datenum_B
_________ _________
12791 12791
0 13521
Option 2
Alternatively, can do them one column at a time like this:
Tbl.('A') = cellfun(#datenum,Tbl.('A'))
to produce
Tbl =
A B
_____ ____________
12791 '29/07/2017'
0 '31/07/2017'
Then you can do it for 'B', etc.
Convert table to an array first and then apply datenum alongwith the format of date. Inclusion of numbers in your data is weird but, anyway, here is a solution:
numdate= table2array(Tbl); %Converting table to array
ind = cellfun(#ischar,numdate); %Finding logical indices of dates stored as char array
%Finding serial date number of dates; not doing anything on numeric values
numdate(ind)=cellfun(#(x) datenum(x, 'dd/mm/yyyy'), numdate(ind),'un',0); %Serial Datenums
%Converting back the date number serials into dates
dateback=numdate; dateback(ind)=cellfun(#datestr,numdate(ind),'un',0);
Output:
>> numdate
numdate =
[736905] [736905]
[ 0] [736907]
>> dateback
dateback =
'29-Jul-2017' '29-Jul-2017'
[ 0] '31-Jul-2017'

Matlab, converting datetime array in matrix

I have a datetime array that highlights the peaks of a function "datepeak", for every day in one year. I obtained it using a datetime array "date" and the array with the position of the peaks "position".
t1 = datetime(year,1,1,0,0,0);
t2 = datetime(year,12,31,23,59,0);
date = t1:minutes(1):t2;
datepeak=date(position);
I need to take the n number of peaks for the day 1 and transpose this array to the first row of the matrix, and so on.
Since the number of peaks are not constants (min 3 max 4) I tried to initiate the matrix like this:
matrix=NaN(365,4)
Then I override the NaN of every row with this double for loop:
for i=1:365
v=datepeak(day(datepeak,'dayofyear')==i);
for c=1:length(v)
matrix(i,c)=(v(c));
end
end
This loop works (I tried it with the peaks), but with datetime I get an error.
Here's an example to paste:
year=2016;
position=[128 458 950];
t1 = datetime(year,1,1,0,0,0);
t2 = datetime(year,12,31,23,59,0);
date = t1:minutes(1):t2;
datepeak=date(position);
matrix=NaN(365,4);
for i=1:365
v=datepeak(day(datepeak,'dayofyear')==i);
for c=1:length(v)
matrix(i,c)=(v(c));
end
end
The nan array is of class double whereas datepeak is of class datetime so you can't store them in the same array. The way you represent your data should be driven by what you want to do with them later (and what is feasible). In your case, i'll assume that list 365 elements, containing the (any number) peak times of the day is ok.
year=2016;
position=[128 458 950];
t1 = datetime(year,1,1,0,0,0);
t2 = datetime(year,12,31,23,59,0);
date = t1:minutes(1):t2;
datepeak=date(position);
peaktimes_list = cell(365,1);
for i=1:365
peaktimes_list{i} = datepeak(day(datepeak,'dayofyear')==i);
end
EDIT : For a 365x4 cell array, change the last part by :
peaktimes = cell(365,4);
for i=1:365
v = datepeak(day(datepeak,'dayofyear')==i);
nv = numel(v);
peaktimes(i,1:nv) = num2cell(v);
end
When there are less than 4 values, the remaining columns will be empty.

Unify timestamps as date strings

MATLAB R2015b
I have a table containing a date string and a time string in various formats in two columns for each row:
11.01.2016 | 00:00:00 | data
10/19/16 | 05:29:00 | data
12.02.16 | 06:40 | data
I want to convert this two columns to one column with a common format:
31.12.2017 14:00:00
My current solution uses a loop over each row and combines the columns as strings, checks for the various formats to use datetime with an appropriate format string and then uses datestr with the desired format string. Datetime was not able to automatically determine the format of the input string.
As you can imagine, this is horribly slow for large tables (approx. 50000 rows).
Is there any faster solution?
Thanks in advance.
I gave a try to vectorize the code. The trick is to
convert tables > cell > char-array, then
manipulate char strings, then
convert back from char-array > cell > table
Also, there is an important bit to pad all cells having shorter lenths with 'null' character in a vectorized way. Without this, it will not be possible to convert from cell > char-array. Here is the code.
clc
clear all
%% create Table T
d={'11.01.2016';
'10/19/16';
'12.02.16'};
t={'00:00:00';
'05:29:00';
'06:40'};
dat=[123;
456;
789];
T = table(d,t,dat);
%% deal with dates in Table T
% separate date column and convert to cell
dd = table2cell(T(:,1));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter dates
nmax=max(cellfun(#numel,dd));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,dd,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ddd=cell2mat(temp1);
% replace the separators in 3rd and 6th location with '.' (period)
ddd(:,[3 6]) = repmat(['.' '.'], length(dd),1);
% find indexes of shorter dates
short_year_idx = find(uint16(ddd(:,nmax)) == 0);
% find the year value for those short_year cases
yy = ddd(short_year_idx,[7 8]);
% replace null chars with '20XX' string in desirted place
ddd(short_year_idx,7:nmax) = ...
[repmat('20',size(short_year_idx,1),1) yy];
% convert char array back to cell and replace in table
dddd = mat2cell(ddd,ones(1,size(d,1)),nmax);
T(:,1) = table(dddd);
%% deal with times in Table T
% separate time column and convert to cell
tt = table2cell(T(:,2));
% equalize the lengths of all elements of cell
% by padding 'null' in end of shorter times
nmax=max(cellfun(#numel,tt));
func = #(x) [x,zeros(1,nmax-numel(x))];
temp1 = cellfun(func,tt,'UniformOutput',false);
% convert to array for vectorized manipulation of char strings
ttt=cell2mat(temp1);
% find indexes of shorter times (assuming only ':00' in end is missing
short_time_idx = find(uint16(ttt(:,nmax)) == 0);% dirty hack, as null=0 in ascii
% replace null chars with ':00' string
ttt(short_time_idx,[6 7 8]) = repmat(':00',size(short_time_idx,1),1);
% convert char array back to cell and replace in table
tttt = mat2cell(ttt,ones(1,size(t,1)),nmax);
T(:,2) = table(tttt);
If you call the two columns cell arrays c1 and c2, then something like this should work:
c = detestr(datenum(strcat(c1,{' '},c2)), 'dd.mm.yyyy HH:MM:SS')
Then you would need to drop the old columns and put this one c in their place. On the inside, datenum must be doing something similar to what you're doing, however, so I'm not sure if this will be faster. I suspect that it is because (we can hope) the standard functions are optimized.
If your table isn't representing those as cell arrays, then you may need to do a pre-processing step to form the cell arrays for strcat.

How to use the Percentile Function for Ranking purpose? (Matlab)

I have the following 606 x 274 table:
see here
Goal:
For every date calculate lower and upper 20% percentiles and, based on the outcome, create 2 new variables, e.g. 'L' for "lower" and 'U' for "upper", which contain the ticker names as seen in the header of the table.
Step by step:
% Replace NaNs with 'empty' for the percentile calculation (error: input to be cell array)
T(cellfun(#isnan,T)) = {[]}
% Change date format
T.Date=[datetime(T.Date, 'InputFormat', 'eee dd-MMM-yyyy')];
% Take row by row
for row=1:606
% If Value is in upper 20% percentile create new variable 'U' that contains the according ticker names.
% If Value is in lower 20% percentile create new variable 'L' that contains the according ticker names.
end;
So far, experimenting with 'prctile' only yielded a numeric outcome, for a single column. Example:
Y = prctile(T.A2AIM,20,2);
Thanks for your help and ideas!
Generally speaking, if you have an array of numbers:
a = [4 2 1 8 -2];
percentiles can be computed by first sorting the array and then attempting to access the index supplied in the percentile. So prctile(a,20)'s functionality could in principle be replaced by
b = sort(a);
ind = round(length(b)*20/100);
if ind==0
ind = 1;
end
b = b(ind);
% b = -2
However, prctile does a bit more of fancy magic interpolating the input vector to get a value that is less affected by array size. However, you can use the idea above to find the percentile splitting columns. If you chose to do it like I said above, what you want to do to get the headers that correspond to the 20% and 80% percentiles is to loop through the rows, remove the NaNs, get the indeces of the sort on the remaining values and get the particular index of the 20% or 80% percentile. Regrettably, I have an old version of Matlab that does not support tables so I couldn't verify if the header names are returned correctly, but the idea should be clear.
L = cell(size(T,1),1);
U = cell(size(T,1),1);
for row=1:size(T,1)
row_values = T{row,:};
row_values = row_values(2:end); % Remove date column
non_nan_indeces = find(~isnan(row_values));
if not(isempty(non_nan_indeces))
[row_values,sorted_indeces] = sort(row_values(non_nan_indeces));
% The +1 is because we removed the date column
L_ind = non_nan_indeces(sorted_indeces(1:round(0.2*length(row_values))))+1;
U_ind = non_nan_indeces(sorted_indeces(round(0.8*length(row_values)):end))+1;
% I am unsure about this part
L{row} = T.Properties.VariableNames(L_ind);
U{row} = T.Properties.VariableNames(U_ind);
else
L{row} = nan;
U{row} = nan;
end
end;
If you want to use matlab's prctile, you would have to find the returned value's index doing something like this:
L = cell(size(T,1),1);
U = cell(size(T,1),1);
for row=1:size(T,1)
row_values = T{row,:};
row_values = row_values(2:end); % Remove date column
non_nan_indeces = find(~isnan(row_values));
if not(isempty(non_nan_indeces))
[row_values,sorted_indeces] = sort(row_values(non_nan_indeces));
L_val = prctile(row_values(non_nan_indeces),20);
U_val = prctile(row_values(non_nan_indeces),80);
% The +1 is because we removed the date column
L_ind = non_nan_indeces(sorted_indeces(find(row_values<=L_val)))+1;
U_ind = non_nan_indeces(sorted_indeces(find(row_values>=U_val)))+1;
% I am unsure about this part
L{row} = T.Properties.VariableNames(L_ind);
U{row} = T.Properties.VariableNames(U_ind);
else
L{row} = nan;
U{row} = nan;
end
end;

Add string headers to double columns Matlab

I have a double matrix A of size 10x10. What I want to do is I have a string array of size 1x10. I want to replace the first row in matrix A with this array of strings to be the headers of these columns. Same with the first column. If anyone could please advise how this can be done in Matlab.
For MATLAB R2013b or higher
If you have at least MATLAB R2013b or higher, you can use the array2table function to present the values in your desired format. Let's assume that your matrix is stored in A. Next, assuming your row headers are in a cell array and stored in row and your column headers are stored in a cell array called col, try this:
Try this:
T = array2table(A, 'RowNames', row, 'VariableNames', col);
Here's an example:
>> A = [1 12 30.48; 2 24 60.96; 3 36 91.44]
>> col = {'Feet', 'Inches', 'Centimeters'};
>> row = {'Number 1', 'Number 2', 'Number 3'};
>> T = array2table(A, 'RowNames', row, 'VariableNames', col)
T =
Feet Inches Centimeters
____ ______ ___________
Number 1 1 12 30.48
Number 2 2 24 60.96
Number 3 3 36 91.44
For MATLAB R2013a or lower
If you have R2013a or lower, you have no choice but to use a cell array for this. You can only achieve mixed data types in a matrix with a cell array. What you'll need to do is convert each number into an individual cell in a cell array. I'm going to introduce you to an undocumented function: sprintfc. You are able to print matrices directly to cell arrays.
Therefore, try doing this, assuming that row contains your strings in a cell array for the row header of size 1 x N and col contains your strings in a cell array for the column header of size 1 x N. With your matrix A:
Acell = sprintfc('%f', A); %// Convert matrix to cells
out = [' ', row; col.', Acell]; %// Generate final matrix
out contains your desired matrix. Here's an example:
>> A = [1 12 30.48; 2 24 60.96; 3 36 91.44];
>> Acell = sprintfc('%f', A);
>> row = {'Feet', 'Inches', 'Centimeters'};
>> col = {'Number 1', 'Number 2', 'Number 3'};
>> out = [' ', row; col.', Acell]
out =
' ' 'Feet' 'Inches' 'Centimeters'
'Number 1' '1.000000' '12.000000' '30.480000'
'Number 2' '2.000000' '24.000000' '60.960000'
'Number 3' '3.000000' '36.000000' '91.440000'