I am using MATLAB 2021b and I have the following data:
ID = {'a','a','a','a','b','b','b','b'}';
DATE = [2010,2010,2011,2011,2011,2011,2012,2012]';
FIELD_ID = {'f1','f2','f1','f2','f1','f2','f1','f2'}';
VALUE = [1,2,5,6,1,1,7,8]';
T_before = table(ID,DATE,FIELD_ID,VALUE);
T_before =
8×4 table
ID DATE FIELD_ID VALUE
_____ ____ ________ _____
{'a'} 2010 {'f1'} 1
{'a'} 2010 {'f2'} 2
{'a'} 2011 {'f1'} 5
{'a'} 2011 {'f2'} 6
{'b'} 2011 {'f1'} 1
{'b'} 2011 {'f2'} 1
{'b'} 2012 {'f1'} 7
{'b'} 2012 {'f2'} 8
In reality the table is a lot longer and contains more fields. The latest DATE for a given ID can be different. The column FIELD_ID contains fields and their respective value is in the column VALUE.
What I would like to do is unstack this table in long format to have one row per ID with the fields as colums. I have one condition, I want only to unstack the rows containing the latest value in the field DATE. It should look like the following:
T_after =
2×4 table
ID DATE f1 f2
_____ ____ __ __
{'a'} 2011 5 6
{'b'} 2012 7 8
One for loop can do this quickly.
ID = {'a','a','a','a','b','b','b','b'}';
DATE = [2010,2010,2011,2011,2011,2011,2012,2012]';
FIELD_ID = {'f1','f2','f1','f2','f1','f2','f1','f2'}';
VALUE = [1,2,5,6,1,1,7,8]';
T_before = table(ID,DATE,FIELD_ID,VALUE);
% get unique IDs
id_unique = unique(ID);
n = numel(id_unique);
f1 = cell(1,n);
f2 = f1;
dates = f1;
for i = 1:n
%filter DATE by unique ID, then get the latest date
filter_id = ismember(ID, id_unique{i});
date_filtered = DATE(filter_id);
dates{i} = max(date_filtered);
%filter FIELD_ID and VALUE by unique ID and dates
filter_date = DATE == dates{i};
filter = filter_id & filter_date;
field_id_filtered = FIELD_ID(filter);
value_filtered = VALUE(filter);
%find F1 and F2
if strcmp(field_id_filtered{1}, 'f1')
f1{i} = value_filtered(1);
f2{i} = value_filtered(2);
else
f1{i} = value_filtered(2);
f2{i} = value_filtered(1);
end
end
%make the final table
T = table(id_unique(:),dates(:),f1(:),f2(:), 'VariableNames',{'ID','DATE','f1','f2'});
Related
I have a table in MATLAB with attributes in the first three columns and data from the fourth column onwards. I was trying to sort the entire table based on the first three columns. However, one of the columns (Column C) contains months ('January', 'February' ...etc). The sortrows function would only let me choose 'ascend' or 'descend' but not a custom option to sort by month. Any help would be greatly appreciated. Below is the code I used.
sortrows(Table, {'Column A','Column B','Column C'} , {'ascend' , 'ascend' , '???' } )
As #AnonSubmitter85 suggested, the best thing you can do is to convert your month names to numeric values from 1 (January) to 12 (December) as follows:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t.ColumnC = month(datenum(t.ColumnC,'mmmm'));
This will facilitate the access to a standard sorting criterion for your ColumnC too (in this example, ascending):
t = sortrows(t,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
If, for any reason that is unknown to us, you are forced to keep your months as literals, you can use a workaround that consists in sorting a clone of the table using the approach described above, and then applying to it the resulting indices:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t_original = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t_clone = t_original;
t_clone.ColumnC = month(datenum(t_clone.ColumnC,'mmmm'));
[~,idx] = sortrows(t_clone,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
t_original = t_original(idx,:);
In Matlab, I want to create a plot of the hourly prices "DataSeriesEl" (size is 744 x 1). They should go from January 1, 2008, 00:00:00 to January 31, 2008, 23:00:00. However, my code switches at January 7 to 05:59:59 - see below. Do you know what the problem is?
StartYearData = 2008;
StartMonthData = 1;
StartDayData = 1;
date(1) = datenum(StartYearData,StartMonthData,StartDayData,0,0,0);
for m = 2:length(DataSeriesEl)
date(m) = addtodate(date(m-1), 1, 'hour');
end
str = datestr(date)
I think this is a much simpler solution if you use this method , for this method you dont have to use a for loop
StartYearData = 2008;
StartMonthData = 1;
StartDayData = 1;
EndDayData = 31;
dates(1) = datenum(StartYearData,StartMonthData,StartDayData,0,0,0);
dates(2) = datenum(StartYearData,StartMonthData,EndDayData ,23,0,0);
myDateTime = datetime(dates, 'ConvertFrom', 'datenum')
hours = (myDateTime (2) - myDateTime (1) )/duration(1,0,0);
date = linspace(myDateTime(1),myDateTime(2),hours +1 )
i have problems regarding passing value with formatting issues.
The description i have put in the codes as i am having problem passing my PreEditedCheque in my code for the if ValidateMMonCheque <> MM part. The output for if length(RawChequenumber) = 15 will be in 1 digit instead of 00001 ( example)
MM = HostGetFLD('','MM')
YY = HostGetFLD('','YY')
PreEditedCheque = substr(RawChequenumber,11,5)
ValidateMMonCheque = substr(RawChequenumber,7,2)
if ValidateMMonCheque <> MM Then *From this statement*
Do
PreEditedCheque = substr('00000',1,5) *This part where those 0 can't be properly shown if pass to the next statement*
EditedCheque = '00'||'2'||'0'||YY||MM||'00'||PreEditedCheque
rc = message(2,2,EditedCheque)
End
if length(RawChequenumber) = 15 Then
EditedCheque = '00'||'2'||'0'||YY||MM||'00'||PreEditedCheque + 1 *Second statement if <>MM ran, this part, the PreEditedCheque will be not in 00001, it will be 1.
rc = PanSetCtlData('PREVIEW',EditedCheque)
What you're asking for is to have the cheque number padded to the left with zeroes in a 5-character field. The Right() function is your friend:
Right(PreEditedCheque, 5, '0') /* "1" -> "00001" */
I have a large table whose entries are
KEY_A,KEY_B,VAL
where KEY_A and KEY_B are finite sets of keys. For arguments sake, we'll have 4 different KEY_B values and 4 different KEY_A values. And example table:
KEY_A KEY_B KEY_C
_____ _____ _________
1 1 0.45054
1 2 0.083821
1 3 0.22898
1 4 0.91334
2 1 0.15238
2 2 0.82582
2 3 0.53834
2 4 0.99613
3 1 0.078176
3 2 0.44268
3 3 0.10665
3 4 0.9619
4 1 0.0046342
4 2 0.77491
4 3 0.8173
4 4 0.86869
4 5 1
I want to elegantly flatten the table into
KEY_A KEY_B_1 KEY_B_2 KEY_B_3 KEY_B_4 KEY_B_5
_____ _________ ________ _______ _______ _______
1 0.45054 0.083821 0.22898 0.91334 -1
2 0.15238 0.82582 0.53834 0.99613 -1
3 0.078176 0.44268 0.10665 0.9619 -1
4 0.0046342 0.77491 0.8173 0.86869 1
I'd like to be able to handle missing B values (set them to a default like -1), but I think if I get an elegant way to do this to start then such things will fall into place.
The actual table has millions of records, so I do want to use a vectorized call.
The line I've got (which doesn't handle int invalid 5) is:
cell2mat(arrayfun(#(x)[x,testtable{testtable.KEY_A==x,3}'],unique(testtable{:,1}),'UniformOutput',false))
But
it doesn't output a different table
If there are missing keys in the table, it doesn't handle that
I would think that this isn't that uncommon of an activity...has anyone done something like this before?
If the input table is T, then you could try this for the given case -
KEY_B_ =-1.*ones(max(T.KEY_A),max(T.KEY_B))
KEY_B_(sub2ind(size(KEY_B_),T.KEY_A,T.KEY_B)) = T.KEY_C
T1 = array2table(KEY_B_)
Output for the edited input -
T1 =
KEY_B_1 KEY_B_2 KEY_B_3 KEY_B_4 KEY_B_5
_________ ________ _______ _______ _______
0.45054 0.083821 0.22898 0.91334 -1
0.15238 0.82582 0.53834 0.99613 -1
0.078176 0.44268 0.10665 0.9619 -1
0.0046342 0.77491 0.8173 0.86869 1
Edit by MadScienceDreams: This answer lead me to write the following function, which will smash together pretty much any table based on the input keys. Enjoy!
function [ OT ] = flatten_table( T,primary_keys,secondary_keys,value_key,default_value )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
if nargin < 5
default_value = {NaN};
end
if ~iscell(default_value)
default_value={default_value};
end
if ~iscell(primary_keys)
primary_keys={primary_keys};
end
if ~iscell(secondary_keys)
secondary_keys={secondary_keys};
end
if ~iscell(value_key)
value_key={value_key};
end
primary_key_values = unique(T(:,primary_keys));
num_primary = size(primary_key_values,1);
[primary_key_map,primary_key_map] = ismember(T(:,primary_keys),primary_key_values);
secondary_key_values = unique(T(:,secondary_keys));
num_secondary = size(secondary_key_values,1);
[secondary_key_map,secondary_key_map] = ismember(T(:,secondary_keys),secondary_key_values);
%out =-1.*ones(max(T.KEY_A),max(T.KEY_B))
try
values = num2cell(T{:,value_key},2);
catch
values = num2cell(table2cell(T(:,value_key)),2);
end
if (~iscell(values))
values=num2cell(values);
end
OT=repmat(default_value,num_primary,num_secondary);
OT(sub2ind(size(OT),primary_key_map,secondary_key_map)) = values;
label_array = num2cell(cellfun(#(x,y)[x '_' mat2str(y)],...
repmat (secondary_keys,size(secondary_key_values,1),1),...
table2cell(secondary_key_values),'UniformOutput',false),1);
label_array = strcat(label_array{:});
OT = [primary_key_values,cell2table(OT,'VariableNames',label_array)];
end
I have a text file that contains timestamps out of a camera that captures 50 frames per second .. The data are as follows:
1 20931160389
2 20931180407
3 20931200603
4 20931220273
5 20931240360
.
.
50 20932139319
... and so on.
It gives also the starting time of capturing like
Date: **02.03.2012 17:57:01**
The timestamps are in microseconds not in milliseconds, and MATLAB can support only till milliseconds but its OK for me.
Now I need to know the human format of these timestamps for each row..like
1 20931160389 02.03.2012 17:57:01.045 % just an example
2 20931180407 02.03.2012 17:57:01.066
3 20931200603 02.03.2012 17:57:01.083
4 20931220273 02.03.2012 17:57:01.105
5 20931240360 02.03.2012 17:57:01.124
and so on
I tried this:
%Refernce Data
clc; format longg
refTime = [2012,03,02,17,57,01];
refNum = datenum(refTime);
refStr = datestr(refNum,'yyyy-mm-dd HH:MM:SS.FFF');
% Processing data
dn = 24*60*60*1000*1000; % Microseconds! I have changed this equation to many options but nothing was helpful
for i = 1 : size(Data,1)
gzTm = double(Data{i,2}); %timestamps are uint64
gzTm2 = gzTm / dn;
gzTm2 = refNum + gzTm2;
gzNum = datenum(gzTm2);
gzStr = datestr(gzNum,'yyyy-mm-dd HH:MM:SS.FFF'); % I can't use 'SS.FFFFFF'
fprintf('i = %d\t Timestamp = %f\t TimeStr = %s\n', i, gzTm, gzStr);
end;
But I got always strange outputs like
i = 1 Timestamp = 20931160389.000000 TimeStr = **2012-03-08 13:29:28.849**
i = 2 Timestamp = 20931180407.000000 TimeStr = **2012-03-08 13:29:29.330**
i = 3 Timestamp = 20931200603.000000 TimeStr = **2012-03-08 13:29:29.815**
The output time is about some hours late/earlier than the Referenced Time. The day is different.
The time gap between each entry in the array should be nearly 20 seconds..since I have 50 frames per second(1000 millisecond / 50 = 20) ..and the year,month, day,hour,minute and seconds should also indicate the initial time given as reference time because it is about some seconds earlier.
I expect something like:
% just an example
1 20931160389 02.03.2012 **17:57:01.045**
2 20931180407 02.03.2012 **17:57:01.066**
Could one help me please..! Where is my mistake?
It looks like you can work out the number of microseconds between a record and the first record:
usecs = double(Data{i,2}) - double(Data{1,2});
convert that into seconds:
secsDiff = usecs / 1e6;
then add that to the initial datetime you'd calculated:
matDateTime = refNum + secsDiff / (24*60*60);