I have two tables, the first table has columns: id, start_time, and end_time. The second table has columns: id, timestamp, value. Is there a way to make a sum of table 2 based on the conditions in table 1?
Table 1:
id
start_date
end_date
5
2000-01-01 01:00:00
2000-01-05 02:45:00
5
2000-01-10 01:00:00
2000-01-15 02:45:00
6
2000-01-01 01:00:00
2000-01-05 02:45:00
6
2000-01-11 01:00:00
2000-01-12 02:45:00
6
2000-01-15 01:00:00
2000-01-20 02:45:00
Table 2:
id
timestamp
value
5
2000-01-01 05:00:00
1
5
2000-01-01 06:00:00
2
6
2000-01-01 05:00:00
1
6
2000-01-11 05:00:00
2
6
2000-01-15 05:00:00
2
6
2000-01-15 05:30:00
2
Desired result:
id
start_date
end_date
Sum
5
2000-01-01 01:00:00
2000-01-05 02:45:00
3
5
2000-01-10 01:00:00
2000-01-15 02:45:00
null
6
2000-01-01 01:00:00
2000-01-05 02:45:00
1
6
2000-01-11 01:00:00
2000-01-12 02:45:00
2
6
2000-01-15 01:00:00
2000-01-20 02:45:00
4
Try this :
SELECT a.id, a.start_date, a.end_date, sum(b.value) AS sum
FROM table1 AS a
LEFT JOIN table2 AS b
ON b.id = a.id
AND b.timestamp >= a.start_date
AND b.timestamp < a.end_date
GROUP BY a.id, a.start_date, a.end_date
I have the following two tables:
t1:([]sym:`AAPL`GOOG; histo_dates1:(2000.01.01+til 10;2000.01.01+til 10);histo_values1:(til 10;5+til 10));
t2:([]sym:`AAPL`GOOG; histo_dates2:(2000.01.05+til 5;2000.01.06+til 4);histo_values2:(til 5; 2+til 4));
What I want is to sum the histo_values of each symbol across the histo_dates, such that the resulting table would look like this:
t:([]sym:`AAPL`GOOG; histo_dates:(2000.01.01+til 10;2000.01.01+til 10);histo_values:(0 1 2 3 4 6 8 10 12 9;5 6 7 8 9 12 14 16 18 14))
So the resulting dates histo_dates should be the union of histo_dates1 and histo_dates2, and histo_values should be the sum of histo_values1 and histo_values2 across dates.
EDIT:
I insist on the union of the dates, as I want the resulting histo_dates to be the union of both histo_dates1 and histo_dates2.
There are a few ways. One would be to ungroup to remove nesting, join the tables, aggregate on sym/date and then regroup on sym:
q)0!select histo_dates:histo_dates1, histo_values:histo_values1 by sym from select sum histo_values1 by sym, histo_dates1 from ungroup[t1],cols[t1]xcol ungroup[t2]
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14
A possibly faster way would be to make each row a dictionary and then key the tables on sym and add them:
q)select sym:s, histo_dates:key each v, histo_values:value each v from (1!select s, d!'v from `s`d`v xcol t1)+(1!select s, d!'v from `s`d`v xcol t2)
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14
Another option would be to use a plus join pj:
q)0!`sym xgroup 0!pj[ungroup `sym`histo_dates`histo_values xcol t1;2!ungroup `sym`histo_dates`histo_values xcol t2]
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14
See here for more on plus joins: https://code.kx.com/v2/ref/pj/
EDIT:
To explicitly make sure the result has the union of the dates, you could use a union join:
q)0!`sym xgroup select sym,histo_dates,histo_values:hv1+hv2 from 0^uj[2!ungroup `sym`histo_dates`hv1 xcol t1;2!ungroup `sym`histo_dates`hv2 xcol t2]
sym histo_dates histo_values
-------------------------------------------------------------------------------------------------------------------------------------------
AAPL 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 0 1 2 3 4 6 8 10 12 9
GOOG 2000.01.01 2000.01.02 2000.01.03 2000.01.04 2000.01.05 2000.01.06 2000.01.07 2000.01.08 2000.01.09 2000.01.10 5 6 7 8 9 12 14 16 18 14
another way:
// rename the columns to be common names, ungroup the tables, and place the key on `sym and `histo_dates
q){2!ungroup `sym`histo_dates`histo_values xcol x} each (t1;t2)
// add them together (or use pj in place of +), group on `sym
`sym xgroup (+) . {2!ungroup `sym`histo_dates`histo_values xcol x} each (t1;t2)
// and to test this matches t, remove the key from the resulting table
q)t~0!`sym xgroup (+) . {2!ungroup `sym`histo_dates`histo_values xcol x} each (t1;t2)
1b
Another possible way using functional amend
//Column join the histo_dates* columns and get the distinct dates - drop idx
//Using a functional apply use the idx to determine which values to plus
//Join the two tables using sym as the key - Find the idx of common dates
(enlist `idx) _select sym,histo_dates:distinct each (histo_dates1,'histo_dates2),
histovalues:{#[x;z;+;y]}'[histo_values1;histo_values2;idx],idx from
update idx:(where each histo_dates1 in' histo_dates2) from ((1!t1) uj 1!t2)
One possible problem with this is that to get the idx, it depends on the date columns being sorted which is usually the case.
I'm using postgresql to store historical data coming from an RTLS platform.
Position data is not collected continuosly.
The historical_movements is implemented as a single table as follow (it is a simplified table but enough to present the use case):
User Area EnterTime ExitTime
John room1 2018-01-01 10:00:00 2018-01-01 10:00:05
Doe room1 2018-01-01 10:00:00 2018-01-01 10:10:00
John room1 2018-01-01 10:05:00 2018-01-01 10:10:00
Doe room1 2018-01-01 10:20:00 2018-01-01 10:30:00
John room2 2018-01-01 11:00:00 2018-01-01 11:05:00
John room2 2018-01-01 11:08:00 2018-01-01 11:15:00
John room1 2018-01-01 12:00:00 2018-01-01 12:08:00
John room1 2018-01-01 12:10:00 2018-01-01 12:20:00
John room1 2018-01-01 12:25:00 2018-01-01 12:25:00
John room3 2018-01-01 12:30:00 2018-01-01 12:35:00
John room3 2018-01-01 12:40:00 2018-01-01 12:50:00
I'm looking at a way to make a query showing the user staying in the various rooms, aggregating the data related to the same room and computing the overall staying time, as follows
User Area EnterTime ExitTime ArregateTime
John room1 2018-01-01 10:00:00 2018-01-01 10:10:00 00:10:00
John room2 2018-01-01 11:00:00 2018-01-01 11:05:00 00:15:00
John room1 2018-01-01 12:00:00 2018-01-01 12:25:00 00:25:00
John room3 2018-01-01 12:30:00 2018-01-01 12:50:00 00:20:00
Doe room1 2018-01-01 10:00:00 2018-01-01 10:30:00 00:30:00
Looking at various threads I'm quite sure I'd have to use lag and partition by functions but it's not clear how.
Any hints?
Best regards.
AggregateTime isn't really an aggregate in your expected result. It seems to be a difference between max_time and min_time for each block where each block is set of contiguous rows with same (users, area).
with block as(
select users, area, entertime, exittime,
(row_number() over (order by users, entertime) -
row_number() over (partition by users, area order by entertime)
) as grp
from your_table
order by 1,2,3
)
select users, area, entertime, exittime, (exittime - entertime) as duration
from (select users, area, grp, min(entertime) as entertime, max(exittime) as exittime
from block
group by users, area, grp
) t2
order by 5;
I made some changes to 'Resetting Row number according to record data change' to arrive at the solution.
I have two tables in Matlab that I would like to merge, ´Returns´ and ´Yearly´, according to the following SQL statement. How do I merge them in Matlab? (I have to use Matlab)
select a.*, b.Equity, b.Date as Yearly_date from Returns a, Yearly b where a.Id = b.Id and a.Date >= b.Date group by a.Id, a.Date having max(b.Date) = b.Date
Here is some sample data:
Returns = table([repmat(1,5,1);repmat(2,6,1)],[(datetime(2013,10,31):calmonths(1):datetime(2014,2,28)).';(datetime(2013,10,31):calmonths(1):datetime(2014,3,31)).'],randn(11,1),'VariableNames',{'Id','Date','Return'})
Returns =
Id Date Return
__ ___________ ________
1 31-Oct-2013 -0.8095
1 30-Nov-2013 -2.9443
1 31-Dec-2013 1.4384
1 31-Jan-2014 0.32519
1 28-Feb-2014 -0.75493
2 31-Oct-2013 1.3703
2 30-Nov-2013 -1.7115
2 31-Dec-2013 -0.10224
2 31-Jan-2014 -0.24145
2 28-Feb-2014 0.31921
2 31-Mar-2014 0.31286
Yearly = table([repmat(1,3,1);repmat(2,2,1)],[(datetime(2011,12,31):calyears(1):datetime(2013,12,31)).';(datetime(2012,12,31):calyears(1):datetime(2013,12,31)).'],[8;10;11;30;28],'VariableNames',{'Id','Date','Equity'})
Yearly =
Id Date Equity
__ ___________ ______
1 31-Dec-2011 8
1 31-Dec-2012 10
1 31-Dec-2013 11
2 31-Dec-2012 30
2 31-Dec-2013 28
I would like the following output:
ans =
Id Date Return Equity Yearly_date
__ ___________ __________ ______ ___________
1 31-Oct-2013 -0.86488 10 31-Dec-2012
1 30-Nov-2013 -0.030051 10 31-Dec-2012
1 31-Dec-2013 -0.16488 11 31-Dec-2013
1 31-Jan-2014 0.62771 11 31-Dec-2013
1 28-Feb-2014 1.0933 11 31-Dec-2013
2 31-Oct-2013 1.1093 30 31-Dec-2012
2 30-Nov-2013 -0.86365 30 31-Dec-2012
2 31-Dec-2013 0.077359 28 31-Dec-2013
2 31-Jan-2014 -1.2141 28 31-Dec-2013
2 28-Feb-2014 -1.1135 28 31-Dec-2013
2 31-Mar-2014 -0.0068493 28 31-Dec-2013
Here goes another bsxfun based solution, abusing its masking capability -
%// Inputs
Returns = table([repmat(1,5,1);repmat(2,6,1)],[(datetime(2013,10,31):...
calmonths(1):datetime(2014,2,28)).';(datetime(2013,10,31):calmonths(1):...
datetime(2014,3,31)).'],randn(11,1),'VariableNames',{'Id','Date','Return'})
Yearly = table([repmat(1,3,1);repmat(2,2,1)],[(datetime(2011,12,31):...
calyears(1):datetime(2013,12,31)).';(datetime(2012,12,31):calyears(1):...
datetime(2013,12,31)).'],[8;10;11;30;28],'VariableNames',{'Id','Date','Equity'})
%// Get mask of matches for each ID in Returns against each ID in Yearly
matches = bsxfun(#ge,datenum(Returns.Date),datenum(Yearly.Date)'); %//'
%// Keep the matches within the respective Ids only
matches(~bsxfun(#ge,Returns.Id,Yearly.Id'))=0; %//'# Or matches(bsxfun(#lt,..)
%// Get the ID (column -ID) of the last match for each Id in Returns
[~,flipped_col_ID] = max(matches(:,end:-1:1),[],2);
col_ID = size(matches,2) - flipped_col_ID + 1;
%// Select the rows from Yearly based on col IDs and create the output table
out = [Returns table(Yearly.Equity(col_ID), Yearly.Date(col_ID))]
Code run -
Returns =
Id Date Return
__ ___________ ________
1 31-Oct-2013 0.045158
1 30-Nov-2013 0.071319
1 31-Dec-2013 0.52357
1 31-Jan-2014 -0.65424
1 28-Feb-2014 1.8452
2 31-Oct-2013 0.037262
2 30-Nov-2013 0.38369
2 31-Dec-2013 1.1972
2 31-Jan-2014 -0.54708
2 28-Feb-2014 -0.15706
2 31-Mar-2014 0.11882
Yearly =
Id Date Equity
__ ___________ ______
1 31-Dec-2011 8
1 31-Dec-2012 10
1 31-Dec-2013 11
2 31-Dec-2012 30
2 31-Dec-2013 28
out =
Id Date Return Var1 Var2
__ ___________ ________ ____ ___________
1 31-Oct-2013 0.045158 10 31-Dec-2012
1 30-Nov-2013 0.071319 10 31-Dec-2012
1 31-Dec-2013 0.52357 11 31-Dec-2013
1 31-Jan-2014 -0.65424 11 31-Dec-2013
1 28-Feb-2014 1.8452 11 31-Dec-2013
2 31-Oct-2013 0.037262 30 31-Dec-2012
2 30-Nov-2013 0.38369 30 31-Dec-2012
2 31-Dec-2013 1.1972 28 31-Dec-2013
2 31-Jan-2014 -0.54708 28 31-Dec-2013
2 28-Feb-2014 -0.15706 28 31-Dec-2013
2 31-Mar-2014 0.11882 28 31-Dec-2013
Generic case solution
For cases, when the Ids could be non-numeric and the dates aren't sorted already, you may try out the following code -
%// Inputs
Returns = table([repmat('Id1',5,1);repmat('Id2',6,1)],[(datetime(2013,10,31):...
calmonths(1):datetime(2014,2,28)).';(datetime(2013,10,31):calmonths(1):...
datetime(2014,3,31)).'],randn(11,1),'VariableNames',{'Id','Date','Return'})
Yearly = table([repmat('Id1',3,1);repmat('Id2',2,1)],[(datetime(2011,12,31):...
calyears(1):datetime(2013,12,31)).';(datetime(2012,12,31):calyears(1):...
datetime(2013,12,31)).'],[8;10;11;30;28],'VariableNames',{'Id','Date','Equity'})
%// -- Convert strings based Ids into numeric ones
alltypes = cellstr([Returns.Id ; Yearly.Id]);
[~,~,IDs] = unique(alltypes,'stable');
lbls_len = size(Returns.Id,1);
Returns_Id = IDs(1:lbls_len);
Yearly_Id = IDs(lbls_len+1:end);
%// Get Returns and Yearly Dates
Returns_Date = datenum(Returns.Date);
Yearly_Date = datenum(Yearly.Date);
%// Sort the dates if not already sorted
y1 = arrayfun(#(n) sort(Returns_Date(Returns_Id==n)),1:max(Returns_Id),'Uni',0);
Returns_Date = vertcat(y1{:});
y2 = arrayfun(#(n) sort(Yearly_Date(Yearly_Id==n)),1:max(Yearly_Id),'Uni',0);
Yearly_Date = vertcat(y2{:});
%// Counts of Ids to be used as boundaries when saving output at each
%// iteration correspondin to each ID
Yearly_Id_counts = [0 ; histc(Yearly_Id,1:max(Yearly_Id))];
Returns_Id_counts = histc(Returns_Id,1:max(Returns_Id));
%// Initializations
stop = 0;
col_ID = zeros(size(Returns_Date,1),1);
for iter = 1:max(Returns_Id)
%// Get mask of matches for each ID in Returns against each ID in Yearly
matches = bsxfun(#ge,Returns_Date(Returns_Id==iter),...
Yearly_Date(Yearly_Id==iter)'); %//'
%// Get the ID (column -ID) of the last match for each Id in Returns
[~,flipped_col_ID] = max(matches(:,end:-1:1),[],2);
%// Get start and stop for indexing into output column IDs array
start = stop + 1;
stop = start + Returns_Id_counts(iter) - 1;
%// Get the columns IDs to be used for indexing into Yearly data for
%// getting the final output
col_ID(start:stop) = Yearly_Id_counts(iter) + ...
Yearly_Id_counts(iter + 1) - flipped_col_ID + 1;
end
%// Select the rows from Yearly based on col IDs and create the output table
out = [Returns table(Yearly.Equity(col_ID), Yearly.Date(col_ID))]
Code run -
Returns =
Id Date Return
___ ___________ ________
Id1 31-Oct-2013 0.53767
Id1 30-Nov-2013 1.8339
Id1 31-Dec-2013 -2.2588
Id1 31-Jan-2014 0.86217
Id1 28-Feb-2014 0.31877
Id2 31-Oct-2013 -1.3077
Id2 30-Nov-2013 -0.43359
Id2 31-Dec-2013 0.34262
Id2 31-Jan-2014 3.5784
Id2 28-Feb-2014 2.7694
Id2 31-Mar-2014 -1.3499
Yearly =
Id Date Equity
___ ___________ ______
Id1 31-Dec-2011 8
Id1 31-Dec-2012 10
Id1 31-Dec-2013 11
Id2 31-Dec-2012 30
Id2 31-Dec-2013 28
out =
Id Date Return Var1 Var2
___ ___________ ________ ____ ___________
Id1 31-Oct-2013 0.53767 10 31-Dec-2012
Id1 30-Nov-2013 1.8339 10 31-Dec-2012
Id1 31-Dec-2013 -2.2588 11 31-Dec-2013
Id1 31-Jan-2014 0.86217 11 31-Dec-2013
Id1 28-Feb-2014 0.31877 11 31-Dec-2013
Id2 31-Oct-2013 -1.3077 30 31-Dec-2012
Id2 30-Nov-2013 -0.43359 30 31-Dec-2012
Id2 31-Dec-2013 0.34262 28 31-Dec-2013
Id2 31-Jan-2014 3.5784 28 31-Dec-2013
Id2 28-Feb-2014 2.7694 28 31-Dec-2013
I have used the script below to read in my data (Sample below) and am able to compute the hourly and daily mean of Heat flux (H) by use of accumarraying the date and time stamps. The difficulty is that I also want to accumarray for 15 minute, 30 minute etc averages. How can one do this with the kind of data I have ?
LASfile=fopen('../../data/heat.txt');
Data = textscan(LASfile, '%16c %*24c %s %s %f %f %f %d %d %d %d','headerlines',1);
fclose(LASfile);
H = Data {6};
%%
date_num = datestr(Data{1});
formatIn = 'dd-mmm-yyyy HH:MM:SS';
DateVector = datevec(date_num, formatIn);
%%
% Group by day and hour
[unDates, ~, subs] = unique(DateVector(:,1:4),'rows');
% Accumulate by day
[unDates accumarray(subs, H, [], #mean)]; %computes hourly heat flux
#timeStamp date time Cn2 CT2 H
2012-02-07 11:56:00 2/7/2012 11:56:00 3.11E-13 3.64E-01 330.5
2012-02-07 11:57:00 2/7/2012 11:57:00 2.22E-13 2.60E-01 256.4
2012-02-07 11:58:00 2/7/2012 11:58:00 2.92E-13 3.42E-01 315.3
2012-02-07 11:59:00 2/7/2012 11:59:00 4.07E-13 4.77E-01 404.4
2012-02-07 12:00:00 2/7/2012 12:00:00 3.56E-13 4.17E-01 365.7
2012-02-07 12:01:00 2/7/2012 12:01:00 4.41E-13 5.17E-01 429.3
2012-02-07 12:02:00 2/7/2012 12:02:00 4.23E-13 4.96E-01 416.3
2012-02-07 12:03:00 2/7/2012 12:03:00 3.17E-13 3.72E-01 335.3
2012-02-07 12:04:00 2/7/2012 12:04:00 3.42E-13 4.00E-01 354.7
2012-02-07 12:05:00 2/7/2012 12:05:00 3.43E-13 4.02E-01 355.6
2012-02-07 12:07:00 2/7/2012 12:07:00 2.92E-13 3.42E-01 315.3
2012-02-07 12:08:00 2/7/2012 12:08:00 2.63E-13 3.09E-01 291.7
2012-02-07 12:09:00 2/7/2012 12:09:00 2.45E-13 2.87E-01 276.1
2012-02-07 12:10:00 2/7/2012 12:10:00 3.00E-13 3.52E-01 321.8
2012-02-07 12:11:00 2/7/2012 12:11:00 3.77E-13 4.42E-01 382
2012-02-07 12:12:00 2/7/2012 12:12:00 4.40E-13 5.16E-01 428.9
2012-02-07 12:13:00 2/7/2012 12:13:00 3.60E-13 4.22E-01 369.2
2012-02-07 12:14:00 2/7/2012 12:14:00 4.56E-13 5.35E-01 440.4
2012-02-07 12:15:00 2/7/2012 12:15:00 3.62E-13 4.24E-01 370.5
2012-02-07 12:16:00 2/7/2012 12:16:00 3.48E-13 4.07E-01 359.3
2012-02-07 12:17:00 2/7/2012 12:17:00 3.94E-13 4.62E-01 394.9
2012-02-07 12:18:00 2/7/2012 12:18:00 3.53E-13 4.14E-01 363.5
2012-02-07 12:19:00 2/7/2012 12:19:00 4.47E-13 5.24E-01 433.6
2012-02-07 12:20:00 2/7/2012 12:20:00 4.33E-13 5.07E-01 423.6
2012-02-07 12:21:00 2/7/2012 12:21:00 3.18E-13 3.73E-01 336
2012-02-07 12:22:00 2/7/2012 12:22:00 2.91E-13 3.41E-01 314.7
2012-02-07 12:23:00 2/7/2012 12:23:00 2.71E-13 3.17E-01 297.8
2012-02-07 12:24:00 2/7/2012 12:24:00 3.72E-13 4.36E-01 378.2
2012-02-07 12:25:00 2/7/2012 12:25:00 3.25E-13 3.81E-01 341.8
2012-02-07 12:26:00 2/7/2012 12:26:00 3.66E-13 4.29E-01 373.3
2012-02-07 12:27:00 2/7/2012 12:27:00 3.95E-13 4.63E-01 395.3
2012-02-07 12:28:00 2/7/2012 12:28:00 3.73E-13 4.37E-01 378.9
2012-02-07 12:29:00 2/7/2012 12:29:00 3.31E-13 3.89E-01 346.7
2012-02-07 12:30:00 2/7/2012 12:30:00 3.05E-13 3.57E-01 325.7
You should include the fifth element of DateVector rounded to what you need. For example, to use 15-min periods:
DateVector2 = DateVector(:,1:5);
DateVector2(:,5) = floor(DateVector(:,5)/15);
And then you accumarray based on this DateVector2:
[unDates, ~, subs] = unique(DateVector2,'rows');
% Accumulate by day
[unDates accumarray(subs, H, [], #mean)]; %computes average heat flux