How do I group by hour in KDB+ Q? - kdb

I have a table of the form:
t o h l c v
---------------------------------------------------------------
2016.01.04D09:00:00.000000000 105.45 105.45 103.6 103.6 17462
2016.01.04D09:03:00.000000000 103.7 103.99 103.7 103.99 893
2016.01.04D09:06:00.000000000 103.7 103.7 103.7 103.7 335
I want to select the max o grouped by hour.
select hi: max o by t.date, t.time.hour from z
The issue I'm having is that it doesn't seem like hour is a valid attribute of datetime.
What am I doing wrong?

For hour you can do time.hh:
select hi: max o by t.date, t.hh from z

Related

KDB query returns more 2 columns instead of 1 for max filter

I just want to create one report where I need max price for each symbol so I wrote following query which works fine on PROD but fails on UAT. So just wanted to know if following query is the appropriate or not.
select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier
Above query returns 2 column for each symbol instead of 1. Following is the result inner query i.e select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31)
t:([]time:8#2019.03.11D09:00+"v"$0 4 8 10;sym:8#`GOOG`GOOG`MSFT`MSFT;src:8#`L`O`N`O;price:36.01 35.01 35.5 31.1 39.01 38.01 33.5 32.1;size:8#1427 708 7810 1100)
time sym src price
--------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:04.000000000 GOOG O 35.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:10.000000000 MSFT O 31.1
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:04.000000000 GOOG O 38.01
2019.03.11D09:00:08.000000000 MSFT N 33.5
2019.03.11D09:00:10.000000000 MSFT O 32.1
And output for select from (select sum price by sym,time,src from Table where date within(2019.12.01;2019.12.31) ) where size=(max;price) fby tier is :
t[0,2,4,7]
time sym src price
---------------------------------------------
2019.03.11D09:00:00.000000000 GOOG L 36.01
2019.03.11D09:00:08.000000000 MSFT N 35.5
2019.03.11D09:00:00.000000000 GOOG L 39.01
2019.03.11D09:00:10.000000000 MSFT O 32.1
I suspect that there is something missing with the dataset that you have provided in the question. The results of your inner queries are all floats with remainders, as size is a long, it doesn't make any sense that size=(max;price) is returning any results.
To answer your question in the most general of sense, to get the max price by sym is
select from t where price=(max;price) fby sym
Applying this to the inner result you have provided
q)select from t where price=(max;price) fby sym
time sym src price size
-------------------------------------------------
2019.03.11D09:00:08.000000000 MSFT N 35.5 7810
2019.03.11D09:00:00.000000000 GOOG L 39.01 1427

PostgreSQL Data Selection

Is it possible to write PostgreSQL code that looks at the sample data in the selects only the persons who have been active for the whole first quarter( 01/01/2018 to 03/31/2018) as shown in the desired output? Note that person H should not be selected because they are missing January.
Sample Data
Person Start Date End Date
A 1/1/2018 1/31/2018
A 2/1/2018 2/28/2018
A 3/1/2018 3/31/2018
B 1/1/2018 2/28/2018
C 1/1/2018 2/28/2018
C 3/1/2018 3/31/2018
D 2/1/2018 3/31/2018
E 2/1/2018 2/28/2018
F 1/1/2018 3/31/2018
G 1/1/2018 4/30/2018
H 2/1/2018 4/30/2018
Desired Output
Person
A
C
F
G
Assuming your columns are proper DATE columns and there are no overlaps, you could do something like this:
select person
from the_table
group by person
having sum(end_date - start_date + 1) >= date '2018-03-31' - date '2018-01-01' + 1
order by person;
Subtracting one date from another yields the number of days between those two dates. Then the sum of all differences is compared to the difference between the start and end date of the quarter.
Online example: https://rextester.com/OIN10602

How to get those rows where all the values are same against unique id

I have below mentioned table:
ID State City Pincode Code Date
U-1 AAB CCV 141414 121 2018-04-04 18:08:17
U-1 AAB CCV 141414 121 2018-04-04 18:08:17
U-2 BTB ERV 150454 145 2018-05-05 19:11:25
U-2 BTB ERV 150454 145 2018-05-05 19:11:25
U-3 FFT ERT 160707 150 2018-05-22 21:37:45
U-4 FFT RTT 160707 150 2018-05-28 14:23:48
I want to fetch only those rows where all the values are same in the particular unique ID.
Output:
ID State City Pincode Code Date
U-1 AAB CCV 141414 121 2018-04-04 18:08:17
U-1 AAB CCV 141414 121 2018-04-04 18:08:17
U-2 BTB ERV 150454 145 2018-05-05 19:11:25
U-2 BTB ERV 150454 145 2018-05-05 19:11:25
Get the duplicate rows and join the result to the original table.
select * from table a
join ( select id,state,city,pincode,code,date
from table
group by id,state,city,pincode,code,date
having count(*) > 1 ) b
on a.id = b.id
and a.state = b.state
and a.city = b.city
and a.pincode = b.pincode
and a.code = b.code
and a.date=b.date
You can try this:
SELECT * FROM table WHERE ID IN (
SELECT count(*) AS c FROM table
WHERE c > 1
GROUP BY ID
)
Get all rows where count of the records with this ID is greater than 2 (at least two rows with this id)

Creating Hourly average based on 2 minutes before and after of time instantaneous in PostgreSQL

I have a temporal database with 2 minutes sampling frequency and I want to extract instantaneous hourly values as 00:00, 01:00, 02, ... 23 for each day.
So, I would like to get the average value from average of values :
HH-1:58, HH:00 and HH:02 = Average of HH o'clock
OR
HH-1:59, HH:01 and HH:03 = Average of HH o'clock
Sample Data1:
9/28/2007 23:51 -1.68
9/28/2007 23:53 -1.76
9/28/2007 23:55 -1.96
9/28/2007 23:57 -2.02
9/28/2007 23:59 -1.92
9/29/2007 0:01 -1.64
9/29/2007 0:03 -1.76
9/29/2007 0:05 -1.83
9/29/2007 0:07 -1.86
9/29/2007 0:09 -1.94
Expected Result:
For 00 midnight:
(-1.92+-1.64+-1.76)/3
Sample Data2:
9/28/2007 23:54 -1.44
9/28/2007 23:56 -1.58
9/28/2007 23:58 -2.01
9/29/2007 0:00 -1.52
9/29/2007 0:02 -1.48
9/29/2007 0:04 -1.46
Expected Results:
(-2.01+-1.52+-1.48)/3
SELECT hr, ts, aval
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY hr ORDER BY ts) rn
FROM (
SELECT *,
DATE_TRUNC('hour', ts) AS hr,
AVG(value) OVER (ORDER BY ts ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS aval
FROM mytable
) q
) q
WHERE rn = 1
PostgreSQL's window functions make anything involving adjacent rows a lot simpler than it used to be. Untried but should be roughly right:
select
date_trunc('hour', newest_time) as average_time,
(oldest_temp + middle_temp + newest_temp) / 3 as average_temp
from (
select
date_trunc('hour', sample_time) as average_time,
lag(sample_time, 2) over w as oldest_time,
lag(sample_time, 1) over w as middle_time,
sample_time as newest_time,
lag(sample_temp, 2) over w as oldest_temp,
lag(sample_temp, 1) over w as middle_temp,
sample_temp as newest_temp
from
samples
window
w as (order by sample_time)
) as s
where
oldest_time = newest_time - '4 minutes'::interval and
middle_time = newest_time - '2 minutes'::interval and
extract(minute from newest_time) in (2, 3);
I've restricted this in the where clause to exactly the scenario you've described - latest value at :02 or :03, prior 2 values 2 and 4 minutes before. Just in case you have some missing data which would otherwise give odd results like averaging over a much longer interval.

Copy shifts from leap year to non-leap year

I need to copy all the shifts from 2012 to 2013 using T-SQL 2008 R2. There are 3 shifts per day. Start date and shift date are always same. end date (for shift c) is the next day.
As you can see, if I just used dateadd(year, 1, Col), I get 2 sets of records for 2013-02-28. The rows 4, 6 and 8 shouldn't be there (and will cause PK violations). row 8 is wrong as the end time for shift C should be previous calendar day.
I have 67,000-ish rows in total to copy
Only thing I can think of off top of my head is insert to temp table and then somehow identify dupes/incorrect records, delete and then insert back into shifts table. I'm sure there must be a better way
Anyone got a cunning plan?
I'd like to create a general purpose Stored procedure that can copy leap year to non-leap year and vice versa
Regards
Mark
Maybe try a DISTINCT list combined with a WHERE End > Start, as in this simplified example:
CREATE TABLE Shifts(ShiftCode CHAR, ShiftStart DATETIME, ShiftEnd DATETIME);
GO
INSERT Shifts
VALUES('A','2/26/2012 07:00:00','2/26/2012 15:00:00')
, ('B','2/26/2012 15:00:00','2/26/2012 23:00:00')
, ('C','2/26/2012 23:00:00','2/27/2012 07:00:00')
, ('A','2/27/2012 07:00:00','2/27/2012 15:00:00')
, ('B','2/27/2012 15:00:00','2/27/2012 23:00:00')
, ('C','2/27/2012 23:00:00','2/28/2012 07:00:00')
, ('A','2/28/2012 07:00:00','2/28/2012 15:00:00')
, ('B','2/28/2012 15:00:00','2/28/2012 23:00:00')
, ('C','2/28/2012 23:00:00','2/29/2012 07:00:00')
, ('A','2/29/2012 07:00:00','2/29/2012 15:00:00')
, ('B','2/29/2012 15:00:00','2/29/2012 23:00:00')
, ('C','2/29/2012 23:00:00','3/1/2012 07:00:00')
, ('A','3/1/2012 07:00:00','3/1/2012 15:00:00')
, ('B','3/1/2012 15:00:00','3/1/2012 23:00:00')
, ('C','3/1/2012 23:00:00','3/2/2012 07:00:00');
GO
SELECT DISTINCT ShiftCode
, ShiftStart = DATEADD(YYYY,1,ShiftStart)
, ShiftEnd = DATEADD(YYYY,1,ShiftEnd)
FROM Shifts
WHERE DATEADD(YYYY,1,ShiftEnd) > DATEADD(YYYY,1,ShiftStart)
ORDER BY DATEADD(YYYY,1,ShiftStart), ShiftCode
GO
Result:
A 2013-02-26 07:00:00.000 2013-02-26 15:00:00.000
B 2013-02-26 15:00:00.000 2013-02-26 23:00:00.000
C 2013-02-26 23:00:00.000 2013-02-27 07:00:00.000
A 2013-02-27 07:00:00.000 2013-02-27 15:00:00.000
B 2013-02-27 15:00:00.000 2013-02-27 23:00:00.000
C 2013-02-27 23:00:00.000 2013-02-28 07:00:00.000
A 2013-02-28 07:00:00.000 2013-02-28 15:00:00.000
B 2013-02-28 15:00:00.000 2013-02-28 23:00:00.000
C 2013-02-28 23:00:00.000 2013-03-01 07:00:00.000
A 2013-03-01 07:00:00.000 2013-03-01 15:00:00.000
B 2013-03-01 15:00:00.000 2013-03-01 23:00:00.000
C 2013-03-01 23:00:00.000 2013-03-02 07:00:00.000
I figured it out BUT then found some resources were missing shifts for 2012
Ended up creating with tally table and just doing fresh inserts for every shift for the year
SELECT
rh.PlanPressID
,DATEADD(hh,(24 / #NoOfShifts) * (t.N - 1),#StartDateTime) AS ShiftStart
,DATEADD(hh,(24 / #NoOfShifts) * (t.N),#StartDateTime) AS ShiftEnd
,CHAR((t.N - 1) % #NoOfShifts + 65) AS ShiftCode
,DATEADD(dd,0,DATEDIFF(dd,0,DATEADD(hh,(24 / #NoOfShifts) * (t.N - 1),#StartDateTime))) AS ShiftDate
,0 AS Personnel
FROM
dbo.Tally t
CROSS JOIN dbo.ResourceHeader AS rh