time bucketing with cumsum condition - kdb

Hello Fellow Kdb Mortals :D
Stuck on a pretty weird problem here. I have a table like
time col is xbar-ed to 5-mins
time code name count
--------------------------------
00:00 SPY S&P.. 15
00:00 QQQ ... 88
00:00 IWM ... 100
00:00 XLE ... 80
00:05 QQQ ... 20
00:05 SPY ... 75
00:10 QQQ ... 22
00:10 XLE ... 10
00:15 SPY ... 23
.....
.....
23:40 XLE ... 11
23:50 SPY ... 16
23:55 IWM ... 100
23:55 QQQ ... 10
What I want to be returned is a table like (from asc time)
code name stime etime cumcount
------------------------------------------------
SPY S&P... 00:00 00:15 123 <-- 15+75+23
QQQ ... 00:00 00:05 108 <-- 88+20
IWM ... 00:00 00:00 100 <-- 100
XLE ... 00:00 23:40 101 <-- 80+10+11
Notice there is a condition on this time bucket, where the first cumulative sum by (code,name) is greater than or equal to 100.
I can also generate another table from bottoms up (desc time)
code name stime etime cumcount
------------------------------------------------
SPY ... 23:50 20:10 103
QQQ ... 23:55 21:45 118
IWM ... 23:55 23:55 100
XLE ... 23:40 00:00 101 <-- 11+10+80
I have been at this for a couple of hours, but can't get this working. Basic select and sums don't get me anywhere. I could use loops but thought I should check in here first before I go down that lane.
Any help is appreciated :D

Assuming you have a table sorted ascending on time i.e.:
`time xasc `t
Something like this could work
q)t1:update cumcount:sums cnt,stime:first time by code,name from t
q)select code,name,stime,etime:time, cumcount from t1 where cumcount>=100,i=(first;i) fby ([]code;name)
Notice that I have relabelled count as cnt to prevent a clash with the count function that already exists in the q language.
So first you calculate your cumulative count in the update statement.
Then select from the resulting table in such a way that first you pull out only those records where the count is > 100, then you use fby to filter down on this again to pull out the first record for each distinct (code;name) pair.
In this example stime is the time of the first entry for each (code;name) pair and etime will be time when it first exceeds 100.

I prefer Seans solution, but for the sake of alternative:
q)t:update name:string lower code from([]time:"u"$0 0 0 0 5 5 10 10 15 1420 1430 1435 1435;code:`SPY`QQQ`IWM`XLE 0 1 2 3 1 0 1 3 0 3 0 2 1;cnt:15 88 100 80 20 75 22 10 23 11 16 100 10);
q)exec{x x[`cumcnt]binr 100}[([]stime:first time;etime:time;cumcnt:sums cnt)]by code,name from t
code name | stime etime cumcnt
----------| ------------------
IWM "iwm"| 00:00 00:00 100
QQQ "qqq"| 00:00 00:05 108
SPY "spy"| 00:00 00:15 113
XLE "xle"| 00:00 23:40 101
Summing from the bottom would be:
q)exec{x x[`cumcnt]binr 100}[([]stime:last time;etime:reverse time;cumcnt:sums reverse cnt)]by code,name from t
code name | stime etime cumcnt
----------| ------------------
IWM "iwm"| 23:55 23:55 100
QQQ "qqq"| 23:55 00:00 140
SPY "spy"| 23:50 00:05 114
XLE "xle"| 23:40 00:00 101

Related

How to calculate the amount of SQL?

I have a table transaction_details:
transaction_id
customer_id
item_id
item_number
transaction_dttm
7765
1
23
1
2022-01-15
1254
2
12
4
2022-02-03
3332
3
56
2
2022-02-15
7658
1
43
1
2022-03-01
7231
4
56
1
2022-01-15
7231
2
23
2
2022-01-29
I need to calculate the amount spent by the client in the last month and find out the item (item_name) on which the client spent the most in the last month.
Example result:
|customer_id|amount_spent_lm|top_item_lm|
| - | ---------- | ----- |
| 1 | 700 | glasses |
| 2 | 20000 | notebook |
| 3 | 100 | cup |
When calculating, it is necessary to take into account the current price at the time of the transaction (dict_item_prices). Customers who have not made purchases in the last month are not included in the final table. he last month is defined as the last 30 days at the time of the report creation.
There is also a table dict_item_prices:
item_id
item_name
item_price
valid_from_dt
valid_to_dt
23
phone 1
1000
2022-01-01
2022-12-31
12
notebook
5000
2022-01-02
2022-12-31
56
cup
50
2022-01-02
2022-12-31
43
glasses
700
2022-01-01
2022-12-31

Address and smoothen noise in sensor data

I have sensors data as below wherein under Data Column, there are 6rows containing value 45 in between preceding and following rows containing value 50. The requirement is to clean this data and impute with 50 (prev value) in the new_data column. Moreover, the no of noise records (shown as 45 in table) might either vary in number or with level of rows.
Case 1 (sample data) :-
Sl.no
Timestamp
Data
New_data
1
1/1/2021 0:00:00
50
50
2
1/1/2021 0:15:00
50
50
3
1/1/2021 0:30:00
50
50
4
1/1/2021 0:45:00
50
50
5
1/1/2021 1:00:00
50
50
6
1/1/2021 1:15:00
50
50
7
1/1/2021 1:30:00
50
50
8
1/1/2021 1:45:00
50
50
9
1/1/2021 2:00:00
50
50
10
1/1/2021 2:15:00
50
50
11
1/1/2021 2:30:00
45
50
12
1/1/2021 2:45:00
45
50
13
1/1/2021 3:00:00
45
50
14
1/1/2021 3:15:00
45
50
15
1/1/2021 3:30:00
45
50
16
1/1/2021 3:45:00
45
50
17
1/1/2021 4:00:00
50
50
18
1/1/2021 4:15:00
50
50
19
1/1/2021 4:30:00
50
50
20
1/1/2021 4:45:00
50
50
21
1/1/2021 5:00:00
50
50
22
1/1/2021 5:15:00
50
50
23
1/1/2021 5:30:00
50
50
I am thinking of a need to group these data ordered by timestamp asc (like below) and then could have a condition in place where it will have to check group by group in large sample data and if group 1 is same as group 3 , replace group 2 with group 1 values.
Sl.no
Timestamp
Data
New_data
group
1
1/1/2021 0:00:00
50
50
1
2
1/1/2021 0:15:00
50
50
1
3
1/1/2021 0:30:00
50
50
1
4
1/1/2021 0:45:00
50
50
1
5
1/1/2021 1:00:00
50
50
1
6
1/1/2021 1:15:00
50
50
1
7
1/1/2021 1:30:00
50
50
1
8
1/1/2021 1:45:00
50
50
1
9
1/1/2021 2:00:00
50
50
1
10
1/1/2021 2:15:00
50
50
1
11
1/1/2021 2:30:00
45
50
2
12
1/1/2021 2:45:00
45
50
2
13
1/1/2021 3:00:00
45
50
2
14
1/1/2021 3:15:00
45
50
2
15
1/1/2021 3:30:00
45
50
2
16
1/1/2021 3:45:00
45
50
2
17
1/1/2021 4:00:00
50
50
3
18
1/1/2021 4:15:00
50
50
3
19
1/1/2021 4:30:00
50
50
3
20
1/1/2021 4:45:00
50
50
3
21
1/1/2021 5:00:00
50
50
3
22
1/1/2021 5:15:00
50
50
3
23
1/1/2021 5:30:00
50
50
3
Moreover, there is also a need to add an exception like, if the next group is having similar pattern, not to change but to retain the data as it is.
Ex below : If group 1 and group 3 are same , impute group 2 with group 1 value.
But if group 2 and group 4 are same, do not change group 3 , retain same data in New_data.
Case 2:-
Sl.no
Timestamp
Data
New_data
group
1
1/1/2021 0:00:00
50
50
1
2
1/1/2021 0:15:00
50
50
1
3
1/1/2021 0:30:00
50
50
1
4
1/1/2021 0:45:00
50
50
1
5
1/1/2021 1:00:00
50
50
1
6
1/1/2021 1:15:00
50
50
1
7
1/1/2021 1:30:00
50
50
1
8
1/1/2021 1:45:00
50
50
1
9
1/1/2021 2:00:00
50
50
1
10
1/1/2021 2:15:00
50
50
1
11
1/1/2021 2:30:00
45
50
2
12
1/1/2021 2:45:00
45
50
2
13
1/1/2021 3:00:00
45
50
2
14
1/1/2021 3:15:00
45
50
2
15
1/1/2021 3:30:00
45
50
2
16
1/1/2021 3:45:00
45
50
2
17
1/1/2021 4:00:00
50
50
3
18
1/1/2021 4:15:00
50
50
3
19
1/1/2021 4:30:00
50
50
3
20
1/1/2021 4:45:00
50
50
3
21
1/1/2021 5:00:00
50
50
3
22
1/1/2021 5:15:00
50
50
3
23
1/1/2021 5:30:00
50
50
3
24
1/1/2021 5:45:00
45
45
4
25
1/1/2021 6:00:00
45
45
4
26
1/1/2021 6:15:00
45
45
4
27
1/1/2021 6:30:00
45
45
4
28
1/1/2021 6:45:00
45
45
4
29
1/1/2021 7:00:00
45
45
4
30
1/1/2021 7:15:00
45
45
4
31
1/1/2021 7:30:00
45
45
4
Reaching out for help in coding in postgresql to address above scenario. Please feel free to suggest any alternative approaches to solve above problem.
The query below should answer the need.
The first query identifies the rows which correspond to a change of
data.
The second query groups the rows between two successive changes of data and set up the corresponding range of timestamp
The third query is a recursive query which calculates the new_data in an
iterative way according to the timestamp order.
The last query display the expected result.
WITH RECURSIVE list As
(
SELECT no
, timestamp
, lag(data) OVER w AS previous
, data
, lead(data) OVER w AS next
, data IS DISTINCT FROM lag(data) OVER w AS first
, data IS DISTINCT FROM lead(data) OVER w AS last
FROM sensors
WINDOW w AS (ORDER BY timestamp ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
), range_list AS
(
SELECT tsrange(timestamp, lead(timestamp) OVER w, '[]') AS range
, previous
, data
, lead(next) OVER w AS next
, first
FROM list
WHERE first OR last
WINDOW w AS (ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
), rec_list (range, previous, data, next, new_data, arr) AS
(
SELECT range
, previous
, data
, next
, data
, array[range]
FROM range_list
WHERE previous IS NULL
UNION ALL
SELECT c.range
, p.data
, c.data
, c.next
, CASE
WHEN p.new_data IS NOT DISTINCT FROM c.next
THEN p.data
ELSE c.data
END
, p.arr || c.range
FROM rec_list AS p
INNER JOIN range_list AS c
ON lower(c.range) = upper(p.range) + interval '15 minutes'
WHERE NOT array[c.range] <# p.arr
AND first
)
SELECT s.*, r.new_data
FROM sensors AS s
INNER JOIN rec_list AS r
ON r.range #> s.timestamp
ORDER BY timestamp
see the test result in dbfiddle

Extracting all rows containing a specific datetime value (MATLAB)

I have a table which looks like this:
Entry number
Timestamp
Value1
Value2
Value3
Value4
5758
28-06-2018 16:30
34
63
34.2
60.9
5759
28-06-2018 17:00
33.5
58
34.9
58.4
5758
28-06-2018 16:30
34
63
34.2
60.9
5759
28-06-2018 17:00
33.5
58
34.9
58.4
5760
28-06-2018 17:30
33
53
35.2
58.5
5761
28-06-2018 18:00
33
63
35
57.9
5762
28-06-2018 18:30
33
61
34.6
58.9
5763
28-06-2018 19:00
33
59
34.1
59.4
5764
28-06-2018 19:30
28
89
33.5
64.2
5765
28-06-2018 20:00
28
89
33
66.1
5766
28-06-2018 20:30
28
83
32.5
67
5767
28-06-2018 21:00
29
89
32.2
68.4
Where '28-06-2018 16:30' is under one column. So I have 6 columns:
Entry number, Timestamp, Value1, Value2, Value3, Value4
I want to extract all rows that belong to '28-06-2018', i.e all data pertaining to that day. Since my table is too large I couldn't fit more data, however, the entries under the timestamp range for a couple of months.
t=table([5758;5759],["28-06-2018 16:30";"29-06-2018 16:30"],[34;33.5],'VariableNames',{'Entry number','Timestamp','Value1'})
t =
2×3 table
Entry number Timestamp Value1
____________ __________________ ______
5758 "28-06-2018 16:30" 34
5759 "29-06-2018 16:30" 33.5
t(contains(t.('Timestamp'),"28-06"),:)
ans =
1×3 table
Entry number Timestamp Value1
____________ __________________ ______
5758 "28-06-2018 16:30" 34

Fetch records created within 24 hours in DB2

I need to fetch the records created within 24 hours . I wrote the below query however its not giving the desired result.
SELECT a,b,enddate,status
FROM data WHERE a='1013'AND c ='1250'and (TIMESTAMPDIFF(8,char(timestamp(enddate)-
TIMESTAMP(CURRENT_DATE)))) between 0 and 24
Below is the data present in the table
A B C Enddate
1013 Test1 1250 28-March-2020 11:00 AM
1013 Test2 1000 28-March-2020 15:00 PM
1013 Test3 1250 29-March-2020 05:00 AM
1013 Test4 1250 29-March-2020 13:00 PM
1013 Test5 2500 29-March-2020 17:00 PM
1013 Test6 1250 31-March-2020 19:00 PM
Assuming that CURRENT_DATE = 29-March-2020 19:00 PM the query should return 2 rows Test3 and Test4 . The above query does not return any row .
SELECT B, TS
FROM
(
VALUES
('Test1', TIMESTAMP('2020-03-28-11.00.00'))
, ('Test2', TIMESTAMP('2020-03-28-15.00.00'))
, ('Test3', TIMESTAMP('2020-03-29-05.00.00'))
, ('Test4', TIMESTAMP('2020-03-29-13.00.00'))
, ('Test5', TIMESTAMP('2020-03-29-17.00.00'))
, ('Test6', TIMESTAMP('2020-03-31-19.00.00'))
) T (B, TS)
WHERE TS BETWEEN TIMESTAMP('2020-03-29-19.00.00') - 24 HOURS AND TIMESTAMP('2020-03-29-19.00.00');
The result is:
|B |TS |
|-----|--------------------------|
|Test3|2020-03-29-05.00.00.000000|
|Test4|2020-03-29-13.00.00.000000|
|Test5|2020-03-29-17.00.00.000000|

kdb getting float from integer division

I have a table
id, turnover, qty
and I want to query
select sum turnover, sum qty, (sum turnover) div (sum qty) by id from Table
However, the the resulting value from the division seems to be an int and shows 0 (as the unit price is a lot smaller than 1). I tried to cast the results into a float, but that doesnt help
select sum turnover, sum qty, `float$(`float$(sum turnover) div `float$(sum qty)) by id from Table.
How can I get a float in return?
Also, as a side question. How can I name the column (equivalently to sql select sum(x) as my_column_name ...)
That's the expected output from div, you should use % to divide numbers - which always returns a float.
q)200 div 8.5
22
q)200%8.5
23.52941
q)
Reference here;
Div: http://code.kx.com/q/ref/arith-integer/#div
%: http://code.kx.com/q/ref/arith-float/#divide
*edit
Apologies - forgot to reference the rest of your question. In your example, you are calculating the sum turnover and sum qty twice - you will want to avoid that, if you're dealing with a lot of records.
How is this;
q)show trade:([] id:(`$"A",'string[til 10]);turnover:10?til 10; qty:10?100+til 200)
id turnover qty
---------------
A0 4 152
A1 4 238
A2 2 298
A3 2 268
A4 7 246
A5 2 252
A6 0 279
A7 5 286
A8 7 245
A9 5 191
q)update toverq:sumT%sumQ from select sumT:sum turnover,sumQ:sum qty by id from trade
id| sumT sumQ toverq
--| ---------------------
A0| 4 152 0.02631579
A1| 4 238 0.01680672
A2| 2 298 0.006711409
A3| 2 268 0.007462687
A4| 7 246 0.02845528
A5| 2 252 0.007936508
A6| 0 279 0
A7| 5 286 0.01748252
A8| 7 245 0.02857143
A9| 5 191 0.02617801