kdb+: function with two arguments from columns - kdb

I have a function that does something with a date and a function that takes two arguments to perform a calculation. For now let's assume that they look as follows:
d:{[x] :x.hh}
f:{[x;y] :x+y}
Now I want to use function f in a query as follows:
select f each (columnOne,d[columnTwo]) from myTable
Hence, I first want to convert one column to the corresponding numbers using function d. Then, using both columnOne and the output of d[columnTwo], I want to calculate the outcome of f.
Clearly, the approach above does not work, as it fails with a 'rank error.
I've also tried select f ./: (columnOne,'d[columnTwo]) from myTable, which also doesn't work.
How do I do this? Note that I need to input columnOne and columnTwo into f such that the corresponding rows still match. E.g. input row 1 of columnOne and row 1 of columnTwo simultaneously into f.

I've also tried select f ./: (columnOne,'d[columnTwo]) from myTable, which also doesn't work.
You're very close with that code. The issue is the d function, in particular the x.hh within function d - the .hh notation doesn't work in this context, and you will need to do `hh$x instead, so d becomes:
d:{[x] :`hh$x}
So making only this change to the above code, we get:
q)d:{[x] :`hh$x}
q)f:{[x;y] :x+y}
q)myTable:([] columnOne:10?5; columnTwo:10?.z.t);
q)update res:f ./: (columnOne,'d[columnTwo]) from myTable
columnOne columnTwo res
--------------------------
1 21:10:45.900 22
0 20:23:25.800 20
2 19:03:52.074 21
4 00:29:38.945 4
1 04:30:47.898 5
2 04:07:38.923 6
0 06:22:45.093 6
1 19:06:46.591 20
1 10:07:47.382 11
2 00:45:40.134 2
(I've changed select to update so you can see other columns in result table)
Other syntax to achieve the same:
q)update res:f'[columnOne;d columnTwo] from myTable
columnOne columnTwo res
--------------------------
1 21:10:45.900 22
0 20:23:25.800 20
2 19:03:52.074 21
4 00:29:38.945 4
1 04:30:47.898 5
2 04:07:38.923 6
0 06:22:45.093 6
1 19:06:46.591 20
1 10:07:47.382 11
2 00:45:40.134 2
Only other note worthy point - in the above example, function d is vectorised (works with vector arg), if this wasn't the case, you'd need to change d[columnTwo] to d each columnTwo (or d'[columnTwo])
This would then result in one of the following queries:
select res:f'[columnOne;d'[columnTwo]] from myTable
select res:f ./: (columnOne,'d each columnTwo) from myTable
select res:f ./: (columnOne,'d'[columnTwo]) from myTable

Related

select all columns with suffix _test in q kdb

I have a partitioned table, similar to below table:
q)t:([]date:3#2019.01.01; a:1 2 3; a_test:2 3 4; b_test:3 4 5; c: 6 7 8);
date a a_test b_test c
----------------------------
2019.01.01 1 2 3 6
2019.01.01 2 3 4 7
2019.01.01 3 4 5 8
Now, I want to fetch date column and all columns have names with suffix "_test" from table t.
Expected output:
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In my original table, there are more than 100 columns with name having _test so below is not a practical solution in this case.
q)select date, a_test, b_test from t where date=2019.01.01
I tried various options like below, but of no use:
q)delete all except date, *_test from select from t where date=2019.01.01
If the columns you are selecting are variable then you should use a functional qSQL statement to perform the query. The following can be used in your case
q)query:{[tab;dt;c]?[tab;enlist (=;`date;dt);0b;(`date,c)!`date,c]}
q)query[t;2019.01.01;cols[t] where cols[t] like "*_*"]
date a_test b_test
------------------------
2019.01.01 2 3
2019.01.01 3 4
2019.01.01 4 5
In order to craft a particular functional statement, you can parse your query, putting dummy columns in place if you aren't sure what they should be
q)parse "select date,c1,c2 from tab where date=dt"
?
`tab
,,(=;`date;`dt)
0b
`date`c1`c2!`date`c1`c2
A functional select is probably the best way to go here if you require adding further filters.
?[`t;();0b;{x!x}`date,exec c from meta t where c like "*_test"]
The functional form of any select quesry can be obtained by using the -5! operator on any SQL style statement.
In the example below I have created a table with 20 fields, each one beginning with either a or b.
I then use the functional form to define which fields I want.
q)tab:{[x] enlist x!count[x]#0}`$"_" sv ' raze string `a`b,/:\:til 10
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"b*"]
b_0 b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q){[t;s]?[t;();0b;{[x] x!x} cols[t] where cols[t] like s]}[tab;"a*"]
a_0 a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9
---------------------------------------
0 0 0 0 0 0 0 0 0 0
q)-5!" select a,b from c"
?
`c
()
0b
`a`b!`a`b
Alternatively, if I don't require any filtering I can use the # operator as in below:
{[x;s] (cols[x] where cols[x] like s)#x}[ tab;"a*"]

SQL Renumbering index after group by

I have the following input table:
Seq Group GroupSequence
1 0
2 4 A
3 4 B
4 4 C
5 0
6 6 A
7 6 B
8 0
Output table is:
Line NewSeq GroupSequence
1 1
2 2 A
3 2 B
4 2 C
5 3
6 4 A
7 4 B
8 5
The rules for the input table are:
Any positive integer in the Group column indicates that the rows are grouped together. The entire field may be NULL or blank. A null or 0 indicates that the row is processed on its own. In the above example there are two groups and three 'single' rows.
the GroupSequence column is a single character that sorts within the group. NULL, blank, 'A', 'B' 'C' 'D' are the only characters allowed.
if Group has a positive integer, there must be alphabetic character in GroupSequence.
I need a query that creates the output table with a new column that sequences as shown.
External apps needs to iterate through this table in either Line or NewSeq order(same order, different values)
I've tried variations on GROUP BY, PARTITION BY, OVER(), etc. WITH no success.
Any help much appreciated.
Perhaps this will help
The only trick here is Flg which will indicate a new Group Sequence (values will be 1 or 0). Then it is a small matter to sum(Flg) via a window function.
Edit - Updated Flg method
Example
Declare #YourTable Table ([Seq] int,[Group] int,[GroupSequence] varchar(50))
Insert Into #YourTable Values
(1,0,null)
,(2,4,'A')
,(3,4,'B')
,(4,4,'C')
,(5,0,null)
,(6,6,'A')
,(7,6,'B')
,(8,0,null)
Select Line = Row_Number() over (Order by Seq)
,NewSeq = Sum(Flg) over (Order By Seq)
,GroupSequence
From (
Select *
,Flg = case when [Group] = lag([Group],1) over (Order by Seq) then 0 else 1 end
From #YourTable
) A
Order By Line
Returns
Line NewSeq GroupSequence
1 1 NULL
2 2 A
3 2 B
4 2 C
5 3 NULL
6 4 A
7 4 B
8 5 NULL

kdb how to calculate rolling count

Assume I have a table of events, with Timestamp and Type.
t1, 'b'
t2, 'x'
t3, 's'
t4, 'b'
How can I get a rolling count such that it would give me a list of all timestamps and the cummulative number of events up to taht ts, sort of like a count version of sums
for example for 'b' I d like a table
't1', 1
't2', 1
't3', 1
't4', 2
Here is one way to do it, although there may be a more clever way this uses sums:
//table definition
tab:([]a:`t1`t2`t3`t4;b:"bxsb")
//rolling sum of 1 by column b
update sums count[i]#1 by b from tab
Results in:
a b x
------
t1 b 1
t2 x 1
t3 s 1
t4 b 2
If you wanted replace b you would simply put b: in front of the sums .
One way:
q)t:([]p:asc 4?.z.p+til 1000;t:`b`x`s`b)
q)asc `p xcols ungroup select p,til count i by t from t
p t x
---------------------------------
2017.05.16D09:42:48.259062090 b 0
2017.05.16D09:42:48.259062585 x 0
2017.05.16D09:42:48.259062683 s 0
2017.05.16D09:42:48.259062858 b 1
Ps: Note I have started the sequence at 0 as if to say "I've had 0 events prior to this row" instead of beginning at 1 as per your example. It goes with your req "number of events up to that ts". If you need 1, just add 1 '1+til count i'. Also ensure your time is sorted so as it makes sense when beginning the sequence.
With table t as below:
q)show t: ([]ts:.z.t - desc "u"$(til 4);symb:`b`x`z`b)
ts symb
-----------------
09:46:56.384 b
09:47:56.384 x
09:48:56.384 z
09:49:56.384 b
using a vector conditional:
q)select ts, cum_count:sums ?[symb=`b;1;0] from t
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
The same, but with a function taking symb as a parameter:
q){select ts, cum_count:sums ?[symb=x;1;0] from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
In fact you don't need a vector conditional because you can just sum the booleans directly:
q){select ts, cum_count:sums symb=x from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
This also works
update x:1+til count i by b from tab

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.

Insert rownumber repeatedly in records in t-sql

I want to insert a row number in a records like counting rows in a specific number of range. example output:
RowNumber ID Name
1 20 a
2 21 b
3 22 c
1 23 d
2 24 e
3 25 f
1 26 g
2 27 h
3 28 i
1 29 j
2 30 k
I rather to try using the rownumber() over (partition by order by column name) but my real records are not containing columns that will count into 1-3 rownumber.
I already try to loop each of record to insert a row count 1-3 but this loop affects the performance of the query. The query will use for the RDL report, that is why as much as possible the performance of the query must be good.
any suggestions are welcome. Thanks
have you tried modulo-ing rownumber()?
SELECT
((row_number() over (order by ID)-1) % 3) +1 as RowNumber
FROM table