Functional update - multivariable function with dynamic columns - kdb

Any help with the following would be much appreciated!
I have two tables: table1 is a summary table whilst table2 is a list of all data points. I want to be able to summarise the information in table2 for each row in table1.
table1:flip `grp`constraint!(`a`b`c`d; 10 10 20 20);
table2:flip `grp`cat`constraint`val!(`a`a`a`a`a`b`b`b;`cl1`cl1`cl1`cl2`cl2`cl2`cl2`cl1; 10 10 10 10 10 10 20 10; 1 2 3 4 5 6 7 8);
function:{[grpL;constraintL;catL] first exec total: sum val from table2 where constraint=constraintL, grp=grpL,cat=catL};
update cl1:function'[grp;constraint;`cl1], cl2:function'[grp;constraint;`cl2] from table1;
The fourth line of this code achieves what I want for the two categories:cl1 and cl2
In table1 I want to name a new column with the name of the category (cl1, cl2, etc.) and I want the values in that column to be the output from running the function over that column.
However, I have hundreds of different categories, so don't want to have to list them out manually as in the fourth line. How would I pass in a list of categories, e.g. below?
`cl1`cl2`cl3

Sticking to your approach, you would just have to make your update statement functional and then iterate over the columns like so:
{![`table1;();0b;(1#x)!enlist ((';function);`grp;`constraint;1#x)]} each `cl1`cl2
Assuming you can amend table1 in place. If you must retain the original table1 then you can pass it by value though it will consume more memory
{![x;();0b;(1#y)!enlist ((';function);`grp;`constraint;1#y)]}/[table1;`cl1`cl2]
Another approach would be to aggregate, pivot and join though it's not necessarily a better solution as you get nulls rather than zeros
a:select sum val by cat,grp,constraint from table2
p:exec (exec distinct cat from a)#cat!val by grp,constraint from a
table1 lj p

There are several different methods you can look into.
The easiest method would be a functional update - http://code.kx.com/wiki/JB:QforMortals2/queries_q_sql#Functional_update
Below, though, should somewhat prove more useful, quicker and neater:
Your problem can be split into 2 parts. For the first part, you are looking to create a sum of each category by grp and constraint within table2. As for the second part, you are looking to join these results (the lookups) onto the corresponding records from table1.
You can create the necessary groups using by
q)exec val,cat by grp,constraint from table2
grp constraint| val cat
--------------| ------------------------------
a 10 | 1 2 3 4 5 `cl1`cl1`cl1`cl2`cl2
b 10 | 6 8 `cl2`cl1
b 20 | ,7 ,`cl2
Note though, this will only create nested lists of the columns in your select query
Next is to sum each of the cat groups
q)exec sum each val group cat by grp,constraint from table2
grp constraint|
--------------| ------------
a 10 | `cl1`cl2!6 9
b 10 | `cl2`cl1!6 8
b 20 | (,`cl2)!,7
Then, to create the cat's columns you can use a pivot like syntax - http://code.kx.com/wiki/Pivot
q)cats:asc exec distinct cat from table2
q)exec cats#sum each val group cat by grp,constraint from table2
grp constraint| cl1 cl2
--------------| -------
a 10 | 6 9
b 10 | 8 6
b 20 | 7
Now you can use this lookup table and index into each row from table1
q)(exec cats#sum each val group cat by grp,constraint from table2)[table1]
cl1 cl2
-------
6 9
8 6
To fill the nulls with zeros, use the carat symbol - http://code.kx.com/wiki/Reference/Caret
q)0^(exec cats#sum each val group cat by grp,constraint from table2)[table1]
cl1 cl2
-------
6 9
8 6
0 0
0 0
And now you can join on each row from table1 to your results using join-each
q)table1,'0^(exec cats#sum each val group cat by grp,constraint from table2)[table1]
grp constraint cl1 cl2
----------------------
a 10 6 9
b 10 8 6
c 20 0 0
d 20 0 0
HTH, Sean

This approach is the easiest way to pass in a list of categories
{table1^flip x!function'[table1`grp;table1`constraint;]each x}`cl1`cl2

Related

T-SQL vlookup with fake calendar table?

I am rather new in T-SQL and I have to create a view, where the output will be as shown below:
enter image description here
But my sales table doesn't have any data about sales in February and May for customer ABC and no data in January for customer XYZ, but I really want to have 0 for these months. How to do it in T-SQL?
This is great question about a very important topic that, even many experienced developers need to touch up on. Being "relatively new at SQL" I wont just offer a solution, I'll explain the key concepts involved.
The Auxiliary Table Numbers
First lets learn about what a tally table, aka numbers table is all about.
What does this do?
SELECT N = 1 ;
It returns the number 1.
N
-----
1
How about this?
SELECT N = 1 FROM (VALUES(0)) AS e(N);
Same thing:
N
-----
1
What does this return?
SELECT N = 1 FROM (VALUES(0),(0),(0),(0),(0),(0)) AS e(n);
Here I'm leveraging the VALUES table constructer which allows for a list of values to be treated like a view. This returns:
N
-------
1
1
1
1
1
We don't need the ones, we need the rows. This will make more sense in a moment. Now, what does this do?
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1;
It returns the same thing, five 1's, but I've wrapped the code into a CTE named e. Think of CTEs as inline unnamed views that you can reference multiple times. Now lets CROSS JOIN e to itself. This returns for 25 dummy rows (5*5).
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1, e e2;
Next we leverage ROW_NUMBER() over our set of dummy values.
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a;
Returns (truncated for brevity):
N
--------------------
1
2
3
...
24
25
Using as an auxiliary numbers table
#OneToTen is a table with random numbers 1 to 10. I need to count how many there are, returning 0 when there aren't any. NOTE MY COMMENTS:
;--== 2. Simple Use Case - Counting all numbers, including missing ones (missing = 0)
DECLARE #OneToTen TABLE (N INT);
INSERT #OneToTen VALUES(1),(2),(2),(2),(4),(8),(8),(10),(10),(10);
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT
N = i.N,
Wrong = COUNT(*), -- WRONG!!! Don't do THIS, this counts ALL rows returned
Correct = COUNT(t.N) -- Correct, this counts numbers from #OneToTen AKA "t.N"
FROM iTally AS i -- Aux Table of numbers
LEFT JOIN #OneToTen AS t -- Table to evaluate
ON i.N = t.N -- LEFT JOIN #OneToTen numbers to our Aux table of numbers
WHERE i.N <= 10 -- We only need the numbers 1 to 10
GROUP BY i.N; -- Group by with no Sort!!!
This returns:
N Wrong Correct
----- ----------- -----------
1 1 1
2 3 3
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 2 2
9 1 0
10 3 3
Note that I show you the wrong and right way to do this. Note how COUNT(*) is wrong for this, you need COUNT(whatever you are counting).
Auxiliary table of Dates (AKA calendar table)
My we use our numbers table to create a calendar table.
;--== 3. Auxilliary Month/Year Calendar Table
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1)
TheDate = f.Dt,
TheYear = YEAR(f.Dt),
TheMonth = MONTH(f.Dt),
TheWeekday = DATEPART(WEEKDAY,f.Dt),
DayOfTheYear = DATEPART(DAYOFYEAR,f.Dt),
LastDayOfMonth = EOMONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
This returns:
TheDate TheYear TheMonth TheWeekday DayOfTheYear LastDayOfMonth
---------- ----------- ----------- ----------- ------------ --------------
2019-10-01 2019 10 3 274 2019-10-31
2019-11-01 2019 11 6 305 2019-11-30
2019-12-01 2019 12 1 335 2019-12-31
2020-01-01 2020 1 4 1 2020-01-31
2020-02-01 2020 2 7 32 2020-02-29
2020-03-01 2020 3 1 61 2020-03-31
You will only need the YEAR and MONTH.
The Auxiliary Customer table
Because you are performing aggregations (SUM,COUNT,etc.) against multiple customers we will also need an Auxiliary table of customers, more commonly known as a lookup or dimension.
SAMPLE DATA:
;--== Sample Data
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
Distinct list of customers for an "Auxilliary Table of Customers"
SELECT DISTINCT s.Customer FROM #sale AS s;
For my sample data we get:
Customer
----------
ABC
CDF
FRED
Putting it all together
Here I'm going to:
Create a numbers table
Use my numbers table to create a calendar table
Create an auxiliary Customer table from #sale
CROSS JOIN (combine) both tables for a "junk dimension"
LEFT JOIN our sales data to our calendar/customer auxiliary tables/junk dimension
Group by the auxiliary table values
SOLUTION:
;--==== SAMPLE DATA
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
;--==== START/END DATEs
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
;--==== FINAL SOLUTION
WITH -- 6.1. Auxilliary Table of numbers:
E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a),
-- 6.2. Use numbers table to create an "Auxilliary Date Table" (Calendar Table):
MonthYear(SaleYear,SaleMonth) AS
(
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1) YEAR(f.Dt), MONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
)
SELECT
Customer = cust.Customer,
MonthYear = CONCAT(cal.SaleYear,'-',cal.SaleMonth),
Sales = ISNULL(SUM(s.SaleAmt),0)
-- Auxilliary Table of Customers
FROM (SELECT DISTINCT s.Customer FROM #sale AS s) AS cust -- 6.3. Aux Customer Table
CROSS JOIN MonthYear AS cal -- 6.4. Cross join to create Calendar/Customer Junk Dimension
LEFT JOIN #sale AS s -- 6.5. Join #sale to Junk Dimension on Year,Month and Customer
ON s.SaleYear = cal.SaleYear
AND s.SaleMonth = cal.SaleMonth
AND s.Customer = cust.Customer
GROUP BY cust.Customer, cal.SaleYear, cal.SaleMonth -- 6.6. Group by Junk Dim values
ORDER BY cust.Customer, cal.SaleYear, cal.SaleMonth; -- Order by not required
RESULTS:
Customer MonthYear Sales
---------- ------------ ------------
ABC 2019-10 0.00
ABC 2019-11 0.00
ABC 2019-12 410.00
ABC 2020-1 718.00
ABC 2020-2 0.00
ABC 2020-3 250.00
CDF 2019-10 200.00
CDF 2019-11 198.00
CDF 2019-12 0.00
CDF 2020-1 333.00
CDF 2020-2 5325.00
CDF 2020-3 1105.00
FRED 2019-10 0.00
FRED 2019-11 0.00
FRED 2019-12 0.00
FRED 2020-1 0.00
FRED 2020-2 0.00
FRED 2020-3 0.00

Storing output of a iterating function into unique tables KDB+/Q

I have a query function of the format:
function:{[x] select Y from table1 where date=x}
I want to iterate this function across all days stored in a separate table, like so
function each (select distinct date from table2)
and store every iteration into it's own, unique table. For example, if the first iteration was for 2021.11.10, all 'Y' values from table1 on that date are stored in a table named 'a', for the next iteration for date 2021.11.12, it goes to a table named 'b', or something like that. How can I do this?
Assume the following tables:
q)t1:([]date:.z.d + 0 0 0 1 1 1;a:1 2 3 4 5 6)
q)t2:([]date:.z.d+0 1)
then you could do something like:
q)function:{[x;y] x set select from t1 where date=y}
q)function'[`a`b;exec distinct date from t2]
`a`b
q)show a
date a
------------
2021.11.10 1
2021.11.10 2
2021.11.10 3
q)show b
date a
------------
2021.11.11 4
2021.11.11 5
2021.11.11 6
If you wanted to make it a bit more dynamic then you could also do something like:
q)function:{[x] (`$"tab",string[x] except ".") set select from t1 where date=x}
q)
q)
q)function each exec distinct date from t2
`tab20211110`tab20211111
q)show tab20211110
date a
------------
2021.11.10 1
2021.11.10 2
2021.11.10 3
q)show tab20211111
date a
------------
2021.11.11 4
2021.11.11 5
2021.11.11 6
The above creates new tables named for each date (we remove the "." from the resulting names as they could be confused for q's . operator)

PostgreSQL Query to get the output from same table with same table values

I have one table
id employee leave_days leave_type type
1 ABC 10 sick remove
2 ABC 20 sick add
3 ABC 15 Annual remove
4 ABC 50 Annual add
5 XYZ 10 sick remove
6 XYZ 20 sick add
7 XYZ 15 Annual remove
8 XYZ 50 Annual add
From the above table I will group by the column name called leave_type and then I will merge rows and the output should be as follows.
I have to group by column name leave_type and add new column called leave_allocated . In the leave_allocated column, the column type with value add only will come.
id employee leave_days leave_type leave_allocated
1 ABC 10 sick 20
2 ABC 15 Annual 50
3 XYZ 10 sick 20
4 XYZ 15 Annual 50
I tried with sub query I could not match the inner query with outer query .
This should help
SELECT id,
employee,
leave_dates,
leave type,
(SELECT leave_days
FROM TABLE t2
WHERE t2.id = t1.id
AND t2.type = 'add'
) leave_allocated
FROM TABLE t1
WHERE t1.type = 'remove'

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.

T-SQL How to get all items from a tree in T-SQL?

I have a problem with a t-sql query.
Let's say I have a categories tree (categories ID)
cat_table
1
|
2-\
| 3-\
6 | 5
| 4 |
... ...
ads_table
ad_ID
category_ID
of course the category_ID column references to the ID column in the cat_table
the problem is, how to get (recursive ?) all advertisements from all categories which the top-most parent is the 1st category?
try using a recursive Common Table Expressions, aka "CTE" (available in SQL Server 2005 and up) like this:
--go through a nested table supervisor - user table and display the chain
DECLARE #Contacts table (id varchar(6), first_name varchar(10), reports_to_id varchar(6))
INSERT #Contacts VALUES ('1','Jerome', NULL ) -- tree is as follows:
INSERT #Contacts VALUES ('2','Joe' ,'1') -- 1-Jerome
INSERT #Contacts VALUES ('3','Paul' ,'2') -- / \
INSERT #Contacts VALUES ('4','Jack' ,'3') -- 2-Joe 9-Bill
INSERT #Contacts VALUES ('5','Daniel','3') -- / \ \
INSERT #Contacts VALUES ('6','David' ,'2') -- 3-Paul 6-David 10-Sam
INSERT #Contacts VALUES ('7','Ian' ,'6') -- / \ / \
INSERT #Contacts VALUES ('8','Helen' ,'6') -- 4-Jack 5-Daniel 7-Ian 8-Helen
INSERT #Contacts VALUES ('9','Bill ' ,'1') --
INSERT #Contacts VALUES ('10','Sam' ,'9') --
DECLARE #Root_id char(4)
--get 2 and below
SET #Root_id=2
PRINT '#Root_id='+COALESCE(''''+#Root_id+'''','null')
;WITH StaffTree AS
(
SELECT
c.id, c.first_name, c.reports_to_id, c.reports_to_id as Manager_id, cc.first_name AS Manager_first_name, 1 AS LevelOf
FROM #Contacts c
LEFT OUTER JOIN #Contacts cc ON c.reports_to_id=cc.id
WHERE c.id=#Root_id OR (#Root_id IS NULL AND c.reports_to_id IS NULL)
UNION ALL
SELECT
s.id, s.first_name, s.reports_to_id, t.id, t.first_name, t.LevelOf+1
FROM StaffTree t
INNER JOIN #Contacts s ON t.id=s.reports_to_id
WHERE s.reports_to_id=#Root_id OR #Root_id IS NULL OR t.LevelOf>1
)
SELECT * FROM StaffTree
output:
#Root_id='2 '
id first_name reports_to_id Manager_id Manager_first_name LevelOf
------ ---------- ------------- ---------- ------------------ -----------
2 Joe 1 1 Jerome 1
3 Paul 2 2 Joe 2
6 David 2 2 Joe 2
7 Ian 6 6 David 3
8 Helen 6 6 David 3
4 Jack 3 3 Paul 3
5 Daniel 3 3 Paul 3
(7 row(s) affected)
change #Root_id to get different output:
#Root_id=null
id first_name reports_to_id Manager_id Manager_first_name LevelOf
------ ---------- ------------- ---------- ------------------ -----------
1 Jerome NULL NULL NULL 1
2 Joe 1 1 Jerome 2
9 Bill 1 1 Jerome 2
10 Sam 9 9 Bill 3
3 Paul 2 2 Joe 3
6 David 2 2 Joe 3
7 Ian 6 6 David 4
8 Helen 6 6 David 4
4 Jack 3 3 Paul 4
5 Daniel 3 3 Paul 4
(10 row(s) affected)
There is an option to avoid recurrency in tree browsing queries. You can add 'Path' column to your categories tree. It should contain each element ancestors IDs delimited with some non-numeric character (like slash).
For example your "ID=4" category's path would look like that: "/1/2/3/"
Now when you join your ads table to categories you need to do following:
select * from ads_table
inner join cat_table on cat_table.ID = ads_table.category_ID
where cat_table.Path like '/1/%'
And that's your query.
You can read more on this topic on my blog post
Are you familiar with Common Table Expressions in SQL Server? One of the many uses a CTE has is to do recursive queries.
The following is one of the best articles I've found on the subject:
https://web.archive.org/web/20210927200924/http://www.4guysfromrolla.com/webtech/071906-1.shtml