Select distinct for all columns from keyed table - kdb

It seems we can not get distinct values from a keyed table in the same way as for unkeyed:
t:([a:1 2]b:3 4)
?[t;();0b;()] // keyed table
?[0!t;();1b;()] // unkeyed table
?[t;();1b;()] // err 'type
Why do we have this error here?

I suspect it's the same reason you can't run distinct on a dictionary - it's ambiguous. Do you intend to apply distinct to the keys or the values? I think kdb doesn't pick a side so it makes you do it yourself.
q)t:([]a:1 1 1 2 2;b:10 12 10 14 14)
q)select distinct from t
a b
----
1 10
1 12
2 14
q)select distinct from 1!t
'type
q)distinct `a`b`c!(1;"ab";enlist 1b)
'type

Related

KDB Select rows from a table based on one of its column while comparing it to another table

I have table1 as below.
num
value
1
10
2
15
3
20
table2
ver
value
1.0
5
2.0
15
3.0
18
Output should be as below. I need to select all rows from table1 such that table1.value <= table2.value.
num
value
1
10
2
15
I tried this, it's not working.
select from table1 where value <= (exec value from table2)
From a logical point of view what you're asking kdb to compare is:
10 15 20<=5 15 18
Because these are equal lengths, kdb assumes you mean pairwise comparison, aka
10<=5
15<=15
20<=18
to which it would return
q)10 15 20<=5 15 18
010b
What you actually seem to mean (based on your expected output) is 10 15 20<=max(5 15 18). So in that case you would want:
q)t1:([]num:1 2 3;val:10 15 20)
q)t2:([]ver:1 2 3.;val:5 15 18)
q)select from t1 where val<=exec max val from t2
num val
-------
1 10
2 15
As an aside, you can't/shouldn't have a column called value as it clashes with a keyword
value is a keyword so don't assign to it.
Assuming you want all values from table1 with value less than the max value in table2 you could do:
q)table1:([]num:til 3;val:10 15 20)
q)table2:([]ver:`float$til 3;val:5 15 18)
q)select from table1 where val<=max table2`val
num val
-------
0 10
1 15

Get distinct rows based on one column with T-SQL

I have a column in the following format:
Time Value
17:27 2
17:27 3
I want to get the distinct rows based on one column: Time. So my expected result would be one result. Either 17:27 3 or 17:27 3.
Distinct
T-SQL uses distinct on multiple columns instead of one. Distinct would return two rows since the combinations of Time and Value are unique (see below).
select distinct [Time], * from SAPQMDATA
would return
Time Value
17:27 2
17:27 3
instead of
Time Value
17:27 2
Group by
Also group by does not appear to work
select * from table group by [Time]
Will result in:
Column 'Value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Questions
How can I select all unique 'Time' columns without taking into account other columns provided in a select query?
How can I remove duplicate entries?
This is where ROW_NUMBER will be your best friend. Using this as your sample data...
time value
-------------------- -----------
17:27 2
17:27 3
11:36 9
15:14 5
15:14 6
.. below are two solutions with that you can copy/paste/run.
DECLARE #youtable TABLE ([time] VARCHAR(20), [value] INT);
INSERT #youtable VALUES ('17:27',2),('17:27',3),('11:36',9),('15:14',5),('15:14',6);
-- The most elegant way solve this
SELECT TOP (1) WITH TIES t.[time], t.[value]
FROM #youtable AS t
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL));
-- A more efficient way solve this
SELECT t.[time], t.[value]
FROM
(
SELECT t.[time], t.[value], ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL)) AS RN
FROM #youtable AS t
) AS t
WHERE t.RN = 1;
Each returns:
time value
-------------------- -----------
11:36 9
15:14 5
17:27 2

Replace content in 'order' column with sequential numbers

I have an 'order' column in a table in a postgres database that has a lot of missing numbers in the sequence. I am having a problem figuring out how to replace the numbers currently in the column, with new ones that are incremental (see examples).
What I have:
id order name
---------------
1 50 Anna
2 13 John
3 2 Bruce
4 5 David
What I want:
id order name
---------------
1 4 Anna
2 3 John
3 1 Bruce
4 2 David
The row containing the lowest order number in the old version of the column should get the new order number '1', the next after that should get '2' etc.
You can use the window function row_number() to calculate the new numbers. The result of that can be used in an update statement:
update the_table
set "order" = t.rn
from (
select id, row_number() over (order by "order") as rn
from the_table
) t
where t.id = the_table.id;
This assumes that id is the primary key of that table.

Functional update - multivariable function with dynamic columns

Any help with the following would be much appreciated!
I have two tables: table1 is a summary table whilst table2 is a list of all data points. I want to be able to summarise the information in table2 for each row in table1.
table1:flip `grp`constraint!(`a`b`c`d; 10 10 20 20);
table2:flip `grp`cat`constraint`val!(`a`a`a`a`a`b`b`b;`cl1`cl1`cl1`cl2`cl2`cl2`cl2`cl1; 10 10 10 10 10 10 20 10; 1 2 3 4 5 6 7 8);
function:{[grpL;constraintL;catL] first exec total: sum val from table2 where constraint=constraintL, grp=grpL,cat=catL};
update cl1:function'[grp;constraint;`cl1], cl2:function'[grp;constraint;`cl2] from table1;
The fourth line of this code achieves what I want for the two categories:cl1 and cl2
In table1 I want to name a new column with the name of the category (cl1, cl2, etc.) and I want the values in that column to be the output from running the function over that column.
However, I have hundreds of different categories, so don't want to have to list them out manually as in the fourth line. How would I pass in a list of categories, e.g. below?
`cl1`cl2`cl3
Sticking to your approach, you would just have to make your update statement functional and then iterate over the columns like so:
{![`table1;();0b;(1#x)!enlist ((';function);`grp;`constraint;1#x)]} each `cl1`cl2
Assuming you can amend table1 in place. If you must retain the original table1 then you can pass it by value though it will consume more memory
{![x;();0b;(1#y)!enlist ((';function);`grp;`constraint;1#y)]}/[table1;`cl1`cl2]
Another approach would be to aggregate, pivot and join though it's not necessarily a better solution as you get nulls rather than zeros
a:select sum val by cat,grp,constraint from table2
p:exec (exec distinct cat from a)#cat!val by grp,constraint from a
table1 lj p
There are several different methods you can look into.
The easiest method would be a functional update - http://code.kx.com/wiki/JB:QforMortals2/queries_q_sql#Functional_update
Below, though, should somewhat prove more useful, quicker and neater:
Your problem can be split into 2 parts. For the first part, you are looking to create a sum of each category by grp and constraint within table2. As for the second part, you are looking to join these results (the lookups) onto the corresponding records from table1.
You can create the necessary groups using by
q)exec val,cat by grp,constraint from table2
grp constraint| val cat
--------------| ------------------------------
a 10 | 1 2 3 4 5 `cl1`cl1`cl1`cl2`cl2
b 10 | 6 8 `cl2`cl1
b 20 | ,7 ,`cl2
Note though, this will only create nested lists of the columns in your select query
Next is to sum each of the cat groups
q)exec sum each val group cat by grp,constraint from table2
grp constraint|
--------------| ------------
a 10 | `cl1`cl2!6 9
b 10 | `cl2`cl1!6 8
b 20 | (,`cl2)!,7
Then, to create the cat's columns you can use a pivot like syntax - http://code.kx.com/wiki/Pivot
q)cats:asc exec distinct cat from table2
q)exec cats#sum each val group cat by grp,constraint from table2
grp constraint| cl1 cl2
--------------| -------
a 10 | 6 9
b 10 | 8 6
b 20 | 7
Now you can use this lookup table and index into each row from table1
q)(exec cats#sum each val group cat by grp,constraint from table2)[table1]
cl1 cl2
-------
6 9
8 6
To fill the nulls with zeros, use the carat symbol - http://code.kx.com/wiki/Reference/Caret
q)0^(exec cats#sum each val group cat by grp,constraint from table2)[table1]
cl1 cl2
-------
6 9
8 6
0 0
0 0
And now you can join on each row from table1 to your results using join-each
q)table1,'0^(exec cats#sum each val group cat by grp,constraint from table2)[table1]
grp constraint cl1 cl2
----------------------
a 10 6 9
b 10 8 6
c 20 0 0
d 20 0 0
HTH, Sean
This approach is the easiest way to pass in a list of categories
{table1^flip x!function'[table1`grp;table1`constraint;]each x}`cl1`cl2

T-SQL table variable data order

I have a UDF which returns table variable like
--
--
RETURNS #ElementTable TABLE
(
ElementID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
ElementValue VARCHAR(MAX)
)
AS
--
--
Is the order of data in this table variable guaranteed to be same as the order data is inserted into it. e.g. if I issue
INSERT INTO #ElementTable(ElementValue) VALUES ('1')
INSERT INTO #ElementTable(ElementValue) VALUES ('2')
INSERT INTO #ElementTable(ElementValue) VALUES ('3')
I expect data will always be returned in that order when I say
select ElementValue from #ElementTable --Here I don't use order by
EDIT:
If order by is not guaranteed then the following query
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross Apply dbo.MyFunc T2
order by t1.elementid
will not produce 9x9 matrix as
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
consistently.
Is there any possibility that it could be like
1 2
1 1
1 3
2 3
2 2
2 1
3 1
3 2
3 3
How to do it using my above function?
No, the order is not guaranteed to be the same.
Unless, of course you are using ORDER BY. Then it is guaranteed to be the same.
Given your update, you obtain it in the obvious way - you ask the system to give you the results in the order you want:
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross join dbo.MyFunc() T2
order by t1.elementid, t2.elementid
You are guaranteed that if you're using inefficient single row inserts within your UDF, that the IDENTITY values will match the order in which the individual INSERT statements were specified.
Order is not guaranteed.
But if all you want is just simply to get your records back in the same order you inserted them, then just order by your primary key. Since you already have that field setup as an auto-increment, it should suffice.
...or use a deterministic function
SELECT TOP 9
M1 = (ROW_NUMBER() OVER(ORDER BY id) + 2) / 3,
M2 = (ROW_NUMBER() OVER(ORDER BY id) + 2) % 3 + 1
FROM
sysobjects
M1 M2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3