Need t-sql view with difficult sort order - tsql

I have a sorting issue in a sql-server 2017 view. To simplify the question: I have a table with hierarchical data and has two columns: key and txt. The key column is used for the hierarchical order and has one, two or three positions. The txt column just has random text values. I need to sort the data, but on a combination of both key and txt columns. To be more precise, I need to get from the left view (sorted on key column) to the right view (the sort I need):
key
txt
key
txt
A
de
A
de
A1
al
A1
al
A2
nl
A3
gt
A3
gt
A31
oj
A31
oj
A2
nl
B
pf
B
pf
B1
zf
B4
ar
B2
br
B42
cd
B3
qa
B41
ik
B31
lb
B2
br
B32
bn
B3
qa
B33
kt
B32
bn
B4
ar
B33
kt
B41
ik
B31
lb
B42
cd
B1
zf
So the view should first show the top level (key is one character) and then below that row the txt values alphabetically (key is two characters). But if the key has three characters, the rows must be placed alphabetically under the matching key with two characters. In the example above, row with key A31 must be listed directly under the row with key A3, row with key B42 must be directly below B4 and B41 below B42, etc.
I have tried many things, but I cannot get the rows with the three character keys to appear directly under the proper two character key rows.
This is an example of what I tried:
SELECT *
FROM tbl
ORDER BY CASE LEN(key) WHEN 1 THEN key
WHEN 2 THEN LEFT(key, 1) + '10'
ELSE LEFT(key, 1) + '20'
END, txt
But this places the rows with three character keys at the bottom of the list...
Hope someone can put me in the right direction.

This is a really complicated process because your rules are more complicated than your schema. Here's my attempt, using window functions to group things together and determine which 2-character substring has the lowest txt value, then perform a series of ordering conditionals:
WITH cte AS
(
SELECT [key],
l = LEN([key]),
k1 = LEFT([key],1),
k2 = LEFT([key],2),
txt
FROM dbo.YourTableName
),
cte2 AS
(
SELECT *,
LowestTxt = MIN(CASE WHEN l = 2 THEN txt END) OVER (PARTITION BY k2),
Len2RN = ROW_NUMBER() OVER (PARTITION BY k2
ORDER BY CASE WHEN l = 2 THEN txt ELSE 'zzzzz' END)
FROM cte
)
SELECT [key], txt
FROM cte2
ORDER BY k1,
CASE WHEN l > 1 THEN 1 END,
LowestTxt,
CASE WHEN l = 2 THEN 'aaa' ELSE txt END,
Len2RN;
Example in this working fiddle.

Related

How to find duplicates in associated fields in PostgreSQL?

I have table in postgresql that has the following values:
KEY VALNO
1 a1
2 x1
3 x2
4 a3
5 a1
6 x2
7 a4
8 a5
9 x6
4 x7
7 a6
KEY expects unique values, but there are duplicates (4,7). VALNO should have a unique KEY assigned to them, but same VALNO had used multiple KEY (a1 used both 1 & 5, x2 used both 3 & 6).
I tried the following sql to find duplicates, but could not succeed.
select KEY, VALNO from mbs m1
where (select count(*) from mbs m2
where m1.KEY = m2.KEY) > 1
order by KEY
Is there a better way to find same VALNO's have used different KEYS, and same KEY's have used different VALNO's?
ie
Duplicate VALNO
KEY VALNO
1 a1
5 a1
3 x2
6 x2
Duplicate KEY
KEY VALNO
4 x7
7 a6
For VALNO duplicate records, we can use COUNT as an analytic function:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY VALNO) cnt
FROM mbs
)
SELECT "KEY", VALNO
FROM cte
WHERE cnt > 1;
The logic for the KEY duplicate query is almost identical, except that we can use this for the count definition:
COUNT(*) OVER (PARTITION BY "KEY") cnt

PostgreSQL: generate all possible strings of arbitrary length from a set of characters

I'm trying to generate all possible strings of length x using a fixed set of characters in PostgreSQL. For the simple case of x = 2 I can use the query below, but I cannot figure out how to do it for an arbitrary length (I'm assuming this will involve recursion):
with characters (c) as (
select unnest(array['a', 'b', 'c', 'd'])
)
select concat(c1.c, c2.c)
from characters c1
cross join characters c2
This generates aa, ab, ac, ad, ba, bb, bc, bd, etc.
Using recursive CTE:
with recursive characters (c) as (
select unnest(array['a', 'b', 'c', 'd'])
), param(val) AS (
VALUES (4) -- here goes param value
), cte AS (
select concat(c1.c, c2.c) AS c, 2 AS l
from characters c1
cross join characters c2
UNION ALL
SELECT CONCAT(c1.c, c2.c), l + 1
FROM cte c1
CROSS JOIN characters c2
WHERE l <= (SELECT val FROM param)
)
SELECT c
FROM cte
WHERE l = (SELECT val FROM param)
ORDER BY c;
db<>fiddle demo

Need to split column into rows and columns

I have a table like this:
ID cst
1 string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8,string9, 65
2 string10;-3;string11;string12;56;string13;6;string14;string15;9
etc.
Now I want to split the cst column into 5 columns and multiple rows.
So like this:
ID C1 C2 C3 C4 C5
1 string1 3 string2 string3 34
1 string4 -1 string5 string6 12
1 string7 5 string8 string9 65
2 string10 -3 string11 string12 56
2 string13 6 string14 string15 9
etc.
How to accomplish this? I am on SQL-server 2017, so I can use the string_split function. The problem with this function is that it produces only one output column...
Preferably I would like yo create an UDF that outputs a table. The function would use these input parameters: the string, the separator character, the number of columns. So the function can be used dynamically with a varying number of columns.
ps. the strings can be of variable length of course.
Try it along this:
Hint: There are some "normal" commas in your sample data.
I suspected these as wrong and used semicolons.
If this is wrong, you might use a general REPLACE() to use ";" instead of ",".
Create a declared table to simulate your issue
DECLARE #tbl TABLE(ID INT, cst VARCHAR(1000));
INSERT INTO #tbl(ID,cst)
VALUES(1,'string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8;string9; 65')
,(2,'string10;-3;string11;string12;56;string13;6;string14;string15;9');
--The query (for almost any version of SQL-Server, find v2017+ as UPDATE below)
WITH cte AS
(
SELECT t.ID
,B.Nr
,A.Casted.value('(/x[sql:column("B.Nr")]/text())[1]','varchar(max)') AS ValueAtPosition
,(B.Nr-1) % 5 AS Position
,(B.Nr-1)/5 AS GroupingKey
FROM #tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.cst,';','</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY(SELECT TOP(A.Casted.value('count(x)','int')) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) FROM master..spt_values) B(Nr)
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;
The idea in short:
we use APPLY to add your string casted to XML to the result set. This will help to split the string ("a;b;c" => <x>a</x><x>b</x><x>c</x>)
We use another APPLY to create a tally on the fly with a computed TOP-clause. It will return as many virtual rows as there are elements in the XML
We use sql:column() to grab each element's value by its position and some simple maths to create a grouping key and a running number from 0 to 4 and so on.
We use GROUP BY together with MAX(CASE...) to place the values in the fitting column (old-fashioned pivot or conditional aggregation).
Hint: If you want this fully generically, with a number of columns not knwon in advance. You cannot use any kind of function or ad-hoc query. You would rather need some kind of dynamic statement creation together with EXEC within a stored procedure.
to be honest: This might be a case of XY-problem. Such approaches are the wrong idea - at least in almost all situations I can think of.
UPDATE for SQL-Server 2017+
You are on v2017, this allows for JSON, which is a bit faster in position safe string splitting. Try this:
SELECT t.ID
,A.*
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
The general idea is the same. We transform a string to a JSON-array ("a,b,c" => ["a","b","c"]) and read it with APPLY OPENJSON().
You can perform the same maths at the "key" column and do the rest as above.
Just because it is ready here, this is the full query for v2017+
WITH cte AS
(
SELECT t.ID
,A.[key]+1 AS Nr
,A.[value] AS ValueAtPosition
,A.[key] % 5 AS Position
,A.[key]/5 AS GroupingKey
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;
The easiest option here honestly might be the following steps:
Write out the current table to a CSV flat file, using semicolon as the separator (which is also the separator for the current cst column
Then load the CSV using SQL Server's bulk loading tool, again with semicolon as the column separator. This will yield a table with 16 columns, ID, and then C1 through and including C15.
Create a new table (ID, C1, C2, C3, C4, C5)
Then populate the above table using:
INSERT INTO newTable (ID, C1, C2, C3, C4, C5)
SELECT ID, C1, C2, C3, C4, C5 FROM loadedTable UNION ALL
SELECT ID, C6, C7, C8, C9, C10 FROM loadedTable UNION ALL
SELECT ID, C11, C12, C13, C14, C15 FROM loadedTable;
While the above suggestion might seem like a lot of work, SQL Server has poor support for regex and complex string splitting operations, especially on earlier versions. Working directly with your current table might be either not possible or more work than the above.

reshaping table based on column values

I was looking at a problem of reshaping a table creating new columns according based on values.
I'm using the same example as this problem discussed there: A complicated sum in R data.table that involves looking at other columns
so I have a table:
df:([]ID:1+til 5;
Group:1 1 2 2 2;
V1:10 + 2 * til 5;
Type_v1:`t1`t2`t1`t1`t2;
V2:3 0N 0N 7 8;
Type_v2:`t2```t3`t3);
ID Group V1 Type_v1 V2 Type_v2
------------------------------
1 1 10 t1 3 t2
2 1 12 t2
3 2 14 t1
4 2 16 t1 7 t3
5 2 18 t2 8 t3
and the goal is to transform it to get the sum of values by group and type. please note the new columns created. basically all types in Type_v1 and Type_v2 are used to create columns for the resulting table.
# group v_1 type_1 v_2 type_2 v_3 type_3
#1: 1 10 t1 15 t2 NA <NA>
#2: 2 30 t1 18 t2 15 t3
I did the beginning but I am unable to transform the table and create the new columns.
also of course I'm trying to get all the columns created in a dynamic way, as it would not be possible to input 20k columns manually.
df1:select Group, Value:V1, Type:Type_v1 from df;
df2:select Group, Value:V2, Type:Type_v2 from df;
tr:df1,df2;
tr:0!select sum Value by Group, Type from tr where Type <> ` ;
basically I'm missing the equivalent of:
dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))
any help and explanations appreciated,
The last piece you're missing is a pivot: https://code.kx.com/q/kb/pivoting-tables/
q)P:exec distinct Type from tr
q)exec P#(Type!Value) by Group:Group from tr
Group| t1 t2 t3
-----| --------
1 | 10 15
2 | 30 18 15
It doesn't quite get you the exact output but pivot is the concept
You could expand on Terry's pivot to dynamically do the select parts above using functional form. See more detail here:
https://code.kx.com/q/basics/funsql/
// Personally, I would try to stay clear of column names too similar to reserved keywords in kdb
df: `id`grpCol`v_1`typCol_1`v_2`typCol_2 xcol df;
{[df;n]
// dynamically create cols from 1 to n
cls:`$("v_";"typCol_"),\:/:string 1 + til n;
// functional form of select for each type/value col before joining together
df:(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
// sum, then pivot
df:0!select sum v by grpCol, typCol from df where typCol <> `;
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
// Type cols seem unnecessary but
// Can be done with another functional select
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
}[df;2]
grpCol t1 typCol_1 t2 typCol_2 t3 typCol_3
1 10 t1 15 t2 0N t3
2 30 t1 18 t2 15 t3
EDIT - More detailed breakdown below:
cls:`$("v_";"typCol_") ,\:/: string 1 + til n;
Dynamically create a symbol list for the columns as they are required for column names when using functional form. I start by creating a list of v_ and typCol_ up to number n.
,\:/: -> join with each left and each right iterators
https://code.kx.com/q/ref/maps/#each-left-and-each-right
This allows me to join every item on the left ("v_";"typCol_") with every item on the right.
The same could be achieved with cross but you would have to restructure the list with flip and cut
flip n cut `$("v_";"typCol_") cross string 1 + til n
(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
(,/) -> This is the over iterator used with join. It takes the 1st table, joins it to the 2nd, then takes that and joins on to the 3rd etc.
https://code.kx.com/q/ref/over/
{?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls
// functional select
?[table; where; by; columns]
?[x; (); 0b; `grpCol`v`typCol!`grpCol,y]
This creates a list of tables, 1 for each column pair in the cls variable. Notice how I don't explicitly state x or y in the function like this {[x;y]}. This is because x y and z can be used implicitly, so this function works with or without.
The important part here is the last param (columns). For a functional select it is a dictionary with column names as the key and what the columns are as the values
e.g. `grpCol`v`typCol!`grpCol`v_1`typCol_1 -> this is renaming each v and typCol so they are the same to then join them all together with (,/).
There is a useful keyword to help with figuring out functional form -> parse
parse"select Group, Value:V1, Type:Type_v1 from df"
0 ?
1 `df
2 ()
3 0b
4 (`Group`Value`Type)!`Group`V1`Type_v1
P:exec distinct typCol from df;
df:exec P#(typCol!v) by grpCol:grpCol from df;
pivoting is outlined here: https://code.kx.com/q/kb/pivoting-tables/
It effectively flips/rotates a section of the table. It takes the distinct types from typCol as the columns and uses the v column as the rows for each corresponding typCol
?[table; where; by; columns]
?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
Again look at the last param in the functional select i.e. columns. This is how it looks after being dynamically generated:
(`grpCol`t1`typCol_1`t2`typCol_2`t3`typCol_3)!(`grpCol;`t1;enlist `t1;`t2;enlist `t2;`t3;enlist `t3)
It is kind of a hacky way to get the type columns, I select each t1 t2 t3 with a typeCol_1 _2 _3,
`t1 = (column) `t1
`typCol_1 = enlist `t1 -> the enlist here tells kdb I want the value `t1 rather than the column

Common records for 2 fields in a table?

I have a Table which has 2 fields say A,B. Suppose A has values a1,a2.
Corresponding records for a1 in B are 1,2,3,x,y,z.
Corresponding records for a2 in B are 1,2,3,4,d,e,f
I need a a query to be written in DB2, so that it will fetch the common records in B for each record in A (a1 and a2).
So here the output would be :
A B
a1 1
a1 2
a1 3
a2 1
a2 2
a2 3
Can someone please help on this?
Try something like:
SELECT A, B
FROM Table t1
WHERE (SELECT COUNT(*) FROM Table t2 WHERE t2.B = t1.B)
= (SELECT COUNT(DISTINCT t3.A) FROM Table t3)
ORDER BY A, B
This might not be 100% accurate as I can't test it out in DB2 so you might have to tweak the query a little bit to make it work.
with t(num) as (select count(distinct A) from table)
select t1.A, t1.B
from table t1, table t2, t
where t1.B = t2.B
group by t1.A, t1.B, num
having count(*) = num
Basically, the idea is to join the same table with column B and filter out just the ones that match exactly the same number of times as the number of elements in column A, which indicates that it is a common record out of all the A values.