DB2 to Snowflake Migration - SQL: Deriving a result table using 'values' clause - db2

We are migrating off DB2 to Snowflake, and we have run into an issue with a lack of scope when using the 'values' clause. DB2 scope extends to all tables within the select query; while Snowflake is limited to that within the clause:
Valid in DB2:
select * from
t_1
join
table (values
(t_1.c1, ‘foo’)
,(t_1.c1, ‘bar’)
) as t_2 (c1, c2)
on t_1.c5 = t_2.c2
Not valid in Snowflake (slightly different syntax - remove 'table'):
select * from
t_1
join
(values
(t_1.c1, ‘foo’)
,(t_1.c1, ‘bar’)
) as t_2 (c1, c2)
on t_1.c5 = t_2.c2
This throws an error saying "...t_1 is undefined".
Does anyone know of an equivalent capability in Snowflake?
For some context: This construct sits behind a well embedded process for our business, and to rework it would mean significant investment.
It forms the basis for a dynamic ETL query which is passed into an ADF pipeline. There is a frontend which deploys components into this query - this deployable piece needs to be kept as simple as possible. It is the equivalent to the "(t_1.c1, ‘foo’)" line, so ideally any solution would keep this piece the same.
EDIT:
For some additional context...
In the example above, we have a placeholder which identifies the insert point of newly deployed items from the front end, like this:
select * from
t_1
join
table (values
(t_1.c1, ‘foo’)
,(t_1.c1, ‘bar’)
/*client_X_placeholder*/
) as t_2 (c1, c2)
on t_1.c5 = t_2.c2
union
select * from
t_1
join
table (values
(t_1.c1, ‘foo’)
,(t_1.c1, ‘bar’)
,(t_1.c2, 'barfoo')
/*client_Y_placeholder*/
) as t_2 (c1, c2)
on t_1.c5 = t_2.c2
We are pulling a series of metrics associated with clients, where the label is t_2.c2 and the result is t2.c1. This is still highly simplified relative to the real world. We need a simple way to update the placeholder with a new metric and its calculation - t2.c1 may be wrapped in a pile of functions and/or reside inside case statements, etc.

Lennart deserve this one, his code is so close
WITH t_1 (c1, c5) as (
SELECT 'a', 'bar'
)
select *
from t_1
join lateral (values
(t_1.c1, 'foo'),
(t_1.c1, 'bar')
) as t_2(c1,c2)
on t_1.c5 = t_2.c2;
but that causes an internal error:
000603 (XX000): SQL execution internal error:
Processing aborted due to error 300002:4143448929; incident 4919401.
But if you put a SELECT * FROM in there:
WITH t_1 (c1, c5) as (
SELECT 'a', 'bar'
)
select *
from t_1
join lateral (select * from values
(t_1.c1, 'foo'),
(t_1.c1, 'bar')
) as t_2(c1,c2)
on t_1.c5 = t_2.c2;
it works:
C1
C5
C1
C2
a
bar
a
bar
Lenart's try more brackets works also:
WITH t_1 (c1, c5) as (
SELECT 'a', 'bar'
)
select *
from t_1
join lateral (
select * from (
values
(t_1.c1, 'foo'),
(t_1.c1, 'bar')
)
) as t_2(c1,c2)
on t_1.c5 = t_2.c2;

I have no idea whether this works in snowflake, but SQL requires you to declare t_2 as LATERAL if you want to reference t_1 inside of it. Db2 uses tables as a synonym for lateral (which works as well). So you could try:
select *
from t_1
join LATERAL (
values (t_1.c1, ‘foo’)
, (t_1.c1, ‘bar’)
) as t_2 (c1, c2)
on t_1.c5 = t_2.c2
Lateral was introduced in SQL99 and is supported by many DBMS including Db2.
In Db2, TABLE is typically used when selecting from a function returning a table:
SELECT * FROM TABLE ( myfun() )
which is somewhat similar so I guess that is why TABLE can act as a synonym for LATERAL (just guessing).
I did some googling and LATERAL is described in the docs for snowflake: join-lateral
EDIT:
Apparently, there is a limitation when it comes to joining the values clause in the derived table. The answer by Simeon Pilgrim contains a fix for this

hmm, that string is almost correlated subquery/join with values:
The first thought I have is:
WITH t_1 (c1, c5) as (
SELECT 'a', 'bar'
)
select t_1.*, t_1.c1 as c1, t_2.c2
from t_1
join values ('foo'),
('bar') as t_2(c2)
on t_1.c5 = t_2.c2
;
but the part you wanting would look like (if it works):
WITH t_1 (c1, c5) as (
SELECT 'a', 'bar'
)
select t_1.*, t_1.c1 as c1, t_2.c2
from t_1
join values
(t_1.c1, 'foo'),
(t_1.c1, 'bar')
as t_2(c1,c2)
on t_1.c5 = t_2.c2
;
but it doesn't work.... more thinking required.
So IF that block comes in as
(t_1.c1, 'foo'),
(t_1.c1, 'bar')
then you can mash it into block with %%stuff%%
and then rip it apart and hoop jump..
WITH t_1 (c1, c5) as (
SELECT 'a', 'bar'
), magic as (
SELECT
split(t.value, ',') as p
,trim(get(p,0),'"') as p0
,trim(get(p,1),' \'")') as p1
FROM table(split_to_table($$(t_1.c1, 'foo'),
(t_1.c1, 'bar')$$,'(')) as t
WHERE t.value <> ''
), hoop_jumping(c1, c2) as (
SELECT
case p0
WHEN 't_1.c1' then t_1.c1
WHEN 't_1.c5' then t_1.c5
end,
p1
FROM magic, t_1
)
SELECT t.*, h.*
FROM t_1 as t
JOIN hoop_jumping as h
ON t.c5 = h.c2
and as long as you are will to cover all the bases of what was selected:
C1
C5
C1
C2
a
bar
a
bar
but I suspect it makes more sense to inject that block into a snowflake scripting block, and use it as injected code.

Related

Compare varchar string to produce missing items list

I have a table with a column. The column stores locations using varchar as the datatype. The locations use the format -2,7 -25,30 etc. I am trying to produce a list of missing locations i.e. where we don't have any customers.
The locations go from -30,-30 to 30,30. I can't find a way to setup a loop to run though all the options. Is there a way to do this?
Microsoft SQL Server 2017
;WITH cte as (
select -30 as n --anchor member
UNION ALL
select n + 1 --recursive member
from cte
where n < 31
)
select z.*
from (
select CONCAT(y.n,',',x.n) as locations
from cte as x CROSS JOIN cte y
) as z
LEFT OUTER JOIN dbo.Client as cli ON cli.client_location = z.locations
where cli.client_location IS NULL
order by z.locations asc
Generate all combinations.
Then match the generated against the existing combinations.
WITH DIGITS AS
(
SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS val(n)
),
NUMS AS
(
SELECT (tens.n * 10 + ones.n)-50 AS n
FROM DIGITS ones
CROSS JOIN DIGITS tens
),
LOCATIONS AS
(
SELECT CONCAT(n1.n,',',n2.n) AS location, n1.n as n1, n2.n as n2
FROM NUMS n1
JOIN NUMS n2 ON n2.n BETWEEN -30 AND 30
WHERE n1.n BETWEEN -30 AND 30
)
SELECT loc.location
FROM LOCATIONS loc
LEFT JOIN
(
SELECT Client_Location, COUNT(*) Cnt
FROM dbo.Client
GROUP BY Client_Location
) cl ON cl.Client_Location = loc.location
WHERE cl.Client_Location IS NULL
ORDER BY loc.n1, loc.n2
I would go with a recursive CTE. This is a slight variation of SNR's approach:
with cte as (
select -30 as n --anchor member
union all
select n + 1 --recursive member
from cte
where n < 30
)
select cte.x, cte.y,
concat(cte_x.n, ',', cte_y.n) as missing_location
from cte cte_x cross join
cte cte_y left join
dbo.client c
on c.client_location = concat(cte_x.n, ',', cte_y.n)
where c.client_location is null;
Or to avoid the concat() twice:
select cte.x, cte.y, v.location as missing_location
from cte cte_x cross join
cte cte_y cross apply
(values (concat(cte_x.n, ',', cte_y.n))
) v(location) left join
dbo.client c
on c.client_location = v.location
where c.client_location is null;

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC
Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.
Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).

SQL insert into using CTE

I am facing a performance issue due to "Insert into" statement in sql. I am using a CTE to select data from multiple tables and insert into other table. It was working just fine until yesterday. Select takes less than a minute to retrieve the data where as insert into taking forever. Can some one please help me in understanding what i am doing wrong. Any help is highly appreciated. Thanks.
Here is my code:
I am using this query in an SP. I am trying to load 220K records to 1.5M records table.
;with CTE_A
AS
(
SELECT A1, A2,...
FROM dbo.A with (nolock)
WHERE A1 = <some condition>
GROUP BY a.A1,a.A2 , a.A3
), CTE_C as
(
SELECT C1, C2,....
FROM dbo.B with (nolock)
WHERE a.C1 = <some condition>
GROUP BY a.c1,a.C2 , a.C3
)
INSERT INTO [dbo].MainTable
SELECT
A1, A2, A3 , C1, C2, C3
FROM
CTE_A ta with (nolock)
LEFT OUTER JOIN
CTE_C tc with (nolock) ON ta.a1 = tc.a1 and ta.b1 = tc.b1 and ta.c1 = tc.c1
LEFT OUTER JOIN
othertable bs with (nolock) ON usd_bs.c = s.c
AND (A1 BETWEEN bs.a1 AND bs.a1)
AND bs.c1 = 1
try this method (temp table instead cte), perfomance must be much higher for your task
IF OBJECT_ID('Tempdb..#CTE_A') IS NOT NULL
DROP TABLE #CTE_A
IF OBJECT_ID('Tempdb..#CTE_C') IS NOT NULL
DROP TABLE #CTE_C
-------------------------------------------------------------
SELECT A1 ,
A2 ,...
INTO #CTE_A --data set into temp table
FROM dbo.A WITH ( NOLOCK )
WHERE A1 = <some condition>
GROUP BY a.A1 ,
a.A2 ,
a.A3
-------------------------------------------------------------
SELECT C1 ,
C2 ,....
FROM dbo.B WITH ( NOLOCK )
INTO #CTE_C --data set into temp table
WHERE a.C1 = <some condition>
GROUP BY a.c1 ,
a.C2 ,
a.C3
INSERT INTO [dbo].MainTable
SELECT A1 ,
A2 ,
A3 ,
C1 ,
C2 ,
C3
FROM #CTE_A AS ta
LEFT JOIN #CTE_C AS tc ON ta.a1 = tc.a1
AND ta.b1 = tc.b1
AND ta.c1 = tc.c1
LEFT JOIN othertable AS bs ON usd_bs.c = s.c
AND ( A1 BETWEEN bs.a1 AND bs.a1 )
AND bs.c1 = 1

Get cartesian product of two columns

How can I get the cartesian product of two columns in one table?
I have table
A 1
A 2
B 3
B 4
and I want a new table
A 1
A 2
A 3
A 4
B 1
B 2
B 3
B 4
fiddle demo
your table
try this using joins
select distinct b.let,a.id from [dbo].[cartesian] a join [dbo].[cartesian] b on a.id<>b.id
will result like this
Create this table :
CREATE TABLE [dbo].[Table_1]
(
[A] [int] NOT NULL ,
[B] [nvarchar](50) NULL ,
CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED ( [A] ASC )
WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON ) ON [PRIMARY]
)
ON [PRIMARY]
Fill table like this :
INSERT INTO [dbo].[Table_1]
VALUES ( 1, 'A' )
INSERT INTO [dbo].[Table_1]
VALUES ( 2, 'A' )
INSERT INTO [dbo].[Table_1]
VALUES ( 3, 'B' )
INSERT INTO [dbo].[Table_1]
VALUES ( 4, 'C' )
SELECT *
FROM [dbo].[Table_1]
Use this query
SELECT DISTINCT
T1.B ,
T2.A
FROM dbo.Table_1 AS T1 ,
dbo.Table_1 AS T2
ORDER BY T1.B
To clarify loup's answer (in more detail that allowable in a comment), any join with no relevant criteria specified will naturally produce a Cartesian product (which is why a glib answer to your question might be "all too easily"-- mistakenly doing t1 INNER JOIN t2 ON t1.Key = t1.Key will produce the same result).
However, SQL Server does offer an explicit option to make your intentions known. The CROSS JOIN is essentially what you're looking for. But like INNER JOIN devolving to a Cartesian product without a useful join condition, CROSS JOIN devolves to a simple inner join if you go out of your way to add join criteria in the WHERE clause.
If this is a one-off operation, it probably doesn't matter which you use. But if you want to make it clear for posterity, consider CROSS JOIN instead.

SQL Column Populating

I want to know if it is possible to create another column in a table that has data that I wish to populate in this new column? The new column is Flag2. Here is the table:
what I want to do is, where item id is 30, I want the ITEM ID to only display 30 once and, populate the QC Unsupportted in Flag2? How do I do this?
I can only think of doing an inner join but this is not working.
This is what I have done in trying to do so:
SELECT
A.ITEMID, A.FLAG1, A.FLAG2
FROM
#FLAGS as A
INNER JOIN
#FLAGS as B ON A.ITEMID = B.ITEMID
GROUP BY
a.ITEMID, a.FLAG1, A.FLAG2
ORDER BY
ITEMID
Assuming I understand what you are after, if the current FLAG1 values are distinct for any ITEMID and you only have at most two instances of the same ID, I think this should do what you want:
SELECT
lft.ITEMID
, lft.FLAG1
, rght.FLAG1 FLAG2
FROM (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, COUNT(l.ITEMID) i
FROM #FLAGS l
INNER JOIN #FLAGS r ON l.ITEMID = r.ITEMID
WHERE r.FLAG1 <= l.FLAG1
GROUP BY
l.ITEMID
, l.FLAG1) t
WHERE t.i=1) lft
LEFT OUTER JOIN (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, COUNT(l.ITEMID) i
FROM #FLAGS l
INNER JOIN #FLAGS r ON l.ITEMID = r.ITEMID
WHERE r.FLAG1 <= l.FLAG1
GROUP BY
l.ITEMID
, l.FLAG1) t
WHERE t.i=2) rght ON lft.ITEMID = rght.ITEMID
-- Or better
SELECT
lft.ITEMID
, lft.FLAG1
, rght.FLAG1 FLAG2
FROM (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, ROW_NUMBER() OVER(PARTITION BY ITEMID ORDER BY FLAG1) as i
FROM test l) t
WHERE t.i=1) lft
LEFT OUTER JOIN (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, ROW_NUMBER() OVER(PARTITION BY ITEMID ORDER BY FLAG1) as i
FROM test l) t
WHERE t.i=2) rght ON lft.ITEMID = rght.ITEMID
If you have additional flag values for the same ID, a new outer join can be added to a new inline table (rght2, rght3, etc.) where i=3, 4, etc. and you are selecting rght2 AS FLAG3, rght3 AS FLAG4, etc.
Also note that the current values for FLAG1 will be distributed through FLAG1 and FLAG2 in alphabetical order. If you wanted to distribute them in reverse order you could replace <= with >=. If you had more than two flags that you wanted distributed in a specific order, you would have to create a separate table with a ranking value and join to that which would be doable but even uglier!