TSQL: How to concatenate string of GROUPED values

TSQL: How to concatenate string of GROUPED values - tsql

I encountered a lot of thread about this, the solutions suggested all tend to go the same way, but it is very inconvenient in my case.
Most of the time something like this is suggested.
DECLARE #Actors TABLE ( [Id] INT , [Name] VARCHAR(20) , [MovieId] INT);
DECLARE #Movie TABLE ( [Id] INT, [Name] VARCHAR(20), [FranchiseId] INT );
INSERT INTO #Actors
( Id, Name, MovieId )
VALUES ( 1, 'Sean Connery', 1 ),
( 2, 'Gert Fröbe', 1 ),
( 3, 'Honor Blackman', 1 ),
( 4, 'Daniel Craig', 2 ),
( 5, 'Judi Dench', 2 ),
( 2, 'Harrison Ford', 3 )
INSERT INTO #Movie
( Id, Name, FranchiseId )
VALUES ( 1, 'Goldfinger', 1 ),
( 2, 'Skyfall', 1 ),
( 3, 'Return of the Jedi', 2 )
SELECT m.Name ,
STUFF(( SELECT ',' + a_c.Name
FROM #Actors a_c
WHERE a_c.MovieId = m.Id
FOR
XML PATH('')
), 1, 1, '')
FROM #Actors a
JOIN #Movie m ON a.MovieId = m.Id
GROUP BY m.Id ,
m.Name
The Problem is (how shall I explain?), one does not really access the grouped Items (as Count(), Max(), Min(), ...), one does rebuild the joining pattern of the "outer query" and force in the WHERE statement, that the corresponding values are the same as those in the GROUP BY statement (in the outer query).
If you do not understand what I'm trying to say, I extended the Example above, by one additional table and you will see, that I will also have to extend the "Inner Query"
DECLARE #Actors TABLE ( [Id] INT , [Name] VARCHAR(20) , [MovieId] INT);
DECLARE #Movie TABLE ( [Id] INT, [Name] VARCHAR(20), [FranchiseId] INT );
DECLARE #Franchise TABLE ( [Id] INT , [Name] VARCHAR(20));
INSERT INTO #Actors
( Id, Name, MovieId )
VALUES ( 1, 'Sean Connery', 1 ),
( 2, 'Gert Fröbe', 1 ),
( 3, 'Honor Blackman', 1 ),
( 4, 'Daniel Craig', 2 ),
( 5, 'Judi Dench', 2 ),
( 2, 'Harrison Ford', 3 )
INSERT INTO #Movie
( Id, Name, FranchiseId )
VALUES ( 1, 'Goldfinger', 1 ),
( 2, 'Skyfall', 1 ),
( 3, 'Return of the Jedi', 2 )
INSERT INTO #Franchise
( Id, Name )
VALUES ( 1, 'James Bond' ),
( 2, 'Star Wars' )
SELECT f.Name ,
STUFF(( SELECT ',' + a_c.Name
FROM #Actors a_c
JOIN #Movie m_c ON a_c.MovieId = m_c.Id
WHERE m_c.FranchiseId = f.Id
FOR
XML PATH('')
), 1, 1, '')
FROM #Actors a
JOIN #Movie m ON a.MovieId = m.Id
JOIN #Franchise f ON m.FranchiseId = m.Id
GROUP BY f.Id ,
f.Name
And now, going somewhat further, imagine a huge query, very complicated, several grouping values over many tables. Performance is an issue. I don't want to rebuild the whole joining pattern in the "inner query".
So is there any other way? A way that does not kill performance and you do not have to duplicate the joining pattern?

Contrary to what I said in this comment, you need no GROUP BY clause, nor a WHERE clause, at all!
You simply need the outer SELECT to "iterate" over all franchises (or whatever you want to group by). Then in the inner SELECT, you need some JOINs to get to the franchise key column. Instead of a WHERE clause to filter by the outer franchise's key, simply use the outer franchise key directly in the INNER JOIN:
SELECT f.Name AS FranchiseName,
COALESCE(STUFF((SELECT DISTINCT ', ' + a.Name
FROM #Actor a
JOIN #Movie m ON a.MovieId = m.Id
WHERE m.FranchiseId = f.Id
ORDER BY ', ' + a.Name -- this is optional
FOR XML PATH('')), 1, 1, ''), '') AS ActorNames
FROM #Franchise f
Source of information: "High Performance T-SQL Using Window Functions" by Itzik Ben-Gak. Because SQL Server unfortunately does not have an aggregate/window function for concatenating values, the book's author recommends something like the above as the next best solution.
P.S.: I've removed my previous solution that substituted an additional JOIN for a WHERE clause; I am now fairly certain that a WHERE clause is likely to perform better. Nevertheless, I left some evidence of my previous solution (i.e. the striked-through text) because of that reference to a comment I made earlier.

Related

WHERE in NOT EXISTS clause being ignored

I'm trying to fill a table with rows that should be there: If a city in #Maps does not exist in #Results, then I will fill it using NOT EXISTS. The issue is that the filterisused = 1 not only is ignored, it seems to void the NOT EXISTS.
With IsUsed = 1, everything in #Maps will be inserted to #Results regardless if it exists or not.
If I remove IsUsed = 1, both rows from NY are inserted (correct behavior but not what I'm looking for).
Here's the code:
declare #Maps table
(
Name varchar(20),
IsUsed bit,
Code varchar(20)
)
insert into #Maps
select 'NY', 1, 'NY1'
union select 'NY', 0, 'NY2'
union select 'FL', 0, 'FL1'
union select 'TX', 0, 'TX1'
declare #Results table
(
Name varchar(20),
Value int,
Code varchar(20)
)
insert into #results
select 'FL', 12, 'FL1'
union
select 'TX', 54,'TX1'
union
select 'CA', 54,'CA1'
union
select 'NJ', 54,'NJ1'
insert into #results
select Name, 999, code from #Maps m
-- This adds everything even if it exists
where not exists (select name from #Results p where p.name = m.name and IsUsed = 1)
-- This adds both 'NY'. Partially correct but adds column IsUsed = 0
-- where not exists (select name from #Results p where p.name = m.name)
select * from #results
How can I add the one row that's not included in #results and has IsUsed equal to 1? In this case it would be {'NY', 1, 'NY1}`.
I understand that there are many ways of accomplishing this, but I'm interested in knowing how the where clause in not exists work.

You have to remove the IsUsed=1 from the NOT EXISTS and add it to the WHERE:
insert into #results
select Name, 999, code
from #Maps m
where m.IsUsed = 1
and not exists (select name from #Results p where p.name = m.name)

I think you confuse how an insert select works. The select is run independently. The insert is not committed until the end of the statement. See all the inserted cnt is 4.
declare #maps table(name varchar(10), isUsed bit, code varchar(10));
insert into #Maps values
('NY', 1, 'NY1')
, ('NY', 0, 'NY2')
, ('FL', 0, 'FL1')
, ('TX', 0, 'TX1')
declare #Results table (Name varchar(20), Value int, Code varchar(20), cnt int)
insert into #results values
('FL', 12, 'FL1', null)
, ('TX', 54, 'TX1', null)
, ('CA', 54, 'CA1', null)
, ('NJ', 54, 'NJ1', null)
select * from #results;
insert into #Results
select m.Name, 999, m.code
, (select count(*) from #results) as cnt
from #Maps m
where not exists (select name
from #Results p
where p.name = m.name
and m.IsUsed = 1)
select * from #results;
On the first NY where p.name = m.name is false so not exits is true
On the second NY where p.name = m.name is false so not exits is true
The first NY as not been committed
On the FL and TL the where p.name = m.name is true but m.IsUsed = 1 is false so not exits is true

TSQL Case WHEN LIKE REPLACE

Newbie question... looking for the fastest way to update a new column based on the existence of a value from another table, while replacing values.
Example, below, taking the words 'Bought a car' with 'car' into another table. The problem is 'Bought a car' is into another table.
I did a hack to reselect the value and do a replace, but with more rows, the performance is horrible, taking up to 3 to 5 minutes to perform.
Oh SQL Gurus, what is the best way to do this?
Example
DECLARE #Staging_Table TABLE
(
ACCTID INT IDENTITY(1,1),
NAME VARCHAR(50),
PURCHASES VARCHAR(255)
)
INSERT INTO #Staging_Table (Name, Purchases)
VALUES ('John','Bought a table')
INSERT INTO #Staging_Table (Name, Purchases)
VALUES ('Jack','Sold a car')
INSERT INTO #Staging_Table (Name, Purchases)
VALUES ('Mary','Returned a chair')
DECLARE #HISTORY TABLE
(
ACCTID INT IDENTITY(1,1),
NAME VARCHAR(50),
Item VARCHAR(255)
)
INSERT INTO #HISTORY (Name, Item)
VALUES ('John','')
INSERT INTO #HISTORY (Name, Item)
VALUES ('Jack','')
INSERT INTO #HISTORY (Name, Item)
VALUES ('Mary','')
UPDATE #HISTORY
Set ITEM = CASE WHEN EXISTS(
Select ts.Purchases as Output from #Staging_Table ts
where ts.NAME = Name AND ts.PURCHASES LIKE '%table%')
THEN REPLACE((Select ts2.PURCHASES Output
from #Staging_Table ts2 where ts2.NAME = Name AND ts2.PURCHASES LIKE '%table%'),'Bought a ','')
WHEN EXISTS(
Select ts.Purchases as Output from #Staging_Table ts
where ts.NAME = Name AND ts.PURCHASES LIKE '%car%')
THEN REPLACE((Select ts2.PURCHASES Output
from #Staging_Table ts2 where ts2.NAME = Name AND ts2.PURCHASES LIKE '%car%'),'Bought a ','')
End
SELECT * FROM #HISTORY

DECLARE #Staging_Table TABLE
(
ACCTID INT IDENTITY(1, 1) ,
NAME VARCHAR(50) ,
PURCHASES VARCHAR(255)
)
INSERT INTO #Staging_Table
( Name, Purchases )
VALUES ( 'John', 'Bought a table' ),
( 'Jack', 'Sold a car' ),
( 'Mary', 'Returned a chair' )
DECLARE #HISTORY TABLE
(
ACCTID INT IDENTITY(1, 1) ,
NAME VARCHAR(50) ,
Item VARCHAR(255)
)
INSERT INTO #HISTORY
( Name, Item )
VALUES ( 'John', '' ),
( 'Jack', '' ),
( 'Mary', '' )
UPDATE L
SET L.ITEM = ( CASE WHEN R.PURCHASES LIKE '%table%'
THEN REPLACE(R.PURCHASES, 'Bought a ', '')
WHEN R.PURCHASES LIKE '%car%'
THEN REPLACE(R.PURCHASES, 'Sold a ', '')
END )
FROM #HISTORY AS L
JOIN #Staging_Table AS R ON L.NAME = R.NAME
WHERE ( R.PURCHASES LIKE '%table%'
OR R.PURCHASES LIKE '%car%'
)
SELECT *
FROM #HISTORY

Select value from an enumerated list in PostgreSQL

I want to select from an enumaration that is not in database.
E.g. SELECT id FROM my_table returns values like 1, 2, 3
I want to display 1 -> 'chocolate', 2 -> 'coconut', 3 -> 'pizza' etc. SELECT CASE works but is too complicated and hard to overview for many values. I think of something like
SELECT id, array['chocolate','coconut','pizza'][id] FROM my_table
But I couldn't succeed with arrays. Is there an easy solution? So this is a simple query, not a plpgsql script or something like that.

with food (fid, name) as (
values
(1, 'chocolate'),
(2, 'coconut'),
(3, 'pizza')
)
select t.id, f.name
from my_table t
join food f on f.fid = t.id;
or without a CTE (but using the same idea):
select t.id, f.name
from my_table t
join (
values
(1, 'chocolate'),
(2, 'coconut'),
(3, 'pizza')
) f (fid, name) on f.fid = t.id;

This is the correct syntax:
SELECT id, (array['chocolate','coconut','pizza'])[id] FROM my_table
But you should create a referenced table with those values.

What about creating another table that enumerate all cases, and do join ?
CREATE TABLE table_case
(
case_id bigserial NOT NULL,
case_name character varying,
CONSTRAINT table_case_pkey PRIMARY KEY (case_id)
)
WITH (
OIDS=FALSE
);
and when you select from your table:
SELECT id, case_name FROM my_table
inner join table_case on case_id=my_table_id;

PostgreSQL join to denormalize a table with generate_series

I've this table:
CREATE TABLE "mytable"
( name text, count integer );
INSERT INTO mytable VALUES ('john', 4),('mark',2),('albert',3);
and I would like "denormlize" the rows in this way:
SELECT name FROM mytable JOIN generate_series(1,4) tmp(a) ON (a<=count)
so I've a number of rows for each name equals to the count column: I've 4 rows with john, 2 with mark and 3 with albert.
But i can't use the generate_series() function if I don't know the highest count (in this case 4). There is a way to do this without knowing the MAX(count) ?

select name,
generate_series(1,count)
from mytable;
Set returning functions can be used in the select list and will do a cross join with the row retrieved from the base table.
I think this is an undocumented behaviour that might go away in the future, but I'm not sure about that (I recall some discussion regarding this on the mailing list)
SQLFiddle example

DROP TABLE ztable ;
CREATE TABLE ztable (zname varchar, zvalue INTEGER NOT NULL);
INSERT INTO ztable(zname, zvalue) VALUES( 'one', 1), ( 'two', 2 ), ( 'three', 3) , ( 'four', 4 );
WITH expand AS (
WITH RECURSIVE zzz AS (
SELECT 1::integer AS rnk , t0.zname
FROM ztable t0
UNION
SELECT 1+rr.rnk , t1.zname
FROM ztable t1
JOIN zzz rr ON rr.rnk < t1.zvalue
)
SELECT zzz.zname
FROM zzz
)
SELECT x.*
FROM expand x
;

Selecting min value of a group with id

Consider the following code:
DECLARE #table AS TABLE
(
id INT IDENTITY ,
DATA VARCHAR(100) ,
code CHAR(1)
)
INSERT INTO #table
( data, code )
VALUES ( 'xasdf', 'a' ),
( 'aasdf', 'a' ),
( 'basdf', 'a' ),
( 'casdf', 'b' ),
( 'Casdf', 'c' ),
( NULL, NULL )
I need to get a row with minimum data grouped by the code. Can I do this without nested queries?
Basically, what I want is something like this:
SELECT TOP ( 1 )
id ,
MIN(data)
FROM #table
GROUP BY code

SELECT * (SELECT
ROW_NUMBER() OVER(PARTITION BY code ORDER BY code DESC,data DESC) AS Row,
id,DATA,code
FROM #table) T where T.Row = 1

The short answer is NO. This is not possible.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

TSQL: How to concatenate string of GROUPED values - tsql

Related

WHERE in NOT EXISTS clause being ignored

TSQL Case WHEN LIKE REPLACE

Select value from an enumerated list in PostgreSQL

PostgreSQL join to denormalize a table with generate_series

Selecting min value of a group with id

Categories

Resources