Learning Pivoting in TSQL

Learning Pivoting in TSQL - tsql

I feel that this should be simple, but all the pivots I find seem to be more complicated than what I am looking for, so any help or re-direction would be much appreciated.
I have ‘ID_code’ and ‘product_name’ and I am looking for mismatched product names and have them put next to each other in a row as opposed to in a column like this:
Select distinct ID_Code, product_name
From table
Where ID_Code in
(Select ID_Code from table
Group by ID_Code
Having count(distinct product_name) <> 1)
I would like a table set out as
ID_Code Product_name1 Product_name2 Product_name3
Thanks very much, and have a Happy New Year!

This should remove the duplicates but still returns one result if the product_name has a match.
;with testdata as(
SELECT '1' as ID_Code, 'bike' as product_name
UNION ALL SELECT '1', 'biker'
UNION ALL SELECT '1', 'bike'
UNION ALL SELECT '2', 'motorbike'
UNION ALL SELECT '2', 'motorbike'
UNION ALL SELECT '2', 'motorbike'
UNION ALL SELECT '2', 'motrbike'
UNION ALL SELECT '2', 'motorbiker'
)
--added this section to return distinct products
,cte as(
SELECT * FROM testdata d1
INTERSECT
SELECT * FROM testdata d2
)
SELECT --DISTINCT --Use DISTINCT here if need to return just one line per ID_Code
ID_Code
,product_name = STUFF((SELECT ', ' +
--Added this to track product_names for each ID_Code
t2.product_name + '_' + cast(ROW_NUMBER() OVER (PARTITION BY ID_Code ORDER BY product_name) as varchar(100))
FROM cte t2
WHERE t2.ID_Code = cte.ID_Code
FOR XML PATH('')), 1, 2, '')
FROM cte
Example here: db<>fiddle
More info about INTERSECT should this not be what works in this scenario.

Your expected output appears to be somewhat inflexible, because we may not know exactly how many columns/products would be needed. Instead, I recommend and rolling up the mismatched products into a CSV string for output.
SELECT
ID_Code,
STUFF((SELECT ',' + t2.product_name
FROM yourTable t2
WHERE t1.ID_Code = t2.ID_Code
FOR XML PATH('')), 1, 1, '') products
FROM your_table t1
GROUP BY
ID_Code
HAVING
MIN(product_name) <> MAX(product_name); -- index friendly
Demo

Related

T-SQL - Pivot/Crosstab - variable number of values

I have a simple data set that looks like this:
Name Code
A A-One
A A-Two
B B-One
C C-One
C C-Two
C C-Three
I want to output it so it looks like this:
Name Code1 Code2 Code3 Code4 Code...n ...
A A-One A-Two
B B-One
C C-One C-Two C-Three
For each of the 'Name' values, there can be an undetermined number of 'Code' values.
I have been looking at various examples of Pivot SQL [including simple Pivot sql and sql using the XML function?] but I have not been able to figure this out - or to understand if it is even possible.
I would appreciate any help or pointers.
Thanks!

Try it like this:
DECLARE #tbl TABLE([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO #tbl VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
SELECT p.*
FROM
(
SELECT *
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM #tbl
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (Code1,Code2,Code3,Code4,Code5 /*add as many as you need*/)
)p;
This line
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
will use a partitioned ROW_NUMBER in order to create numbered column names per code. The rest is simple PIVOT...
UPDATE: A dynamic approach to reflect the max amount of codes per group
CREATE TABLE TblTest([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO TblTest VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
DECLARE #cols VARCHAR(MAX);
WITH GetMaxCount(mc) AS(SELECT TOP 1 COUNT([Code]) FROM TblTest GROUP BY [Name] ORDER BY COUNT([Code]) DESC)
SELECT #cols=STUFF(
(
SELECT CONCAT(',Code',Nmbr)
FROM
(SELECT TOP((SELECT mc FROM GetMaxCount)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) t(Nmbr)
FOR XML PATH('')
),1,1,'');
DECLARE #sql VARCHAR(MAX)=
'SELECT p.*
FROM
(
SELECT *
,CONCAT(''Code'',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM TblTest
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (' + #cols + ')
)p;';
EXEC(#sql);
GO
DROP TABLE TblTest;
As you can see, the only part which will change in order to reflect the actual amount of columns is the list in PIVOTs IN() clause.
You can create a string, which looks like Code1,Code2,Code3,...CodeN and build the statement dynamically. This can be triggered with EXEC().
I'd prefer the first approach. Dynamically created SQL is very mighty, but can be a pain in the neck too...

Sort two csv fields by removing duplicates and without row-by-row processing

I am trying to combine two csv fields, eliminate duplicates, sort and store it in a new field.
I was able to achieve this. However, I encountered a scenario where the values are like abc and abc*. I need to keep the one with abc* and remove the other.
Could this be achieved without row by row processing?
Here is what I have.
CREATE TABLE csv_test
(
Col1 VARCHAR(100),
Col2 VARCHAR(100),
Col3 VARCHAR(500)
);
INSERT dbo.csv_test (Col1, Col2)
VALUES ('xyz,def,abc', 'abc*,tuv,def,xyz*,abc'), ('qwe,bca,a23', 'qwe,bca,a23*,abc')
--It is assumed that there are no spaces around commas
SELECT Col1, Col2, Col1 + ',' + Col2 AS Combined_NonUnique_Unsorted,
STUFF((
SELECT ',' + Item
FROM (SELECT DISTINCT Item FROM dbo.DelimitedSplit8K(Col1 + ',' + Col2,',')) t
ORDER BY Item
FOR XML PATH('')
),1,1,'') Combined_Unique_Sorted
, ExpectedResult = 'Keep the one with * and make it unique'
FROM dbo.csv_test;
--Expected Results; if there are values like abc and abc* ; I need to keep abc* and remove abc ;
--How can I achieve this without looping or using temp tables?
abc,abc*,def,tuv,xyz,xyz* -> abc*,def,tuv,xyz*
a23,a23*,abc,bca,qwe -> a23*,abc,bca,qwe

Well, since you agree that normalizing the database is the correct thing to do, I decided to try to come up with a solution for you.
I ended up with quite a cumbersome solution involving 4(!) common table expressions - cumbersome, but it works.
The first cte is to add a row identifier missing from your table - I've used ROW_NUMBER() OVER(ORDER BY Col1, Col2) for that.
The second cte is to get a unique set of values from combining both csv columns. Note that this does not handle the * part yet.
The third cte is handling the * issue.
And finally, the fourth cte is putting all the unique items back into a single csv. (I could do it in the third cte but I wanted to have each cte responsible of a single part of the solution - it's much more readable.)
Now all that's left is to update the first cte's Col3 with the fourth cte's Combined_Unique_Sorted:
;WITH cte1 as
(
SELECT Col1,
Col2,
Col3,
ROW_NUMBER() OVER(ORDER BY Col1, Col2) As rn
FROM dbo.csv_test
), cte2 as
(
SELECT rn, Item
FROM cte1
CROSS APPLY
(
SELECT DISTINCT Item
FROM dbo.DelimitedSplit8K(Col1 +','+ Col2, ',')
) x
), cte3 AS
(
SELECT rn, Item
FROM cte2 t0
WHERE NOT EXISTS
(
SELECT 1
FROM cte2 t1
WHERE t0.Item + '*' = t1.Item
AND t0.rn = t1.rn
)
), cte4 AS
(
SELECT rn,
STUFF
((
SELECT ',' + Item
FROM cte3 t1
WHERE t1.rn = t0.rn
ORDER BY Item
FOR XML PATH('')
), 1, 1, '') Combined_Unique_Sorted
FROM cte3 t0
)
UPDATE t0
SET Col3 = Combined_Unique_Sorted
FROM cte1 t0
INNER JOIN cte4 t1 ON t0.rn = t1.rn
To verify the results:
SELECT *
FROM csv_test
ORDER BY Col1, Col2
Results:
Col1 Col2 Col3
qwe,bca,a23 qwe,bca,a23*,abc a23*,abc,bca,qwe
xyz,def,abc abc*,tuv,def,xyz*,abc abc*,def,tuv,xyz*
You can see a live demo on rextester.

sql recursive within a range

I have a sql query to form a parent/child structure to a tree-like view, the outcome is like this:
lvl1a
lvl1a/lvl2a
lvl1a/lvl2b
lvl1b/lvl2a/lvl3a
lvl1c
lvl1d/lvl2a/lvl3a/lvl4a
...
the query itself doesn't have a limited range, for instance, if i only want to get this tree-like view for the first and second level
can someone modify the sql query to add such function? tks
;with cte as
(
select
labelID,
Title,
ParentLevel,
cast(Title as varchar(max)) as [treePath]
from TestTable
where ParentLevel = 0
union all
select
t.labelID,
t.Title,
t.ParentLevel,
[treePath] + '/' + cast(t.Title as varchar(255))
from
cte
join TestTablet on cte.labelID = t.ParentLevel
)
select
labelID,
Title,
ParentLevel,
[treePath]
from cte
order by treePath

All we did here was add lvl 0 for the first part of the union in the CTE
then increment it by 1 each time the recursion occurs (after the union all)
then add a where clause to the select to eliminate levels beyond 2.
Though I find it odd this works since t isn't aliased in your code...
.
;with cte as
(
select
labelID,
Title,
ParentLevel,
cast(Title as varchar(max)) as [treePath],
0 as lvl
from TestTable
where ParentLevel = 0
union all
select
t.labelID,
t.Title,
t.ParentLevel,
[treePath] + '/' + cast(t.Title as varchar(255)),
cte.lvl+1 as lvl
from
cte
join TestTablet t on cte.labelID = t.ParentLevel
)
select
labelID,
Title,
ParentLevel,
[treePath]
from cte
where lvl <=2
order by treePath

SQL Server SUM() for DISTINCT records

I have a field called "Users", and I want to run SUM() on that field that returns the sum of all DISTINCT records. I thought that this would work:
SELECT SUM(DISTINCT table_name.users)
FROM table_name
But it's not selecting DISTINCT records, it's just running as if I had run SUM(table_name.users).
What would I have to do to add only the distinct records from this field?

Use count()
SELECT count(DISTINCT table_name.users)
FROM table_name
SQLFiddle demo

This code seems to indicate sum(distinct ) and sum() return different values.
with t as (
select 1 as a
union all
select '1'
union all
select '2'
union all
select '4'
)
select sum(distinct a) as DistinctSum, sum(a) as allSum, count(distinct a) as distinctCount, count(a) as allCount from t
Do you actually have non-distinct values?
select count(1), users
from table_name
group by users
having count(1) > 1
If not, the sums will be identical.

You can see for yourself that distinct works with the following example. Here I create a subquery with duplicate values, then I do a sum distinct on those values.
select DistinctSum=sum(distinct x), RegularSum=Sum(x)
from
(
select x=1
union All
select 1
union All
select 2
union All
select 2
) x
You can see that the distinct sum column returns 3 and the regular sum returns 6 in this example.

You can use a sub-query:
select sum(users)
from (select distinct users from table_name);

SUM(DISTINCTROW table_name.something)
It worked for me (innodb).
Description - "DISTINCTROW omits data based on entire duplicate records, not just duplicate fields." http://office.microsoft.com/en-001/access-help/all-distinct-distinctrow-top-predicates-HA001231351.aspx

;WITH cte
as
(
SELECT table_name.users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM table_name
)
SELECT SUM(users)
FROM cte
WHERE rn = 1
SQL Fiddle
Try here yourself
TEST
DECLARE #table_name Table (Users INT );
INSERT INTO #table_name Values (1),(1),(1),(3),(3),(5),(5);
;WITH cte
as
(
SELECT users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM #table_name
)
SELECT SUM(users) DisSum
FROM cte
WHERE rn = 1
Result
DisSum
9

If circumstances make it difficult to weave a "distinct" into the sum clause, it will usually be possible to add an extra "where" clause to the entire query - something like:
select sum(t.ColToSum)
from SomeTable t
where (select count(*) from SomeTable t1 where t1.ColToSum = t.ColToSum and t1.ID < t.ID) = 0

May be a duplicate to
Trying to sum distinct values SQL
As per Declan_K's answer:
Get the distinct list first...
SELECT SUM(SQ.COST)
FROM
(SELECT DISTINCT [Tracking #] as TRACK,[Ship Cost] as COST FROM YourTable) SQ

how to convert single line SQL result into multiple rows?

I am developing a T-SQL query in SSMS 2008 R2 which returns one line only. But the problem is that in this one line there are four fields which I instead want to be unique rows. For example, my output line looks like:
Col. 1 Col. 2 Col. 3 Col. 4
xxxx yyyy zzzz aaaa
Instead, I want this to look like:
Question Answer
Col. 1 xxxx
Col. 2 yyyy
Col. 3 zzzz
Col. 4 aaaa
I have tried using the UNPIVOT operator for this, but it is not doing the above. How can I achieve this?

You should be able to use UNPIVOT for this:
Here is a static pivot where you hard code in the values of the columns:
create table t1
(
col1 varchar(5),
col2 varchar(5),
col3 varchar(5),
col4 varchar(5)
)
insert into t1 values ('xxxx', 'yyyy', 'zzzz', 'aaaa')
select question, answer
FROM t1
unpivot
(
answer
for question in (col1, col2, col3, col4)
) u
drop table t1
Here is a SQL Fiddle with a demo.
but you can also use a Dynamic Unpivot:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
select #cols = stuff((select ','+quotename(C.name)
from sys.columns as C
where C.object_id = object_id('t1') and
C.name like 'Col%'
for xml path('')), 1, 1, '')
set #query = 'SELECT question, answer
from t1
unpivot
(
answer
for question in (' + #cols + ')
) p '
execute(#query)

This is from my data names but it is tested
select 'sID', sID as 'val'
from [CSdemo01].[dbo].[docSVsys]
where sID = 247
union
select 'sParID', sParID as 'val'
from [CSdemo01].[dbo].[docSVsys]
where sID = 247 ;
But UNPIVOT should work

UNION-ing together your four questions would look like:
SELECT 'column1' AS Question, MAX(column1) AS Answer UNION
SELECT 'column2' , MAX(column2) UNION
SELECT 'column3' , MAX(column3) UNION
SELECT 'column4' , MAX(column4)
(obv just using the MAX as an example)