T-SQL GROUP BY over a dynamic list - tsql

I have a table with an XML type column. This column contains a dynamic list of attributes that may be different between records.
I am trying to GROUP BY COUNT over these attributes without having to go through the table separately for each attribute.
For example, one record could have attributes A, B and C and the other would have B, C, D then, when I do the GROUP BY COUNT I would get A = 1, B = 2, C = 2 and D = 1.
Is there any straightforward way to do this?
EDIT in reply to Andrew's answer:
Because my knowledge of this construct is superficial at best I had to fiddle with it to get it to do what I want. In my actual code I needed to group by the TimeRange, as well as only select some attributes depending on their name. I am pasting the actual query below:
WITH attributes AS (
SELECT
Timestamp,
N.a.value('#name[1]', 'nvarchar(max)') AS AttributeName,
N.a.value('(.)[1]', 'nvarchar(max)') AS AttributeValue
FROM MyTable
CROSS APPLY AttributesXml.nodes('/Attributes/Attribute') AS N(a)
)
SELECT Datepart(dy, Timestamp), AttributeValue, COUNT(AttributeValue)
FROM attributes
WHERE AttributeName IN ('AttributeA', 'AttributeB')
GROUP BY Datepart(dy, Timestamp), AttributeValue
As a side-note: Is there any way to reduce this further?

WITH attributes AS (
SELECT a.value('(.)[1]', 'nvarchar(max)') AS attribute
FROM YourTable
CROSS APPLY YourXMLColumn.nodes('//path/to/attributes') AS N(a)
)
SELECT attribute, COUNT(attribute)
FROM attributes
GROUP BY attribute
CROSS APPLY is like being able to JOIN the xml as a table. The WITH is needed because you can't have xml methods in a group clause.

Here is a way to get the attribute data into a way that you can easily work with it and reduce the number of times you need to go through the main table.
--create test data
declare #tmp table (
field1 varchar(20),
field2 varchar(20),
field3 varchar(20))
insert into #tmp (field1, field2, field3)
values ('A', 'B', 'C'),
('B', 'C', 'D')
--convert the individual fields from seperate columns to one column
declare #table table(
field varchar(20))
insert into #table (field)
select field1 from #tmp
union all
select field2 from #tmp
union all
select field3 from #tmp
--run the group by and get the count
select field, count(*)
from #table
group by field

Related

T-SQL - Pivot/Crosstab - variable number of values

I have a simple data set that looks like this:
Name Code
A A-One
A A-Two
B B-One
C C-One
C C-Two
C C-Three
I want to output it so it looks like this:
Name Code1 Code2 Code3 Code4 Code...n ...
A A-One A-Two
B B-One
C C-One C-Two C-Three
For each of the 'Name' values, there can be an undetermined number of 'Code' values.
I have been looking at various examples of Pivot SQL [including simple Pivot sql and sql using the XML function?] but I have not been able to figure this out - or to understand if it is even possible.
I would appreciate any help or pointers.
Thanks!
Try it like this:
DECLARE #tbl TABLE([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO #tbl VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
SELECT p.*
FROM
(
SELECT *
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM #tbl
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (Code1,Code2,Code3,Code4,Code5 /*add as many as you need*/)
)p;
This line
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
will use a partitioned ROW_NUMBER in order to create numbered column names per code. The rest is simple PIVOT...
UPDATE: A dynamic approach to reflect the max amount of codes per group
CREATE TABLE TblTest([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO TblTest VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
DECLARE #cols VARCHAR(MAX);
WITH GetMaxCount(mc) AS(SELECT TOP 1 COUNT([Code]) FROM TblTest GROUP BY [Name] ORDER BY COUNT([Code]) DESC)
SELECT #cols=STUFF(
(
SELECT CONCAT(',Code',Nmbr)
FROM
(SELECT TOP((SELECT mc FROM GetMaxCount)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) t(Nmbr)
FOR XML PATH('')
),1,1,'');
DECLARE #sql VARCHAR(MAX)=
'SELECT p.*
FROM
(
SELECT *
,CONCAT(''Code'',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM TblTest
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (' + #cols + ')
)p;';
EXEC(#sql);
GO
DROP TABLE TblTest;
As you can see, the only part which will change in order to reflect the actual amount of columns is the list in PIVOTs IN() clause.
You can create a string, which looks like Code1,Code2,Code3,...CodeN and build the statement dynamically. This can be triggered with EXEC().
I'd prefer the first approach. Dynamically created SQL is very mighty, but can be a pain in the neck too...

Copy rows into same table, but change value of one field

I have a list of values:
(56957,85697,56325,45698,21367,56397,14758,39656)
and a 'template' row in a table.
I want to do this:
for value in valuelist:
{
insert into table1 (field1, field2, field3, field4)
select value1, value2, value3, (value)
from table1
where ID = (ID of template row)
}
I know how I would do this in code, like c# for instance, but I'm not sure how to 'loop' this while passing in a new value to the insert statement. (i know that code makes no sense, just trying to convey what I'm trying to accomplish.
There is no need to loop here, SQL is a set based language and you apply your operations to entire sets of data all at once as opposed to looping through row by row.
insert statements can come from either an explicit list of values or from the result of a regular select statement, for example:
insert into table1(col1, col2)
select col3
,col4
from table2;
There is nothing stopping you selecting your data from the same place you are inserting to, which will duplicate all your data:
insert into table1(col1, col2)
select col1
,col2
from table1;
If you want to edit one of these column values - say by incrementing the value currently held, you simply apply this logic to your select statement and make sure the resultant dataset matches your target table in number of columns and data types:
insert into table1(col1, col2)
select col1
,col2+1 as col2
from table1;
Optionally, if you only want to do this for a subset of those values, just add a standard where clause:
insert into table1(col1, col2)
select col1
,col2+1 as col2
from table1
where col1 = <your value>;
Now if this isn't enough for you to work it out by yourself, you can join your dataset to you values list to get a version of the data to be inserted for each value in that list. Because you want each row to join to each value, you can use a cross join:
declare #v table(value int);
insert into #v values(56957),(85697),(56325),(45698),(21367),(56397),(14758),(39656);
insert into table1(col1, col2, value)
select t.col1
,t.col2
,v.value
from table1 as t
cross join #v as v

Sum of a column within the subquery in Postgresql

I have a Postgresql table where I have 2 fields i.e. ID and Name ie column1 and column2 in the SQLFiddle. The default record_count I put for a particular ID is 1. I want to get the record_count for column 1 and sum that record_count by column1.
I tried to use this query but somehow its showing some error.
select sum(column_record) group by column_record ,
* from (select column1,1::int4 as column_record from test) a
Also find the Input/Output screenshot in the form of excel below :
SQL Fiddle for the same :
http://sqlfiddle.com/#!15/12fe9/1
If you're using a window function (you may want to use normal grouping, which is "a lot" more faster and performant), this is the way to do it:
-- create temp table test as (select * from (values ('a', 'b'), ('c', 'd')) a(column1, column2));
select sum(column_record) over (partition by column_record),
* from (select column1, 1::int4 as column_record from test) a;

PostgreSQL - How to get distinct on two columns separately?

I've a table like this:
Source table "tab"
column1 column2
x 1
x 2
y 1
y 2
y 3
z 3
How can I build the query to get result with unique values in each of two columns separately. For example I'd like to get a result like one of these sets:
column1 column2
x 1
y 2
z 3
or
column1 column2
x 2
y 1
z 3
or ...
Thanks.
What you're asking for is difficult because it's weird: SQL treats rows as related fields but you're asking to make two separate lists (distinct values from col1 and distinct values from col2) then display them in one output table not caring how the rows match up.
You can so this by writing the SQL along those lines. Write a separate select distinct for each column, then put them together somehow. I'd put them together by giving each row in each results a row number, then joining them both to a big list of numbers.
It's not clear what you want null to mean. Does it mean there's a null in one of the columns, or that there's not the same number of distinct values in each column? This one problem from asking for things that don't match up with typical relational logic.
Here's an example, removing the null value from the data since that confuses the issue, different data values to avoid confusing rowNumber with data and so there are 3 distinct values in one column and 4 in another. This works for SQL Server, presumably there's a variation for PostgreSQL.
if object_id('mytable') is not null drop table mytable;
create table mytable ( col1 nvarchar(10) null, col2 nvarchar(10) null)
insert into mytable
select 'x', 'a'
union all select 'x', 'b'
union all select 'y', 'c'
union all select 'y', 'b'
union all select 'y', 'd'
union all select 'z', 'a'
select c1.col1, c2.col2
from
-- derived table giving distinct values of col1 and a rownumber column
( select col1
, row_number() over (order by col1) as rowNumber
from ( select distinct col1 from mytable ) x ) as c1
full outer join
-- derived table giving distinct values of col2 and a rownumber column
( select col2
, row_number() over (order by col2) as rowNumber
from ( select distinct col2 from mytable ) x ) as c2
on c1.rowNumber = c2.rowNumber

Converting Traditional IF EXIST UPDATE ELSE INSERT into MERGE is not working?

I am going to use MERGE to insert or update a table depending upon ehether it's exist or not. This is my query,
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT id FROM #t WHERE ID = 2) t2 ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
The result is,
id name
1 a
I think it should be,
id name
1 a
2 b
You have your USING part slightly messed up, that's where to put what you want to match against (although in this case you're only using id)
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT 2, 'b') AS t2 (id, name) ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
As Mikhail pointed out, your query in the USING clause doesn't contain any rows.
If you want to do an upsert, put the new data into the USING clause:
MERGE INTO #t t1
USING (SELECT 2 as id, 'b' as name) t2 ON (t1.id = t2.id) --This no longer has an artificial dependency on #t
WHEN MATCHED THEN
UPDATE SET name = t2.name
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (t2.id, t2.name);
This query won't return anything:
SELECT id FROM #t WHERE ID = 2
Because where is no rows in table with ID = 2, so there is nothing to merge into table.
Besides, in MATCHED clause you are updating a field ID on which you are joining table, i think, it's forbidden.
For each DML operations you have to commit (Marks the end of a successful the transaction)Then only you will be able to see the latest data
For example :
GO
BEGIN TRANSACTION;
GO
DELETE FROM HumanResources.JobCandidate
WHERE JobCandidateID = 13;
GO
COMMIT TRANSACTION;
GO