T-SQL find differences - tsql

I found Jeff Smith's solution which is displaying differences between two tables:
SELECT MIN(TableName) as TableName, ID, COL1, COL2, COL3 ...
FROM
(
SELECT 'Table A' as TableName, A.ID, A.COL1, A.COL2, A.COL3, ...
FROM A
UNION ALL
SELECT 'Table B' as TableName, B.ID, B.COL1, B.COl2, B.COL3, ...
FROM B
) tmp
GROUP BY ID, COL1, COL2, COL3 ...
HAVING COUNT(*) = 1
ORDER BY ID
In my project I need to compare eg. col1 and col2 only, rest is used for another operations.
I tried to use
HAVING (COUNT(col1) = 1 and COUNT(col2) = 1)
but with no effect.
Could you please ptovide me solution which will do that?

Get the values of COL1 and COL2 in A that do not exist in B using EXCEPT:
SELECT COL1, COL2 FROM A
EXCEPT
SELECT COL1, COL2 FROM B
Use the results as a derived table to join them back to A and get all the columns:
SELECT 'A' AS SRC, A.COL1, A.COL2, A.COL3...
FROM (
SELECT COL1, COL2 FROM A
EXCEPT
SELECT COL1, COL2 FROM B
) AS diff
INNER JOIN A ON diff.COL1 = A.COL1 AND diff.COL2 = A.COL2
Similarly, use EXCEPT to get the values of COL1 and COL2 that exist only in B, and join the resulting set to B obtain complete rows accordingly.
Combine the two sets with UNION ALL:
SELECT 'A' AS SRC, A.COL1, A.COL2, A.COL3...
FROM (
SELECT COL1, COL2 FROM A
EXCEPT
SELECT COL1, COL2 FROM B
) AS diff
INNER JOIN A ON diff.COL1 = A.COL1 AND diff.COL2 = A.COL2
UNION ALL
SELECT 'B' AS SRC, B.COL1, B.COL2, B.COL3...
FROM (
SELECT COL1, COL2 FROM B
EXCEPT
SELECT COL1, COL2 FROM A
) AS diff
INNER JOIN B ON diff.COL1 = B.COL1 AND diff.COL2 = B.COL2
;

You are dropping the columns from the wrong place. You should drop it from the lists of columns instead of from the star:
SELECT MIN(TableName) as TableName, ID, COL1, COL2
FROM
(
SELECT 'Table A' as TableName, A.ID, A.COL1, A.COL2
FROM A
UNION ALL
SELECT 'Table B' as TableName, B.ID, B.COL1, B.COl2
FROM B
) tmp
GROUP BY ID, COL1, COL2
HAVING COUNT(*) = 1
ORDER BY ID
To keep the other columns in the result, you can use MIN (or friends) to keep them:
SELECT MIN(TableName) as TableName, ID, COL1, COL2, MIN(COL3), MIN(COL4), ...
FROM
(
SELECT 'Table A' as TableName, A.ID, A.COL1, A.COL2, A.COL3, A.COL4, ...
FROM A
UNION ALL
SELECT 'Table B' as TableName, B.ID, B.COL1, B.COL2, B.COL3, B.COL4, ...
FROM B
) tmp
GROUP BY ID, COL1, COL2
HAVING COUNT(*) = 1
ORDER BY ID
Note that this doesn't work very well for certain situations. If two rows are identical in the two tables (including IDs), then it will find it as a difference even though it's not. Also, in this version, if you have multiple rows where COL1 and COL2 are the same, then this doesn't work well either. I would join the two tables together for a more robust comparison.

Related

select distinct values in multiple column and save in common column with column tags

I have a table in postgres with two columns:
col1 col2
a a
b c
d e
f f
I would like to have distinct on the two columns and make one column and later assign the tag of column name from where it is coming. The desired output is:
col source
a col1, col2
b col1
c col1
d col1
e col1
f col1, col2
I am able to find distinct in individual columns but not able to make a single column and add label source.
below is the query i am using:
select distinct on (col1, col2) col1, col2 from table
Any suggestions would be really helpful.
You can un-pivot the columns and the aggregate them back:
select u.value, string_agg(distinct u.source, ',' order by u.source)
from data
cross join lateral (
values('col1', col1), ('col2', col2)
)as u(source,value)
group by u.value
order by u.value;
Online example
Alternatively, if you don't want to list each column, you can convert the row to a JSON value and then un-pivot that:
select x.value, string_agg(distinct x.source, ',' order by x.source)
from data d
cross join lateral jsonb_each_text(to_jsonb(d)) as x(source, value)
group by x.value
order by x.value;

Postgresql WITH-clause: sequentially or nested

I must select CTE by the value of the window function. How, in terms of CPU, memory, choose? sequentially?
WITH t1 AS(
SELECT col1, col2, row_number() OVER(PARTITION BY col1, col2 ORDER BY col3) rn
FROM tbl
),
t2 AS(
SELECT col1, col2
FROM t1
WHERE rn = 1
)
SELECT *
FROM t2;
or nested
WITH t2 AS(
WITH t1 AS(
SELECT col1, col2, row_number() OVER(PARTITION BY col1, col2 ORDER BY col3) rn
FROM tbl
)
SELECT col1, col2
FROM t1
WHERE rn = 1
)
SELECT *
FROM t2;
EXPLAIN ANALYZE is not clear. WHERE rn = 1 in a separate CTE, because this data will be needed in several CTEs. By the execution time is almost no different.

Sort two csv fields by removing duplicates and without row-by-row processing

I am trying to combine two csv fields, eliminate duplicates, sort and store it in a new field.
I was able to achieve this. However, I encountered a scenario where the values are like abc and abc*. I need to keep the one with abc* and remove the other.
Could this be achieved without row by row processing?
Here is what I have.
CREATE TABLE csv_test
(
Col1 VARCHAR(100),
Col2 VARCHAR(100),
Col3 VARCHAR(500)
);
INSERT dbo.csv_test (Col1, Col2)
VALUES ('xyz,def,abc', 'abc*,tuv,def,xyz*,abc'), ('qwe,bca,a23', 'qwe,bca,a23*,abc')
--It is assumed that there are no spaces around commas
SELECT Col1, Col2, Col1 + ',' + Col2 AS Combined_NonUnique_Unsorted,
STUFF((
SELECT ',' + Item
FROM (SELECT DISTINCT Item FROM dbo.DelimitedSplit8K(Col1 + ',' + Col2,',')) t
ORDER BY Item
FOR XML PATH('')
),1,1,'') Combined_Unique_Sorted
, ExpectedResult = 'Keep the one with * and make it unique'
FROM dbo.csv_test;
--Expected Results; if there are values like abc and abc* ; I need to keep abc* and remove abc ;
--How can I achieve this without looping or using temp tables?
abc,abc*,def,tuv,xyz,xyz* -> abc*,def,tuv,xyz*
a23,a23*,abc,bca,qwe -> a23*,abc,bca,qwe
Well, since you agree that normalizing the database is the correct thing to do, I decided to try to come up with a solution for you.
I ended up with quite a cumbersome solution involving 4(!) common table expressions - cumbersome, but it works.
The first cte is to add a row identifier missing from your table - I've used ROW_NUMBER() OVER(ORDER BY Col1, Col2) for that.
The second cte is to get a unique set of values from combining both csv columns. Note that this does not handle the * part yet.
The third cte is handling the * issue.
And finally, the fourth cte is putting all the unique items back into a single csv. (I could do it in the third cte but I wanted to have each cte responsible of a single part of the solution - it's much more readable.)
Now all that's left is to update the first cte's Col3 with the fourth cte's Combined_Unique_Sorted:
;WITH cte1 as
(
SELECT Col1,
Col2,
Col3,
ROW_NUMBER() OVER(ORDER BY Col1, Col2) As rn
FROM dbo.csv_test
), cte2 as
(
SELECT rn, Item
FROM cte1
CROSS APPLY
(
SELECT DISTINCT Item
FROM dbo.DelimitedSplit8K(Col1 +','+ Col2, ',')
) x
), cte3 AS
(
SELECT rn, Item
FROM cte2 t0
WHERE NOT EXISTS
(
SELECT 1
FROM cte2 t1
WHERE t0.Item + '*' = t1.Item
AND t0.rn = t1.rn
)
), cte4 AS
(
SELECT rn,
STUFF
((
SELECT ',' + Item
FROM cte3 t1
WHERE t1.rn = t0.rn
ORDER BY Item
FOR XML PATH('')
), 1, 1, '') Combined_Unique_Sorted
FROM cte3 t0
)
UPDATE t0
SET Col3 = Combined_Unique_Sorted
FROM cte1 t0
INNER JOIN cte4 t1 ON t0.rn = t1.rn
To verify the results:
SELECT *
FROM csv_test
ORDER BY Col1, Col2
Results:
Col1 Col2 Col3
qwe,bca,a23 qwe,bca,a23*,abc a23*,abc,bca,qwe
xyz,def,abc abc*,tuv,def,xyz*,abc abc*,def,tuv,xyz*
You can see a live demo on rextester.

postgres output query within with clause

I'm trying to get the output of queries within the with clause of my final query as csv or some sort of text files. I only have query access, I'm not allowed to create tables for this database. I have a set of queries that do some calculations on a data set, another set of queries that compute on the previous set and yet another that calculates on the final set. I don't want to run all of it as three seperate queries because the results from the first two are actually in the last one.
WITH
Q1 AS(
SELECT col1, col2, col3, col4, col5, col6, col7
FROM table1
),
Q2 AS(
SELECT AVG(col1) as col1Avg, MAX(col1) as col1Max, col2, col3,col4
FROm Q1
GROUP BY col2, col3, col4
)
SELECT
AVG(col1AVG), col3
FROM
Q2
GROUP BY col3
I would like the results from Q1, Q2 and the final select statement as preferably 3 csv files but I could live with all of it in one csv file. Is this possible?
Thanks!
Edit: Just to clarify, the columns from the queries are very different. I'm definitely pulling more columns from my first query than my second. I've edited the above code a bit to make this more clear.
To combine all the results together you'd use UNION ALL, but the number and data types of the columns must match.
select col1, col2, col2
from blah
union all
select col1, col2, col2
from blah2
union all
... etc
You can reference CTE's in there of course ...
with
cte_1 as (
select ... from ...),
cte_2 as (
select ... from ... cte_1),
cte_3 as (
select ... from ... cte_2)
select col1, col2, col2
from cte_1
union all
select col1, col2, col2
from cte_2
union all
select col1, col2, col2
from cte_3
If your final output is a csv then it looks like you have multiple row formats in there -- checksums? If so, in the queries that you union all together you might like to combine all the columns from each query into one string ...
with
cte_1 as (
select ... from ...),
cte_2 as (
select ... from ... cte_1),
cte_3 as (
select ... from ... cte_2)
select col1||','||col2||','||col2
from cte_1
union all
select col1||','||col2
from cte_2
union all
select col1
from cte_3

How to filter sql duplicates?

My question: I want the records without duplicate, in the same table and in multiple tables? How can I proceed to do this in SQL?
Let me explain what I have tried:
Select distinct Col1, col2
from Table
where order id = 143
Output
VolumeAnswer1 AreaAnswer1 heightAnswer1
VolumeAnswer2 AreaAnswer1 heightAnswer2
VolumeAnswer3 AreaAnswer1 heightAnswer2
Expected Output
It shows the duplicate for the second table, but I need the output to be like:
VolumeAnswer1 AreaAnswer1 heightAnswer1
VolumeAnswer2 heightAnswer2
VolumeAnswer3
I need the same scenario for multiple tables, same duplicate I found for joins also. If it cannot be handled in SQL Server, how can we handle it in .Net? I used multiple select but they used to change it in single select. Each and every column should bind in dropdownlist...
Something like this might be a good place to start:
;with cte1 as (
Select col1, cnt1
From (
Select
col1
,row_number() over(Partition by col1 Order by col1) as cnt1
From tbltest) as tbl_sub1
Where cnt1 = 1
), cte2 as (
Select col2, cnt2
From (
Select
col2
,row_number() over(Partition by col2 Order by col2) as cnt2
From tbltest) as tbl_sub2
Where cnt2 = 1
), cte3 as (
Select col3, cnt3
From (
Select
col3
,row_number() over(Partition by col3 Order by col3) as cnt3
From tbltest) as tbl_sub3
Where cnt3 = 1
)
Select
col1, col2, col3
From cte1
full join cte2 on col1 = col2
full join cte3 on col1 = col3
Sql Fiddle showing example: http://sqlfiddle.com/#!3/c9127/1