Report duplicate data - sql-server-2008-r2

create table dupt(cat varchar(10), num int)
insert dupt(cat,num) values ('A',1),('A',2),('A',3),
('B',1),('B',2),
('C',1),('C',2), ('C',3),
('D',1),('D',2), ('D',4),
('E',1),('E',2),
('F',1),('F',2)
I need to create a report which finds out duplicate data. From the sample data above, report needs to show that data for cat A is duplicated by cat C (notice the num value and no. of records) and cat B is duplicated by cat E and F. What is the best way to show that?
Example output
-------------
|cat | dupby|
-------------
| A | C |
| B | E, F |
-------------

Updated: switched to traditional set matching using common table expression and the stuff() with select ... for xml path ('') method of string concatenation only on the final results:
;with cte as (
select *
, cnt = count(*) over (partition by cat)
from t
)
, duplicates as (
select
x.cat
, dup_cat = x2.cat
from cte as x
inner join cte as x2
on x.cat < x2.cat
and x.num = x2.num
and x.cnt = x2.cnt
group by x.cat, x2.cat, x.cnt
having count(*) = x.cnt
)
select
d.cat
, dupby = stuff((
select ', '+i.dup_cat
from duplicates i
where i.cat = d.cat
for xml path (''), type).value('.','varchar(8000)')
,1,2,'')
from duplicates d
where not exists (
select 1
from duplicates i
where d.cat = i.dup_cat
)
group by d.cat
rextester demo: http://rextester.com/KHAG98718
returns:
+-----+-------+
| cat | dupby |
+-----+-------+
| A | C |
| B | E, F |
+-----+-------+

Related

How to join two tables with nested field?

I have a table like this:
id | ciaps
1 | a|b|c
An have a second table like:
cod | desc
a | item a
b | item b
c | item c
I need a code to join this tables like:
id | ciaps
1 | item a|item b|item c
Use array_agg for concatenating string separated by '|' and convert it array_to_string to get the value expected format.
-- PostgreSQL (v11)
SELECT t1.id, t2.descr ciaps
FROM test1 t1
INNER JOIN (SELECT array_to_string(array_agg(cod), '|') cod
, array_to_string(array_agg(descr), '|') descr
FROM test2) t2
ON t1.ciaps = t2.cod;
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=6fffc7f1da6a02a48018b3691c99ad17

DB2: How to transpose mutlidimensional table from row to column to find data changes across rows

I am trying the following with Db2:
Problem
So I've got a table with 80+ columns and two rows.
I need to accomplish is checking what columns have changed value between the two rows, and return a table of the column names that have changed, their initial value from row1, and their new value from row2.
Approach so far
My initial idea was to perform a pivot of the two rows into two columns, row 1 as column 1, row 2 as column 2, then join a column of column names (likely taken from syscat.columns) to the table as column 3, at which point I can then do a select where column1 != column2, hence returning the rows with all the data needed. But alas, it was not long after coming up with this that I discover DB2 doesn't support pivot / unpivot...
Question
So is there any idea for how to accomplish this in DB2, taking a table with 80+ columns and two rows like so:
| Col A | Col B | Col C | ... | Col Z|
| ----- | ----- | ----- | --- | ---- |
| Val A | Val B | 123 | ... | 01/01/2021 |
| Val C | Val B | 124 | ... | 02/01/2021 |
And returning a table with the columns changed, their initial value, and their new value:
| Initial | New | ColName|
| ----- | ----- | ----- |
| Val A | Val C | Col A |
| 123 | 124 | Col C |
| 01/01/2021 | 02/01/2021 | Col Z |
Also note the column data types also vary, so will need to be converted to varchar
DB2 version is 11.1
EDIT: Also for reference as per comment request, this is code I attempted to use to achieve this goal:
WITH
INIT AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MIN(SOMEDATE) FROM TABLE),
LATE AS (SELECT * FROM TABLE WHERE SOMEDATE=(SELECT MAX(SOMEDATE) FROM TABLE),
COLS AS (SELECT COLNAME FROM SYSCAT.COLUMNS WHERE TABNAME='TABLE' ORDER BY COLNO)
SELECT * FROM (
SELECT
COLNAME AS ATTRIBUTE,
(SELECT COLNAME AS INITIAL FROM INIT),
(SELECT COLNAME AS NEW FROM LATE)
FROM
COLS
WHERE
(INITIAL != NEW) OR (INITIAL IS NULL AND NEW IS NOT NULL) OR (INITIAL IS NOT NULL AND NEW IS NULL));
Only issue with this one is that I couldn't figure how to use the values from the COLS table as the columns to be selected
You may easily generate text of the expressions needed, if you don't want to type them manually.
Consider the following example, if you want to print different column values only in 2 rows of the same quite a wide table SYSCAT.TABLES. We use the following query for such an expression generation.
SELECT
'DECODE(I.I, '
|| LISTAGG(COLNO || ', A.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS INITIAL' AS EXPR_INITIAL
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', B.' || COLNAME || CASE WHEN TYPENAME NOT LIKE '%CHAR%' AND TYPENAME NOT LIKE '%GRAPHIC' THEN '::VARCHAR(128)' ELSE '' END, ', ')
|| ') AS NEW' AS EXPR_NEW
, 'DECODE(I.I, '
|| LISTAGG(COLNO || ', ''' || COLNAME || '''', ', ')
|| ') AS COLNAME' AS EXPR_COLNAME
FROM SYSCAT.COLUMNS C
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB';
It doesn't matter how many columns the table contains. We just filter out the columns of *LOB types as an example. If you want them as well, you should change the ::VARCHAR(128) casting to some ::CLOB(XXX).
These 3 generated expressions we put to the corresponding places in the query below:
WITH MYTAB AS
(
-- We enumerate the rows to reference them later
SELECT ROWNUMBER() OVER () RN_, T.*
FROM SYSCAT.TABLES T
WHERE TABSCHEMA = 'SYSCAT'
FETCH FIRST 2 ROWS ONLY
)
SELECT *
FROM
(
SELECT
-- Place here the result got in the EXPR_INITIAL column
-- , Place here the result got in the EXPR_NEW column
-- , Place here the result got in the EXPR_COLNAME column
FROM MYTAB A, MYTAB B
,
(
SELECT COLNO AS I
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'SYSCAT' AND TABNAME = 'TABLES'
AND TYPENAME NOT LIKE '%LOB'
) I
WHERE A.RN_ = 1 AND B.RN_ = 2
)
WHERE INITIAL IS DISTINCT FROM NEW;
The result I got in my database:
|INITIAL |NEW |COLNAME |
|--------------------------|--------------------------|---------------|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|ALTER_TIME |
|26 |15 |COLCOUNT |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|CREATE_TIME |
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|INVALIDATE_TIME|
|2019-06-04-22.44.14.493001|2019-06-04-22.44.14.502001|LAST_REGEN_TIME|
|ATTRIBUTES |AUDITPOLICIES |TABNAME |

Postgres: how to find rows having duplicate values in fields

How can I find if any value exists more than once in one row? An example:
id | c1 | c2 | c3
----+----+----+----
1 | a | b | c
2 | a | a | b
3 | b | b | b
The query should return rows 2 and 3 since they have the same value more than once. The solution I'm looking for is not 'where c1 = c2 or c1 = c3 or c2 = c3' since there can be any number of columns in tables I need to test. All values are text but can be any length.
One way to do that is to convert the columns to rows:
select *
from the_table tt
where exists (select 1
from ( values (c1), (c2), (c3) ) as t(v)
group by v
having count(*) > 1)
If you want a dynamic solution where you don't have to list each column, you can do that by converting the row to a JSON value:
select *
from the_table tt
where exists (select 1
from jsonb_each_text(to_jsonb(tt)) as j(k,v)
group by v
having count(*) > 1)
Online example

Creating clusters of related columns

I have a table named Stores with columns:
StoreCode NVARCHAR(10),
OldStoreCode NVARCHAR(10)
Here is a sample of my data:
| StoreCode | OldStoreCode |
|-----------|--------------|
| A | B |
| B | A |
| D | E |
| E | F |
| M | K |
| J | K |
| K | L |
|-----------|--------------|
I want to create clusters of related Stores. Related store means there is a one way relation between StoreCodes and OldStoreCodes.
Expected result table:
| StoreCode | ClusterId |
|-----------|-----------|
| A | 1 |
| B | 1 |
| D | 2 |
| E | 2 |
| F | 2 |
| M | 3 |
| K | 3 |
| J | 3 |
| L | 3 |
|-----------|-----------|
There is no maximum number hops. There may be a StoreCode A which has a OldStoreCode B, which has a OldStoreCode C, which has a OldStoreCode D etc.
How can I cluster stores like this?
Try it like this:
EDIT: With changes by OP taken from comment
DECLARE #tbl TABLE(ID INT IDENTITY, StoreCode VARCHAR(100),OldStoreCode VARCHAR(100));
INSERT INTO #tbl VALUES
('A','B'),('B','A'),('D','E'),('E','F'),('M','K'),('J','K'),('K','L');
WITH Related AS
(
SELECT DISTINCT t1.ID,Val
FROM #tbl AS t1
INNER JOIN #tbl AS t2 ON t1.StoreCode=t2.StoreCode
OR t1.OldStoreCode=t2.OldStoreCode
OR t1.OldStoreCode=t2.StoreCode
OR t1.StoreCode=t2.OldStoreCode
CROSS APPLY(SELECT DISTINCT Val
FROM
(VALUES(t1.StoreCode),(t2.StoreCode),(t1.OldStoreCode),(t2.OldStoreCode)) AS A(Val)
) AS valsInCols
)
,ClusterKeys AS
(
SELECT r1.ID
,(
SELECT r2.Val AS [*]
FROM Related AS r2
WHERE r2.ID=r1.ID
ORDER BY r2.Val
FOR XML PATH('')
) AS ClusterKey
FROM Related AS r1
GROUP BY r1.ID
)
,ClusterIds AS
(
SELECT ClusterKey
,MIN(ID) AS ID
FROM ClusterKeys
GROUP BY ClusterKey
)
SELECT r.ID
,r.Val
FROM ClusterIds c
INNER JOIN Related r ON c.ID = r.ID
The result
ID Val
1 A
1 B
3 D
3 E
3 F
5 J
5 K
5 L
5 M
This should do it:
SAMPLE DATA:
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL
BEGIN
DROP TABLE #Temp1;
END;
CREATE TABLE #Temp1(StoreCode NVARCHAR(10)
, OldStoreCode NVARCHAR(10));
INSERT INTO #Temp1(StoreCode
, OldStoreCode)
VALUES
('A'
, 'B'),
('B'
, 'A'),
('D'
, 'E'),
('E'
, 'F'),
('M'
, 'K'),
('J'
, 'K'),
('K'
, 'L');
QUERY:
;WITH A -- get all distinct new and old storecodes
AS (
SELECT StoreCode
FROM #Temp1
UNION
SELECT OldStoreCode
FROM #Temp1),
B -- give a unique number id to each store code
AS (SELECT rn = RANK() OVER(ORDER BY StoreCode)
, StoreCode
FROM A),
C -- combine the store codes and the unique number id's in one table
AS (SELECT b2.rn AS StoreCodeID
, t.StoreCode
, b1.rn AS OldStoreCodeId
, t.OldStoreCode
FROM #Temp1 AS t
LEFT OUTER JOIN B AS b1 ON t.OldStoreCode = b1.StoreCode
LEFT OUTER JOIN B AS b2 ON t.StoreCode = b2.StoreCode),
D -- assign a row number for each entry in the data set
AS (SELECT rn = RANK() OVER(ORDER BY StoreCode)
, *
FROM C),
E -- derive first and last store in the path
AS (SELECT FirstStore = d2.StoreCode
, LastStore = d1.OldStoreCode
, GroupID = d1.OldStoreCodeId
FROM D AS d1
RIGHT OUTER JOIN D AS d2 ON d1.StoreCodeID = d2.OldStoreCodeId
AND d1.rn - 1 = d2.rn
WHERE d1.OldStoreCode IS NOT NULL) ,
F -- get the stores wich led to the last store with one hop
AS (SELECT C.StoreCode
, E.GroupID
FROM E
INNER JOIN C ON E.LastStore = C.OldStoreCode)
-- combine to get the full grouping
SELECT A.StoreCode, ClusterID = DENSE_RANK() OVER (ORDER BY A.GroupID) FROM (
SELECT C.StoreCode,F.GroupID FROM C INNER JOIN F ON C.OldStoreCode = F.StoreCode
UNION
SELECT * FROM F
UNION
SELECT E.LastStore,E.GroupID FROM E) AS A ORDER BY StoreCode, ClusterID
RESULTS:

Comparing tables and getting non matching values

I'm pretty new to SQL and I can't get this to work I've got these two tables below
Table A Table B
_________________ _________________
| A | 2015-10-4 | B | 2015-11-6
| B | 2015-11-4 | C | 2015-05-4
| C | 2015-05-6 | D | 2015-05-8
| D | 2015-05-7 | C | 2015-05-5
I'm trying to write a stored procedure that will get all letters from table B that has a date less than table A and any letter that doesn't exist in table B.
This is what I have so far
SELECT *
FROM A q JOIN
B c ON q.Letter = c.Letter AND q.Date > c.Date OR c.Letter IS NULL
This returns C but I can't have it return A also. It's confusing to me trying to join and compare tables still.
I do not want duplicate rows, the results I would be expecting would return
| A | 2015-10-4
| C | 2015-05-6
EDIT
I'm running into an issue now where if I have a case like this
Table A Table B
_________________ _________________
| A | 2015-10-4 | B | 2015-11-6
| B | 2015-11-4 | C | 2015-05-4
| C | 2015-05-6 | D | 2015-05-8
| D | 2015-05-7 | C | 2015-05-5
| C | 2015-05-7
It will still return C for some reason. Using a.date > max(b.date) doesn't work because max can't used that way. And I want to assume the max date can be anywhere in the table in table B.
So now my new results would be
| A | 2015-10-4
But I am getting A and C still.
You should use a LEFT JOIN:
SELECT DISTINCT A.letter, A.[Date]
FROM dbo.TableA A
LEFT JOIN dbo.TableB B
ON A.letter = B.letter
WHERE B.[Date] < A.[Date] OR B.letter IS NULL;
UPDATE
You should have explained your requirements as: "get all letters from table B in which every date is lesser than...."
SELECT DISTINCT A.letter, A.[Date]
FROM dbo.TableA A
LEFT JOIN (SELECT letter, MAX([Date]) [Date]
FROM dbo.TableB
GROUP BY letter) B
ON A.letter = B.letter
WHERE B.[Date] < A.[Date] OR B.letter IS NULL;
I would go for a UNION / UNION ALL, so that you get the result subset for the first condition + the ones for the second one.
Something similar to this should do the job:
sqlite> create table A (letter, my_date);
sqlite> create table B (letter, my_date);
sqlite> insert into A values ('A', '2015-10-04');
sqlite> insert into A values ('B', '2015-11-04');
sqlite> insert into A values ('C', '2015-05-06');
sqlite> insert into A values ('D', '2015-05-07');
sqlite> insert into B values ('B', '2015-11-06');
sqlite> insert into B values ('C', '2015-05-04');
sqlite> insert into B values ('D', '2015-05-08');
sqlite> insert into B values ('C', '2015-05-05');
A 2015-10-04
sqlite> select B.* from A, B where A.letter = B.letter and B.my_date < A.my_date UNION ALL select A.* from A where not exists (select 1 from B where B.letter=A.letter);
letter my_date
---------- ----------
C 2015-05-04
C 2015-05-05
A 2015-10-04