Combine similar rows using case statement

Combine similar rows using case statement - tsql

I have a query currently populating a report which has a few rows of "duplicate" information. Similar IDs are being passed through which should be combined but are unique enough that we do not want to Concat/Insert them within our model. In order for the report to be processed correctly, I need to sum their $ values (The only information I actually need to keep preserved is the name, the final Summed amount, and the ID.
Is there a simple way to achieve this by creating a case statement the solely will sum the Amount field? I tried using a SUM(CASE WHEN statement but I do not want a new column since my report is only using that field to populate $$ information. Here is a sample of my issue below:
ID Name Amount Person
+-------+--------------+------------+-----------------------+
21011 Place A -210.30 John Doe
210115 Place A-a 6500.70 John Doe
21060 Place B 255.00 Wayne C
2106015 Place Bb 212.30 Wayne C
2106015 Place Bb 1212.30 Wayne C
2106015 Place Bb 212.30 Wayne C
21080 Place J 57212.30 Billy J
My desired result for this would be:
ID Name Amount Person
+-------+--------------+------------+-----------------------+
21011 Place A 6290.40 John Doe
21060 Place B 1889.90 Wayne C
21080 Place J 57212.30 Billy J
Is there a simplified way to combine these rows in TSQL without modifying the db?

You can try this (provided your ID column is a number and not a character field):
;WITH cte_getsum AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY Person ORDER BY ID) AS RowNum,
ID,
NAME,
(SELECT SUM(Amount) FROM TableName WHERE TableName.Person = t1.Person) AS SumAmount,
Person
FROM
TableName t1
)
SELECT * FROM cte_getsum
WHERE rownum = 1

You can try with below script, I created a temp table just for sample Data.. but in your case you can directly refer to table you have.
SELECT * INTO #tmpInput
FROM (VALUES('21011','Place A', -210.30,'John Doe'),
('210115','Place A-a',6500.70,'John Doe'),
('21060', 'Place B' ,255.00,'Wayne C'),
('2106015', 'Place Bb' ,212.30,'Wayne C'),
('2106015' , 'Place Bb' ,1212.30,'Wayne C'),
('2106015' , 'Place Bb' ,212.30 ,'Wayne C')
,('21080' , 'Place J' ,57212.30,'Billy J')
)Input (ID,Name,Amount,Person)
SELECT SUBSTRING(t1.ID,0,6) ID
,t2.Name
,SUM(t1.Amount) AMOUNT
,t2.Person
FROM #tmpInput t1
INNER JOIN #tmpInput t2 ON t2.ID=SUBSTRING(t1.ID,0,6)
GROUP BY SUBSTRING(t1.ID,0,6),t2.Name,t2.Person

Related

Restrict string_agg order by in postgres

While working with postgres db, I came across a situation where I will have to display column names based on their ids stored in a table with comma separated. Here is a sample:
table1 name: labelprint
id field_id
1 1,2
table2 name: datafields
id field_name
1 Age
2 Name
3 Sex
Now in order to display the field name by picking ids from table1 i.e. 1,2 from field_id column, I want the field_name to be displayed in same order as their respective ids as
Expected result:
id field_id field_name
1 2,1 Name,Age
To achieve the above result, I have written the following query:
select l.id,l.field_id ,string_agg(d.field_name,',') as field_names
from labelprint l
join datafields d on d.id = ANY(string_to_array(l.field_id::text,','))
group by l.id
order by l.id
However, the string_agg() functions sort the final string in ascending order and displays the output as shown below:
id field_id field_name
1 2,1 Age, Name
As you can see the order is not maintained in the field_name column which I want to display as per field_id value order.
Any suggestion/help is highly appreciated.
Thanks in advance!
Already mentioned in the description.

While this will probably be horrible for performance, as well as readability and maintainability, you can dynamically compute the order you want:
select l.id,l.field_id,
string_agg(d.field_name,','
order by array_position(string_to_array(l.field_id::text,','),d.id)
) as field_names
from labelprint l
join datafields d on d.id = ANY(string_to_array(l.field_id::text,','))
group by l.id
order by l.id;
You should at least store your array as an actual array, not as a comma delimited string. Or maybe use an intermediate table and don't store arrays at all.

With a small modification to your existing query you could do it as follows :
select l.id, l.field_id, string_agg(d.field_name,',') as field_names
from labelprint l
join datafields d on d.id::varchar = ANY(string_to_array(l.field_id,','))
group by l.id, l.field_id
order by l.id
Demo here

Find equal twin record postgresql

I have a table company with 60 columns. The goal is to create a tool to find, compare and eliminate duplicates in this table.
Example: I have a record with id 22 and I know it has a twin because I run this (simplified code):
SELECT min(co_id),co_name,count(*) FROM co
GROUP BY co_name
HAVING count(*) > 1
The result shows there are one twin (count 2) and I get the oldest id by min(co_id)
My question is how I search for the twin co_id? Just passing the oldest id?
Something like:
SELECT co_id FROM co
WHERE co_name EQUAL TO co_id='22'
LIMIT 2
Sample data:
id co_name
22 Volvo
23 Volvo
24 Ford
25 Ford
I know id 22 and I want to search for the twin 23 based on the content of 22.
The closest I found is this. Which is far from generic. And a nightmare for comparing 60 field:
SELECT id,
(SELECT max(b.id) from co b
WHERE a.co_name = b.co_name
LIMIT 1) as twin
FROM co a
WHERE id='22'
How do I do this in a more simple and generic way? I just want the twin record co_id.
Thank you in advance!

select max_co,co_name from (
select max(co_id) max_co,min(co_id) min_co,co_name from co
group by co_name having count(*)>1) where min_co=(your old co id as input);

You can join your table with itself:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON c1.co_name=c2.co_name
AND c1.id>c2.id
this will return all duplicated records (but not the original record with the lowest id). Or since you're using Postgresql you can use a window function:
SELECT *
FROM (
SELECT
id,
co_name,
row_number() OVER (PARTITION by co_name ORDER BY id) as row
FROM
co_name
) s
WHERE
row>1;
Please see an example here.
If you want to compare multiple columns, the JOIN solution would be more flexible. I don't know exactly how you want to compare your columns and how you exactly define "twin" rows, but you a query like this should help:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON (
c1.co_name=c2.co_name
OR c1.co_city=c2.co_city
OR c1.co_owner=c2.co_owner
OR ...
) AND c1.id>c2.id
if you just want duplicated records of id=22 then you can try with this:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON c1.co_name=c2.co_name
AND c1.id>c2.id
WHERE
c2.id=22
or if you just want a single twin, comparing 60 columns, you can try with this query:
SELECT MIN(ID) as Twin /* or MAX(ID), depending what you're after */
FROM
co_name c1 INNER JOIN co_name c2
ON (
c1.co_name=c2.co_name
OR c1.co_city=c2.co_city
OR c1.co_owner=c2.co_owner
OR ...
) AND c1.id>c2.id
WHERE
c2.id=22

I found one solution that is working on 60 columns if I use variables in stead of hardcode in the query. Thanks everybody for all input. Some of them were about the same track.
SELECT id,
(SELECT max(b.id) from co b
WHERE concat(a.co_name,etc) = concat(b.co_name,etc)
LIMIT 1) as twin
FROM co a
WHERE id='22'
Not the best one, but fetch one twin at a time. And it is far from generic. Thanks for pointing me in the right direction. A generic solution would be nicer.

Select rows based on grouping in same table

Sorry about the lame Title... If I could summarize this in a few words I might have had better luck finding an existing solution here!
I have a table that simplified looks like this:
ID PRODUCT
___ _________
100 Savings
200 Mortgage
200 Visa
300 Mortgage
300 Savings
I need to select rows based on the product of each ID. For example, I can do this:
SELECT DISTINCT ID
FROM table1
WHERE Product NOT IN ('Savings', 'Chequing')
This would return:
ID
___
200
300
However, in the case of ID 300 they do have Savings so I actually do not want this returned. In plain English I want to
Select * from table1 where 'Savings' and 'Chequing' are not the product for any row with that ID.
Desired result in this case would be one row with ID 200 since they do not have Savings or Chequing.
How can I do this?

Select the rows that match the item you do not want to match then compare therr ids
e.g.
select distinct id from table1 where id not in (
SELECT ID
FROM table1
WHERE Product IN ('Savings', 'Chequing')
)

You can use NOT EXISTS:
SELECT DISTINCT t1.ID
FROM dbo.Table1 t1
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.Table1 t2
WHERE t2.Poduct IN ('Savings', 'Chequing')
AND t2.ID = t1.ID
)
Demo
Worth reading: Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

Eliminating matching values in a SQL result set

I have a table with a list of transactions (invoices and credits) and I need to get a list of all the rows where the invoices and credits don't match up.
eg
user product value
bill ThingA 200
jim ThingA -200
sue ThingB 100
liz ThingC 50
I only want to see the third and fourth rows, as the values of the others match off.
I can do this if I select product, sum(value)
...
group by product
having sum(value) <> 0
which works well, but I want to return the user name as well.
As soon as I add the user to the select, I need to group by it as well, which messes it up as the amounts don't match up by user AND product.
Any ideas ? I am using MS SQL 2000...
Cheers

You can do like this:
SELECT tab2.user, product, sum_val
FROM
(SELECT product, SUM(value) sum_val
FROM your_table
GROUP BY product HAVING SUM(value) <> 0) tab1
INNER JOIN your_table tab2
ON tab1.product = tab2.product

#LolCoder solution is good, but given a context where you have "Thing B" with a "100" value by both "sue" and "liz", you could be able to retrieve the following resultset with my query :
| product | value | users |
+----------------------------+
| Thing B | 200 | sue, liz |
Here is the query :
select product
,sum(value) as value
,Stuff(( select ',' + convert(varchar(40), SQ.user)
from YourTable SQ
where Q.product = SQ.product
for xml path('')
), 1, 1, '') as users
from YourTable Q
group by Q.product

TSQL Group By with an "OR"?

This query for creating a list of Candidate duplicates is easy enough:
SELECT Count(*), Can_FName, Can_HPhone, Can_EMail
FROM Can
GROUP BY Can_FName, Can_HPhone, Can_EMail
HAVING Count(*) > 1
But if the actual rule I want to check against is FName and (HPhone OR Email) - how can I adjust the GROUP BY to work with this?
I'm fairly certain I'm going to end up with a UNION SELECT here (i.e. do FName, HPhone on one and FName, EMail on the other and combine the results) - but I'd love to know if anyone knows an easier way to do it.
Thank you in advance for any help.
Scott in Maine

Before I can advise anything, I need to know the answer to this question:
name phone email
John 555-00-00 john#example.com
John 555-00-01 john#example.com
John 555-00-01 john-other#example.com
What COUNT(*) you want for this data?
Update:
If you just want to know that a record has any duplicates, use this:
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
)
SELECT *
FROM q qo
WHERE EXISTS
(
SELECT NULL
FROM q qi
WHERE qi.id <> qo.id
AND qi.name = qo.name
AND (qi.phone = qo.phone OR qi.email = qo.email)
)
It's more efficient, but doesn't tell you where the duplicate chain started.
This query select all entries along with the special field, chainid, that indicates where the duplicate chain started.
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
),
dup AS (
SELECT id AS chainid, id, name, phone, email, 1 as d
FROM q
UNION ALL
SELECT chainid, qo.id, qo.name, qo.phone, qo.email, d + 1
FROM dup
JOIN q qo
ON qo.name = dup.name
AND (qo.phone = dup.phone OR qo.email = dup.email)
AND qo.id > dup.id
),
chains AS
(
SELECT *
FROM dup do
WHERE chainid NOT IN
(
SELECT id
FROM dup di
WHERE di.chainid < do.chainid
)
)
SELECT *
FROM chains
ORDER BY
chainid

None of these answers is correct. Quassnoi's is a decent approach, but you will notice one fatal flaw in the expressions "qo.id > dup.id" and "di.chainid < do.chainid": comparisons made by ID! This is ALWAYS bad practice because it depends on some inherent ordering in the IDs. IDs should NEVER be given any implicit meaning and should ONLY participate in equality or null testing. You can easily break Quassnoi's solution in this example by simply reordering the IDs in the data.
The essential problem is a disjunctive condition with a grouping, which leads to the possibility of two records being related through an intermediate, though they are not directly relatable.
e.g., you stated these records should all be grouped:
(1) John 555-00-00 john#example.com
(2) John 555-00-01 john#example.com
(3) John 555-00-01 john-other#example.com
You can see that #1 and #2 are relatable, as are #2 and #3, but clearly #1 and #3 are not directly relatable as a group.
This establishes that a recursive or iterative solution is the ONLY possible solution.
So, recursion is not viable since you can easily end up in a looping situation. This is what Quassnoi was trying to avoid with his ID comparisons, but in doing so he broke the algorithm. You could try limiting the levels of recursion, but you may not then complete all relations, and you will still potentially be following loops back upon yourself, leading to excessive data size and prohibitive inefficiency.
The best solution is ITERATIVE: Start a result set by tagging each ID as a unique group ID, and then spin through the result set and update it, combining IDs into the same unique group ID as they match on the disjunctive condition. Repeat the process on the updated set each time until no further updates can be made.
I will create example code for this soon.

GROUP BY doesn't support OR - it's implicitly AND and must include every non-aggregator in the select list.

I assume you also have a unique ID integer as the primary key on this table. If you don't, it's a good idea to have one, for this purpose and many others.
Find those duplicates by a self-join:
select
c1.ID
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
, c2.ID
, c2.Can_FName
, c2.Can_HPhone
, c2.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
order by
c1.ID
The query gives you N-1 rows for each N duplicate combinations - if you want just a count along with each unique combination, count the rows grouped by the "left" side:
select count(1) + 1,
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
group by
c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
Granted, this is more involved than a union - but I think it illustrates a good way of thinking about duplicates.

Project the desired transformation first from a derived table, then do the aggregation:
SELECT COUNT(*)
, CAN_FName
, Can_HPhoneOrEMail
FROM (
SELECT Can_FName
, ISNULL(Can_HPhone,'') + ISNULL(Can_EMail,'') AS Can_HPhoneOrEMail
FROM Can) AS Can_Transformed
GROUP BY Can_FName, Can_HPhoneOrEMail
HAVING Count(*) > 1
Adjust your 'OR' operation as needed in the derived table project list.

I know this answer will be criticised for the use of the temp table, but it will work anyway:
-- create temp table to give the table a unique key
create table #tmp(
ID int identity,
can_Fname varchar(200) null, -- real type and len here
can_HPhone varchar(200) null, -- real type and len here
can_Email varchar(200) null, -- real type and len here
)
-- just copy the rows where a duplicate fname exits
-- (better performance specially for a big table)
insert into #tmp
select can_fname,can_hphone,can_email
from Can
where can_fname exists in (select can_fname from Can
group by can_fname having count(*)>1)
-- select the rows that have the same fname and
-- at least the same phone or email
select can_Fname, can_Hphone, can_Email
from #tmp a where exists
(select * from #tmp b where
a.ID<>b.ID and A.can_fname = b.can_fname
and (isnull(a.can_HPhone,'')=isnull(b.can_HPhone,'')
or (isnull(a.can_email,'')=isnull(b.can_email,'') )

Try this:
SELECT Can_FName, COUNT(*)
FROM (
SELECT
rank() over(partition by Can_FName order by Can_FName,Can_HPhone) rnk_p,
rank() over(partition by Can_FName order by Can_FName,Can_EMail) rnk_m,
Can_FName
FROM Can
) X
WHERE rnk_p=1 or rnk_m =1
GROUP BY Can_FName
HAVING COUNT(*)>1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse