Select Single Record if ID/Name is Duplicated - tsql

This seems like a really simple problem but I can't seem to figure it out right now...
Here is a simplified view of the data that I am fetching from my current stored proc:
ID Name Class Desc
--- ----- ------ -----
84 Calvin J. 2B
53 Fred D. 3B
53 Fred D. ADJ Change/Correction
47 Mary F. 3A
47 Mary F. ADJ New Product
09 Donald M. ADJ Cancelled
21 Richard G. ADJ Bad Debt
21 Richard G. ADJ Cancelled
I need to modify my procedure to select only one record per individual. If a person has an adjustment, I only want to select the record with the adjustment and disregard the other record. Based on the above, this is the result set that I am trying to return:
ID Name Class Desc
--- ----- ------ -----
84 Calvin J. 2B
53 Fred D. ADJ Change/Correction
47 Mary F. ADJ New Product
09 Donald M. ADJ Cancelled
21 Richard G. ADJ Cancelled
Help please!
UPDATE
I just realized that there is an additional requirement for this query; if there are two adjustments where one has a description of "Bad Debt" and the other "Cancelled", the record with the "Cancelled" description needs to be selected (see updated data above).

This should do the trick:
SELECT ID, Name, Class, [Desc]
FROM (
SELECT ID, Name, Class, [Desc],
ROW_NUMBER() OVER(PARTITION BY ID
ORDER BY CASE WHEN Class = 'ADJ'
THEN 0 ELSE 1 END) rn
FROM Table1
) A
WHERE rn = 1
It looks scarier than it really is. The inner query contains an extra column computed with ROW_NUMBER(). What this does is number your rows, starting over at 1 for each distinct ID (specified in the PARTITION BY). The ORDER BY, which tells ROW_NUMBER() how to order the rows, is a case statement saying that rows with Class = 'ADJ' should come before all other rows. Then at the end we grab only rows numbered 1. The result is selecting the ADJ row if there is one for that ID, or the regular row otherwise.
Edit in response to updated requirements
If you have additional prioritization criteria then you can add those into the ORDER BY, just like you would ORDER BY in a regular query. Often it's helpful to execute just the inner query without filtering down to rn = 1 so you can see exactly how row numbers are being assigned.
Here's the updated query that should satisfy your new requirement:
SELECT ID, Name, Class, [Desc]
FROM (
SELECT ID, Name, Class, [Desc],
ROW_NUMBER() OVER(PARTITION BY ID
ORDER BY
CASE WHEN Class = 'ADJ'
THEN 0 ELSE 1 END,
CASE WHEN [Desc] = 'Cancelled'
THEN 0 ELSE 1 END) rn
FROM Table1
) A
WHERE rn = 1
See it in action here.

Related

OrientDB Traverse Sum and Group By Top-Most Record

We have Orders that include "caused_order" edges from Order to Order because friends can refer other friends to make purchases. We know from the links we generate for the friends that Order ID 42 caused Order ID 47, so we create a "caused_order" edge between the two Order vertices.
We're looking to identify the people that are generating the most referral business. Right now we just loop through in C# and figure it out because our datasets are relatively small. But I'd like to figure out if there's a way to use the Traverse SQL to accomplish this instead.
The problem I'm running in to is getting an accurate count/sum for each Original Order ID.
Consider the following scenario:
Order 42 caused four other Orders, including Order 47. Order 47 caused 2 additional Orders. And Order 51, unrelated to 42 or 47, caused 3 Orders.
I can run the following SQL to get the best referrers for this specific {ProductId}:
select in_caused_order[0].id as OrderID, count(*) as ReferCount, sum(amount) as ReferSum
from ( traverse out('caused_order') from Order )
where out_includes.id = '{ProductId}' and $depth >= 1
group by in_caused_order[0].id
EDIT: the schema is a bit more complex than this, I was just including the out_includes WHERE clause to show that there's a bit of filtering of the Orders. But it's a bit like:
Product(V) <-- includes(E) <-- Order(V) --> caused_order(E) --> Order(V)
(the Order vertex has "amount" as a property, which stores the money spent and is being SUM'd in the SELECT, along with a few fields like date which aren't important)
But that will result in something like:
OrderID | ReferCount | ReferSum
42 | 4 | 525
47 | 2 | 130
51 | 3 | 250
Except that's not quite right, is it? Because Order 42 also technically caused 47's two orders. So we'd want to see something like:
OrderID | ReferCount | ReferSum | ExtendedCount | ExtendedSum
42 | 4 | 525 | 2 | 130
47 | 2 | 130 | 0 | 0
51 | 3 | 250 | 0 | 0
I recognize that the two "Extended" count/sum columns might be tricky. We might have to run the query twice, once with $depth = 1, and again with $depth > 1, and then assemble the results of those two queries in C#, which is fine.
But I can't even figure out how to get the overall total calculated correctly. The first step would even be to see something like:
OrderID | ReferCount | ReferSum
42 | 6 | 635 <-- includes its 4 orders + 47's 2 orders
47 | 2 | 130
51 | 3 | 250
And since this can be n-levels deep, it's not like I can somehow just do in_caused_order.in_caused_order.in_caused_order in the SQL, I don't know how many deep that will go. Order 83 could be caused by Order 47, and Order 105 could be caused by Order 83, and so on.
Any help would be much appreciated. Or maybe the answer is, Traverse can't handle this, and we'll have to figure something else out entirely.
I'm trying your usecase, following is my testdata:
create class caused_order extends e
create class Order extends v
create property Order.id integer
create property Order.amount integer
begin
create vertex Order set id=1 ,amount=1
create vertex Order set id=2 ,amount=5
create vertex Order set id=3 ,amount=11
create vertex Order set id=4 ,amount=23
create vertex Order set id=5 ,amount=31
create vertex Order set id=6 ,amount=49
create vertex Order set id=7 ,amount=4
create vertex Order set id=8 ,amount=74
create vertex Order set id=9 ,amount=87
create edge caused_order from (select from Order where id=1) to (select from Order where id=2)
create edge caused_order from (select from Order where id=1) to (select from Order where id=3)
create edge caused_order from (select from Order where id=2) to (select from Order where id=4)
create edge caused_order from (select from Order where id=2) to (select from Order where id=5)
create edge caused_order from (select from Order where id=6) to (select from Order where id=7)
create edge caused_order from (select from Order where id=6) to (select from Order where id=8)
commit retry 20
then I wrote these 2 queries to show orders with relative referSum and ReferCount.
First one including head order in the count:
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (traverse out('caused_order') from $parent.$current) group by Amount)
second one, excluding the head:
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (select from (traverse out('caused_order') from $parent.$current) where $depth>=1) group by Amount)
EDIT
I've added this to my data:
create class includes extends E
create class Product extends V
create property Product.id Integer
create vertex Product set id = 101
create vertex Product set id = 102
create vertex Product set id = 103
create vertex Product set id = 104
create edge includes from (select from Order where id=1) to (select from Product where id=101)
create edge includes from (select from Order where id=2) to (select from Product where id=102)
create edge includes from (select from Order where id=3) to (select from Product where id=103)
create edge includes from (select from Order where id=4) to (select from Product where id=104)
create edge includes from (select from Order where id=5) to (select from Product where id=101)
create edge includes from (select from Order where id=6) to (select from Product where id=102)
create edge includes from (select from Order where id=7) to (select from Product where id=103)
create edge includes from (select from Order where id=8) to (select from Product where id=104)
create edge includes from (select from Order where id=9) to (select from Product where id=101)
create edge includes from (select from Order where id=1) to (select from Product where id=102)
create edge includes from (select from Order where id=1) to (select from Product where id=103)
create edge includes from (select from Order where id=2) to (select from Product where id=104)
and these are the modified queries (added the while out('includes').id contains {prodID_number} in traverse and where out('includes').id contains {prodID_number}:
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (traverse out('caused_order') from $parent.$current while out('includes').id contains 102) group by Amount)
where out('includes').id contains 102
select id as OrderID, $a[0].Amount as ReferSum, $a[0].Count as ReferCount from Order
let $a=(select sum(amount) as Amount, count(*) as Count from (traverse out('caused_order') from $parent.$current while out('includes').id contains 102) where $depth >= 1 group by Amount)
where out('includes').id contains 102

Combine similar rows using case statement

I have a query currently populating a report which has a few rows of "duplicate" information. Similar IDs are being passed through which should be combined but are unique enough that we do not want to Concat/Insert them within our model. In order for the report to be processed correctly, I need to sum their $ values (The only information I actually need to keep preserved is the name, the final Summed amount, and the ID.
Is there a simple way to achieve this by creating a case statement the solely will sum the Amount field? I tried using a SUM(CASE WHEN statement but I do not want a new column since my report is only using that field to populate $$ information. Here is a sample of my issue below:
ID Name Amount Person
+-------+--------------+------------+-----------------------+
21011 Place A -210.30 John Doe
210115 Place A-a 6500.70 John Doe
21060 Place B 255.00 Wayne C
2106015 Place Bb 212.30 Wayne C
2106015 Place Bb 1212.30 Wayne C
2106015 Place Bb 212.30 Wayne C
21080 Place J 57212.30 Billy J
My desired result for this would be:
ID Name Amount Person
+-------+--------------+------------+-----------------------+
21011 Place A 6290.40 John Doe
21060 Place B 1889.90 Wayne C
21080 Place J 57212.30 Billy J
Is there a simplified way to combine these rows in TSQL without modifying the db?
You can try this (provided your ID column is a number and not a character field):
;WITH cte_getsum AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY Person ORDER BY ID) AS RowNum,
ID,
NAME,
(SELECT SUM(Amount) FROM TableName WHERE TableName.Person = t1.Person) AS SumAmount,
Person
FROM
TableName t1
)
SELECT * FROM cte_getsum
WHERE rownum = 1
You can try with below script, I created a temp table just for sample Data.. but in your case you can directly refer to table you have.
SELECT * INTO #tmpInput
FROM (VALUES('21011','Place A', -210.30,'John Doe'),
('210115','Place A-a',6500.70,'John Doe'),
('21060', 'Place B' ,255.00,'Wayne C'),
('2106015', 'Place Bb' ,212.30,'Wayne C'),
('2106015' , 'Place Bb' ,1212.30,'Wayne C'),
('2106015' , 'Place Bb' ,212.30 ,'Wayne C')
,('21080' , 'Place J' ,57212.30,'Billy J')
)Input (ID,Name,Amount,Person)
SELECT SUBSTRING(t1.ID,0,6) ID
,t2.Name
,SUM(t1.Amount) AMOUNT
,t2.Person
FROM #tmpInput t1
INNER JOIN #tmpInput t2 ON t2.ID=SUBSTRING(t1.ID,0,6)
GROUP BY SUBSTRING(t1.ID,0,6),t2.Name,t2.Person

Find equal twin record postgresql

I have a table company with 60 columns. The goal is to create a tool to find, compare and eliminate duplicates in this table.
Example: I have a record with id 22 and I know it has a twin because I run this (simplified code):
SELECT min(co_id),co_name,count(*) FROM co
GROUP BY co_name
HAVING count(*) > 1
The result shows there are one twin (count 2) and I get the oldest id by min(co_id)
My question is how I search for the twin co_id? Just passing the oldest id?
Something like:
SELECT co_id FROM co
WHERE co_name EQUAL TO co_id='22'
LIMIT 2
Sample data:
id co_name
22 Volvo
23 Volvo
24 Ford
25 Ford
I know id 22 and I want to search for the twin 23 based on the content of 22.
The closest I found is this. Which is far from generic. And a nightmare for comparing 60 field:
SELECT id,
(SELECT max(b.id) from co b
WHERE a.co_name = b.co_name
LIMIT 1) as twin
FROM co a
WHERE id='22'
How do I do this in a more simple and generic way? I just want the twin record co_id.
Thank you in advance!
select max_co,co_name from (
select max(co_id) max_co,min(co_id) min_co,co_name from co
group by co_name having count(*)>1) where min_co=(your old co id as input);
You can join your table with itself:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON c1.co_name=c2.co_name
AND c1.id>c2.id
this will return all duplicated records (but not the original record with the lowest id). Or since you're using Postgresql you can use a window function:
SELECT *
FROM (
SELECT
id,
co_name,
row_number() OVER (PARTITION by co_name ORDER BY id) as row
FROM
co_name
) s
WHERE
row>1;
Please see an example here.
If you want to compare multiple columns, the JOIN solution would be more flexible. I don't know exactly how you want to compare your columns and how you exactly define "twin" rows, but you a query like this should help:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON (
c1.co_name=c2.co_name
OR c1.co_city=c2.co_city
OR c1.co_owner=c2.co_owner
OR ...
) AND c1.id>c2.id
if you just want duplicated records of id=22 then you can try with this:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON c1.co_name=c2.co_name
AND c1.id>c2.id
WHERE
c2.id=22
or if you just want a single twin, comparing 60 columns, you can try with this query:
SELECT MIN(ID) as Twin /* or MAX(ID), depending what you're after */
FROM
co_name c1 INNER JOIN co_name c2
ON (
c1.co_name=c2.co_name
OR c1.co_city=c2.co_city
OR c1.co_owner=c2.co_owner
OR ...
) AND c1.id>c2.id
WHERE
c2.id=22
I found one solution that is working on 60 columns if I use variables in stead of hardcode in the query. Thanks everybody for all input. Some of them were about the same track.
SELECT id,
(SELECT max(b.id) from co b
WHERE concat(a.co_name,etc) = concat(b.co_name,etc)
LIMIT 1) as twin
FROM co a
WHERE id='22'
Not the best one, but fetch one twin at a time. And it is far from generic. Thanks for pointing me in the right direction. A generic solution would be nicer.

counting in sql in subquery in the table

DNO DNAME
----- -----------
1 Research
2 Finance
EN ENAME CITY SALARY DNO JOIN_DATE
-- ---------- ---------- ---------- ---------- ---------
E1 Ashim Kolkata 10000 1 01-JUN-02
E2 Kamal Mumbai 18000 2 02-JAN-02
E3 Tamal Chennai 7000 1 07-FEB-04
E4 Asha Kolkata 8000 2 01-MAR-07
E5 Timir Delhi 7000 1 11-JUN-05
//find all departments that have more than 3 employees.
My try
select deptt.dname
from deptt,empl
where deptt.dno=empl.dno and (select count(empl.dno) from empl group by empl.dno)>3;
here is the solution
select deptt.dname
from deptt,empl
where deptt.dno=empl.dno
group by deptt.dname having count(1)>3;
select
*
from departments d
inner join (
select dno from employees group by dno having count(*) > 3
) e on d.dno = e.dno
There are many approaches to this problem but almost all will use GROUP BY and the HAVING clause. That clause allows you to filter results of aggregate functions. Here it is used to choose only those records where the count is greater than 3.
In the query structure used above the group by is handled on the employee table only, then the result (which is known as a derived table) is joined by an INNER JOIN to the departments table. This inner join only allows matching records so this has the effect of filtering the departments table to only those which have a count() of greater than 3.
An advantage of this query structure is fewer records are joined, and also that all columns of the departments table are available for reporting. Disadvantage of this structure is the the count() of employees per department isn't visible.

Using fields from select query in where clause in subqueries

I have a list of people and there are 4 types that can occur as well as 5 resolutions for each type. I'm trying to write a single query so that I can pull each type/resolution combination for each person but am running into problems. This is what I have so far:
SELECT person,
TypeRes1 = (SELECT COUNT(*) FROM table1 where table1.status = 45)
JOIN personTbl ON personTbl.personid = table1.personid
WHERE person LIKE 'A0%'
GROUP BY person
I have adjusted column names to make it more...generic, but basically the person table has several hundred people in it and I just want A01 through A09, so the like statement is the easiest way to do this. The problem is that my results end up being something like this:
Person TypeRes1
A06 48
A04 48
A07 48
A08 48
A05 48
Which is incorrect. I can't figure out how to get the column count correct for each person. I tried doing something like:
SELECT person as p,
TypeRes1= (SELECT COUNT(*) FROM table1
JOIN personTbl ON personTbl.personid = table1.personid
WHERE table1.status = 45 AND personTbl.person = p)
FROM table1
JOIN personTbl ON personTbl.personid = table1.personid
WHERE personTbl.person LIKE 'A0%'
GROUP BY personTbl.person
But that gives me the error: Invalid Column name 'p'. Is it possible to pass p into the subquery or is there another way to do it?
EDIT: There are 19 different statuses as well, so there will be 19 different TypeRes, for brevity I just put the one as if I can find the one, I think I can do the rest on my own.
Maybe something like this:
SELECT
person,
(
SELECT
COUNT(*)
FROM
table1
WHERE
table1.status = 45
AND personTbl.personid = table1.personid
) AS TypeRes1
FROM
personTbl
WHERE person LIKE 'A0%'