Query to find duplicates in Access table

Query to find duplicates in Access table - ms-access-2003

I have exhausted all examples I can find and I am still not getting the results I need.
I have an invoice table where I need to find any instance where the same invoice number is used on more than one customer. An invoice number can occur multiple times in the table and each record is distinguished by invoice and invoice seq no, but there should only be one customer per invoice, but some errors have crept in.
What I need is a report of only those invoice that have two or more customers. In the data below Cust 45001301 should not be the customer number for Invoice 708, seq 3.
Cust No Invoice No Seq No Input Date
700180 708 1 9/30/2007
700180 708 2 9/30/2007
45001301 708 3 9/30/2007
700180 708 4 9/30/2007
700190 709 1 9/30/2007
700190 709 2 9/30/2007
What I have tried to do is just get a simple group by query to show me only those invoices with more than one customer much like this-
[Invoice No] [Cust no]
708 700180
708 45001301
But I ONLY want to see those with two or more customers, so in the above example I would not see the entry for invoice 709 because it only has the one customer.
[Invoice No] [Cust no]
708 700180
708 45001301
709 700190

First create a query which returns all distinct combinations of [Invoice No] and [Cust no]. Then use that as a subquery where you count the number of customers per [Invoice No] and add a HAVING clause to limit the output to only those where the count is greater than one.
SELECT sub.[Invoice No], Count(*) AS customers
FROM
(
SELECT DISTINCT [Invoice No], [Cust no]
FROM Invoices
) AS sub
GROUP BY sub.[Invoice No]
HAVING Count(*) > 1;
If you then need to see which customers where duplicated for those invoices, INNER JOIN that query back to the Invoices table.
SELECT DISTINCT i.[Invoice No], sub2.customers, i.[Cust No]
FROM
Invoices AS i
INNER JOIN
(
SELECT sub.[Invoice No], Count(*) AS customers
FROM
(
SELECT DISTINCT [Invoice No], [Cust no]
FROM Invoices
) AS sub
GROUP BY sub.[Invoice No]
HAVING Count(*) > 1
) AS sub2
ON i.[Invoice No] = sub2.[Invoice No];
With Access 2007 and your sample data in a table named Invoices, that query gives me this result set:
Invoice No customers Cust No
708 2 700180
708 2 45001301
If you actually wanted to see all the data for those duplicate invoice numbers, change the first line of the second query to this:
SELECT i.[Invoice No], sub2.customers, i.[Cust No], i.[Seq No], i.[Input Date]

How about:
SELECT inv.[cust no],
inv.[invoice no],
inv.[seq no],
inv.[input date]
FROM inv
INNER JOIN (SELECT q.[invoice no],
Count(q.[invoice no]) AS [CountOfInvoice No]
FROM (SELECT inv.[invoice no],
inv.[cust no]
FROM inv
GROUP BY inv.[invoice no],
inv.[cust no]) AS q
GROUP BY q.[invoice no]
HAVING (( ( Count(q.[invoice no]) ) > 1 ))) AS q2
ON inv.[invoice no] = q2.[invoice no]
Where inv is the name of your table.
The query returns all details and rows for invoice numbers that have more than one associated customer number.
cust no invoice no seq no input date
700180 708 4 30/09/2007
45001301 708 3 30/09/2007
700180 708 2 30/09/2007
700180 708 1 30/09/2007

Related

INNER JOIN change SUM result

CREATE TABLE beauty.customer_payments
(
customer_id integer,
date date,
amount numeric(10,2),
CONSTRAINT customer_payments_customer_id_fkey FOREIGN KEY (customer_id)
REFERENCES beauty.customers (customer_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
CREATE TABLE beauty.sales
(
product_id integer,
customer_id integer,
sell_date date NOT NULL,
qty integer NOT NULL,
sell_price numeric(10,2) NOT NULL,
expiry_date date NOT NULL,
CONSTRAINT sales_customer_id_fkey FOREIGN KEY (customer_id)
REFERENCES beauty.customers (customer_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT sales_product_id_fkey FOREIGN KEY (product_id)
REFERENCES beauty.products (product_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
Balance of payments beauty.customer_payments for customer_id=6 is 0
SELECT * FROM beauty.customer_payments
WHERE customer_id=6;
customer_id
date
amount
6
2020-11-14
75.00
6
2020-11-14
-75.00
SELECT * FROM beauty.sales
WHERE customer_id=6;
product_id
customer_id
sell_date
qty
sell_price
expiry_date
76
6
2020-11-14
1
75.00
2022-03-03
83
6
2020-11-14
1
10.00
2022-06-23
85
6
2020-11-14
1
10.00
2022-06-23
44
6
2020-11-14
1
12.00
2022-06-23
41
6
2020-11-14
1
15.00
2022-03-26
96
6
2020-11-14
1
75.00
2022-03-15
28
6
2020-11-14
1
4.00
2022-01-22
33
6
2020-11-14
1
4.00
2023-01-23
37
6
2020-11-14
1
4.00
2023-01-23
40
6
2020-11-14
1
4.00
2023-08-13
(10 rows)
SELECT customer_id, SUM(qty * sell_price) AS purchased
FROM beauty.sales
WHERE customer_id=6
GROUP BY customer_id;
customer_id
purchased
6
213.00
SELECT s.customer_id,
SUM(qty * sell_price) AS purchased,
SUM(cp.amount) AS paid,
SUM(qty * sell_price - cp.amount) AS balance
FROM beauty.sales s
INNER JOIN beauty.customer_payments cp
ON cp.customer_id = s.customer_id
WHERE s.customer_id=6
GROUP BY s.customer_id;
customer_id
purchased
paid
balance
6
1065.00
0.00
1065.00
Please advise WHY after adding JOIN (INNER, LEFT, RIGHT) calculation goes wrong and how to solve this multi-calculation issue?
As I see similar question all of them based on not cross-tables calculations like SUM(qty * sell_price - cp.amount)
Delete payments
`
DELETE FROM beauty.customer_payments
WHERE customer_id=6;`
Add new ZERO payment
INSERT INTO beauty.customer_payments(
customer_id, date, amount)
VALUES (6, '2020-11-17', 0);
customer_id
purchased
paid
balance
6
213.00
0.00
213.00
Add payment 10
INSERT INTO beauty.customer_payments(
customer_id, date, amount)
VALUES (6, '2020-11-17', 10);
SELECT * FROM beauty.customer_payments WHERE customer_id=6;
customer_id
date
amount
6
2020-11-17
0.00
6
2020-11-17
10.00
SELECT s.customer_id,
.......
INNER JOIN beauty.customer_payments cp
.......
customer_id
purchased
paid
balance
6
426.00
100.00
326.00
Correct payment with negative amount
INSERT INTO beauty.customer_payments(
customer_id, date, amount)
VALUES (6, '2020-11-17', -10);
SELECT * FROM beauty.customer_payments WHERE customer_id=6;
customer_id
date
amount
6
2020-11-17
0.00
6
2020-11-17
10.00
6
2020-11-17
-10.00
SELECT s.customer_id,
.......
INNER JOIN beauty.customer_payments cp
.......
customer_id
purchased
paid
balance
6
639.00
0.00
639.00
What is this `INNER JOIN' calculate?

In general, you probably wouldn't want to match each payment to each sale (unless there's an additional identifier matching specific payments to specific sales, not just matching each to a customer).
If a customer has 2 sales for $10 and $15, and two payments for $9 and $14, your join is going to match each payment to each sale for that customer, creating something like
Sale
Payment
$10
$9
$10
$14
$15
$9
$15
$14
So the sum of the sales after the join will be $50, not $25 (as you might be expecting). I think the above answers your question about why the join doesn't do what you expect.
The exact query you want might be a little different (do you want all customers even if they have no sales? is it possible for customers to have payments if they don't have a sale?), but in general I'd expect something like the following to work. There are multiple ways of doing this, but I think the following is easy to understand since it aggregates the data into one payment row per customer and one sales row per customer before joining them.
SELECT
s.customer_id,
s.purchased,
cp.amount as paid,
s.purchased - cp.amount as balance
FROM
(SELECT s.customer_id,
SUM(s.qty * s.sell_price) AS purchased
FROM beauty.sales s
GROUP BY s.customer_id) s
LEFT OUTER JOIN
(SELECT cp.customer_id,
SUM(cp.amount) AS amount
FROM beauty.customer_payments cp
GROUP BY cp.customer_id) cp
ON s.customer_id = cp.customer_id
WHERE s.customer_id = 6

PostgreSQL SELECT COUNT returning a bunch of 1s

The following is my code that returns the correct number of rows of nameids that I am looking for (75). Then, when I do COUNT(DISTINCT nameid) at the top instead, it just returns 145 1s instead of the number of rows in my query (75). It just says
1
1
1
..
1
(145 rows)
What am I doing wrong?
SELECT
DISTINCT nameid
FROM
shop
WHERE
yearid >= 2000
GROUP BY
nameid,
yearid
HAVING
SUM(spend) > 98;

You should not use the same column in group by and in aggregated function in this way you obtain only 1 ( the distinct count of a value grouped bybthe same value is 1)
if you want count the DISTINCT nameid for each year with sum(spend) > 68 you should use
SELECT yearid, COUNT(DISTINCT nameid)
FROM shop
WHERE
yearid >= 2000
GROUP BY yearid
HAVING SUM(spend) > 98;

Calculate past 3 month average for every past 3rd month

I am using SQL Server 2014. I have a table like this
create table revenue (id varchar(2), trasdate date, revenue int);
insert into revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
;
I am preparing a SQL query to create a SSRS report. I want to calculate a past 3 month average for current and every past 3rd month with result like below. As we are in month of September right now. The result should show something like this:
**id Period Revenue_3Mon**
aa March-May 233
aa June-Aug 516
ab March-May 233
ab June-Aug 516
Though I can figure out about the Period column. I was mainly focussing on getting the Revenue_3Mon. So I initially tried with the below query after some googling. But this query throws an error as incorrect syntax near 'rows' and if I remove rows from the query then it throws an error as Incorrect syntax near the keyword 'between'. And incorrect syntax near i.
select i.id,i.mon,
avg([i.mon_revenue]) over (partition by i.id, i.mon order by [i.id],
[i.mon] rows between 3 preceding and 1 preceding row) as revenue_3mon --
-- using 3 preceding and 1 preceding row you exclude the current row
from (select a.id, month(a.trasdate) as mon,
sum(a.revenue) as mon_revenue
from revenue a
group by a.id, month(a.trasdate)) i
group by i.id, i.mon
order by i.id,i.mon;
After few efforts, I gave up on this query and came up with new solution which was a bit close to my expectation (after lots of trial and errors).
Declare #count as int;
declare #max as int;
set #count = 4
declare #temp as table (id varchar(2), monthoftrasdate int, revenue int,
[3monavg] int);
SET #MAX = (SELECT distinct MAX(a.ROWNUM) FROM (SELECT id, month(trasdate)
as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
) A GROUP BY A.ID);
while (#count <= #max )
begin
WITH CTE AS (
SELECT id, month(trasdate) as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS
ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
)
insert into #temp
SELECT A.ID,A.MON, a.TotalRevenue
,( SELECT avg(b.TotalRevenue) as avgrev
FROM CTE B
WHERE B.ROWNUM BETWEEN A.ROWNUM-3 AND A.ROWNUM-1
AND A.ID = B.ID --AND A.mon = B.mon
--and b.ROWNUM < a.ROWNUM
and (a.mon > 3 and a.ROWNUM > 3)
GROUP BY B.id
) AS REVENUE_3MON
FROM CTE A
set #count = #count + 1
end
select distinct a.* from #temp a
The reason I had to use 'distinct' is because the query was showing duplicate records for every id and every month. So far the result shows like below
id MonthofTrasdate Revenue 3MonAvg
aa 1 400 NULL
aa 2 700 NULL
aa 3 350 NULL
aa 4 150 483
aa 5 200 400
aa 6 600 233
aa 7 500 316
aa 8 450 433
aa 9 1234 516
ab 1 400 NULL
ab 2 700 NULL
ab 3 350 NULL
ab 4 150 483
ab 5 200 400
ab 6 600 233
ab 7 500 316
ab 8 450 433
ab 9 1234 516
This pulls out past 3 month average for every month. But i will just manipulate the rest on SSRS the way i want it.
As currently my table has no data for previous year. This works for me showing the appropriate result for next couple of months for now. But my concern is when I have to show my boss for next year Jan, Feb and March then it should be able to pull also for these months as well like Oct-Dec (Previous year), Nov-Jan and Dec - Feb. I am struggling to figure out the proper way to put this in my query.
Can you please help me out with this query? And also let me know what is wrong with my former query.

Problems with your first attempt:
You enclosed some of the aliases and column names in square brackets like [i.mon_revenue]. There is no need for square brackets, but if you want to use them, you have to break them up at the dot: [i].[mon_revenue].
In your window function expression, there is one row too many (in the end).
Window functions are applied at the very end (after the rest of the respective query), so you also have to include i.mon_revenue in your GROUP BY clause of the outer query.
Knowing that the inner query will produce one row per id and mon, there will never be preceding rows in an id-mon partition. Therefore, you must not partition by both, but only by id.
To simplify the query after resolving the issues: ordering by a partition column generally makes no sense, and since - as already mentioned - the inner query returns unique id-mon combinations, you don't have to group by these in the outer query. Looking at that query, we see that the outer query just directly selects and uses the values from the inner query, which makes a separation in two queries unneccessary. So, in fact, you wanted to perform the following query, which will produce the rolling 3-month average (I added the monthly TotalRevenue as well):
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, MONTH(trasdate)
ORDER BY id, MONTH(trasdate);
Suggestions on your second attempt:
When calculating the #MAX value, you rely on the fact that each id has revenues for the same number of months. Are you sure?
The code inside the WHILE loop does not depend on #count, so it will add the same data into the #temp table multiple times, which is probably the reason why you thought you needed a DISTINCT. Therfore: No need for the variables, no need for a loop and a #temp, no need for DISTINCT.
The conditions A.mon > 3 and A.rownum > 3 are redundant with your current data. In general, I guess, you don't want to explicitly excluse the months from January to March, so A.mon > 3 should be removed. A.rownum > 3 could be removed, too, unless you really don't want to see a 3-month average when there are only 2 preceding months or less.
As the subquery for the average is restricted to only one id, there's no need for a GROUP BY.
Since the ROW_NUMBER function doesn't care about gaps in the months, I suggest to use a different numbering function, for example DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum. Of course, the comparison in the WHERE clause of the subquery then has to be changed to B.mnum BETWEEN A.mnum+1 AND A.mnum+3.
So, your second attempt can be reduced to this, which will produce the same result as the above, at least with your sample data, where no gaps in the months exist:
WITH CTE AS (
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum
FROM revenue
GROUP BY id, MONTH(trasdate)
)
SELECT id, mon, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Now, guess what, an expression like my mnum using DATEDIFF increases by one every month as we move to the past, regardless of a change of years, so this might be useful for grouping as well, whether you want to (or can?) use Window functions or not:
With OVER()
SELECT id, MONTH(MIN(trasdate)) AS mon, YEAR(MIN(trasdate)) AS yr, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MIN(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
ORDER BY id, DATEDIFF(month, trasdate, GETDATE()) DESC;
Without OVER()
WITH CTE AS (
SELECT id, MIN(trasdate) AS min_dt, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, trasdate, GETDATE()) AS mnum
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
)
SELECT id, MONTH(min_dt) AS mon, YEAR(min_dt) AS yr, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Both queries allow for retrieving the minimum and maximum date for each period (including month and year).
If you instead wanted what you originally posted under The result should show something like this (just grouping by previous 3-months intervals), you just would have to group your original revenue table by id and (DATEDIFF(month, trasdate, GETDATE())-1)/3 (filtering WHERE DATEDIFF(month, trasdate, GETDATE()) > 0). If so, this kind of grouping and aggregation could, of course, be done also by the Report Server.

I think this should do what you want:
select r.*,
avg(r.mon_revenue) over (partition by r.id
order by r.mon_min
rows between 3 preceding and 1 preceding row
) as revenue_3mon
-- using 3 preceding and 1 preceding row you exclude the current row
from (select r.id, month(r.trasdate) as mon,
min(r.trasdate) as mon_min,
sum(r.revenue) as mon_revenue
from revenue r
group by r.id, year(r.trasdate), month(r.trasdate)
) 4
order by r.id, r.mon, r.mon_min;
Notes:
I fixed the code so it recognizes years as well as dates.
The expression [i.mon_revenue] is not a valid column reference (in your case). You have no column with the name "i.mon_revenue" (with the . in the name).
I changed the column alias to r to match the table.
I added a date column for each month to make it easier to express the ordering.
The outer group by is not necessary.

There are several syntax errors in your code. This should give you what you need. The inner query is the important bit but hopefully this will be enough to get you on your way.
I switch our the temp table for variable and changed the revenue column to not be INT as you have decimal values in there but other than that your original sample table is unchanged
DECLARE #revenue table (id varchar(2), trasdate date, revenue float)
insert into #revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
SELECT
*
FROM
(
SELECT
*
, MONTH(trasdate) as MonthNumber
, AVG(revenue) OVER (PARTITION BY id
ORDER BY
id
, MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) as ThreeMonthAvg
FROM #revenue
) a
WHERE MONTH(GETDATE()) - MonthNumber IN (0, 3, 6, 9)
This gives the following results
aa 2018-06-04 600 6 400
aa 2018-09-01 1234.5 9 516.666666666667
ab 2018-03-01 350 3 700
ab 2018-06-04 600 6 233.333333333333
ab 2018-09-01 1234.5 9 516.666666666667

PGSQL duplicate record in same column

i have a table and i want to know where duplicate records are present for same columns. These are my columns and i want to get record where group_id or week are different for same code and fweek and newcode
Id newcode fweek code group_id week
1 343001 2016-01 343 100 8
2 343002 2016-01 343 100 8
3 343001 2016-01 343 101 08
Required record is
Id newcode fweek code group_id week
3 343001 2016-01 343 101 08

To find the duplicate values i have joined the table with itself.
and we need to group the results with code,fweek and newcode to get more than one duplicate rows if they exist. i have used max() to get last inserted row.
you don't need to use is distinct from (it is same for inequality + NULL). if you don't want to compare NULL ones, use <> operator.
You find more information about here info
select r.*
from your_table r
where r.id in (select max(r.id)
from your_table r
join your_table r2 on r2.code = r.code and r2.fweek = r.fweek and r2.newcode = r.newcode
where
r2.group_id is distinct from r.group_id or
r2.week is distinct from r.week
group by r.code,
r.fweek,
r.newcode
having count(*) > 1)

Retrieve information dynamically from multiple CTE

I have multiple CTEs and I want to retrieve some information from a couple of them into next CTE.
So, I have this information from one of the CTEs:
PeriodID StarDate
1 2006-01-01
2 2007-04-25
3 2008-08-16
4 2009-12-08
5 2011-04-017
and this from other:
RecordID Date
100 2007-04-15
101 2008-05-21
102 2008-06-06
103 2008-07-01
104 2009-11-12
And I need to show in next one:
RecordID Date PeriodID
100 2007-04-15 1
101 2008-05-21 2
102 2008-06-06 2
103 2008-07-01 2
104 2009-11-12 3
I can use some case/when statement to define if date of record is in period 1,2,3,4 or 5 but it some situation I can have different numbers of periods return from the first CTE.
Is there a way to do this in the above context?

You can have multiple CTEs defined as follows, and then select from and join them as you would any other table.
with cte1 as (select * ...),
cte2 as (select * ...)
select
cte2.*,
periodid
from cte2
cross apply
(select top 1 * from cte1 where cte2.recorddate> cte1.startdate order by startdate desc) v

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Query to find duplicates in Access table - ms-access-2003

Related

INNER JOIN change SUM result

PostgreSQL SELECT COUNT returning a bunch of 1s

Calculate past 3 month average for every past 3rd month

PGSQL duplicate record in same column

Retrieve information dynamically from multiple CTE

Categories

Resources