Removing duplicate data from the query with different condition

Removing duplicate data from the query with different condition - ssrs-2008

In the view I've created this query,
SELECT DISTINCT
ClientCode, ClientName, Address1, Address2, City, Country, CreatedBy, CreatedDate
FROM
Contact
GROUP BY
ClientCode, ClientName, Address1, Address2, City, Country, CreatedBy, CreatedDate
Which will give me this result,
001 ABC Lot No, Road, City B, US Alice 03/04/2012
001 ABC Lot No, Road, City B, US Benny 04/04/2012
How should I design my query so that I can filter out the duplicate data? I wanted to ignore the two fields - CreatedBy and CreatedDate by showing only 1 row of data. This is the result I wanna get.
001 ABC Lot No, Road, City B, US Alice 03/04/2012 !OR!
001 ABC Lot No, Road, City B, US Benny 04/04/2012
I want the query to have the ability to filter out duplicate data by comparing only the ClientCode, ClientName, Address1, Address2, City and Country. The reason of keeping the CreatedBy and CreatedDate is because I have to include it in another interface.

You can use a ranking function to get the most recent contact details for each address, based on the most recent date:
with recentContact as
(
select *
, mostRecentRank = row_number() over
(
partition by ClientCode
,ClientName
,Address1
,Address2
,City
,Country
order by CreatedDate desc
)
from contacts
)
select ClientCode
,ClientName
,Address1
,Address2
,City
,Country
,CreatedBy
,CreatedDate
from recentContact
where mostRecentRank = 1
SQL Fiddle with demo.

try to add
GROUP BY field
Where field should be any field that is equal like clientname
In your special case, you can add multiple fields to GROUP BY like
GROUP BY clientname, clientfoo ..

Related

Creating query to reference prior row over grouped list of users

We have a collection of users with duplicates, and I'm writing a process to merge them. Basically selecting out all users with matching names and DOB's, then I need a list of user id's to merge them together. Here's an example:
CREATE TABLE #tmpUsers (UserID Integer NOT NULL PRIMARY KEY, FullName NVARCHAR(50), Birthdate DATE);
INSERT INTO #tmpUsers (UserID, FullName, Birthdate)
VALUES
(120,'John Michael','1985-03-02'),
(45,'John Michael','1985-03-02'),
(60,'John Michael','1985-03-02'),
(33,'John Michael','1985-03-02'),
(12,'Tim Smith','1973-01-02'),
(16,'Tim Smith','1973-01-02'),
(29,'Jane Thomas','1990-06-20'),
(43,'Jane Thomas','1990-06-20'),
(8,'Jane Thomas','1990-06-20');
The process I'm building needs to have a new table ordered by the Fullname and DOB, but have the current and prior ID so it can merge together, like this:
Name
DOB
Merge From
Merge To
Jane Thomas
1990-06-20
8
29
Jane Thomas
1990-06-20
29
43
John Michael
1985-03-02
33
45
John Michael
1985-03-02
45
60
John Michael
1985-03-02
60
120
Tim Smith
1973-01-02
12
16
The process basically merges or collapses the oldest values into the newest one, so in the end we will only have one User for each. I'm just unable to find any good way to do this, though I'm sure there's a simple TSQL method. I hoped someone had advise on how to build it.
In the end after my process runs it'll have three users with ID's 16, 43, 120. The others will either be removed or deactivated, but just getting the query to start the process is where I'm hung.
Thanks.

This will do it:
SELECT *
FROM (
SELECT FullName as Name, BirthDate as DOB, UserID as [Merge From],
LEAD(UserID) OVER(PARTITION BY fullname, birthdate
ORDER BY fullname, birthdate, userid) as [Merge To]
from #tmpUsers
) t
WHERE [Merge To] IS NOT NULL
ORDER BY Name, DOB, [Merge From];
See it work here:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=c2557bc038cab44ab000a1b35ab1563b

This is ubiquitously solved with row_number to select the max value per group:
with u as (
select UserId, FullName, BirthDate,
Row_Number() over(partition by FullName order by UserId desc) keepMe
from #tmpUsers
)
select UserId, FullName, BirthDate
from u
where KeepMe=1

Though my suggestion follows a different approach to solve the underlying problem, why not run this simple query,
SELECT
MIN(UserID) AS MergeFrom,
MAX(UserID) AS MergeTo,
FullName,
BirthDate
FROM #tmpusers
GROUP BY
FullName,
BirthDate
HAVING MIN(UserID)<>MAX(UserID)
shift users as indicated and do this in a loop until the query returns an empty result set?

How effeciently count categories (250) of a categorical attribute? PostgreSQL or Python

I have a big database with 50 attributes (8 categorical) and I need to create a summary with a count of all categories of each variable grouped by city and state. One of the attributes has over 250 categories.
So far I was able to create a query that counts one category per attribute at the time grouped by city and exported to csv.
(select city as "City", COUNT(use4) as "use2056"
from demo
where use4 = '2056'
group by city
order by city asc)
I was thinking about manually copying and pasting (I know it will take forever) but I get outputs with different rows. Also, there are cities with the US with the same name (I will eventually need to visualize it). I tried to use several SELECT per query but I cannot make it work.
Select
(select city as "City", COUNT(use4) as "use2056"
from demo
where use4 = '2056'
group by city
order by city asc),
(COUNT(use4) as "use2436"
from demo
where use4 = '2436'
group by city
order by city asc),
(COUNT(use4) as "use9133"
from demo
where use4 = '9133'
group by city
order by city asc)
I also tried to add the city and county and additional counts
(select zip as "ZIPCODE", city, county, COUNT(use4) as "Use4count1466", COUNT(use4) as "Use4count9133"
from demo
where use4 = '1466',
where use4 = '9133'
group by zip, city, county
order by zip asc)
is there anyway to do this efficiently? create a loop that keeps counting every category of each attribute? How many SELECT can you have in a query? I need to find a way to display zipcode, county, city and count all the categories of each categorical attribute.

You can use filtered aggregation to do this in a single query:
select city,
count(*) filter (where use4 = '2056') as use2056,
count(*) filter (where use4 = '2436') as use2436,
count(*) filter (where use4 = '9133') as use9133,
from demo
where use4 in ('2056', '2436', '9133')
group by city;
You can apply the same for the second query:
select zip as "ZIPCODE",
city,
county,
count(*) filter (where use4 = '1466') as use4count1466,
count(*) filter (where use4 = '9133') as use4count9133
from demo
where use4 in ('1466','9133')
group by zip, city, county

Using "UNION ALL" and "GROUP BY" to implement "Intersect"

I'v provided following query to find common records in 2 data sets but it's difficult for me to make sure about correctness of my query because of that I have a lot of data records in my DB.
Is it OK to implement Intersect between "Customers" & "Employees" tables using UNION ALL and apply GROUP BY on the result like below?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
So can we say that any record which exists in the result of this query also exists in the Intersect set between "Customers & Employees" tables AND any record that exists in Intersect set between "Customers & Employees" tables will be in the result of this query too?

So is it right to say any record in result of this query is in
"Intersect" set between "Customers & Employees" "AND" any record that
exist in "Intersect" set between "Customers & Employees" is in result
of this query too?
YES.
... Yes, but it won't be as efficient because you are filtering out duplicates three times instead of once. In your query you're
Using DISTINCT to pull unique records from employees
Using DISTINCT to pull unique records from customers
Combining both queries using UNION ALL
Using GROUP BY in your outer query to to filter the records you retrieved in steps 1,2 and 3.
Using INTERSECT will return identical results but more efficiently. To see for yourself you can create the sample data below and run both queries:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
Run these queries:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
Results:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
If using INTERSECT is not an option OR you want a faster query you could improve the query you posted a couple different ways, such as:
Option 1: let GROUP BY handle ALL the de-duplication like this:
This is the same as what you posted but without the DISTINCTS
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
Option 2: Use ROW_NUMBER
This would be my preference and will likely be most efficient
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;

psql, display column that is not in the group by clause

i'm having problems with a query. I have two tables: country and city and i want to display the city with the highest population per country.
Here's the query:
select country.name as coname, city.name as ciname, max(city.population) as pop
from city
join country on city.countrycode=country.code
group by country.name
order by pop;`
Error
column "city.name" must appear in the GROUP BY clause or be used in an aggregate function.
I don't know how to solve this, i tried to make a subquery but it didn't work out.
How can i make it work?

You can easly get it using rank function:
select * from
(
select country.name as coname,
city.name as ciname,
city.population,
rank() over (partition by country.name order by city.population desc) as ranking
from
city
join
country
on city.countrycode=country.code
) A
where ranking = 1

access SQL error with order by and group by

I am working on a homework exercise and I am ussing MS access 2013 and I am writing and sql that produces a query tha is supposed to Show the LastName and FirstName of all customers who have had an order with an Item named 'Dress Shirt'. Use a subquery. Present the results sorted by LastName, in ascending order and then FirstName in descending order. This is the code I wrote following what I have learned from the book
SELECT LastName, FirstName
FROM CUSTOMER, INVOICE_ITEM
WHERE Item In
(SELECT Item
FROM INVOICE_ITEM
WHERE Item="Dress Shirt")
GROUP BY LastName
ORDER BY FirstName DESC;
I get an error that my query does not include a specified expression 'FirstName' as part of an agregate funtion.

You have to group by with all the fields in select statement unless you count, sum, etc
Try:
SELECT LastName, FirstName
FROM CUSTOMER, INVOICE_ITEM
WHERE Item In (SELECT Item FROM INVOICE_ITEM WHERE Item="Dress Shirt") GROUP BY LastName, FirstName ORDER BY FirstName DESC;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Removing duplicate data from the query with different condition - ssrs-2008

try to add GROUP BY field Where field should be any field that is equal like clientname In your special case, you can add multiple fields to GROUP BY like GROUP BY clientname, clientfoo ..

Related

Creating query to reference prior row over grouped list of users

How effeciently count categories (250) of a categorical attribute? PostgreSQL or Python

Using "UNION ALL" and "GROUP BY" to implement "Intersect"

psql, display column that is not in the group by clause

access SQL error with order by and group by

Categories

Resources