Lets say I have table with ID int, VALUE string:
ID | VALUE
1 abc
2 abc
3 def
4 abc
5 abc
6 abc
If I do select value, count(*) group by value I should get
VALUE | COUNT
abc 5
def 1
Now the tricky part, if there is count == 1 I need to get that ID from first table. Should I be using CTE? creating resultset where I will add ID string == null and run update b.ID = a.ID where count == 1 ?
Or is there another easier way?
EDIT:
I want to have result table like this:
ID VALUE count
null abc 5
3 def 1
If your ID values are unique, you can simply check to see if the max(id) = min(id). If so, then use either one, otherwise you can return null. Like this:
Select Case When Min(id) = Max(id) Then Min(id) Else Null End As Id,
Value, Count(*) As [Count]
From YourTable
Group By Value
Since you are already performing an aggregate, including the MIN and Max function is not likely to take any extra (noticeable) time. I encourage you to give this a try.
The way I would do it would indeed be a CTE:
using #group AS (SELECT value, Count(*) as count from MyTable GROUP BY value HAVING count = 1)
SELECT MyTable.ID, #group.value, #group.count from MyTable
JOIN #group ON #group.value = MyTable.value
When using group by, after the group by statement you can use a having clause.
So
SELECT [ID]
FROM table
GROUP BY [VALUE]
HAVING COUNT(*) = 1
Edit: with regards to your edited question: this uses some fun joins and unions
CREATE TABLE #table
(ID int IDENTITY,
VALUE varchar(3))
INSERT INTO #table (VALUE)
VALUES('abc'),('abc'),('def'),('abc'),('abc'),('abc')
SELECT * FROM (
SELECT Null as ID,VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) > 1
UNION ALL
SELECT t.ID,t.VALUE,p.Count FROM
#table t
JOIN
(SELECT VALUE, COUNT(*) as [Count]
FROM #table
GROUP BY VALUE
HAVING COUNT(*) = 1) p
ON t.VALUE=p.VALUE
) a
DROP TABLE #table
maybe not the most efficient but something like this works:
SELECT MAX(Id) as ID,Value FROM Table WHERE COUNT(*) = 1 GROUP BY Value
Related
I have a test table
ID V_ID
1 1
1 2
I want max(V_ID) and resulr should be V_ID 2
select Id,max(V_ID) from test
group by Id,value
I am trying simple query but it's still pulling two records. Is there any other simple query 1) we can try rank 2)?
You should be grouping only by the ID column:
SELECT ID, MAX(V_ID)
FROM test
GROUP BY IdD;
A more general pattern for this type of problem uses ROW_NUMBER to find the entire record for each Id having the max value of V_ID:
SELECT ID, V_ID
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY V_ID DESC) rn
FROM test
) t
WHERE rn = 1;
I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?
here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.
I have two tables (one for quarter one, one for quarter two), each of which contains employees who have bonus in that quarter. Every employee has a unique id in the company.
I want to get all employees who has bonus in either q1 or q2. No duplicate employee is needed. Both Id, and Amount are required.
Below is my solution, I want to find out if there is a better solution.
declare #q1 table (
EmployeeID int identity(1,1) primary key not null,
amount int
)
declare #q2 table (
EmployeeID int identity(1,1) primary key not null,
amount int
)
insert into #q1
(amount)
select 1
insert into #q1
(amount)
select 2
select * from #q1
insert into #q2
(amount)
select 1
insert into #q2
(amount)
select 11
insert into #q2
(amount)
select 22
select * from #q2
My Solution:
;with both as
(
select EmployeeID
from #q1
union
select EmployeeID
from #q2
)
select a.EmployeeID, a.amount
from #q1 as a
where a.EmployeeID in (select EmployeeID from both)
union all
select b.EmployeeID, b.amount
from #q2 as b
where b.EmployeeID in (select EmployeeID from both) and b.EmployeeID NOT in (select EmployeeID from #q1)
Result:
EmployeeID, Amount
1 1
2 2
3 22
SELECT EmployeeID, Name, SUM(amount) AS TotalBonus
FROM
(SELECT EmployeeID, Name, amount
from #q1
UNION ALL
SELECT EmployeeID, Name, amount
from #q2) AS all
GROUP BY EmployeeID, Name
The subselect UNIONS both tables together. The GROUP BY gives you one row per employee and the SUM means that if someone got lucky in both qs then you get the total. I'm guessing that's the right thing for you.
try this one:
SELECT EmployeeID
FROM EmployeeList
WHERE EmployeeID IN
(SELECT EmployeeID From QuarterOne
UNION
SELECT EmployeeID From QuarterTwo)
OR by using JOIN
SELECT EmployeeID
FROM EmployeeList a INNER JOIN QuarterTwo b
ON a.EmployeeID = b.EmployeeID
INNER JOIN QuarterTwo c
ON a.EmployeeID = c.EmployeeID
This will return all EmployeeID that has record in either quarter.
Try:
SELECT DISTINCT q1.EmployeeID --- Same as q2.EmployeeID thanks to the join
, q1.EmployeeName -- Not defined in OP source.
FROM #q1 AS q1
CROSS JOIN #q2 AS q2
WHERE q1.amount IS NOT NULL
OR q2.amount IS NOT NULL
I am using this query to update a column with ascending values:
DECLARE #counter NUMERIC(10, 0)
SET #counter = 1400000
UPDATE SomeTable
SET #counter = SomeColumn = #counter + 1
Question is, how can I not put duplicates there? For example the column already has 1400002 as value. Normally it has NULLs, but sometimes it doesnt. I could add
where SomeColumn is null
but this would not avoid duplicates. Any ideas?
Thanks
I am not sure that this will help or not but you can put your existing data into temp table and then use that temp table to remove duplicates like:
WHERE (#counter + 1) not in ( select SomeColumn from #temp)
If above is not correct then please explain your question a little more.
This worked for me in SQL Server 2008:
DECLARE #StartNumber int, #EndNumber int;
SET #StartNumber = 100;
SELECT #EndNumber = #StartNumber + COUNT(*) - 1 FROM SomeTable;
WITH numbers AS (
SELECT #StartNumber AS Value
UNION ALL
SELECT
Value + 1
FROM numbers
WHERE Value < #EndNumber
),
validnumbers AS (
SELECT
n.Value,
rownum = ROW_NUMBER() OVER (ORDER BY n.Value)
FROM numbers n
LEFT JOIN SomeTable t ON n.Value = t.Value
WHERE t.Value IS NULL
),
RowsToUpdate AS (
SELECT
Value,
rownum = ROW_NUMBER() OVER (ORDER BY Value)
FROM SomeTable
WHERE Value IS NULL
OR Value NOT IN (SELECT Value FROM numbers)
)
UPDATE r
SET Value = v.Value
FROM RowsToUpdate r
INNER JOIN validnumbers v ON v.rownum = r.rownum;
Basically, it implements the following steps:
Create a number table.
Exclude the numbers present in SomeTable.
Rank the rest of the rows.
Exclude the values from SomeTable that are present in the number table.
Rank the rest of the rows.
Update the ranked rows of SomeTable from the ranked number list.
Not sure how good this solution would be for big tables, though...
I am trying to find duplicate rows in my DB, like this:
SELECT email, COUNT(emailid) AS NumOccurrences
FROM users
GROUP BY emailid HAVING ( COUNT(emailid) > 1 )
This returns the emailid and the number of matches found. Now what I want do is compare the ID column to another table I have and set a column there with the count.
The other table has a column named duplicates, which should contain the amount of duplicates from the select. So let's say we have 3 rows with the same emailid. The duplicates column has a "3" in all 3 rows. What I want is a "2" in the first 2 and nothing or 0 in the last of the 3 matching ID rows.
Is this possible?
Update:
I managed to have a temporary table now, which looks like this:
mailid | rowcount | AmountOfDups
643921 | 1 | 3
643921 | 2 | 3
643921 | 3 | 3
Now, how could I decide that only the first 2 should be updated (by mailid) in the other table? The other table has mailid as well.
SELECT ...
ROW_NUMBER() OVER (PARTITION BY email ORDER BY emailid DESC) AS RN
FROM ...
...is a great starting point for such a problem. Never underestimate the power of ROW_NUMBER()!
Using Sql Server 2005+ you could try something like (full example)
DECLARE #Table TABLE(
ID INT IDENTITY(1,1),
Email VARCHAR(20)
)
INSERT INTO #Table (Email) SELECT 'a'
INSERT INTO #Table (Email) SELECT 'b'
INSERT INTO #Table (Email) SELECT 'c'
INSERT INTO #Table (Email) SELECT 'a'
INSERT INTO #Table (Email) SELECT 'b'
INSERT INTO #Table (Email) SELECT 'a'
; WITH Duplicates AS (
SELECT Email,
COUNT(ID) TotalDuplicates
FROM #Table
GROUP BY Email
HAVING COUNT(ID) > 1
)
, Counts AS (
SELECT t.ID,
ROW_NUMBER() OVER(PARTITION BY t.Email ORDER BY t.ID) EmailID,
d.TotalDuplicates
FROM #Table t INNER JOIN
Duplicates d ON t.Email = d.Email
)
SELECT ID,
CASE
WHEN EmailID = TotalDuplicates
THEN 0
ELSE TotalDuplicates - 1
END Dups
FROM Counts