where exists - all - group by? - tsql

I use SQL Server 2008 R2.
I have a weird problem as following. I have a table as shown in
I need to write such a query like:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE Field2 IN (96,102)
in this query, WHERE Field2 IN (96,102) gives me 96 or 102 or both!
More over, I would like to return rows that contains 96 and 102 at the same time!
Is there any suggestion? please write result oriented...

I have made a sqlfiddle for this..
create table a (id int, val int)
go
insert into a select 1, 22
insert into a select 1, 122
insert into a select 2, 22
insert into a select 3, 122
insert into a select 4, 22
insert into a select 4, 122
then select like this
select count(distinct id), id
from a
where val in (22, 122)
group by id
having count(id) > 1
EDIT: count(distinct id) will only show distinct counts..

EDIT:
Here's a sqlfiddle example (thanks to Mark Kremers):
http://sqlfiddle.com/#!3/df201/1
create table mytable (field1 int, field2 int)
go
insert into mytable values (199201, 84)
insert into mytable values (199201, 96)
insert into mytable values (199201, 102)
insert into mytable values (199201, 103)
insert into mytable values (581424, 96)
insert into mytable values (581424, 84)
insert into mytable values (581424, 106)
insert into mytable values (581424, 122)
insert into mytable values (687368, 79)
insert into mytable values (687368, 96)
insert into mytable values (687368, 102)
insert into mytable values (687368, 104)
insert into mytable values (687368, 106)
Here's the query:
select distinct a.field1 from
( select field1 from mytable where field2=96) a
inner join
( select field1 from mytable where field2=102) b
on a.field1 = b.field1
And here are the results:
FIELD1
199201
687368
Finally, here's a simplified version of the query (thans to pst):
select distinct a.field1 from mytable a
inner join mytable b
on a.field1 = b.field1
where a.field2=96 and b.field2=102

Use a self-join? Not the most tidy, but I think it works well for 2 values
SELECT *
FROM T R1
JOIN T R2 -- join table with itself
ON R1.F1 = R2.F1 -- where the first field is the same
WHERE R1.F2 = 96 AND R2.F2 = 102 -- and each has one of the required values
(T = Table, Rx = Relation Alias, Fx = Field)
If there can be an arbitrary number of fields, this can be solved as
CREATE TABLE #T (id int, val int)
GO
INSERT INTO #T (id, val)
VALUES
(1, 22), (1, 22), -- no, only 22 (but 2 records)
(2, 22), (2, 122), -- yes, both values (only)
(3, 122), -- no, only 122
(4, 22), (4,122), -- yes, both values ..
(4, 444), (4, null), -- and extra values
(5, 555) -- no, neither value
GO
-- Using DISTINCT over filtered results first, as
-- SQL Server 2008 does not support HAVING COUNT(DISTINCT F1, F2)
SELECT id
FROM (SELECT DISTINCT id, val
FROM #T
WHERE val IN (22, 122)) AS R1
GROUP BY id
HAVING COUNT(id) >= 2 -- or 3 or ..
GO
-- Or a similar variation, as can COUNT(DISTINCT ..)
-- in the SELECT of a GROUP BY
SELECT id
FROM (SELECT id, COUNT(DISTINCT val) as ct
FROM #T
WHERE val IN (22, 122)
GROUP BY id) AS R1
WHERE ct >= 2 -- or 3 or ..
GO
For larger IN (..) sizes, say above 20 values, it may be advisable to use a separate table or table-value and a JOIN for performance reasons.

Try from your original query:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE rtrim(ltrim(cast(Field2 as varchar))) IN ('96','102')

Related

How to get values to dense table with columns of categories using PostgreSQL (crosstab)?

I have this toy example which gives me sparse table of values separated in their different categories. I would want to have dense matrix, where all columns are individually ordered.
drop table if exists temp_table;
create temp table temp_table(
rowid int
, category text
, score int
);
insert into temp_table values (0, 'cat1', 10);
insert into temp_table values (1, 'cat2', 21);
insert into temp_table values (2, 'cat3', 32);
insert into temp_table values (3, 'cat2', 23);
insert into temp_table values (4, 'cat2', 24);
insert into temp_table values (5, 'cat3', 35);
insert into temp_table values (6, 'cat1', 16);
insert into temp_table values (7, 'cat1', 17);
insert into temp_table values (8, 'cat2', 28);
insert into temp_table values (9, 'cat2', 29);
Which gives this temporary table:
rowid
category
score
0
cat1
10
1
cat2
21
2
cat3
32
3
cat2
23
4
cat2
24
5
cat3
35
6
cat1
16
7
cat1
17
8
cat2
28
9
cat2
29
Then ordering score values to different columns based on their category:
select "cat1", "cat2", "cat3"
from crosstab(
$$ select rowid, category, score from temp_table $$ -- as source_sql
, $$ select distinct category from temp_table order by category $$ -- as category_sql
) as (rowid int, "cat1" int, "cat2" int, "cat3" int)
That outputs:
cat1
cat2
cat3
10
21
32
23
24
35
16
17
28
29
But I would want the result of the query to be dense, like:
cat1
cat2
cat3
10
21
32
16
23
35
17
24
28
29
Maybe PostgreSQL's crosstab is not even right tool to do this, but that comes to my mind first as it produces that sparse table close to the result I would need.
This should work for the exact given example data and expected output.
select max(cat1), max(cat2), max(cat3)
from crosstab(
$$ select rank() over(partition by category order by rowid) as ranking,
rowid,
category,
score
from temp_table
order by rowid, category asc$$ -- as source_sql
, $$ select distinct category
from temp_table
order by category $$ -- as category_sql
) as (ranking int, rowid int, "cat1" int, "cat2" int, "cat3" int)
group by ranking
order by ranking asc
You can test the solution here - https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f198e40a18a282cc0d65fa6ecdf797cb
Edit:
Improvements made to your query to arrive at the solution:
In the source SQL query, I have ranked the category values based on the rowid order, which helps "determining" the order of the expected values, as per your requirement.
select rank() over(partition by category order by rowid) as ranking, rowid, category, score from temp_table order by rowid, category asc
In the external query, I am effectively picking the max() values of each category, for each of the rankings as obtained in the source SQL query.
with cte as (
select category, score, row_number() over (
partition by category order by score
) as r
from temp_table
)
select
sum(score) filter (where category = 'cat1') as cat1,
sum(score) filter (where category = 'cat2') as cat2,
sum(score) filter (where category = 'cat3') as cat3
from cte
group by r
order by r
;
If the number of columns is known and it is reasonably small, FILTER might be a better option than CROSSTAB, which requires an extension.

Converting Traditional IF EXIST UPDATE ELSE INSERT into MERGE is not working?

I am going to use MERGE to insert or update a table depending upon ehether it's exist or not. This is my query,
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT id FROM #t WHERE ID = 2) t2 ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
The result is,
id name
1 a
I think it should be,
id name
1 a
2 b
You have your USING part slightly messed up, that's where to put what you want to match against (although in this case you're only using id)
declare #t table
(
id int,
name varchar(10)
)
insert into #t values(1,'a')
MERGE INTO #t t1
USING (SELECT 2, 'b') AS t2 (id, name) ON (t1.id = t2.id)
WHEN MATCHED THEN
UPDATE SET name = 'd', id = 3
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (2, 'b');
select * from #t;
As Mikhail pointed out, your query in the USING clause doesn't contain any rows.
If you want to do an upsert, put the new data into the USING clause:
MERGE INTO #t t1
USING (SELECT 2 as id, 'b' as name) t2 ON (t1.id = t2.id) --This no longer has an artificial dependency on #t
WHEN MATCHED THEN
UPDATE SET name = t2.name
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (t2.id, t2.name);
This query won't return anything:
SELECT id FROM #t WHERE ID = 2
Because where is no rows in table with ID = 2, so there is nothing to merge into table.
Besides, in MATCHED clause you are updating a field ID on which you are joining table, i think, it's forbidden.
For each DML operations you have to commit (Marks the end of a successful the transaction)Then only you will be able to see the latest data
For example :
GO
BEGIN TRANSACTION;
GO
DELETE FROM HumanResources.JobCandidate
WHERE JobCandidateID = 13;
GO
COMMIT TRANSACTION;
GO

Select value from an enumerated list in PostgreSQL

I want to select from an enumaration that is not in database.
E.g. SELECT id FROM my_table returns values like 1, 2, 3
I want to display 1 -> 'chocolate', 2 -> 'coconut', 3 -> 'pizza' etc. SELECT CASE works but is too complicated and hard to overview for many values. I think of something like
SELECT id, array['chocolate','coconut','pizza'][id] FROM my_table
But I couldn't succeed with arrays. Is there an easy solution? So this is a simple query, not a plpgsql script or something like that.
with food (fid, name) as (
values
(1, 'chocolate'),
(2, 'coconut'),
(3, 'pizza')
)
select t.id, f.name
from my_table t
join food f on f.fid = t.id;
or without a CTE (but using the same idea):
select t.id, f.name
from my_table t
join (
values
(1, 'chocolate'),
(2, 'coconut'),
(3, 'pizza')
) f (fid, name) on f.fid = t.id;
This is the correct syntax:
SELECT id, (array['chocolate','coconut','pizza'])[id] FROM my_table
But you should create a referenced table with those values.
What about creating another table that enumerate all cases, and do join ?
CREATE TABLE table_case
(
case_id bigserial NOT NULL,
case_name character varying,
CONSTRAINT table_case_pkey PRIMARY KEY (case_id)
)
WITH (
OIDS=FALSE
);
and when you select from your table:
SELECT id, case_name FROM my_table
inner join table_case on case_id=my_table_id;

SQL GROUP BY HAVING issue

I have two tables of records that I need to find all of the matches. The tables are based on different Primary Key identifiers, but the data points are exactly the same. I need a fast query that can show me records that are duplicated from the first table to the second. Here is an example of what I am trying to do:
DECLARE #Table1 TABLE (ID INT, Value INT)
DECLARE #Table2 TABLE (ID INT, Value INT)
INSERT INTO #Table1 VALUES (1, 500)
INSERT INTO #Table1 VALUES (2, 500)
INSERT INTO #Table2 VALUES (3, 500)
INSERT INTO #Table2 VALUES (4, 500)
SELECT MAX(x.T1ID)
,MAX(x.T2ID)
FROM (
SELECT T1ID = t1.ID
,T2ID = 0
,t1.Value
FROM #Table1 t1
UNION ALL
SELECT T1ID = 0
,T2ID = t2.ID
,t2.Value
FROM #Table2 t2
) x
GROUP BY x.Value
HAVING COUNT(*) >= 2
The problem with this code is that it returns record 2 in table 1 correlated to record 4 in table 2. I really need it to return record 1 in table 1 correlated to record 3 in table 2. I tried the following:
SELECT MIN(x.T1ID)
,MIN(x.T2ID)
FROM (
SELECT T1ID = t1.ID
,T2ID = 0
,t1.Value
FROM #Table1 t1
UNION ALL
SELECT T1ID = 0
,T2ID = t2.ID
,t2.Value
FROM #Table2 t2
) x
GROUP BY x.Value
HAVING COUNT(*) >= 2
This code does not work either. It returns 0,0.
Is there a way to return the MIN value greater than 0 for both tables?
Might answer my own question. This seems to work. Are there any reasons why I would not do this?
SELECT MIN(t1.ID)
,MIN(t2.ID)
FROM #Table1 t1
INNER JOIN #Table2 t2 ON t1.Value = t2.Value
GROUP BY t1.Value
If you want to see the records in table1 that have matches in table2 then
select *
from #Table1 T1
where exists (select * from #Table2 T2
where T1.ID=T2.ID
-- you would put the complete join clause that defines a match here
)

TSQL: Remove duplicates based on max(date)

I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1