I need to work out how to do the following.
I have two columns.
One is the Transaction Reference which is a unique number in my example I have 4,5,6.
The other is an Analysis Code 9 field. This will only ever be A, O or N.
When an Analysis 9 Code does not exist in this case O and A are missing I need to create a row per Transaction Reference - this is because I then need to use a combination of those to output a file in SSIS and the customer requires a blank file even if say Analysis Code O is not available.
So I would I would expect 6 rows created in this example.
A O for batch 4,5,6.
A A for batch 4,5,6.
You basically want to find all the distinct transaction references and cross them with all possible codes. You then need to filter this to find out the missing pairs.
The LEFT JOIN at the end will do the filtering for you.
CREATE TABLE [#trans]
(
[Transaction Reference] int,
[Analysis 9 Code] char(1)
)
CREATE TABLE [#codes]
(
[Code] char(1)
)
-- Create a table with all potential code values
INSERT INTO [#codes] ([Code]) VALUES ('A')
INSERT INTO [#codes] ([Code]) VALUES ('N')
INSERT INTO [#codes] ([Code]) VALUES ('O')
-- Insert your dummy data
INSERT INTO [#trans] ([Transaction Reference], [Analysis 9 Code]) VALUES (6, 'N')
INSERT INTO [#trans] ([Transaction Reference], [Analysis 9 Code]) VALUES (6, 'N')
INSERT INTO [#trans] ([Transaction Reference], [Analysis 9 Code]) VALUES (4, 'N')
INSERT INTO [#trans] ([Transaction Reference], [Analysis 9 Code]) VALUES (4, 'N')
INSERT INTO [#trans] ([Transaction Reference], [Analysis 9 Code]) VALUES (5, 'N')
INSERT INTO [#trans] ([Transaction Reference], [Analysis 9 Code]) VALUES (6, 'N')
SELECT [ExistingRefs].[Transaction Reference] AS [Transaction Reference],
[#codes].[Code] AS [Analysis 9 Code]
FROM
(
SELECT DISTINCT([Transaction Reference]) [Transaction Reference] FROM [#trans]
) [ExistingRefs]
CROSS JOIN [#codes]
LEFT JOIN [#trans] ON [ExistingRefs].[Transaction Reference] = [#trans].[Transaction Reference]
AND [#codes].[Code] = [#trans].[Analysis 9 Code]
WHERE [#trans].[Analysis 9 Code] IS NULL
DROP TABLE [#trans]
DROP TABLE [#codes]
Please try the below.
select distinct t.[Transaction Reference],x.code from #trans t
cross apply(select distinct code from #codes c inner join #trans tt on c.Code<>tt.[Analysis 9 Code])x
order by t.[Transaction Reference]
Thanks
Related
Apologies if my question seems to be naive:
I cannot get my head around the 2 statements below, can someone please explain the difference:
OUTPUT $ACTION, INSERTED.BuildRequestID, ..... and
PRINT ##ROWCOUNT
Apparently, they both can be used to print something on the window, with output in the example above, the records that have been inserted will be displayed. And, PRINT ##ROWCOUNT returns the number of rows affected by the last executed statement in the batch, so, if the function was insert, then it will show the inserted records?
Thank you,
In its simplest terms, OUTPUT will give you the actual records affected by a DML statement (INSERT, UPDATE, DELETE, MERGE), ##ROWCOUNT will just tell you how many rows were affected by the previous Statement (not limited to DML).
This is probably easiest understood with a working example that you can run yourself and see both in action:
IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;
-- CHECK ##ROWCOUNT
DECLARE #RowCountFromDropTable INT = ##ROWCOUNT;
-- CREATE A TABLE
CREATE TABLE #T (ID INT NOT NULL PRIMARY KEY, Col CHAR(1) NOT NULL);
-- INSERT SOME VALUES AND CHECK THE OUTPUT
INSERT #T (ID, Col)
OUTPUT inserted.*
VALUES (1, 'A'), (2, 'B'), (3, 'C');
-- CHECK ##ROWCOUNT
DECLARE #RowCountFromInsert INT = ##ROWCOUNT;
-- DELETE A VALUE AND INSPECT THE DELETED RECORD WITH OUTPUT
DELETE #T
OUTPUT deleted.*
WHERE ID = 3;
-- CHECK ##ROWCOUNT
DECLARE #RowCountFromDelete INT = ##ROWCOUNT;
-- UPDATE A RECORD AND VIEW BEFORE AND AFTER VALUES
UPDATE #T
SET Col = 'X'
OUTPUT inserted.ID AS ID,
inserted.Col AS UpdatedTo,
deleted.Col AS UpdatedFrom
WHERE ID = 2;
-- CHECK ##ROWCOUNT
DECLARE #RowCountFromUpdate INT = ##ROWCOUNT;
-- USE MERGE, AND CAPTURE ACTION:
MERGE #T AS t
USING (VALUES (2, 'B'), (3, 'C')) AS s (ID, Col)
ON s.ID = t.ID
WHEN NOT MATCHED THEN INSERT (ID, Col) VALUES (s.ID, s.Col)
WHEN MATCHED THEN UPDATE SET Col = s.Col
WHEN NOT MATCHED BY SOURCE THEN DELETE
OUTPUT $Action AS DMLAction,
inserted.ID AS InsertedID,
inserted.Col AS InsertedCol,
deleted.ID AS DeletedID,
deleted.Col AS DeletedCol;
-- CHECK ##ROWCOUNT
DECLARE #RowCountFromMerge INT = ##ROWCOUNT;
SELECT RowCountFromDropTable = #RowCountFromDropTable,
RowCountFromInsert = #RowCountFromInsert,
RowCountFromDelete = #RowCountFromDelete,
RowCountFromUpdate = #RowCountFromUpdate,
RowCountFromMerge = #RowCountFromMerge;
The recordsets output from each of the DML are:
INSERT
ID Col
-------
1 A
2 B
3 C
DELETE
ID Col
-------
3 C
UPDATE
ID UpdatedTo UpdatedFrom
---------------------------
2 X B
MERGE
DMLAction InsertedID InsertedCol DeletedID DeletedCol
------------------------------------------------------------
INSERT 3 C NULL NULL
DELETE NULL NULL 1 A
UPDATE 2 B 2 X
INSPECT ##ROWCOUNTS
RowCountFromDropTable RowCountFromInsert RowCountFromUpdate RowCountFromMerge
--------------------------------------------------------------------------------
0 3 1 3
A quick point on some wording in the qeustion too: You cannot use OUTPUT directly to print something to the window, it returns records much like a SELECT statement. ##ROWCOUNT can be used like any scalar function, so you could use this in consecutive statements. So you could do something like this:
SELECT TOP (1) *
FROM (VALUES (1), (2), (3)) AS t (ID);
SELECT TOP (##ROWCOUNT + 1) *
FROM (VALUES (1), (2), (3)) AS t (ID);
SELECT TOP (##ROWCOUNT + 1) *
FROM (VALUES (1), (2), (3)) AS t (ID);
Which returns 1, 1,2 and 1,2,3 respectively. I have no idea why you would want to do this, but it demonstrates the scope of ##ROWCOUNT a bit better than the above, and how it can be used elsewhere.
I want to write trigger for insert update and Delete. O have one table named (tbl_rank) which have primary key (ID).
ID Name Rank
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
Now I want to insert new rank but conditions are
1) if I enter 6 it will be 6
2) if I enter 7 it also should be 6 (I mean in sequence)
3) if I enter 2 than than entered rank will be 2 and 2 will be 3 and so on
For delete trigger
1) if I delete 5 the rank should be 1 to 4
2) if I delete 2 the rank would be rearranged and 3 should be 2 and 4 would be 3 and so on
for update trigger
1) if I update 3 to 5 than 4 would be 3 and 5 would be 4
2) if I update 5 to 3 than 3 would be 4 and 4 would be 5
I wrote insert and delete trigger its working fine but in update I am getting uneven result.
Can you not just have tbl_rank as a view then you don't need any triggers? To rank them in the view you can use a windowed function row_number() over (order by Id)
How is the initial update performed? If you know it is an update then you need to do a delete and insert just for the effected range. Eg changing 3 to 5. You delete records for 3 to 5 then insert those 3 records again with the different ids. An update statement essentially does this anyway
There is an assumption that id is not an auto-identity column.
CREATE TRIGGER trg_tbl_rank
ON tbl_rank
INSTEAD OF INSERT,DELETE,UPDATE
AS
BEGIN
SET NOCOUNT ON;
DECLARE #v_deleted_rank INT;
DECLARE #v_inserted_rank INT;
DECLARE #v_max_rank INT;
SELECT #v_deleted_rank = COALESCE(rank, 0) FROM deleted;
SELECT #v_inserted_rank = COALESCE(rank, 0) FROM inserted;
SELECT #v_max_rank = COALESCE(MAX(rank), 0) FROM tbl_rank;
IF #v_deleted_rank > 0
BEGIN
DELETE FROM tbl_rank
WHERE id = (SELECT id FROM deleted);
UPDATE tbl_rank
SET rank = rank - 1
WHERE rank > #v_deleted_rank;
END
IF #v_inserted_rank > 0
BEGIN
IF #v_inserted_rank <= #v_max_rank
BEGIN
UPDATE tbl_rank
SET rank = rank + 1
WHERE rank >= #v_inserted_rank;
INSERT INTO tbl_rank (id, name, rank)
SELECT id, name, #v_inserted_rank FROM inserted;
END
ELSE
INSERT INTO tbl_rank (id, name, rank)
SELECT id, name, #v_max_rank + 1 FROM inserted;
END
END
GO
Here are queries to test:
INSERT INTO tbl_rank (id, name, rank) VALUES (1, 'A', 1);
INSERT INTO tbl_rank (id, name, rank) VALUES (2, 'B', 2);
INSERT INTO tbl_rank (id, name, rank) VALUES (3, 'C', 3);
INSERT INTO tbl_rank (id, name, rank) VALUES (4, 'D', 4);
INSERT INTO tbl_rank (id, name, rank) VALUES (5, 'E', 5);
SELECT * FROM tbl_rank;
INSERT INTO tbl_rank (id, name, rank) VALUES (6, 'F', 7);
SELECT * FROM tbl_rank;
INSERT INTO tbl_rank (id, name, rank) VALUES (7, 'G', 2);
SELECT * FROM tbl_rank;
DELETE FROM tbl_rank WHERE rank = 7;
SELECT * FROM tbl_rank;
DELETE FROM tbl_rank WHERE rank = 2;
SELECT * FROM tbl_rank;
UPDATE tbl_rank SET rank = 5 WHERE rank = 3;
SELECT * FROM tbl_rank;
UPDATE tbl_rank SET rank = 3 WHERE rank = 5;
SELECT * FROM tbl_rank;
TRUNCATE TABLE tbl_rank;
I want to write trigger for insert update and Delete I have one table named(tbl_rank)which have primary key(ID)
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Learn how to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.
This is minimal behavior on SQL forums. Putting “tbl_” on table name is a classic design flaw called “tbling” and the column names are violations of ISO-11179 rules, too. Now we have to guess at keys, data types, etc. Here is my guess and clean up.
CREATE TABLE Prizes
(prize_id INTEGER NOT NULL PRIMARY KEY,
prize_name CHAR(1) NOT NULL,
prize_rank INTEGER NOT NULL);
INSERT INTO Prizes
VALUES
(1, 'A', 1),
(2, 'B', 2),
(3, 'C', 3),
(4, 'D', 4),
(5, 'E', 5);
Why triggers? RDBMS has virtual tables and columns. This not a deck of punch cards or a magnetic tape file. A VIEW is always current and correct.
CREATE VIEW Prize_List
AS
SELECT prize_id, prize_name,
ROW_NUMBER() OVER (ORDER BY prize_id)
AS prize_rank
FROM Prizes;
But it might be better to drop the prize_id column completely and re-arrange the display order based on the prize_rank column:
CREATE TABLE Prizes
(prize_name CHAR(1) NOT NULL,
prize_rank INTEGER NOT NULL PRIMARY KEY);
Now use procedures to manipulate the table as needed.
CREATE PROCEDURE Swap_Prize_Ranks (#old_prize_rank INTEGER, #new_prize_rank INTEGER)
AS
UPDATE Prizes
SET prize_rank
= CASE prize_rank
WHEN #old_prize_rank
THEN #new_prize_rank
ELSE prize_rank + SIGN(#old_prize_rank - #new_prize_rank)
END
WHERE prize_rank BETWEEN #old_prize_rank AND #new_prize_rank
OR prize_rank BETWEEN #new_prize_rank AND #old_prize_rank;
When you want to drop a few rows, remember to close the gaps with this:
CREATE PROCEDURE Close_Prize_Gaps()
AS
UPDATE Prizes
SET prize_rank
= (SELECT COUNT (P1.prize_rank)
FROM Prizes AS P1
WHERE P1.prize_rank <= Prizes.prize_rank);
I use SQL Server 2008 R2.
I have a weird problem as following. I have a table as shown in
I need to write such a query like:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE Field2 IN (96,102)
in this query, WHERE Field2 IN (96,102) gives me 96 or 102 or both!
More over, I would like to return rows that contains 96 and 102 at the same time!
Is there any suggestion? please write result oriented...
I have made a sqlfiddle for this..
create table a (id int, val int)
go
insert into a select 1, 22
insert into a select 1, 122
insert into a select 2, 22
insert into a select 3, 122
insert into a select 4, 22
insert into a select 4, 122
then select like this
select count(distinct id), id
from a
where val in (22, 122)
group by id
having count(id) > 1
EDIT: count(distinct id) will only show distinct counts..
EDIT:
Here's a sqlfiddle example (thanks to Mark Kremers):
http://sqlfiddle.com/#!3/df201/1
create table mytable (field1 int, field2 int)
go
insert into mytable values (199201, 84)
insert into mytable values (199201, 96)
insert into mytable values (199201, 102)
insert into mytable values (199201, 103)
insert into mytable values (581424, 96)
insert into mytable values (581424, 84)
insert into mytable values (581424, 106)
insert into mytable values (581424, 122)
insert into mytable values (687368, 79)
insert into mytable values (687368, 96)
insert into mytable values (687368, 102)
insert into mytable values (687368, 104)
insert into mytable values (687368, 106)
Here's the query:
select distinct a.field1 from
( select field1 from mytable where field2=96) a
inner join
( select field1 from mytable where field2=102) b
on a.field1 = b.field1
And here are the results:
FIELD1
199201
687368
Finally, here's a simplified version of the query (thans to pst):
select distinct a.field1 from mytable a
inner join mytable b
on a.field1 = b.field1
where a.field2=96 and b.field2=102
Use a self-join? Not the most tidy, but I think it works well for 2 values
SELECT *
FROM T R1
JOIN T R2 -- join table with itself
ON R1.F1 = R2.F1 -- where the first field is the same
WHERE R1.F2 = 96 AND R2.F2 = 102 -- and each has one of the required values
(T = Table, Rx = Relation Alias, Fx = Field)
If there can be an arbitrary number of fields, this can be solved as
CREATE TABLE #T (id int, val int)
GO
INSERT INTO #T (id, val)
VALUES
(1, 22), (1, 22), -- no, only 22 (but 2 records)
(2, 22), (2, 122), -- yes, both values (only)
(3, 122), -- no, only 122
(4, 22), (4,122), -- yes, both values ..
(4, 444), (4, null), -- and extra values
(5, 555) -- no, neither value
GO
-- Using DISTINCT over filtered results first, as
-- SQL Server 2008 does not support HAVING COUNT(DISTINCT F1, F2)
SELECT id
FROM (SELECT DISTINCT id, val
FROM #T
WHERE val IN (22, 122)) AS R1
GROUP BY id
HAVING COUNT(id) >= 2 -- or 3 or ..
GO
-- Or a similar variation, as can COUNT(DISTINCT ..)
-- in the SELECT of a GROUP BY
SELECT id
FROM (SELECT id, COUNT(DISTINCT val) as ct
FROM #T
WHERE val IN (22, 122)
GROUP BY id) AS R1
WHERE ct >= 2 -- or 3 or ..
GO
For larger IN (..) sizes, say above 20 values, it may be advisable to use a separate table or table-value and a JOIN for performance reasons.
Try from your original query:
SELECT DISTINCT Field1
FROM MYTABLE
WHERE rtrim(ltrim(cast(Field2 as varchar))) IN ('96','102')
Let me frame my question ....
I have say
Name
A
B
C
A
D
B
What I want is
ID Name
1 A
2 B
3 C
4 A
5 D
6 B
If I write
SELECT name, (SELECT COUNT(*) FROM #t AS i2 WHERE i2.Name <= i1.Name) As rn FROM #t AS i1
it will work fine if all the names are distinct/unique...What if they are not(as in this example)
Even NEWID() does not make the trick as it varies overtime?
I am using sql server 2000...
Please help
Here are 2 ways of solving it
1.
DECLARE #t TABLE ([ID] [int] IDENTITY(1,1), name CHAR)
INSERT #t VALUES ('b')
INSERT #t VALUES ('a')
INSERT #t VALUES ('c')
INSERT #t VALUES ('b')
SELECT * FROM #t
2.
DECLARE #t2 TABLE (name CHAR)
INSERT #t2 (name) VALUES ('b')
INSERT #t2 (name) VALUES ('a')
INSERT #t2 (name) VALUES ('c')
INSERT #t2 (name) VALUES ('b')
SELECT ID = ROW_NUMBER() OVER (ORDER BY b), name
FROM (SELECT name, null b FROM #t2) temp
I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1