How can I add values to a column that show the ranking by a random value? - tsql

I can't see what is going wrong here:
DECLARE #cData TABLE(cID NVARCHAR(1), cSeed DECIMAL(8,8), cRank INT)
INSERT INTO #cData (cID, cSeed) SELECT 'W', RAND()
INSERT INTO #cData (cID, cSeed) SELECT 'X', RAND()
INSERT INTO #cData (cID, cSeed) SELECT 'Y', RAND()
INSERT INTO #cData (cID, cSeed) SELECT 'Z', RAND()
SELECT cID, cSeed, (RANK() OVER (ORDER BY cSeed)) AS cRank FROM #cData
UPDATE #cData
SET cRank = (SELECT (RANK() OVER (ORDER BY cSeed)))
SELECT * FROM #cData
Why am I getting different results from my first select statement than I am from my second--why didn't my update statement put the same data into the table that my first select statement displayed?

SELECT (RANK() OVER (ORDER BY cSeed))
This is a statement on its own, correlated by a column used only in OVER / ORDER BY clause.
It operates over an implied rowset of exactly one record (the current record from #cData) and hence always returns 1, as the rank of the only record in a set is 1 by definition.
I believe you want to run this instead:
WITH t AS
(
SELECT *,
RANK() OVER (ORDER BY cSeed) rnk
FROM #cData
)
UPDATE t
SET cRank = rnk

Related

What would be Postgres equivalent of this T-SQL

I want to run this either interactively from psql or from code.
create proc recent_orders_by_region
as
select top(3) * from view_orders where region = 'NA' order by order_date desc
select top(3) * from view_orders where region = 'WE' order by order_date desc
select top(3) * from view_orders where region = 'EE' order by order_date desc
You can use rank() over () but remember that order must be unique, otherwise RANK will assign the same number to multiple rows in group.
Query with example data
SELECT a.* FROM (
SELECT *,
rank() OVER (
PARTITION BY region
ORDER BY order_date DESC,ident DESC
)
FROM (
values
(1, current_date, 'NA'),(2, current_date-1, 'NA'),
(3, current_date-1, 'NA'),(4, current_date-4, 'NA'),
(5, current_date, 'NA1'),(6, current_date-1, 'NA1'),
(7, current_date-1, 'NA1'),(8, current_date-4, 'NA1')
) view_orders (ident, order_date, region)
) a WHERE RANK <=3
ident is only for demonstration purposes and should be replaced by real column from view_orders table

T-SQL - Pivot/Crosstab - variable number of values

I have a simple data set that looks like this:
Name Code
A A-One
A A-Two
B B-One
C C-One
C C-Two
C C-Three
I want to output it so it looks like this:
Name Code1 Code2 Code3 Code4 Code...n ...
A A-One A-Two
B B-One
C C-One C-Two C-Three
For each of the 'Name' values, there can be an undetermined number of 'Code' values.
I have been looking at various examples of Pivot SQL [including simple Pivot sql and sql using the XML function?] but I have not been able to figure this out - or to understand if it is even possible.
I would appreciate any help or pointers.
Thanks!
Try it like this:
DECLARE #tbl TABLE([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO #tbl VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
SELECT p.*
FROM
(
SELECT *
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM #tbl
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (Code1,Code2,Code3,Code4,Code5 /*add as many as you need*/)
)p;
This line
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
will use a partitioned ROW_NUMBER in order to create numbered column names per code. The rest is simple PIVOT...
UPDATE: A dynamic approach to reflect the max amount of codes per group
CREATE TABLE TblTest([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO TblTest VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
DECLARE #cols VARCHAR(MAX);
WITH GetMaxCount(mc) AS(SELECT TOP 1 COUNT([Code]) FROM TblTest GROUP BY [Name] ORDER BY COUNT([Code]) DESC)
SELECT #cols=STUFF(
(
SELECT CONCAT(',Code',Nmbr)
FROM
(SELECT TOP((SELECT mc FROM GetMaxCount)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) t(Nmbr)
FOR XML PATH('')
),1,1,'');
DECLARE #sql VARCHAR(MAX)=
'SELECT p.*
FROM
(
SELECT *
,CONCAT(''Code'',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM TblTest
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (' + #cols + ')
)p;';
EXEC(#sql);
GO
DROP TABLE TblTest;
As you can see, the only part which will change in order to reflect the actual amount of columns is the list in PIVOTs IN() clause.
You can create a string, which looks like Code1,Code2,Code3,...CodeN and build the statement dynamically. This can be triggered with EXEC().
I'd prefer the first approach. Dynamically created SQL is very mighty, but can be a pain in the neck too...

Multi-INSERT with unchangeable param

Is there any way to INSERT multiple values with one from DB that unchangable?
I thought about WITH but without success:
WITH t as (SELECT date_trunc('hour', NOW()))
INSERT INTO my_table(ID, TIME) VALUES (1,t),(2,t);
No need for the CTE, just use a plain SELECT as the source for the insert:
insert into my_table (id, time)
select i, date_trunc('hour', NOW())
from generate_series(1,2) i;
If you really want the CTE, you need to select from it in the values clause:
WITH t as (
SELECT date_trunc('hour', NOW()) hour_t
)
INSERT INTO my_table(ID, TIME)
VALUES
(1, (select hour_t from t)),
(2, (select hour_t from t));

way to ignore intermediate column in an INSERT SELECT statement?

For a insert ... select statement, is there a way to ignore a variable that is required for the select statement but should be inserted?
I'm using a rank function to select which data to insert. Unfortunately the rank function can't be called in the where clause.
insert into target_table(v1
,v2
,v3
)
select v1
,v2
,v3
,RANK() over (partition by group_col
order by order_col desc
) as my_rank
from source_table
where my_rank > 1
The result is the following error:
Msg 121, Level 15, State 1, Line 1 The select list for the INSERT
statement contains more items than the insert list. The number of
SELECT values must match the number of INSERT columns.
I know I can do this using a temporary table but would like to keep it in a single statement if possible.
Wrap your main query inside a subquery.
INSERT INTO target_table
(v1, v2, v3)
SELECT q.v1, q.v2, q.v3
FROM (SELECT v1, v2, v3,
RANK() OVER (PARTITION BY group_col ORDER BY order_col DESC) AS my_rank
FROM source_table) q
WHERE q.my_rank > 1;
For SQL Server 2005+, you could also use a CTE:
WITH cteRank AS (
SELECT v1, v2, v3,
RANK() OVER (PARTITION BY group_col ORDER BY order_col DESC) AS my_rank
FROM source_table
)
INSERT INTO target_table
(v1, v2, v3)
SELECT v1, v2, v3
FROM cteRank
WHERE my_rank > 1;

TSQL: Remove duplicates based on max(date)

I am searching for a query to select the maximum date (a datetime column) and keep its id and row_id. The desire is to DELETE the rows in the source table.
Source Data
id date row_id(unique)
1 11/11/2009 1
1 12/11/2009 2
1 13/11/2009 3
2 1/11/2009 4
Expected Survivors
1 13/11/2009 3
2 1/11/2009 4
What query would I need to achieve the results I am looking for?
Tested on PostgreSQL:
delete from table where (id, date) not in (select id, max(date) from table group by id);
There are various ways of doing this, but the basic idea is the same:
- Indentify the rows you want to keep
- Compare each row in your table to the ones you want to keep
- Delete any that don't match
DELETE
[source]
FROM
yourTable AS [source]
LEFT JOIN
yourTable AS [keep]
ON [keep].id = [source].id
AND [keep].date = (SELECT MAX(date) FROM yourTable WHERE id = [keep].id)
WHERE
[keep].id IS NULL
DELETE
[yourTable]
FROM
[yourTable]
LEFT JOIN
(
SELECT id, MAX(date) AS date FROM yourTable GROUP BY id
)
AS [keep]
ON [keep].id = [yourTable].id
AND [keep].date = [yourTable].date
WHERE
[keep].id IS NULL
DELETE
[source]
FROM
yourTable AS [source]
WHERE
[source].row_id != (SELECT TOP 1 row_id FROM yourTable WHERE id = [source].id ORDER BY date DESC)
DELETE
[source]
FROM
yourTable AS [source]
WHERE
NOT EXISTS (SELECT id FROM yourTable GROUP BY id HAVING id = [source].id AND MAX(date) != [source].date)
Because you are using SQL Server 2000, you'er not able to use the Row Over technique of setting up a sequence and to identify the top row for each unique id.
So, your proposed technique is to use a datetime column to get the top 1 row to remove duplicates. That might work, but there is a possibility that you might still get duplicates having the same datetime value. But that's easy enough to check for.
First check the assumption that all rows are unique based on the id and date columns:
CREATE TABLE #TestTable (rowid INT IDENTITY(1,1), thisid INT, thisdate DATETIME)
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '11/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (1, '12/12/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
INSERT INTO #TestTable (thisid,thisdate) VALUES (2, '1/11/2009')
SELECT COUNT(*) AS thiscount
FROM #TestTable
GROUP BY thisid, thisdate
HAVING COUNT(*) > 1
This example returns a value of 2 - indicating that you will still end up with duplicates even after using the date column to remove duplicates. If you return 0, then you have proven that your proposed technique will work.
When de-duping production data, I think one should take some precautions and test before and after. You should create a table to hold the rows you plan to remove so you can recover them easily if you need to after the delete statement has been executed.
Also, it's a good idea to know beforehand how many rows you plan to remove so you can verify the count before and after - and you can gauge the magnitude of the delete operation. Based on how many rows will be affected, you can plan when to run the operation.
To test before the de-duping process, find the occurrences.
-- Get occurrences of duplicates
SELECT COUNT(*) AS thiscount
FROM
#TestTable
GROUP BY thisid
HAVING COUNT(*) > 1
ORDER BY thisid
That gives you the rows with more than one row with the same id. Capture the rows from this query into a temporary table and then run a query using the SUM to get the total number of rows that are not unique based on your key.
To get the number of rows you plan to delete, you need the count of rows that are duplicate based on your unique key, and the number of distinct rows based on your unique key. You subtract the distinct rows from the count of occurrences. All that is pretty straightforward - so I'll leave you to it.
Try this
declare #t table (id int, dt DATETIME,rowid INT IDENTITY(1,1))
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
Query:
delete from #t where rowid not in(
select t.rowid from #t t
inner join(
select MAX(dt)maxdate
from #t
group by id) X
on t.dt = X.maxdate )
select * from #t
Output:
id dt rowid
1 2009-11-13 00:00:00.000 3
2 2009-11-01 00:00:00.000 4
delete from temp where row_id not in (
select t.row_id from temp t
right join
(select id,MAX(dt) as dt from temp group by id) d
on t.dt = d.dt and t.id = d.id)
I have tested this answer..
INSERT INTO #t (id,dt) VALUES (1, '11/11/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/12/2009')
INSERT INTO #t (id,dt) VALUES (1, '11/13/2009')
INSERT INTO #t (id,dt) VALUES (2, '11/01/2009')
select * from #t
;WITH T AS(
select dense_rank() over(partition by id order by dt desc)NO,DT,ID,rowid from #t )
DELETE T WHERE NO>1