Update Postgresql table using rank() - postgresql

I'm trying to update a column (pop_1_rank) in a postgresql table with the results from a rank() like so:
UPDATE database_final_form_merge
SET
pop_1_rank = r.rnk
FROM (
SELECT pop_1, RANK() OVER ( ORDER BY pop_1 DESC) FROM database_final_form_merge WHERE territory_name != 'north' AS rnk)r
The SELECT query by itself works fine, but I just can't get it to update correctly. What am I doing wrong here?

I rather use the CTE notation.
WITH cte as (
SELECT pop_1,
RANK() OVER ( ORDER BY pop_1 DESC) AS rnk
FROM database_final_form_merge
WHERE territory_name <> 'north'
)
UPDATE database_final_form_merge
SET pop_1_rank = cte.rnk
FROM cte
WHERE database_final_form_merge.pop_1 = cte.pop_1

As far as I know, Postgres updates tables not subqueries. So, you can join back to the table:
UPDATE database_final_form_merge
SET pop_1_rank = r.rnk
FROM (SELECT pop_1, RANK() OVER ( ORDER BY pop_1 DESC) as rnk
FROM database_final_form_merge
WHERE territory_name <> 'north'
) r
WHERE database_final_form_merge.pop_1 = r.pop_1;
In addition:
The column alias goes by the column name.
This assumes that pop_1 is the id connecting the two tables.

You're missing WHERE on UPDATE query, because when doing UPDATE ... FROM you're basically doing joins.
So you need to select primary key and then match on primary key to update just the columns are computing rank over.

Related

Optimizing row_number to not scan all table

I have a table created as
CREATE TABLE T0
(
id text,
kind_of_datetime text,
... 10 more text fields
),
PRIMARY KEY(id, kind_of_datetime)
It is about 31M rows with about 800K of unique id values and 13K unique kind_of_datetime values.
I want to make query
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() over(PARTITION BY id ORDER BY kind_of_datetime DESC) as rn_col
FROM TO
WHERE kind_of_datetime <= 'some_value'
) as tmp
WHERE rn_col = 1
It makes WindowAgg with actual reading of all table + sorting and works really long (minutes).
I tried to make index
CREATE INDEX index_name ON T0 (id, kind_of_datetime DESC NULLS LAST)
and in works better but only if final select consists of two key columns id + kind_of_datetime. Otherwise it's always fullscan.
Maybe I should change a way of storing data? Or create some other index?
What I don't want to do is to add UNCLUDE 10 other columns because it will take much RAM.
Try a subquery:
SELECT *,
ROW_NUMBER() over(PARTITION BY id ORDER BY kind_of_datetime DESC) as rn_col
FROM (SELECT * FROM tab
WHERE kind_of_datetime <= 'some_value'
) AS t
That will definitely apply the filter first.

Update a deleted_at column on partition in PostgreSQL

Quick question, I'm trying to update a column only when there are duplicates(partition column > 1) in the table and have selected it based on partition concept, But the current query updates the whole table! please check the query below, Any leads would be greatly appreciated :)
UPDATE public.database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
FROM (
SELECT *,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
ORDER BY RN DESC) X
WHERE X.RN > 1
Thanks very much!
Assuming that every row have unique ID it can be done like below.
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> in (
select <some_unique_id> from (
SELECT <some_unique_id>,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
) X
WHERE X.RN > 1
)
Or we can reverse query to update all but set of ID's
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> not in (
select distinct on (title)
<some_unique_id> from database_tag
order by title, created_at
)

TSQL -- display results of two queries on one row in SSMS

I am using TSQL, SSMS v.17.9.1 The underlying db is Microsoft SQL Server 2014 SP3
For display purposes, I want to concatenate the results of two queries:
SELECT TOP 1 colA as 'myCol1' FROM tableA
--
SELECT TOP 1 colB as 'myCol2' FROM tableB
and display the results from the queries in one row in SSMS.
(The TOP 1 directive would hopefully guarantee the same number of results from each query, which would assist displaying them together. If this could be generalized to TOP 10 per query that would help also)
This should work for any number of rows, it assumes you want to pair ordered by the values in the column displayed
With
TableA_CTE AS
(
SELECT TOP 1 colA as myCol1
,Row_Number() OVER (ORDER BY ColA DESC) AS RowOrder
FROM tableA
),
TableB_CTE AS
(
SELECT TOP 1 colB as myCol2
,Row_Number() OVER (ORDER BY ColB DESC) AS RowOrder
FROM tableB
)
SELECT A.myCol1, B.MyCol2
FROM TableA_CTE AS A
INNER JOIN TableB_CTE AS B
ON A.RowOrder = B.RowOrder
There are currently two issues with the accepted answer:
I) a missing comma before the line: "Table B As"
II) TSQL seems to find it recursive as written, so I re-wrote it in a non-recursive way:
This is a re-working of the accepted answer that actually works in T-SQL:
USE [Database_1];
With
CTE_A AS
(
SELECT TOP 1 [Col1] as myCol1
,Row_Number() OVER (ORDER BY [Col2] desc) AS RowOrder
FROM [TableA]
)
,
CTE_B AS
(
SELECT TOP 1 [Col2] as myCol2
,Row_Number() OVER (ORDER BY [Col2] desc) AS RowOrder
FROM [TableB]
)
SELECT A.myCol1, B.myCol2
FROM CTE_A AS A
INNER JOIN CTE_B AS B
ON ( A.RowOrder = B.RowOrder)

How to translate SQL to DAX, Need to add FILTER

I want to create calculated table that will summarize In_Force Premium from existing table fact_Premium.
How can I filter the result by saying:
TODAY() has to be between `fact_Premium[EffectiveDate]` and (SELECT TOP 1 fact_Premium[ExpirationDate] ORDE BY QuoteID DESC)
In SQL I'd do that like this:
`WHERE CONVERT(date, getdate()) between CONVERT(date, tblQuotes.EffectiveDate)
and (
select top 1 q2.ExpirationDate
from Table2 Q2
where q2.ControlNo = Table1.controlno
order by quoteid` desc
)
Here is my DAX statement so far:
In_Force Premium =
FILTER(
ADDCOLUMNS(
SUMMARIZE(
//Grouping necessary columns
fact_Premium,
fact_Premium[QuoteID],
fact_Premium[Division],
fact_Premium[Office],
dim_Company[CompanyGUID],
fact_Premium[LineGUID],
fact_Premium[ProducerGUID],
fact_Premium[StateID],
fact_Premium[ExpirationDate]
),
"Premium", CALCULATE(
SUM(fact_Premium[Premium])
),
"ControlNo", CALCULATE(
DISTINCTCOUNT(fact_Premium[ControlNo])
)
), // Here I need to make sure TODAY() falls between fact_Premium[EffectiveDate] and (SELECT TOP 1 fact_Premium[ExpirationDate] ORDE BY QuoteID DESC)
)
Also, what would be more efficient way, to create calculated table from fact_Premium or create same table using sql statement (--> Get Data--> SQL Server) ?
There are 2 potential ways in T-SQL to get the next effective date. One is to use LEAD() and another is to use an APPLY operator. As there are few facts to work with here are samples:
select *
from (
select *
, lead(EffectiveDate) over(partition by CompanyGUID order by quoteid desc) as NextEffectiveDate
from Table1
join Table2 on ...
) d
or
select table1.*, oa.NextEffectiveDate
from Table1
outer apply (
select top(1) q2.ExpirationDate AS NextEffectiveDate
from Table2 Q2
where q2.ControlNo = Table1.controlno
order by quoteid desc
) oa
nb. an outer apply is a little similar to a left join in that it will allow rows with a NULL to be returned by the query, if that is not needed than use cross apply instead.
In both these approaches you may refer to NextEffectiveDate in a final where clause, but I would prefer to avoid using the convert function if that is feasible (this depends on the data).

How to update a table with a list of values at a time?

I have
update NewLeaderBoards set MonthlyRank=(Select RowNumber() (order by TotalPoints desc) from LeaderBoards)
I tried it this way -
(Select RowNumber() (order by TotalPoints desc) from LeaderBoards) as NewRanks
update NewLeaderBoards set MonthlyRank = NewRanks
But it doesnt work for me..Can anyone suggest me how can i perform an update in such a way..
You need to use the WITH statement and a full CTE:
;With Ranks As
(
Select PrimaryKeyColumn, Row_Number() Over( Order By TotalPoints Desc ) As Num
From LeaderBoards
)
Update NewLeaderBoards
Set MonthlyRank = T2.Num
From NewLeaderBoards As T1
Join Ranks As T2
On T2.PrimaryKeyColumn = T1.PrimaryKeyColumn