SQL multiple calculation in table creation - tsql

I'm trying to create a table with multiple calculation.
I have a base table from which I would like to collect data and insert into the new table. The next columns are calculated based on the base table. So the first few columns are based on the original table, one part of it exactly the same, other part is calculated.
These works fine, however the last 2 columns are not. The calculation of these would be based on the calculated field of the new table.
Can it be solved within one step? Should I use update? As far as I know ranking is not working with that.
INSERT INTO [RAW_NBA_TeamSimpleRating]
(
[Team]
,[Game_total]
,[ORtg_avg]
,[DRtg_avg]
,[ORtg_rank]
,[ORtg_cluster]
)
SELECT
[Team]
,[Game]
,AVG ([ORtg]) OVER (PARTITION BY Team ORDER BY RowNumber rows between 81 preceding and current row) as ORtg_avg
,AVG ([DRtg]) OVER (PARTITION BY Team ORDER BY RowNumber rows between 81 preceding and current row) as DRtg_avg
,RANK () OVER (PARTITION BY [RAW_NBA_TeamSimpleRating].[Game_total] ORDER BY [RAW_NBA_TeamSimpleRating].[ORtg_avg] Desc)
,CASE
WHEN RANK () OVER (PARTITION BY [RAW_NBA_TeamSimpleRating].[Game_total] ORDER BY [RAW_NBA_TeamSimpleRating].[ORtg_avg] DESC) > 10 THEN 'Bottom'
WHEN RANK () OVER (PARTITION BY [RAW_NBA_TeamSimpleRating].[Game_total] ORDER BY [RAW_NBA_TeamSimpleRating].[ORtg_avg] DESC) <= 10 THEN 'TOP'
END
FROM [WRK_NBA_TeamTable]

If you wrap your query you can use the values from the inner select, such as
select Team, Game, ORtg_avg, DRtg_avg, [Rank],
case
when [Rank] > 10 then 'Bottom'
when [Rank] <= 10 then 'TOP'
end as ORtg_cluster
from (
select Team, Game
,Avg (ORtg) over (partition by Team order by RowNumber rows between 81 preceding and current row) as ORtg_avg
,Avg (DRtg) over (partition by Team order by RowNumber rows between 81 preceding and current row) as DRtg_avg
,Rank () over (partition by RAW_NBA_TeamSimpleRating.Game_total order by RAW_NBA_TeamSimpleRating.ORtg_avg desc) as [Rank]
from WRK_NBA_TeamTable
)s

Related

Update a deleted_at column on partition in PostgreSQL

Quick question, I'm trying to update a column only when there are duplicates(partition column > 1) in the table and have selected it based on partition concept, But the current query updates the whole table! please check the query below, Any leads would be greatly appreciated :)
UPDATE public.database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
FROM (
SELECT *,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
ORDER BY RN DESC) X
WHERE X.RN > 1
Thanks very much!
Assuming that every row have unique ID it can be done like below.
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> in (
select <some_unique_id> from (
SELECT <some_unique_id>,
row_number() over (partition by title order by created_at) as RN
FROM public.database_tag
) X
WHERE X.RN > 1
)
Or we can reverse query to update all but set of ID's
UPDATE database_tag
SET deleted_at= '2022-04-25 19:33:29.087133+00'
WHERE <some_unique_id> not in (
select distinct on (title)
<some_unique_id> from database_tag
order by title, created_at
)

Fetch minimum value for each NTILE bucket in Hive

I am trying to partition the data into percentiles (100 equal buckets) using NTILE window function for each merchant_id ordered by score column. The output of the query will contain merchant_id, score, and percentile for every record in the source table. (Sample code below)
CREATE TABLE merchant_score_ntiles
AS
SELECT merchant_id, score, NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM merch_table
This will return sample output as follows:
merchant_id,score,percentile
1001,900,1
1001,800,1
1001,760,1
1002,900,2
1002,800,2
1002,750,2
Is there a way we can return only the minimum score for each merchant_id based on percentile column such as below?
merchant_id,score,percentile
1001,760,1
1002,750,2
You can try to use ROW_NUMBER window function in subquery before using NTILE window function
SELECT merchant_id,
score,
NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY merchant_id ORDER BY score) rn
FROM merch_table
) t1
WHERE rn = 1

Adding days to the select query result dates in oracle

select from_date_su_rela,to_date_su_rela from RELATION_T;
But the expectation is reducing -56 days from the first row from_date_su_rela and adding the +56 days to last(3rd) row to_date_su_rela. as below
I have written query as ,
select from_date_su_rela-56,to_date_su_rela+56 from RELATION_T; But its adding and reducing days from all the rows as below,
How to make it working as above 2nd image.
One option is to use the row_number analytic function sorting the data both ascending and descending to find the first and last row and then perform the addition and subtraction in a case statement
select case when rn_asc = 1
then from_date_su_rela - 56
else from_date_su_rela
end from_date_su_rela,
case when rn_desc = 1
then to_date_su_rela + 56
else to_date_su_rela
end to_date_su_rela
from (
select from_date_su_rela,
to_date_su_rela,
row_number() over (order by from_date_su_rela desc) rn_desc,
row_number() over (order by from_date_su_rela asc) rn_asc
from relation_t
)

Selecting the 1st and 10th Records Only

Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)

How to select corresponding record alongside aggregate function with having clause

Let's say I have an orders table with customer_id, order_total, and order_date columns. I'd like to build a report that shows all customers who haven't placed an order in the last 30 days, with a column for the total amount their last order was.
This gets all of the customers who should be on the report:
select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval
Is there a better way to do this that doesn't require a subquery but instead uses a window function or other more efficient method in order to access the total amount from the most recent order? The techniques from How to select id with max date group by category in PostgreSQL? are related, but the extra having restriction seems to stop me from using something like DISTINCT ON.
demo:db<>fiddle
Solution with row_number window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
SELECT
customer, order_date, order_total
FROM (
SELECT
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total,
row_number() OVER w as row_count
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30
Solution with DISTINCT ON (https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT):
SELECT
customer, order_date, order_total
FROM (
SELECT DISTINCT ON (customer)
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30
Explanation:
In both solutions I am working with the first_value window function. The window function's frame is defined by customers. The rows within the customers' groups are ordered descending by date which gives the latest row first (last_value is not working as expected every time). So it is possible to get the last order_date and the last order_total of this order.
The difference between both solutions is the filtering. I showed both versions because sometimes one of them is significantly faster
The window function style is creating a row count within the frames. Every first row can be filtered later. This is done by adding a row_number window function. The benefit of this solution comes out when you are trying to filter the first two or three data sets. You simply have to change the filter from WHERE row_count = 1 to WHERE row_count = 2
But if you want only one single row per group you just need to ensure that the expected row per group is ordered to be the first row in the group. Then the DISTINCT ON function can delete all following rows. DISTINCT ON (customer) gives the first (ordered) row per customer group.
Try to join table on itself
select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval
Subqueries in select is a bad idea, because DB will execute a query for each row
If you use postgres you can also try to use CTE
https://www.postgresql.org/docs/9.6/static/queries-with.html
WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval