Im pretty new at T-SQL.
I saw this T-SQL script:
SELECT [Date], ClosePrice, ROW_NUMBER() over (partition by 1 order by [Date])rn
FROM NIFTY_SMALLCAP_250_STOCKS
source: https://youtu.be/vE8UcS8U_xE?t=2882 (with my own small change)
Works as expected:
Expected result (and sample data)
Then I changed the script to : [Date] instead of 1
SELECT [Date], ClosePrice, ROW_NUMBER() over (partition by [Date] order by [Date])rn
FROM NIFTY_SMALLCAP_250_STOCKS
The Result - all "dynamic" rn column values equals 1
The Result
My question:
Why partitioning by [Date] (that is obviously uniqe) doesnt work here?
What am i missing here about the "ROW_NUMBER()" and "partition by" combination?
Related
I just started learning Postgres, and I'm trying to make an aggregation table that has the columns:
user_id
booking_sequence
booking_created_time
booking_paid_time
booking_price_amount
total_spent
All columns are provided, except for the booking_sequence column. I need to make a query that shows the first five flights of each user that has at least x purchases and has spent more than a certain amount of money, then sort it by the amount of money spent by the user, and then sort it by the booking sequence column.
I've tried :
select user_id,
row_number() over(partition by user_id order by user_id) as booking_sequence,
booking_created_time as booking_created_date,
booking_price_amount,
sum(booking_price_amount) as total_booking_price_amount
from fact_flight_sales
group by user_id, booking_created_time, booking_price_amount
having count(user_id) > 5
and total_booking_price_amount > 1000
order by total_booking_price_amount;
I got 0 when I added count(user_id) > 5, and total_booking_price_amount is not found when I add the second condition in the HAVING clause.
Edit:
I managed to make the code function correctly, for those who are curious:
select x.user_id, row_number() over(partition by x.user_id)
as booking_sequence, x.booking_created_time::date as booking_created_date, x.booking_price_amount,
sum(y.booking_price_amount) as total_booking_price_amount from
(
select user_id, booking_created_time, booking_price_amount from fact_flight_sales
group by user_id, booking_created_time, booking_price_amount
) as x
join
(
select user_id, booking_price_amount
from fact_flight_sales group by user_id, booking_price_amount
) as y
on x.user_id = y.user_id
group by x.user_id, x.booking_created_time, x.booking_price_amount
having count(x.user_id) >= 1 and sum(y.booking_price_amount) >250000
order by total_booking_price_amount desc, booking_sequence asc;
Big thanks to Laurenz for the help!
About count(user_id) > 5:
HAVING is calculated before window functions are evaluated, So result rows excluded by the HAVING clause will not be used to calculate the window function.
About total_booking_price_amount in HAVING:
You cannot use aliases from the SELECT list in the HAVING clause. You will have to repeat the expression (or use a subquery).
I'm trying to create a table with multiple calculation.
I have a base table from which I would like to collect data and insert into the new table. The next columns are calculated based on the base table. So the first few columns are based on the original table, one part of it exactly the same, other part is calculated.
These works fine, however the last 2 columns are not. The calculation of these would be based on the calculated field of the new table.
Can it be solved within one step? Should I use update? As far as I know ranking is not working with that.
INSERT INTO [RAW_NBA_TeamSimpleRating]
(
[Team]
,[Game_total]
,[ORtg_avg]
,[DRtg_avg]
,[ORtg_rank]
,[ORtg_cluster]
)
SELECT
[Team]
,[Game]
,AVG ([ORtg]) OVER (PARTITION BY Team ORDER BY RowNumber rows between 81 preceding and current row) as ORtg_avg
,AVG ([DRtg]) OVER (PARTITION BY Team ORDER BY RowNumber rows between 81 preceding and current row) as DRtg_avg
,RANK () OVER (PARTITION BY [RAW_NBA_TeamSimpleRating].[Game_total] ORDER BY [RAW_NBA_TeamSimpleRating].[ORtg_avg] Desc)
,CASE
WHEN RANK () OVER (PARTITION BY [RAW_NBA_TeamSimpleRating].[Game_total] ORDER BY [RAW_NBA_TeamSimpleRating].[ORtg_avg] DESC) > 10 THEN 'Bottom'
WHEN RANK () OVER (PARTITION BY [RAW_NBA_TeamSimpleRating].[Game_total] ORDER BY [RAW_NBA_TeamSimpleRating].[ORtg_avg] DESC) <= 10 THEN 'TOP'
END
FROM [WRK_NBA_TeamTable]
If you wrap your query you can use the values from the inner select, such as
select Team, Game, ORtg_avg, DRtg_avg, [Rank],
case
when [Rank] > 10 then 'Bottom'
when [Rank] <= 10 then 'TOP'
end as ORtg_cluster
from (
select Team, Game
,Avg (ORtg) over (partition by Team order by RowNumber rows between 81 preceding and current row) as ORtg_avg
,Avg (DRtg) over (partition by Team order by RowNumber rows between 81 preceding and current row) as DRtg_avg
,Rank () over (partition by RAW_NBA_TeamSimpleRating.Game_total order by RAW_NBA_TeamSimpleRating.ORtg_avg desc) as [Rank]
from WRK_NBA_TeamTable
)s
Have a table with 3 columns: ID, Signature, and Datetime, and it's grouped by Signature Having Count(*) > 9.
select * from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
I now want to select the 1st and 10th records only, per Signature. What determines rank is the Datetime descending. Thus, I would expect every Signature to have 2 rows.
Thanks,
I would go with a couple of common table expressions.
The first will select all records from the table as well as a count of records per signature, and the second one will select from the first where the record count > 9 and add row_number partitioned by signature - and then just select from that where the row_number is either 1 or 10:
With cte1 AS
(
SELECT ID, Signature, Datetime, COUNT(*) OVER(PARTITION BY Signature) As NumberOfRows
FROM #Sigs
), cte2 AS
(
SELECT ID, Signature, Datetime, ROW_NUMBER() OVER(PARTITION BY Signature ORDER BY DateTime DESC) As Rn
FROM cte1
WHERE NumberOfRows > 9
)
SELECT ID, Signature, Datetime
FROM cte2
WHERE Rn IN (1, 10)
ORDER BY Signature desc
Because I don't know what your data looks like, this might need some adjustment.
The simplest way here, since you already know your sort order (DateTime DESC) and partitioning (Signature), is probably to assign row numbers and then select the rows you want.
SELECT *
FROM
(
select o.Signature
,o.DateTime
,ROW_NUMBER() OVER (PARTITION BY o.Signature ORDER BY o.DateTime DESC) [Row]
from (
select s.Signature
from #Sigs s
group by s.Signature
having count(*) > 9
) b
join #Sigs o
on o.Signature = b.Signature
order by o.Signature desc, o.DateTime
)
WHERE [Row] IN (1,10)
I have the below query :
SELECT DISTINCT Summed, ROW_NUMBER () OVER (order by Summed desc) as Rank from table1
I have to write it in Apache Beam(beamSql). Below is my code :
PCollection<BeamRecord> rec_2_part2 = rec_2.apply(BeamSql.query("SELECT DISTINCT Summed, ROW_NUMBER(Summed) OVER (ORDER BY Summed) Rank1 from PCOLLECTION "));
But I'm getting the below error :
Caused by: java.lang.UnsupportedOperationException: Operator: ROW_NUMBER is not supported yet!
Any idea how to implement ROW_NUMBER() in beamSql ?
Here is one way you can approximate your current query without using ROW_NUMBER:
SELECT
t1.Summed,
(SELECT COUNT(*) FROM (SELECT DISTINCT Summed FROM table1) t2
WHERE t2.Summed >= t1.Summed) AS Rank
FROM
(
SELECT DISTINCT Summed
FROM table1
) t1
The basic idea is to first subquery to get a table with only distinct Summed values. Then, use a correlated subquery to simulate the row number. This isn't a very efficient method, but if ROW_NUMBER is not available, then you're stuck with some alternative.
The solution which worked for the above query:
PCollection<BeamRecord> rec_2 = rec_1.apply(BeamSql.query("SELECT max(Summed) as maxed, max(Summed)-10 as least, 'a' as Dummy from PCOLLECTION"));
Since I am using DB2, in order to select a portion of a database in the middle (like a limit/offset pairing), I need to do a different kind of prepare statement. The example I was given was this:
SELECT *
FROM (SELECT col1, col2, col3, ROW_NUMBER() OVER () AS RN FROM table) AS cols
WHERE RN BETWEEN 1 AND 10000;
Which I adapted to this:
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) AS foo WHERE rownum >= 500 AND rownum <1000
And when I call the fetchall_arrayref(), I do come out with 500 results like I want to, but it is only returning an array with references to the row number, and not all of the data I want to pull. I know for a fact that that is what the code is SUPPOSED to do as its written, and I have tried a bunch of permutations to get my desired result with no luck.
All I want is to grab all of the columns like my previous prepare statement into an array of arrays:
SELECT * FROM TU_TRANSACTIONS ORDER BY 2, 3, 4, 6, 7
but just on a designated section. There is just a fundamental thing I am missing, and I just cant see it.
Any help is appreciated, even if its paired with some constructive criticism.
Your table expression:
(SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) as foo
Has only one column - rownum - so when you select "*" from "foo" you get only the one column.
Your table expression needs to include all of the columns you want, just like e example you posted.
I don't use DB2 so I could be off-base but it seems that:
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) AS foo WHERE rownum >= 500 AND rownum <1000
Would only return the row numbers because while the sub-query references the table the main query does not. All it seems it would see is the set of numbers (which would return a single column with the number filled in)
Perhaps this would work:
SELECT * FROM TRANSACTIONS, (SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) AS foo WHERE rownum >= 500 AND rownum <1000