Sorting in CTE expression - tsql

I am retreiving all users from DB ordered by number of followers for each user DESC
with TH_Users as
(
SELECT [ID]
,[FullName]
,[UserName]
,[ImageName]
,dbo.GetUserFollowers(ID) AS Followers
, ROW_NUMBER() OVER (order by ID ) AS 'RowNumber'
from dbo.TH_Users
Where CultureID = #cultureID
)
Select ID,[FullName]
,[UserName]
,[ImageName], Followers from TH_Users
Where RowNumber BETWEEN #startIdx AND #endIdx
Order BY Followers DESC
I am using a function to get number of followers for each user. now is I user Followers column as the column order for ROW_NUMBER() OVER (order by Followers ) AS 'RowNumber'
I get a compilation error.
Putting Order BY Followers DESC at the end of the query will not give the intended result.
Any suggestions ?
Thanks

When you use AS to give an alias to a column, that alias is not available within the query - logically, applying aliases to columns is (almost) the very last part of evaluating a query.
So if you want your ROW_NUMBER with the CTE to be OVER what you alias as Followers, you must express it in the same terms as the column itself:
;with TH_Users as
(
SELECT [ID]
,[FullName]
,[UserName]
,[ImageName]
,dbo.GetUserFollowers(ID) AS Followers
, ROW_NUMBER() OVER (order by dbo.GetUserFollowers(ID) ) AS 'RowNumber'
from dbo.TH_Users
Where CultureID = #cultureID
)
Note that this will not cause the function to be evaluated any more times than it is currently.
(I have not tested this)

Related

How to use new created column in where column in sql?

Hi I have a query which looks like the following :
SELECT device_id, tag_id, at, _deleted, data,
row_number() OVER (PARTITION BY device_id ORDER BY at DESC) AS row_num
FROM mdb_history.devices_tags_mapping_history
WHERE at <= '2019-04-01'
AND _deleted = False
AND (tag_id = '275674' or tag_id = '275673')
AND row_num = 1
However when I run the following query, I get the following error :
ERROR: column "row_num" does not exist
Is there any way to go about this. One way I tried was to use it in the following way:
SELECT * from (SELECT device_id, tag_id, at, _deleted, data,
row_number() OVER (PARTITION BY device_id ORDER BY at DESC) AS row_num
FROM mdb_history.devices_tags_mapping_history
WHERE at <= '2019-04-01'
AND _deleted = False
AND (tag_id = '275674' or tag_id = '275673')) tag_deleted
WHERE tag_deleted.row_num = 1
But this becomes way too complicated as I do it with other queries as I have number of join and I have to select the column as stated from so it causes alot of select statement. Any smart way of doing that in a more simpler way. Thanks
You can't refer to the row_num alias which you defined in the same level of the select in your query. So, your main option here would be to subquery, where row_num would be available. But, Postgres actually has an option to get what you want in another way. You could use DISTINCT ON here:
SELECT DISTINCT ON (device_id), device_id, tag_id, at, _deleted, data
FROM mdb_history.devices_tags_mapping_history
WHERE
at <= '2019-04-01' AND
_deleted = false AND
tag_id IN ('275674', '275673')
ORDER BY
device_id,
at DESC;
Too long/ formatted for a comment. There is a reason behind #TimBiegeleisen statement "alias which you defined in the same level of the select". That reason is that all SQL statement follow the same sequence for evaluation. Unfortunately that sequence does NOT follow the sequence of clauses within the statement presentation. that sequence is in order:
from
where
group by
having
select
limits
You will notice that what actually gets selected fall well after evaluation of the where clause. Since your alias is defined within the select phase it does not exist during the where phase.

select row number for given record id with postgres row_number() function

Sry, if my question isn't new but i can't find answer. I want to find row number for given id in postgres table.I have the folowing Postgres query
"SELECT row_number() over (ORDER BY id DESC) FROM
(SELECT id, row_number() over () FROM user ORDER BY id DESC) AS sub
WHERE id = ?1"
?1 - user id
This query always return 1 for any user id, but i need that it return actual record row number. For example, if i have 50 records in my database with ids from 1 to 50 and i execute query with id = 30, i want to retun row_number = 30. Thanks in advance.
If I followed you correctly, you could just use an aggregate query:
select count(*) rn
from user
where id <= ?1
This counts how many records have an id that is smaller (or equal) to the given parameter.

How to select corresponding record alongside aggregate function with having clause

Let's say I have an orders table with customer_id, order_total, and order_date columns. I'd like to build a report that shows all customers who haven't placed an order in the last 30 days, with a column for the total amount their last order was.
This gets all of the customers who should be on the report:
select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval
Is there a better way to do this that doesn't require a subquery but instead uses a window function or other more efficient method in order to access the total amount from the most recent order? The techniques from How to select id with max date group by category in PostgreSQL? are related, but the extra having restriction seems to stop me from using something like DISTINCT ON.
demo:db<>fiddle
Solution with row_number window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
SELECT
customer, order_date, order_total
FROM (
SELECT
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total,
row_number() OVER w as row_count
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30
Solution with DISTINCT ON (https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT):
SELECT
customer, order_date, order_total
FROM (
SELECT DISTINCT ON (customer)
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30
Explanation:
In both solutions I am working with the first_value window function. The window function's frame is defined by customers. The rows within the customers' groups are ordered descending by date which gives the latest row first (last_value is not working as expected every time). So it is possible to get the last order_date and the last order_total of this order.
The difference between both solutions is the filtering. I showed both versions because sometimes one of them is significantly faster
The window function style is creating a row count within the frames. Every first row can be filtered later. This is done by adding a row_number window function. The benefit of this solution comes out when you are trying to filter the first two or three data sets. You simply have to change the filter from WHERE row_count = 1 to WHERE row_count = 2
But if you want only one single row per group you just need to ensure that the expected row per group is ordered to be the first row in the group. Then the DISTINCT ON function can delete all following rows. DISTINCT ON (customer) gives the first (ordered) row per customer group.
Try to join table on itself
select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval
Subqueries in select is a bad idea, because DB will execute a query for each row
If you use postgres you can also try to use CTE
https://www.postgresql.org/docs/9.6/static/queries-with.html
WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval

Postgres json_agg Limit

I'm using json_agg in Postgres like this
json_agg((e."name",e."someOtherColum",e."createdAt") order by e."createdAt" DESC )
But I want to limit how many rows will be aggregated into JSON. I want to write something like this
json_agg((e."name",e."someOtherColum",e."createdAt") order by e."createdAt" DESC LIMIT 3)
Is it possible in some way?
This is full query
SELECT e."departmentId",
json_agg((e."name",e."someOtherColum",e."createdAt") order by e."createdAt" DESC ) as "employeeJSON"
FROM "Employee" e
GROUP BY e."departmentId"
So I want to achieve department with first three employees for each department.
You need a sub-select that returns only three rows per departmentid and then aggregate the result of that:
select "departmentId",
json_agg(("name","someOtherColum","createdAt") order by "createdAt" DESC) as "employeeJSON"
FROM (
SELECT "departmentId",
"name"
"someOtherColum",
"createdAt",
row_number() over (partition by "departmentId" order by "createdAt") as rn
FROM "Employee"
) t
WHERE rn <= 3
GROUP BY "departmentId"
Note that using quoted identifiers is in general not such a good idea. In the long run it's more trouble than they are worth it.

multiple extract() with WHERE clause possible?

So far I have come up with the below:
WHERE (extract(month FROM orders)) =
(SELECT min(extract(month from orderdate))
FROM orders)
However, that will consequently return zero to many rows, and in my case, many, because many orders exist within that same earliest (minimum) month, i.e. 4th February, 9th February, 15th Feb, ...
I know that a WHERE clause can contain multiple columns, so why wouldn't the below work?
WHERE (extract(day FROM orderdate)), (extract(month FROM orderdate)) =
(SELECT min(extract(day from orderdate)), min(extract(month FROM orderdate))
FROM orders)
I simply get: SQL Error: ORA-00920: invalid relational operator
Any help would be great, thank you!
Sample data:
02-Feb-2012
14-Feb-2012
22-Dec-2012
09-Feb-2013
18-Jul-2013
01-Jan-2014
Output:
02-Feb-2012
14-Feb-2012
Desired output:
02-Feb-2012
I recreated your table and found out you just messed up the brackets a bit. The following works for me:
where
(extract(day from OrderDate),extract(month from OrderDate))
=
(select
min(extract(day from OrderDate)),
min(extract(month from OrderDate))
from orders
)
Use something like this:
with cte1 as (
select
extract(month from OrderDate) date_month,
extract(day from OrderDate) date_day,
OrderNo
from tablename
), cte2 as (
select min(date_month) min_date_month, min(date_day) min_date_day
from cte1
)
select cte1.*
from cte1
where (date_month, date_day) = (select min_date_month, min_date_day from cte2)
A common table expression enables you to restructure your data and then use this data to do your select. The first cte-block (cte1) selects the month and the day for each of your table rows. Cte2 then selects min(month) and min(date). The last select then combines both ctes to select all rows from cte1 that have the desired month and day.
There is probably a shorter solution to that, however I like common table expressions as they are almost all the time better to understand than the "optimal, shortest" query.
If that is really what you want, as bizarre as it seems, then as a different approach you could forget the extracts and the subquery against the table to get the minimums, and use an analytic approach instead:
select orderdate
from (
select o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
from orders o
)
where rn = 1;
ORDERDATE
---------
01-JAN-14
The row_number() effectively adds a pseudo-column to every row in your original table, based on the month and day in the order date. The rn values are unique, so there will be one row marked as 1, which will be from the earliest day in the earliest month. If you have multiple orders with the same day/month, say 01-Jan-2013 and 01-Jan-2014, then you'll still only get exactly one with rn = 1, but which is picked is indeterminate. You'd need to add further order by conditions to make it deterministic, but I have no idea what you might want.
That is done in the inner query; the outer query then filters so that only the records marked with rn = 1 is returned; so you get exactly one row back from the overall query.
This also avoids the situation where the earliest day number is not in the earliest month number - say if you only had 01-Jan-2014 and 02-Feb-2014; comparing the day and month separately would look for 01-Feb-2014, which doesn't exist.
SQL Fiddle (with Thomas Tschernich's anwer thrown in too, giving the same result for this data).
To join the result against your invoice table, you don't need to join to the orders table again - especially not with a cross join, which is skewing your results. You can do the join (at least) two ways:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
) o, invoices i
WHERE i.invno = o.invno
AND rn = 1;
Or:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT orderno, orderdate, invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
)
WHERE rn = 1
) o, invoices i
WHERE i.invno = o.invno;
The first looks like it does more work but the execution plans are the same.
SQL Fiddle with your pastebin-supplied query that gets two rows back, and these two that get one.