can I rank results got from different WHERE clause? - postgresql

Say I want to select the posts that has certain tags or matches the keyword.
select t1.*
from (
select p.*, count(p.id) from plainto_tsquery('hElLo') AS q , post p
left join post_tag pt on pt.post_id = p.id
left join tag t on t.id = pt.tag_id
WHERE (tsv ## q) or t.id in (2,3)
group by p.id
) as t1
order by count desc, ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc
limit 5;
the above does select what I want. In tsv, I gave title A weight and description D weight. it now becomes pretty pointless when sorting by count because each entry has the same weight. Is it possible to do things like if this row is picked from t.id in (2,3), they get to sorted to the first, then sort by ts_rank_cd, or give each match tag 'A' weight, title become B weight and description is D?

Try CASE WHEN
select t1.*
from (
select p.*, count(p.id),
(CASE WHEN t.id in (2,3) THEN 1 ELSE 2 END) as ranking
from plainto_tsquery('hElLo') AS q , post p
left join post_tag pt on pt.post_id = p.id
left join tag t on t.id = pt.tag_id
WHERE (tsv ## q) or t.id in (2,3)
group by p.id
) as t1
order by count desc, ranking asc, ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc
limit 5;
Edited(Correct Answer):
select t1.*
from (
select p.*, count(p.id),
COUNT(1) filter(where t.id in (2,3)) ranking
from plainto_tsquery('hElLo') AS q , post p
left join post_tag pt on pt.post_id = p.id
left join tag t on t.id = pt.tag_id
WHERE (tsv ## q) or t.id in (2,3)
group by p.id
) as t1
order by count desc, ranking asc, ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc
limit 5;

I'm not sure why the count would be the same, but you can add more keys to the order by:
order by count desc,
(t.id in (2, 3)) desc,
ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc ;
The desc is because true > false, and you want the true values to be first.

Related

How to use DISTINCT ON in ARRAY_AGG()?

I have the following query:
SELECT array_agg(DISTINCT p.id) AS price_ids,
array_agg(p.name) AS price_names
FROM items
LEFT JOIN prices p on p.item_id = id
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE id = 1;
When I LEFT JOIN the third_table all my prices are duplicated.
I'm using DISTINCT inside ARRAY_AGG() to get the ids without dups, but I want the names without dups aswell.
If I use array_agg(DISTINCT p.name) AS price_names, it will return distinct values based on the name, not the id.
I want to do something similar to array_agg(DISTINCT ON (p.id) p.name) AS price_names, but it is invalid.
How can I use DISTINCT ON inside ARRAY_AGG()?
Aggregate first, then join:
SELECT p.price_ids,
p.price_names,
t3.*
FROM items
LEFT JOIN (
SELECT pr.item_id,
array_agg(pr.id) AS price_ids,
array_agg(pr.name) AS price_names
FROM prices pr
GROUP BY pr.item_id
) p on p.item_id = items.id
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE items.id = 1;
Using a lateral join might be faster if you only pick a single item:
SELECT p.price_ids,
p.price_names,
t3.*
FROM items
LEFT JOIN LATERAL (
SELECT array_agg(pr.id) AS price_ids,
array_agg(pr.name) AS price_names
FROM prices pr
WHERE pr.item_id = items.id
) p on true
LEFT JOIN third_table t3 on third_table.item_id = id
WHERE items.id = 1;

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC
Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.
Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).

Select the last record for the date and time columns

I need to select the last record in the academic table which has two columns for date and time. When I run the query I get an error. Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
USE PCUnitTest
SELECT C.ACCOUNTNO, C.CONTACT, C.LASTNAME, C.KEY4, A.PEOPLE_ID, A.APP_STATUS, A.APP_DECISION, A.REVISION_DATE, A.REVISION_TIME
FROM ACADEMIC AS A INNER JOIN
GM.dbo.CONTACT1 AS C ON A.PEOPLE_ID = C.KEY4
WHERE A.REVISION_DATE = (SELECT TOP (1) REVISION_DATE, REVISION_TIME, PEOPLE_CODE, PEOPLE_ID, PEOPLE_CODE_ID, ACADEMIC_YEAR, ACADEMIC_TERM, ACADEMIC_SESSION, PROGRAM, DEGREE, CURRICULUM
FROM PCUnitTest.dbo.ACADEMIC
ORDER BY REVISION_DATE DESC, REVISION_TIME DESC)
You can Join the query you are using in the where
USE PowerCampusUnitTest
SELECT C.ACCOUNTNO, C.CONTACT, C.LASTNAME, C.KEY4, A.PEOPLE_ID, A.APP_STATUS, A.APP_DECISION, A.REVISION_DATE, A.REVISION_TIME
FROM ACADEMIC AS A
INNER JOIN GoldMineUnitTest.dbo.CONTACT1 AS C
ON A.PEOPLE_ID = C.KEY4
INNER JOIN (
SELECT TOP 1 A2.REVISION_DATE,A2.REVISION_TIME FROM PowerCampusUnitTest.dbo.ACADEMIC A2
ORDER BY REVISION_DATE DESC, REVISION_TIME DESC
)AS A2
ON A.REVISION_DATE = A2.REVISION_DATE AND A.REVISION_TIME = A2.REVISION_TIME
Use ROW_NUMBER()
USE PCUnitTest
SELECT
R.ACCOUNTNO, R.CONTACT, R.LASTNAME, R.KEY4, R.PEOPLE_ID, R.APP_STATUS, R.APP_DECISION, R.REVISION_DATE, R.REVISION_TIME
FROM
(
SELECT C.ACCOUNTNO, C.CONTACT, C.LASTNAME, C.KEY4, A.PEOPLE_ID, A.APP_STATUS, A.APP_DECISION, A.REVISION_DATE, A.REVISION_TIME
,ROW_NUMBER() OVER (ORDER BY A.REVISION_DATE DESC, A.REVISION_TIME DESC) RN
FROM ACADEMIC AS A INNER JOIN
GMUnitTest.dbo.CONTACT1 AS C ON A.PEOPLE_ID = C.KEY4
) R
WHERE RN=1
If you want to get the latest row for each PEOPLE_ID, then add PARTITION BY
SELECT
R.ACCOUNTNO, R.CONTACT, R.LASTNAME, R.KEY4, R.PEOPLE_ID, R.APP_STATUS, R.APP_DECISION, R.REVISION_DATE, R.REVISION_TIME
FROM
(
SELECT C.ACCOUNTNO, C.CONTACT, C.LASTNAME, C.KEY4, A.PEOPLE_ID, A.APP_STATUS, A.APP_DECISION, A.REVISION_DATE, A.REVISION_TIME
,ROW_NUMBER() OVER (PARTITION BY A.PEOPLE_ID ORDER BY A.REVISION_DATE DESC, A.REVISION_TIME DESC) RN
FROM ACADEMIC AS A INNER JOIN
GMUnitTest.dbo.CONTACT1 AS C ON A.PEOPLE_ID = C.KEY4
) R
WHERE RN=1

Faster left join with last non-empty

Table1:
Shop
Manager
Date
Table2:
Shop
Date
Sales
I need to get Table2 with Manager field from Table1. I did the following trick:
select
t1.[Shop]
,t1.[Date]
,t1.[Sum]
,t2.[Manager]
from t1
left join t2
on t1.[Shop] = t2.[Shop]
and t2.[Date] = (select max(t2.[Date]) from t2
where t2.[Shop] = t1.[Shop]
and t2.[Date] < t1.[Date])
It works, but subquerying is very slow, so I wonder if there is more elegant and fast way to do so?
Some sample data to play around: http://pastebin.com/uLN6x5JE
may seem like a round about way but join on a single condition is typically faster
select t12.[Shop], t12.[Date], t12.[Sum]
, t12.[Manager]
from
( select t1.[Shop], t1.[Date], t1.[Sum]
, t2.[Manager]
, row_number() over (partition by t2.[Shop] order by t2.[Date] desc) as rn
from t1
join t2
on t2.[Shop] = t1.[Shop]
and t1.[Date] < t1.[Date]
) as t12
where t12.rn = 1
union
select t1.[Shop], t1.[Date], t1.[Sum]
, null as [Manager]
from t1
left join t2
on t2.[Shop] = t1.[Shop]
and t1.[Date] < t1.[Date]
group by t1.[Shop], t1.[Date], t1.[Sum]
having count(*) = 1
You may get much better performance by adding a covering index on t2 if you don't already have one:
create index T2ShopDate on t2 ([Shop], [Date]) include ([Manager])
Here is a version that uses a CTE to find all maximum manager dates first and then join back to t2 to get the manager:
;with MaxDates ([Shop], [Date], [Sum], [MaxMgrDate]) as
(
select
t1.[Shop]
,t1.[Date]
,t1.[Sum]
,max(t2.[Date])
from t1
left join t2
on t2.[Shop] = t1.[Shop]
and t2.[Date] < t1.[Date]
group by
t1.[Shop]
,t1.[Date]
,t1.[Sum]
)
select
MaxDates.[Shop]
,MaxDates.[Date]
,MaxDates.[Sum]
,t2.[Manager]
from MaxDates
inner join t2
on t2.[Date] = MaxDates.[MaxMgrDate]
You might be able to remove the second join back to t2 by using row_number():
;with MaxDates ([Shop], [Date], [Sum], [Manager], [RowNum]) as
(
select
t1.[Shop]
,t1.[Date]
,t1.[Sum]
,t2.[Manager]
,row_number() over (partition by (t1.[Shop]) order by t2.[Date] desc)
from t1
left join t2
on t2.[Shop] = t1.[Shop]
and t2.[Date] < t1.[Date]
)
select *
from MaxDates
where RowNum = 1

Cannot perform an aggregate function on a subquery

Can someone help me with this query?
SELECT p.OwnerName, SUM(ru.MonthlyRent) AS PotentinalRent, SUM(
(SELECT COUNT(t.ID) * ru.MonthlyRent FROM tblTenant t
WHERE t.UnitID = ru.ID)
) AS ExpectedRent
FROM tblRentalUnit ru
LEFT JOIN tblProperty p ON p.ID = ru.PropertyID
GROUP BY p.OwnerName
I'm having problems with the second sum, it won't let me do it. Evidently SUM won't work on subqueries, but I need to calculate the expected rent (MonthlyRent if there is a tenant assigned to the RentalUnit's id, 0 of they're not). How can I make this work?
SELECT p.OwnerName, SUM(ru.MonthlyRent) AS PotentialRent, SUM(cnt) AS ExpectedRent
FROM tblRentalUnit ru
LEFT JOIN
tblProperty p
ON p.ID = ru.PropertyID
OUTER APPLY
(
SELECT COUNT(t.id) * ru.MonthlyRent AS cnt
FROM tblTenant t
WHERE t.UnitID = ru.ID
) td
GROUP BY p.OwnerName
Here's a test script to check:
WITH tblRentalUnit AS
(
SELECT 1 AS id, 100 AS MonthlyRent, 1 AS PropertyID
UNION ALL
SELECT 2 AS id, 300 AS MonthlyRent, 2 AS PropertyID
),
tblProperty AS
(
SELECT 1 AS id, 'Owner 1' AS OwnerName
UNION ALL
SELECT 2 AS id, 'Owner 2' AS OwnerName
),
tblTenant AS
(
SELECT 1 AS id, 1 AS UnitID
UNION ALL
SELECT 2 AS id, 1 AS UnitID
)
SELECT p.OwnerName, SUM(ru.MonthlyRent) AS PotentialRent, SUM(cnt) AS ExpectedRent
FROM tblRentalUnit ru
LEFT JOIN
tblProperty p
ON p.ID = ru.PropertyID
OUTER APPLY
(
SELECT COUNT(t.id) * ru.MonthlyRent AS cnt
FROM tblTenant t
WHERE t.UnitID = ru.ID
) td
GROUP BY p.OwnerName
What is the meaning of the sum of the unitMonthlyRent times the number of tenants, for some partiicular rental unit (COUNT(t.ID) * ru.MonthlyRent )?
Is it the case that all you are trying to do is see the difference between the total potential rent from all untis versus the expected rent (From only occcupied units) ? If so, then try this
Select p.OwnerName,
Sum(r.MonthlyRent) AS PotentinalRent,
Sum(Case t.Id When Null Then 0
Else r.MonthlyRent End) ExpectedRent
From tblRentalUnit r
Left Join tblTenant t
On t.UnitID = r.ID
left Join tblProperty p
On p.ID = r.PropertyID)
Group By p.OwnerName