Over Partition By not working for Weighted AVG Calc - tsql

I don't know how I am missing this, but I am sure it's from late nights!
Any help appreciated
let's say we are using NorthWind to Calculate Weighted AVG
USE NORTHWIND
Select OD.UnitPrice,OD.Quantity,
sum(OD.UnitPrice*OD.Quantity)/sum(OD.Quantity) OVER (PARTITION BY
OD.UnitPrice, OD.Quantity) as[ W-AVERAGE]
From [Order Details] OD
What am I missing as to why SQL keeps saying Column 'Order Details.UnitPrice' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.? I thought using Partition By solves having to have a group by?

Resolved: You MUST use the partition by in EVERY aggregate, the error msg is a bit misleading, but just remember whenever you use PArtition by it must be used in every instance of any Aggregate Functions!
Select OD.UnitPrice,OD.Quantity,
sum(OD.UnitPrice*OD.Quantity) Over (PARTITION BY OD.UnitPrice,
OD.Quantity)/sum(OD.Quantity) OVER (Partition By OD.Quantity) as[W-AVERAGE]
From [Order Details] OD

Related

How to get latest data for a column when using grouping in postgres

I am using postgres alongside sequelize. I have encountered a case where I need to write a coustom query which groups the records are a particular field. I know for the remaning columns that are not used for grouping, I need to use a aggregate function like SUM. But the problem is that for some columns I need to get the one what is the latest one (DESC sorted by created_at). I see no function in sql to do so. Is my only option to write subqueries or is there a better way? Thanks?
For better understanding, If you look at the below picture, I want the group the records with address. So after the query there should only be two records, one with sydney and the other with new york. But when it comes to the distance, I want the result of the query to contain the distance form the row that was most recently created, i.e with the latest created_at.
so the final two query results should be:
sydney 100 2022-09-05 18:14:53.492131+05:45
new york 40 2022-09-05 18:14:46.23328+05:45
select address, distance, created_at
from(
select address, distance, created_at, row_number() over(partition by address order by created_at DESC) as rn
from table) x
where rn = 1

Combine count and max in postgresql sql

I have a problem to formulate an sql question in postgresql, hoping to get some help here
I have a table called visitor that contains an column called fk_employee_id, fk_employee_id contains different number between 1-10, example:
1,3,4,6,4,6,7,3,2,1,6,7,6
Now I want to find out which value that is the most frequent in this column (in this case 6) I have made an question that seem to solve my question;
SELECT fk_employee_id
FROM visitor
GROUP BY fk_employee_id
ORDER BY COUNT(fk_employee_id) DESC
LIMIT 1
but this question, doesn't get right if it is two values that are the most frequent one. So instead I try to write a question which contains max function but cant figure out how, anyone now how to do this?
We can use RANK here to slightly modify your current query:
WITH cte AS (
SELECT
fk_employee_id,
RANK() OVER (ORDER BY COUNT(*) DESC) rank
FROM visitor
GROUP BY fk_employee_id
)
SELECT fk_employee_id
FROM cte
WHERE rank = 1;
Demo

How do I sort partition data when using query binding on an SSAS cube?

I'm trying to implement various sorts as described in this article.
I have a typical Sales Measure Group partitioned by fiscal period. If I try to add an order by clause to the query it fails when processing because SSAS wraps the query into a subquery. Is there a way to prevent this from happening? How do I ensure the sort order in a case like this?
Here is the code that is generated for a partition:
SELECT *
FROM
(
SELECT *
FROM [Sales]
WHERE SaleDate between '1/1/2015' and '1/28/2015'
order by SaleDate
)
AS [Sales]
I replaced the field names with * for clarity.
SELECT TOP 100 PERCENT * FROM Sales ORDER BY SaleDate
That is not guaranteed to work. The best way to order it is to ensure the clustered index is on the column you want to order by.

Understanding a simple DISTINCT ON in postgresql

I am having a small difficulty understanding the below simple DISTINCT ON query:
SELECT DISTINCT
ON (bcolor) bcolor,
fcolor
FROM
t1
ORDER BY
bcolor,
fcolor;
I have this table here:
What is the order of execution of the above table and why I am getting the following result:
As I understand since ORDER BY is used it will display the table columns (both of them), in alphabetical order and since ON is used it will return the 1st matched duplicate, but I am still confused about how the resulting table is displayed.
Can somebody take me through how exactly this query is executed ?
This is an odd one since you would think that the SELECT would happen first, then the ORDER BY like any normal RDBMS, but the DISTINCT ON is special. It needs to know the order of the records in order to properly determine which records should be dropped.
So, in this case, it orders first by the bcolor, then by the fcolor. Then it determines distinct bcolors, and drops any but the first record for each distinct group.
In short, it does ORDER BY then applies the DISTINCT ON to drop the appropriate records. I think it would be most helpful to think of 'DISTINCT ON' as being special functionality that differs greatly from DISTINCT.
Added after initial post:
This could be done using window functions and a subquery as well:
SELECT
bcolor,
fcolor
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY bcolor ORDER BY fcolor ASC) as rownumber,
bcolor,
fcolor
FROM t1
) t2
WHERE rownumber = 1

group by date aggregate function in postgresql

I'm getting an error running this query
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY persons.updated_at DESC
I get the error ERROR: column "persons.updated_at" must appear in the GROUP BY clause or be used in an aggregate function LINE 5: ORDER BY persons.updated_at DESC
This works if I remove the date( function from the group by call, however I'm using the date function because i want to group by date, not datetime
any ideas
At the moment it is unclear what you want Postgres to return. You say it should order by persons.updated_at but you do not retrieve that field from the database.
I think, what you want to do is:
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY count(updated_at) DESC -- this line changed!
Now you are explicitly telling the DB to sort by the resulting value from the COUNT-aggregate. You could also use: ORDER BY 2 DESC, effectively telling the database to sort by the second column in the resultset. However I highly prefer explicitly stating the column for clarity.
Note that I'm currently unable to test this query, but I do think this should work.
the problem is that, because you are grouping by date(updated_at), the value for updated_at may not be unique, different values of updated_at can return the same value for date(updated_at). You need to tell the database which of the possible values it should use, or alternately use the value returned by the group by, probably one of
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY date(updated_at)
or
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY min(updated_at)