Postgres - Using window function in grouped rows - postgresql

According to the Postgres Doc at https://www.postgresql.org/docs/9.4/queries-table-expressions.html#QUERIES-WINDOW it states
If the query contains any window functions (...), these functions are evaluated after any grouping, aggregation, and HAVING filtering is performed. That is, if the query uses any aggregates, GROUP BY, or HAVING, then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE.
I didn't get the concept of " then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE". Allow me to use an example to explain my doubt:
Using this ready to run example below
with cte as (
select 1 as primary_id, 1 as foreign_id, 10 as begins
union
select 2 as primary_id, 1 as foreign_id, 20 as begins
union
select 3 as primary_id, 1 as foreign_id, 30 as begins
union
select 4 as primary_id, 2 as foreign_id, 40 as begins
)
select foreign_id, count(*) over () as window_rows_count, count(*) as grouped_rows_count
from cte
group by foreign_id
You may notice that the result is
So if "the rows seen by the window functions are the group rows".. then ¿why window_rows_count is returning a different value from grouped_rows_count?

If you remove the window function from the query:
select foreign_id, count(*) as grouped_rows_count
from cte
group by foreign_id
the result, as expected is this:
> foreign_id | grouped_rows_count
> ---------: | -----------------:
> 1 | 3
> 2 | 1
and on this result, which is 2 rows, if you also apply the window function count(*) over(), it will return 2, because it counts all the rows of the resultset since the over clause is empty, without any partition.

You should follow the last comment on your post.
And for more analysis, you may process the following query :
with cte as (
select 1 as primary_id, 1 as foreign_id, 10 as begins
union
select 2 as primary_id, 1 as foreign_id, 20 as begins
union
select 3 as primary_id, 1 as foreign_id, 30 as begins
union
select 4 as primary_id, 2 as foreign_id, 40 as begins
)
select foreign_id, count(*) over (PARTITION BY foreign_id) as window_rows_count, count(*) as grouped_rows_count
from cte
group by foreign_id ;
You'll see this time that you are getting 1 row for each foreign id.
Checkout the documentation on postgres at this url :
https://www.postgresql.org/docs/13/tutorial-window.html
The window function is applied to the whole set obtained by the former query.

Related

I want to select 2 data from database which durations less than 150

I have a problem with my SQL command. I want to select 2 movies which 2 movies sum of durations less than 150 I wrote this SQL command:
Select
movie_title,Sum(movie_time) as sum_movie
From
movie_movie
Group By
movie_title
Having
Sum(movie_time)<100
Order By
sum_movie DESC
You can get two movies with minimum movie_time values ​​with order by movie_time ASC limit 2 in CTE, and then use that in the condition.
with two_min_movie as (
select *
from movie_movie
order by movie_time ASC limit 2
)
select *
from two_min_movie
where (select sum(movie_time) from two_min_movie) < 150
Demo in DBfiddle

Get the ID of a table and its modulo respect the total rows in the same table in Postgres

While trying to map some data to a table, I wanted to obtain the ID of a table and its modulo respect the total rows in the same table. For example, given this table:
id
--
1
3
10
12
I would like this result:
id | mod
---+----
1 | 1 <- 1 mod 4
3 | 3 <- 3 mod 4
10 | 2 <- 10 mod 4
12 | 0 <- 12 mod 4
Is there an easy way to achieve this dynamically (as in, not counting the rows on before hand or doing it in an atomic way)?
So far I've tried something like this:
SELECT t1.id, t1.id % COUNT(t1.id) mod FROM tbl t1, tbl t2 GROUP BY t1.id;
This works but you must have the GROUP BY and tbl t2 as otherwise it returns 0 for the mod column which makes sense because I think it works by multiplying the table by itself so each ID gets a full set of the table. I guess for small enough tables this is ok but I can see how this becomes problematic for larger tables.
Edit: Found another hack-ish way:
WITH total AS (
SELECT COUNT(*) cnt FROM tbl
)
SELECT t1.id, t1.id % t2.cnt mod FROM tbl t1, total t2
It similar to the previous query but it "collapses" the multiplication to a single row with the previous count.
You can use COUNT() window function:
SELECT id,
id % COUNT(*) OVER () mod
FROM tbl;
I'm sure that the optimizer is smart enough to calculate the result of the window function only once.
See the demo.

How to normalize group by count results?

How can the results of a "group by" count be normalized by the count's sum?
For example, given:
User Rating (1-5)
----------------------
1 3
1 4
1 2
3 5
4 3
3 2
2 3
The result will be:
User Count Percentage
---------------------------
1 3 .42 (=3/7)
2 1 .14 (=1/7)
3 2 .28 (...)
4 1 .14
So for each user the number of ratings they provided is given as the percentage of the total ratings provided by everyone.
SELECT DISTINCT ON (user) user, count(*) OVER (PARTITION BY user) AS cnt,
count(*) OVER (PARTITION BY user) / count(*) OVER () AS percentage;
The count(*) OVER (PARTITION BY user) is a so-called window function. Window functions let you perform some operation over a "window" created by some "partition" which is here made over the user id. In plain and simple English: the partitioned count(*) is calculated for each distinct user value, so in effect it counts the number of rows for each user value.
Without using a windowing function or variables, you will need to cross join a grouped subquery on a second "maxed" subquery then select again to return a subset you can work with.
SELECT
B.UserID,
B.UserCount,
A.CountAll
FROM
(
SELECT
CountAll=SUM(UserCount)
FROM
(
SELECT
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
) AS A
)AS C
CROSS JOIN(
SELECT
UserID,
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
)AS B

TSQL: Inserting missing records into table

I am stuck at this T-SQL query.
I have table below
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
1 Section2 500
3 Section2 100
4 Section2 200
Lets say for each section I can have maximum 5 Age. In above table there are some missing Ages. How do I insert missing Ages for each section. (Possibly without using cursor). The cost would be zero for missing Ages
So after the insertion the table should look like
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
3 Section1 0
4 Section1 0
5 Section1 0
1 Section2 500
2 Section2 0
3 Section2 100
4 Section2 200
5 Section2 0
EDIT1
I should have been more clear with my question. The maximum age is dynamic value. It could be 5,6,10 or someother value but it will be always less than 25.
I think I got it
;WITH tally AS
(
SELECT 1 AS r
UNION ALL
SELECT r + 1 AS r
FROM tally
WHERE r < 5 -- this value could be dynamic now
)
select n.r, t.SectionName, 0 as Cost
from (select distinct SectionName from TempFormsSectionValues) t
cross join
(select ta.r FROM tally ta) n
where not exists
(select * from TempFormsSectionValues where YearsAgo = n.r and SectionName = t.SectionName)
order by t.SectionName, n.r
You can use this query to select missing value:
select n.num, t.SectioName, 0 as Cost
from (select distinct SectioName from table1) t
cross join
(select 1 as num union select 2 union select 3 union select 4 union select 5) n
where not exists
(select * from table1 where table1.age = n.num and table1.SectioName = t.SectioName)
It creates a Cartesian product of sections and numbers 1 to 5 and then selects those that doesn't exist yet. You can then use this query for the source of insert into your table.
SQL Fiddle (it has order by added to check the results easier but it's not necessary for inserting).
Use below query to generate missing rows
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
ORDER BY Section,Age
SQL Fiddle
You can utilize above result set for inserting missing rows by using EXCEPT operator to exclude already existing rows in table -
INSERT INTO test
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
EXCEPT
SELECT Age,Section,Cost
FROM test
SELECT * FROM test
ORDER BY Section,Age
http://www.sqlfiddle.com/#!3/d9035/11

T-SQL End of Month sum

I have a table with some transaction fields, primary id is a CUSTomer field and a TXN_DATE and for two of them, NOM_AMOUNT and GRS_AMOUNT I need an EndOfMonth SUM (no rolling, just EOM, can be 0 if no transaction in the month) for these two amount fields. How can I do it? I need also a 0 reported for months with no transactions..
Thank you!
If you group by the expresion month(txn_date) you can calculate the sum. If you use a temporary table with a join on month you can determine which months have no records and thus report a 0 (or null if you don't use the coalesce fiunction).
This will be your end result, I assume you are able to add the other column you need to sum and adapt for your schema.
select mnt as month
, sum(coalesce(NOM_AMOUNT ,0)) as NOM_AMOUNT_EOM
, sum(coalesce(GRS_AMOUNT ,0)) as GRS_AMOUNT_EOM
from (
select 1 as mnt
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
union all select 10
union all select 11
union all select 12) as m
left outer join Table1 as t
on m.mnt = month(txn_date)
group by mnt
Here is the initial working sqlfiddle