Window functions partition and order without subquery - postgresql

Given a simple table like so in postgres:
CREATE TABLE products (
product_id serial PRIMARY KEY,
group_id INT NOT NULL,
price DECIMAL (11, 2)
);
INSERT INTO products (product_id, group_id,price)
VALUES
(1, 1, 200),
(2, 1, 400),
(3, 1, 500),
(4, 1, 900),
(5, 2, 1200),
(6, 2, 700),
(7, 2, 700),
(8, 2, 800),
(9, 3, 700),
(10, 3, 150),
(11, 3, 200);
How do I query using window functions the group_id and the avg_price, order by avg_price? So the current result I have is only via a subquery:
select * from (
select
distinct group_id,
avg(price) over (partition by group_id) avg_price
from products)
a order by avg_price desc;
But I believe there are more elegent solutions to this.

Window functions can be used in the ORDER BY clause, in addition to the SELECT clause, so the following query is valid:
SELECT
group_id,
AVG(price) OVER (PARTITION BY group_id) avg_price
FROM products
ORDER BY
AVG(price) OVER (PARTITION BY group_id);
However, given that you seem to want to use DISTINCT, I suspect that what you really want here is a GROUP BY query:
SELECT
group_id,
AVG(price) AS avg_price
FROM products
GROUP BY
group_id
ORDER BY
AVG(price);

Related

How to add number on next row based on condition (consecutive hours calculation)

I am trying to calculate consecutive hours based on bottom conditions.
If an employee works continuously with less than 1.5 hours (90 minutes) of interval between each punch in and punch out, those punch hours are added as consecutive hours.
However, if there is more than 90 minute interval between each punch in and out, those punch hours are not added up.
I have bottom illustration in screenshot:
Here is dataset:
select *
into #temp
from
(values
(1, 100001, '2021-12-12 23:31', '2021-12-12 23:59', '2021-12-13 00:00', 1, 0.47, 'solo/add'),
(2, 100001, '2021-12-13 00:00', '2021-12-13 03:07', '2021-12-13 03:37', 30, 3.12, 'solo/add'),
(3, 100001, '2021-12-13 03:37', '2021-12-13 07:07', '2021-12-13 23:17', 970, 3.5, 'no add'),
(4, 100001, '2021-12-13 23:17', '2021-12-13 23:59', NULL, NULL, 0.7, 'solo/add'),
(5, 100003, '2021-12-12 05:50', '2021-12-12 11:00', '2021-12-12 11:30', 30, 5.17, 'solo/add'),
(6, 100003, '2021-12-12 11:30', '2021-12-12 14:25', '2021-12-13 05:51', 926, 2.92, 'no add'),
(7, 100003, '2021-12-13 05:51', '2021-12-13 11:05', '2021-12-13 11:36', 31, 5.23, 'solo/add'),
(8, 100003, '2021-12-13 11:36', '2021-12-13 14:25', NULL, NULL, 2.81, 'solo/add')
)
t1
(id, EmployeeID, punch_start, punch_end, next_punch_start, MinuteDiff, punch_hr, Decide)
The Excel file's screenshot shows the expected output in "ConsecutiveHours" column.
So, on this example, there are two incidents where two punch_hours were added together (illustrated in green and bold):
0.47 + 3.12 = 3.59
5.23 + 2.81 = 8.04
I have two different employees here and id was created (ordered) by EmployeeID and punch_start asc.
How do we go about writing this logic in T-SQL?
You need to group those consecutive rows together. You can use window function LAG() to identify. Once you have that, perform a cumulative sum partition by Employee and the group
with cte as
(
select *,
g = case when Decide
<> lag(Decide, 1, '') over (partition by EmployeeID
order by punch_start)
then 1
else 0
end
from #temp
),
cte2 as
(
select *, grp = sum(g) over (partition by EmployeeID order by punch_start)
from cte
)
select *,
Hours = sum(punch_hr) over (partition by EmployeeID, grp order by punch_start)
from cte2

postgres aggregate subset from group by rows

I'm trying to evaluate user loyalty bonuses balance when bonuses burns after half-year inactivity. I want my sum consist of ord's 4, 5 and 6 for user 1.
create table transactions (
user int,
ord int, -- transaction date replacement
amount int,
lag interval -- after previous transaction
);
insert into transactions values
(1, 1, 10, '1h'::interval),
(1, 2, 10, '.5y'::interval),
(1, 3, 10, '1h'::interval),
(1, 4, 10, '.5y'::interval),
(1, 5, 10, '.1h'::interval),
(1, 6, 10, '.1h'::interval),
(2, 1, 10, '1h'::interval),
(2, 2, 10, '.5y'::interval),
(2, 3, 10, '.1h'::interval),
(2, 4, 10, '.1h'::interval),
(3, 1, 10, '1h'::interval),
;
select user, sum(
amount -- but starting from last '.5y'::interval if any otherwise everything counts
) from transactions group by user
user | sum(amount)
--------------------
1 | 30 -- (4+5+6), not 50, not 60
2 | 30 -- (2+3+4), not 40
3 | 10
try this:
with cte as(
select *,
case when (lead(lag) over (partition by user_ order by ord)) >= interval '.5 year'
then 1 else 0 end "flag" from test
),
cte1 as (
select *,
case when flag=(lag(flag,1) over (partition by user_ order by ord)) then 0 else 1 end "flag1" from cte
)
select distinct on (user_) user_, sum(amount) over (partition by user_,grp order by ord) from (
select *, sum(flag1) over (partition by user_ order by ord) "grp" from cte1) t1
order by user_ , ord desc
DEMO
Though it is very complicated and slow but resolve your problem
Is this what you're looking for ?
with last_5y as(
select "user", max(ord) as ord
from transactions
where lag = '.5y'::interval group by "user"
) select t.user, sum(amount)
from transactions t, last_5y t2
where t.user = t2.user and t.ord >= t2.ord
group by t.user

multiple named windows in a postgres query

The postgres docs specify a window definition clause thus:
[ WINDOW window_name AS ( window_definition ) [, ...] ]
The [,...] specifies that multiple windows are possible. I find nothing else in the docs to confirm or deny it's possible. How do I make this work?
In this query, I can use either window clause on its own but I can't use both even though the syntax follows the spec:
select q.*
, min(value) over w_id as min_id_val
--, min(value) over w_kind as min_kind_val
from (
select 1 as id, 1 as kind, 3.0 as value
union select 1, 2, 1.0
union select 2, 1, 2.0
union select 2, 2, 0.5
) as q
window w_id as (partition by id)
-- ,
-- window w_kind as (partition by kind)
I can get the technical effect by not using window definitions, but that gets tiresome for a complex query where windows are re-used:
select q.*
, min(value) over (partition by id) as min_id_val
, min(value) over (partition by kind) as min_kind_val
from (
select 1 as id, 1 as kind, 3.0 as value
union select 1, 2, 1.0
union select 2, 1, 2.0
union select 2, 2, 0.5
) as q
Don't repeat the window keyword:
select q.*,
min(value) over w_id as min_id_val,
min(value) over w_kind as min_kind_val
from (
values
(1,1,3.0),
(1, 2, 1.0),
(2, 1, 2.0),
(2, 2, 0.5)
) as q(id,kind,value)
window w_id as (partition by id),
w_kind as (partition by kind)

PostgreSQL Get holes in index column

I suppose it is not easy to query a table for data which don't exists but maybe here is some trick to achieve holes in one integer column (rowindex).
Here is small table for illustrating concrete situation:
DROP TABLE IF EXISTS examtable1;
CREATE TABLE examtable1
(rowindex integer primary key, mydate timestamp, num1 integer);
INSERT INTO examtable1 (rowindex, mydate, num1)
VALUES (1, '2015-03-09 07:12:45', 1),
(3, '2015-03-09 07:17:12', 4),
(5, '2015-03-09 07:22:43', 1),
(6, '2015-03-09 07:25:15', 3),
(7, '2015-03-09 07:41:46', 2),
(10, '2015-03-09 07:42:05', 1),
(11, '2015-03-09 07:45:16', 4),
(14, '2015-03-09 07:48:38', 5),
(15, '2015-03-09 08:15:44', 2);
SELECT rowindex FROM examtable1;
With showed query I get all used indexes listed.
But I would like to get (say) first five indexes which is missed so I can use them for insert new data at desired rowindex.
In concrete example result will be: 2, 4, 8, 9, 12 what represent indexes which are not used.
Is here any trick to build a query which will give n number of missing indexes?
In real, such table may contain many rows and "holes" can be anywhere.
You can do this by generating a list of all numbers using generate_series() and then check which numbers don't exist in your table.
This can either be done using an outer join:
select nr.i as missing_index
from (
select i
from generate_series(1, (select max(rowindex) from examtable1)) i
) nr
left join examtable1 t1 on nr.i = t1.rowindex
where t1.rowindex is null;
or an not exists query:
select i
from generate_series(1, (select max(rowindex) from examtable1)) i
where not exists (select 1
from examtable1 t1
where t1.rowindex = i.i);
I have used a hardcoded lower bound for generate_series() so that you would also detect a missing rowindex that is smaller than the lowest number.

Select all but sort by count in postgresql

I have a table myTable with a lot of columns, keep in mind this table is too big, and one of that columns is a geometry point, we'll call it mySortColumn. I need to sort my select by count mySortColumn when there are the same.
One example could be this
myTable
id, mySortColumn
----------------
1, ASD12321F
2, ASD12321G
3, ASD12321F
4, ASD12321G
5, ASD12321H
6, ASD12321F
I have a query which can do what I want, the problem is the time. Actually it take like 30 seconds, and it seems like this:
SELECT
id,
mySortColumn
FROM
myTable
JOIN (
SELECT
mySortColumn,
ST_Y(mySortColumn) AS lat,
ST_X(mySortColumn) AS lng,
COUNT(*)
FROM myTable
GROUP BY mySortColumn
HAVING COUNT(*) > 1
) AS myPosition ON (
ST_X(myTable.mySortColumn) = myPosition.lng
AND ST_Y(myTable.mySortColumn) = myPosition.lat
)
WHERE
<some filters>
ORDER BY COUNT DESC
The result must be this:
id, mySortColumn
----------------
1, ASD12321F
3, ASD12321F
6, ASD12321F
2, ASD12321G
4, ASD12321G
5, ASD12321H
I hope you can help me.
Here you are:
select * from myTable order by count(1) over (partition by mySortColumn) desc;
For more info about aggregate over () construction have a look at:
http://www.postgresql.org/docs/9.4/static/tutorial-window.html