PostgreSQL: Find percentages of total_films_rented

PostgreSQL: Find percentages of total_films_rented - postgresql

The code below gives me the following results
Early: 7738
Late: 6586
On Time: 1720
How would I take this a step further and add a third column that finds the percentages?
Here is a link to the ERD and database set-up: https://www.postgresqltutorial.com/postgresql-sample-database/
WITH
t1
AS
(
SELECT *, DATE_PART('day', return_date - rental_date) AS days_rented
FROM rental
),
t2
AS
(
SELECT rental_duration, days_rented,
CASE WHEN rental_duration > days_rented THEN 'Early'
WHEN rental_duration = days_rented THEN 'On Time'
ELSE 'Late'
END AS rental_return_status
FROM film f, inventory i, t1
WHERE f.film_id = i.film_id AND t1.inventory_id = i.inventory_id
)
SELECT rental_return_status, COUNT(*) AS total_films_rented
FROM t2
GROUP BY 1
ORDER BY 2 DESC;

You can use a window function with one CTE table (instead of 2):
WITH raw_status AS (
SELECT rental_duration - DATE_PART('day', return_date - rental_date) AS days_remaining
FROM rental r
JOIN inventory i ON r.inventory_id=i.inventory_id
JOIN film f on f.film_id=i.film_id
)
SELECT CASE WHEN days_remaining > 0 THEN 'Early'
WHEN days_remaining = 0 THEN 'On Time'
ELSE 'Late' END AS rental_status,
count(*),
(100*count(*))/sum(count(*)) OVER () AS percentage
FROM raw_status
GROUP BY 1;
rental_status | count | percentage
---------------+-------+---------------------
Early | 7738 | 48.2298678633757168
On Time | 1720 | 10.7205185739217153
Late | 6586 | 41.0496135627025679
(3 rows)
Disclosure: I work for EnterpriseDB (EDB)

Use a window function to get the sum of the count column (sum(count(*)) over ()), then just divide the count by that (count(*)/sum(count(*)) over ()). Multiply by 100 to make it a percentage.
psql (12.1 (Debian 12.1-1))
Type "help" for help.
testdb=# CREATE TABLE faket2 AS (
SELECT 'early' AS rental_return_status UNION ALL
SELECT 'early' UNION ALL
SELECT 'ontime' UNION ALL
SELECT 'late');
SELECT 4
testdb=# SELECT
rental_return_status,
COUNT(*) as total_films_rented,
(100*count(*))/sum(count(*)) over () AS percentage
FROM faket2
GROUP BY 1
ORDER BY 2 DESC;
rental_return_status | total_films_rented | percentage
----------------------+--------------------+---------------------
early | 2 | 50.0000000000000000
late | 1 | 25.0000000000000000
ontime | 1 | 25.0000000000000000
(3 rows)

Related

MySQL group by timestamp difference

I need to write mysql query which will group results by difference between timestamps.
Is it possible?
I have table with locations and every row has created_at (timestamp) and I want to group results by difference > 1min.
Example:
id | lat | lng | created_at
1. | ... | ... | 2020-05-03 06:11:35
2. | ... | ... | 2020-05-03 06:11:37
3. | ... | ... | 2020-05-03 06:11:46
4. | ... | ... | 2020-05-03 06:12:48
5. | ... | ... | 2020-05-03 06:12:52
Result of this data should be 2 groups (1,2,3) and (4,5)

It depends on what you actually want. If youw want to group together records that belong to the same minute, regardless of the difference with the previous record, then simple aggregation is enough:
select
date_format(created_at, '%Y-%m-%d %H:%i:00') date_minute,
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from mytable
group by date_minute
On the other hand, if you want to build groups of consecutive records that have less than 1 minute gap in between, this is a gaps and islands problem. Here is on way to solve it using window functions (available in MySQL 8.0):
select
min(id) min_id,
max(id) max_id,
min(created_at) min_created_at,
max(created_at) max_created_at,
count(*) no_records
from (
select
t.*,
sum(case when created_at < lag_created_at + interval 1 minute then 0 else 1 end)
over(order by created_at) grp
from (
select
t.*,
lag(created_at) over(order by created_at) lag_created_at
from mytable t
) t
) t
group by grp

How to force query to return only first row from window?

I have data:
id | price | date
1 | 25 | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
Is it possible to write such query which will return only first row from window? something like LIMIT 1 but for the window OVER( date )?
I expect next result:
id | price | date
1 | 25 | 2019-01-01
1 | 27 | 2019-02-01
Or ignore whole window if first window row has NULL:
id | price | date
1 | NULL | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
result:
1 | 27 | 2019-02-01

Order the rows by date and id, and take only the first row per date.
Then remove those where the price is NULL.
SELECT *
FROM (SELECT DISTINCT ON (date)
id, price, date
FROM mytable
ORDER BY date, id
) AS q
WHERE price IS NOT NULL;
#Laurenz let me to provide a bit more explanation
select distinct on (<fldlist>) * from <table> order by <fldlist+>;
is equal to much more complex query:
select * from (
select row_number() over (partition by <fldlist> order by <fldlist+>) as rn,*
from <table>)
where rn = 1;
And here <fldlist> should be the beginning part (or equal) of <fldlist+>

As Myon on IRC said:
if you want to use a window function in WHERE, you need to put it into a subselect first
So the target query is:
select * from (
select
*
agg_function( my_field ) OVER( PARTITION BY other_field ) as agg_field
from sometable
) x
WHERE agg_field <condition>
In my case I have next query:
SELECT * FROM (
SELECT *,
FIRST_VALUE( p.price ) over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS first_price,
ROW_NUMBER() over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS row_number
FROM st
LEFT JOIN price p ON <COND>
LEFT JOIN currency_rate crate ON <COND>
) p
WHERE p.row_number = 1 AND p.first_price IS NOT null
Here I select only first rows from the group and where price IS NOT NULL

How do I join multiple select results into a single table?

I have a query which returns monthly averages from the same table, but for different pressure_level's:
SELECT some_id, avg(exposure_value) monthly_avg_1000
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
I then have the same query, but for a different pressure_level:
SELECT some_id, avg(exposure_value) monthly_avg_925
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
Both queries return 12 rows (1 per month) with the ID and the average value for the month:
some_id | monthly_avg_1000
--------------------------
1 | 0.000023
1 | 0.000051
1 | 0.000009
some_id | monthly_avg_925
--------------------------
1 | 0.000014
1 | 0.000007
1 | 0.000131
I would like to combine the two queries so that the monthly_avg_* columns all appear in the final table:
some_id | monthly_avg_1000 | monthly_avg_925
--------------------------
1 | 0.000023 | 0.000014
1 | 0.000051 | 0.000007
1 | 0.000009 | 0.000131
How can I do this?

if you have same id, then you can try join:
with a as (
SELECT some_id, avg(exposure_value) monthly_avg_1000,date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
, b as (
SELECT some_id, avg(exposure_value) monthly_avg_925, date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
select distinct a.some_id, monthly_avg_1000,monthly_avg_925
from a
join b on a.some_id = b.some_id and a.d = b.d

Iterate through rows, compare them against each other and store results in another table

I have a table that contains the following rows:
product_id | order_date
A | 12/04/12
A | 01/11/13
A | 01/21/13
A | 03/05/13
B | 02/14/13
B | 03/09/13
What I now need is an overview for each month, how many products have been bought for the first time (=have not been bought the month before), how many are existing products (=have been bought the month before) and how many have not been purchased within a given month. Taken the sample above as an input, the script should deliver the following result, regardless of what period of time is in the data:
month | new | existing | nopurchase
12/2012 | 1 | 0 | 0
01/2013 | 0 | 1 | 0
02/2013 | 1 | 0 | 1
03/2013 | 1 | 1 | 0
Would be great to get a first hint how this could be solved so I'm able to continue.
Thanks!

SQL Fiddle
with t as (
select product_id pid, date_trunc('month', order_date)::date od
from t
group by 1, 2
)
select od,
sum(is_new::integer) "new",
sum(is_existing::integer) existing,
sum(not_purchased::integer) nopurchase
from (
select od,
lag(t_pid) over(partition by s_pid order by od) is null and t_pid is not null is_new,
lag(t_pid) over(partition by s_pid order by od) is not null and t_pid is not null is_existing,
lag(t_pid) over(partition by s_pid order by od) is not null and t_pid is null not_purchased
from (
select t.pid t_pid, s.pid s_pid, s.od
from
t
right join
(
select pid, s.od
from
t
cross join
(
select date_trunc('month', d)::date od
from
generate_series(
(select min(od) from t),
(select max(od) from t),
'1 month'
) s(d)
) s
group by pid, s.od
) s on t.od = s.od and t.pid = s.pid
) s
) s
group by 1
order by 1

How do I get min, median and max from my query in postgresql?

I have written a query in which one column is a month. From that I have to get min month, max month, and median month. Below is my query.
select ext.employee,
pl.fromdate,
ext.FULL_INC as full_inc,
prevExt.FULL_INC as prevInc,
(extract(year from age (pl.fromdate))*12 +extract(month from age (pl.fromdate))) as month,
case
when prevExt.FULL_INC is not null then (ext.FULL_INC -coalesce(prevExt.FULL_INC,0))
else 0
end as difference,
(case when prevExt.FULL_INC is not null then (ext.FULL_INC - prevExt.FULL_INC) / prevExt.FULL_INC*100 else 0 end) as percent
from pl_payroll pl
inner join pl_extpayfile ext
on pl.cid = ext.payrollid
and ext.FULL_INC is not null
left outer join pl_extpayfile prevExt
on prevExt.employee = ext.employee
and prevExt.cid = (select max (cid) from pl_extpayfile
where employee = prevExt.employee
and payrollid = (
select max(p.cid)
from pl_extpayfile,
pl_payroll p
where p.cid = payrollid
and pl_extpayfile.employee = prevExt.employee
and p.fromdate < pl.fromdate
))
and coalesce(prevExt.FULL_INC, 0) > 0
where ext.employee = 17
and (exists (
select employee
from pl_extpayfile preext
where preext.employee = ext.employee
and preext.FULL_INC <> ext.FULL_INC
and payrollid in (
select cid
from pl_payroll
where cid = (
select max(p.cid)
from pl_extpayfile,
pl_payroll p
where p.cid = payrollid
and pl_extpayfile.employee = preext.employee
and p.fromdate < pl.fromdate
)
)
)
or not exists (
select employee
from pl_extpayfile fext,
pl_payroll p
where fext.employee = ext.employee
and p.cid = fext.payrollid
and p.fromdate < pl.fromdate
and fext.FULL_INC > 0
)
)
order by employee,
ext.payrollid desc
If it is not possible, than is it possible to get max month and min month?

To calculate the median in PostgreSQL, simply take the 50% percentile (no need to add extra functions or anything):
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY x) FROM t;

You want the aggregate functions named min and max. See the PostgreSQL documentation and tutorial:
http://www.postgresql.org/docs/current/static/tutorial-agg.html
http://www.postgresql.org/docs/current/static/functions-aggregate.html
There's no built-in median in PostgreSQL, however one has been implemented and contributed to the wiki:
http://wiki.postgresql.org/wiki/Aggregate_Median
It's used the same way as min and max once you've loaded it. Being written in PL/PgSQL it'll be a fair bit slower, but there's even a C version there that you could adapt if speed was vital.
UPDATE After comment:
It sounds like you want to show the statistical aggregates alongside the individual results. You can't do this with a plain aggregate function because you can't reference columns not in the GROUP BY in the result list.
You will need to fetch the stats from subqueries, or use your aggregates as window functions.
Given dummy data:
CREATE TABLE dummystats ( depname text, empno integer, salary integer );
INSERT INTO dummystats(depname,empno,salary) VALUES
('develop',11,5200),
('develop',7,4200),
('personell',2,5555),
('mgmt',1,9999999);
... and after adding the median aggregate from the PG wiki:
You can do this with an ordinary aggregate:
regress=# SELECT min(salary), max(salary), median(salary) FROM dummystats;
min | max | median
------+---------+----------------------
4200 | 9999999 | 5377.5000000000000000
(1 row)
but not this:
regress=# SELECT depname, empno, min(salary), max(salary), median(salary)
regress-# FROM dummystats;
ERROR: column "dummystats.depname" must appear in the GROUP BY clause or be used in an aggregate function
because it doesn't make sense in the aggregation model to show the averages alongside individual values. You can show groups:
regress=# SELECT depname, min(salary), max(salary), median(salary)
regress-# FROM dummystats GROUP BY depname;
depname | min | max | median
-----------+---------+---------+-----------------------
personell | 5555 | 5555 | 5555.0000000000000000
develop | 4200 | 5200 | 4700.0000000000000000
mgmt | 9999999 | 9999999 | 9999999.000000000000
(3 rows)
... but it sounds like you want the individual values. For that, you must use a window, a feature new in PostgreSQL 8.4.
regress=# SELECT depname, empno,
min(salary) OVER (),
max(salary) OVER (),
median(salary) OVER ()
FROM dummystats;
depname | empno | min | max | median
-----------+-------+------+---------+-----------------------
develop | 11 | 4200 | 9999999 | 5377.5000000000000000
develop | 7 | 4200 | 9999999 | 5377.5000000000000000
personell | 2 | 4200 | 9999999 | 5377.5000000000000000
mgmt | 1 | 4200 | 9999999 | 5377.5000000000000000
(4 rows)
See also:
http://www.postgresql.org/docs/current/static/tutorial-window.html
http://www.postgresql.org/docs/current/static/functions-window.html

One more option for median:
SELECT x
FROM table
ORDER BY x
LIMIT 1 offset (select count(*) from x)/2

To find Median:
for instance consider that we have 6000 rows present in the table.First we need to take half rows from the original Table (because we know that median is always the middle value) so here half of 6000 is 3000(Take 3001 for getting exact two middle value).
SELECT *
FROM (SELECT column_name
FROM Table_name
ORDER BY column_name
LIMIT 3001)As Table1
ORDER BY column_name DESC ---->Look here we used DESC(Z-A)it will display the last
-- two values(using LIMIT 2) i.e (3000th row and 3001th row) from 6000
-- rows
LIMIT 2;