Calculating ratios in postgresql - postgresql

I new to postgresql and I am trying do calculate a rate in a table like this:
class phase
a sold
b stock
c idle
d sold
I want to calculate the total count of sold phases / total like this:
2/4 = 50%
i was trying:
with t as ( select count(class) as total_sold from table where phase='sold')
select total_sold / count(*) from t
group by total_sold
but the result is wrong. How can I do this?

Use AVG() aggregate function:
SELECT 100 * AVG((phase = 'sold')::int) AS avg_sold
FROM tablename;
The boolean expression phase = 'sold' is converted to an integer 1 for true or 0 for false and the average of these values is the ratio that you want.
See the demo.

Related

Get the ID of a table and its modulo respect the total rows in the same table in Postgres

While trying to map some data to a table, I wanted to obtain the ID of a table and its modulo respect the total rows in the same table. For example, given this table:
id
--
1
3
10
12
I would like this result:
id | mod
---+----
1 | 1 <- 1 mod 4
3 | 3 <- 3 mod 4
10 | 2 <- 10 mod 4
12 | 0 <- 12 mod 4
Is there an easy way to achieve this dynamically (as in, not counting the rows on before hand or doing it in an atomic way)?
So far I've tried something like this:
SELECT t1.id, t1.id % COUNT(t1.id) mod FROM tbl t1, tbl t2 GROUP BY t1.id;
This works but you must have the GROUP BY and tbl t2 as otherwise it returns 0 for the mod column which makes sense because I think it works by multiplying the table by itself so each ID gets a full set of the table. I guess for small enough tables this is ok but I can see how this becomes problematic for larger tables.
Edit: Found another hack-ish way:
WITH total AS (
SELECT COUNT(*) cnt FROM tbl
)
SELECT t1.id, t1.id % t2.cnt mod FROM tbl t1, total t2
It similar to the previous query but it "collapses" the multiplication to a single row with the previous count.
You can use COUNT() window function:
SELECT id,
id % COUNT(*) OVER () mod
FROM tbl;
I'm sure that the optimizer is smart enough to calculate the result of the window function only once.
See the demo.

How to get a percentage from two different columns in postgres?

My table is in postgre and I have this table1 in my database:
category status value
type a open 4
type a close 5
type b open 3
type b close 5
type c open 2
type c close 4
and I want to calculate the percentage of open status at each category.
The formula is:
% type x (open) = (open / open + close) * 100
with the query, I expect to get:
category percentage
type a 44,44%
type b 60%
type c 50%
How can I get the desired result with query?
Thanks in advance.
You can create a window that partitions your data on category as follows:
window w as ( partition by category )
Then you can aggregate over that window to get the number of open per category using the defined window:
sum(value) filter (where status = 'open') over w
In the same way you get the total per category using the defined window, the nullif is there to avoid division by 0:
nullif(sum(value) over w, 0)
To put it all together:
select distinct on (category)
category,
100 * sum(value) filter (where status = 'open') over w / nullif(sum(value) over w, 0) as percentage
from your_table
window w as ( partition by category );
As we are using window functions and not a grouping, we need to remove duplicates by adding the distinct on (category)
I think aggregates would be most efficient:
SELECT category,
100.0 * -- use a decimal point to force floating point arithmetic
sum(value) FILTER (WHERE status = 'open') /
nullif(sum(value), 0) AS percentage -- avoid division by zero
FROM your_table
GROUP BY category;

PostgreSQL MIN and MAX function doesn't return one result

In PostgreSql DB, I have a table called Trip.
There is a column called id and a column called meta in the table.
A example of id in one row looks like:
id = 123456
A example of meta in one row looks like:
meta = {"runTime": 3922000, "distance": 85132, "duration": 4049000, "fuelUsed": 19.595927498516176}
I want to select the trip which has the minimum kph from the Trip table and show trip id and minimum kph. This is my query:
select tp."id" tripid, MIN((3600 * (tp."meta"->>'distance')::numeric)
/ ((tp."meta"->>'runTime')::NUMERIC)) minkph FROM "Trip" tp
WHERE tp."createdAt" BETWEEN '2020-04-01 00:00:00+00'
and '2020-04-30 00:00:00+00'
GROUP BY tp."id"
However this query returns all trips' id and division calculation results, not only one row.
Could you please help?
You can order by the calculated kph field and return only the first:
select tp."id" tripid, MIN((3600 * (tp."meta"->>'distance')::numeric)
/ ((tp."meta"->>'runTime')::NUMERIC)) minkph FROM "Trip" tp
WHERE tp."createdAt" BETWEEN '2020-04-01 00:00:00+00'
and '2020-04-30 00:00:00+00'
GROUP BY tp."id"
order by 2
limit 1
Approach 1 - General min by id:
You're expressing the column tp.id on your query so, your select will run the MIN() for every group of id. If you want the global MIN() for your query, just make this:
SELECT MIN((3600 * (tp."meta"->>'distance')::numeric) / ((tp."meta"->>'runTime')::NUMERIC)) minkph
FROM "Trip" tp
WHERE tp."createdAt" BETWEEN '2020-04-01 00:00:00+00'
AND '2020-04-30 00:00:00+00'
Every group function groups by a set of distinct data, if you don't pass any column except the MIN(), the query will result the global result in one line, for all rows.
Approach 2 - General min:
If you want to get the MIN() and the respective id, you can do as follows and do a LIMIT 1. as is:
SELECT tp."id" AS tripid, ((3600 * (tp."meta"->>'distance')::numeric) / ((tp."meta"->>'runTime')::NUMERIC)) minkph
FROM "Trip" tp
WHERE tp."createdAt" BETWEEN '2020-04-01 00:00:00+00'
AND '2020-04-30 00:00:00+00'
ORDER BY 2
LIMIT 1
In time. You can use window functions, but is a bit complex to do.

SQL to select users into groups based on group percentage

To keep this simple, let's say I have a table with 100 records that include:
userId
pointsEarned
I would like to group these 100 records (or whatever the total is based on other criteria) into several groups as follows:
Group 1, 15% of total records
Group 2, 25% of total records
Group 3, 10% of total records
Group 4, 10% of total records
Group 5, 40% (remaining of total records, percentage doesn't really matter)
In addition to the above, there will be a minimum of 3 groups and a maximum of 5 groups with varying percentages that always totally 100%. If it makes it easier, the last group will always be the remainder not picked in the other groups.
I'd like to results to be as follows:
groupNbr
userId
pointsEarned
To do this sort of breakup, you need a way to rank the records so that you can decide which group they belong in. If you do not want to randomise the group allocation, and userId is contiguous number, then using userId would be sufficient. However, you probably can't guarantee that, so you need to create some sort of ranking, then use that to split your data into groups. Here is a simple example.
Declare #Total int
Set #Total = Select COUNT(*) from dataTable
Select case
when ranking <= 0.15 * #Total then 1
when ranking <= 0.4 * #Total then 2
when ranking <= 0.5 * #Total then 3
when ranking <= 0.6 * #Total then 4
else 5 end as groupNbr,
userId,
pointsEearned
FROM (Select userId, pointsEarned, ROW_NUMBER() OVER (ORDER BY userId) as ranking From dataTable) A
If you need to randomise which group data end up in, then you need to allocate a random number to each row first, and then rank them by that random number and then split as above.
If you need to make the splits more flexible, you could design a split table that has columns like minPercentage, maxPercentage, groupNbr, fill it with the splits and do something like this
Declare #Total int
Set #Total = Select COUNT(*) from dataTable
Select S.groupNbr
B.userId,
B.pointsEearned
FROM (Select ranking / #Total * 100 as rankPercent, userId, pointsEarned
FROM (Select userId, pointsEarned, ROW_NUMBER() OVER (ORDER BY userId) as ranking From dataTable) A
) B
inner join splitTable S on S.minPercentage <= rankPercent and S.maxPercentage >= rankPercent

How to calculate median in AWS Redshift?

Most databases have a built in function for calculating the median but I don't see anything for median in Amazon Redshift.
You could calculate the median using a combination of the nth_value() and count() analytic functions but that seems janky. I would be very surprised if an analytics db didn't have a built in method for computing median so I'm assuming I'm missing something.
http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_NTH_WF.html
http://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html
And as of 2014-10-17, Redshift supports the MEDIAN window function:
# select min(median) from (select median(num) over () from temp);
min
-----
4.0
Try the NTILE function.
You would divide your data into 2 ranked groups and pick the minimum value from the first group. That's because in datasets with an odd number of values, the first ntile will have 1 more value than the second. This approximation should work very well for large datasets.
create table temp (num smallint);
insert into temp values (1),(5),(10),(2),(4);
select num, ntile(2) over(order by num desc) from temp ;
num | ntile
-----+-------
10 | 1
5 | 1
4 | 1
2 | 2
1 | 2
select min(num) as median from (select num, ntile(2) over(order by num desc) from temp) where ntile = 1;
median
--------
4
I had difficulty with this also, but got some help from Amazon. Since the 2014-06-30 version of Redshift, you can do this with the PERCENTILE_CONT or PERCENTILE_DISC window functions.
They're slightly weird to use, as they will tack the median (or whatever percentile you choose) onto every row. You put that in a subquery and then take the MIN (or whatever) of the median column.
# select count(num), min(median) as median
from
(select num, percentile_cont (0.5) within group (order by num) over () as median from temp);
count | median
-------+--------
5 | 4.0
(The reason it's complicated is that window functions can also do their own mini-group-by and ordering to give you the median of many groups all at once, and other tricks.)
In the case of an even number of values, CONT(inuous) will interpolate between the two middle values, where DISC(rete) will pick one of them.
I typically use the NTILE function to split the data into two groups if I’m looking for an answer that’s close enough. However, if I want the exact median (e.g. the midpoint of an even set of rows), I use a technique suggested on the AWS Redshift Discussion Forum.
This technique orders the rows in both ascending and descending order, then if there is an odd number of rows, it returns the average of the middle row (that is, where row_num_asc = row_num_desc), which is simply the middle row itself.
CREATE TABLE temp (num SMALLINT);
INSERT INTO temp VALUES (1),(5),(10),(2),(4);
SELECT
AVG(num) AS median
FROM
(SELECT
num,
SUM(1) OVER (ORDER BY num ASC) AS row_num_asc,
SUM(1) OVER (ORDER BY num DESC) AS row_num_desc
FROM
temp) AS ordered
WHERE
row_num_asc IN (row_num_desc, row_num_desc - 1, row_num_desc + 1);
median
--------
4
If there is an even number of rows, it returns the average of the two middle rows.
INSERT INTO temp VALUES (9);
SELECT
AVG(num) AS median
FROM
(SELECT
num,
SUM(1) OVER (ORDER BY num ASC) AS row_num_asc,
SUM(1) OVER (ORDER BY num DESC) AS row_num_desc
FROM
temp) AS ordered
WHERE
row_num_asc IN (row_num_desc, row_num_desc - 1, row_num_desc + 1);
median
--------
4.5