Finding ratio of grouped data in PostgreSQL

Finding ratio of grouped data in PostgreSQL - postgresql

I'd like to find ratios after grouping data by QY and Column2.
In every quarter ratio = count of( B/(B+C) ) to be calculated, i.e. in 1-2021 the ratio is 2000/(2000+3000)=0.4.
Grouped data is as follows:
QY
Column 2
Count
1-2021
A
1000
1-2021
B
2000
1-2021
C
3000
2-2021
A
5000
2-2021
B
3000
2-2021
C
4000
Could you kindly help me with the script in postgres?

Use SUM and FILTER to get the records that you need. Something like this should work:
WITH table_name(q, x, y) AS (
SELECT *
FROM
(VALUES
('1-2021','A', 1000),
('1-2021','B', 2000),
('1-2021','C', 3000),
('2-2021','A', 5000),
('2-2021','B', 3000),
('2-2021','C', 4000)
)s
)
SELECT q
, ROUND(
SUM(y) FILTER(WHERE x = 'B')
/ CAST(SUM(y) AS DECIMAL)
,2) AS ratio
FROM table_name
WHERE x IN('B','C')
GROUP BY q
ORDER BY q;
The CTE is there just to simulate an existing table, just for testing.

Related

Finding the timeslot with the maximum decrease in count of nearby points

For each entry in the loc_of_interest table, I want to find the 15 minute timeslot (from the data in the other cte) with the maximum decrease in count of nearby points. I do not know how to proceed beyond the 'pseudocode' part, and indeed, am uncertain if I am going in the right direction with this existing code as well.
Here is my code:
-- I have two cte's already made
subset_cr -- many rows of data
(device_id, points_geom, time_created)
loc_of_interest -- 2 rows of data
(loc_id, points_geom)
-- here is how I wish to proceed:
with temp as (
SELECT loi.loc_id AS loc_id,
routes.fifteen_min_slot ,
routes.count_of_near_points
FROM loc_of_interest as loi
CROSS JOIN LATERAL (
SELECT date_trunc('hour', routes.time_created) + date_part('minute', routes.time_created)::int / 15 * interval '15 min' as fifteen_min_slot,
count (ST_DWithin(
loi.point_geom::geography,
st_transform(route_points.point_geom,4326)::geography,
100)) as count_of_near_points
FROM subset_cr as routes
) routes
group by 1,2
)
--pseudocode below
for each loc_id
select fifteen_min_slot
from temp
where difference in count_of_near_points is max
Code update:
I have added the following code for the pseudocode I wrote earlier:
tempy as (
select loc_id, fifteen_min_slot, count_of_near_points - lag (count_of_near_points) over (partition by loc_id, order by fifteen_min_slot) as lagging_diff
from temp
)
select loc_id, fifteen_min_slot
from tempy
where lagging_diff = (select max lagging_diff from tempy)

Calculating ratios in postgresql

I new to postgresql and I am trying do calculate a rate in a table like this:
class phase
a sold
b stock
c idle
d sold
I want to calculate the total count of sold phases / total like this:
2/4 = 50%
i was trying:
with t as ( select count(class) as total_sold from table where phase='sold')
select total_sold / count(*) from t
group by total_sold
but the result is wrong. How can I do this?

Use AVG() aggregate function:
SELECT 100 * AVG((phase = 'sold')::int) AS avg_sold
FROM tablename;
The boolean expression phase = 'sold' is converted to an integer 1 for true or 0 for false and the average of these values is the ratio that you want.
See the demo.

Is there a smarter method to create series with different intervalls for count within a query?

I want to create different intervalls:
0 to 10 steps 1
10 to 100 steps 10
100 to 1.000 steps 100
1.000 to 10.000 steps 1.000
to query a table for count the items.
with "series" as (
(SELECT generate_series(0, 10, 1) AS r_from)
union
(select generate_series(10, 90, 10) as r_from)
union
(select generate_series(100, 900, 100) as r_from)
union
(select generate_series(1000, 9000, 1000) as r_from)
order by r_from
)
, "range" as ( select r_from
, case
when r_from < 10 then r_from + 1
when r_from < 100 then r_from + 10
when r_from < 1000 then r_from + 100
else r_from + 1000
end as r_to
from series)
select r_from, r_to,(SELECT count(*) FROM "my_table" WHERE "my_value" BETWEEN r_from AND r_to) as "Anz."
FROM "range";

I think generate_series is the right way, there is another way, we can use simple math to calculate the numbers.
SELECT 0 as r_from,1 as r_to
UNION ALL
SELECT power(10, steps ) * v ,
power(10, steps ) * v + power(10, steps )
FROM generate_series(1, 9, 1) v
CROSS JOIN generate_series(0, 3, 1) steps
so that might as below
with "range" as
(
SELECT 0 as r_from,1 as r_to
UNION ALL
SELECT power(10, steps) * v ,
power(10, steps) * v + power(10, steps)
FROM generate_series(1, 9, 1) v
CROSS JOIN generate_series(0, 3, 1) steps
)
select r_from, r_to,(SELECT count(*) FROM "my_table" WHERE "my_value" BETWEEN r_from AND r_to) as "Anz."
FROM "range";
sqlifddle

Rather than generate_series you could create defined integer range types (int4range), then test whether your value is included within the range (see Range/Multirange Functions and Operators. So
with ranges (range_set) as
( values ( int4range(0,10,'[)') )
, ( int4range(10,100,'[)') )
, ( int4range(100,1000,'[)') )
, ( int4range(1000,10000,'[)') )
) --select * from ranges;
select lower(range_set) range_start
, upper(range_set) - 1 range_end
, count(my_value) cnt
from ranges r
left join my_table mt
on (mt.my_value <# r.range_set)
group by r.range_set
order by lower(r.range_set);
Note the 3rd parameter in creating the ranges.
Creating a CTE as above is good if your ranges are static, however if dynamic ranges are required you can put the ranges into a table. Changes ranges then becomes a matter to managing the table. Not simple but does not require code updates. The query then reduces to just the Main part of the above:
select lower(range_set) range_start
, upper(range_set) - 1 range_end
, count(my_value) cnt
from range_tab r
left join my_table mt
on (mt.my_value <# r.range_set)
group by r.range_set
order by lower(r.range_set);
See demo for both here.

T-sql Percent calculation stuffed with WHERE clauses doesn't work

I have t-sql as follows:
SELECT (COUNT(Intakes.fk_ClientID) * 100) / (
SELECT count(*)
FROM INTAKES
WHERE Intakes.AdmissionDate >= #StartDate
)
FROM Intakes
WHERE Intakes.fk_ReleasedFromID = '1'
AND Intakes.AdmissionDate >= #StartDate;
I'm trying to get the percentage of clients who have releasedfromID = 1 out of a subset of clients who have a certain range of admission dates. But I get rows of 1's and 0's instead. Now, I can get the percentage if I take out the where clauses, it works:
SELECT (COUNT(Intakes.fk_ClientID) * 100) / (
SELECT count(*)
FROM INTAKES
)
FROM Intakes
WHERE Intakes.fk_ReleasedFromID = '1';
works fine. It selects ClientIDs where ReleasedFromID =1, multiplies it by 100 and divides by total rows in Intakes. But how do you run percentage with WHERE clauses as above?

After reading comment from #Anssssss
SELECT (COUNT(Intakes.fk_ClientID) * 100.0) / (
SELECT count(*)
FROM INTAKES
) 'percentage'
FROM Intakes
WHERE Intakes.fk_ReleasedFromID = '1';

How to calculate median in AWS Redshift?

Most databases have a built in function for calculating the median but I don't see anything for median in Amazon Redshift.
You could calculate the median using a combination of the nth_value() and count() analytic functions but that seems janky. I would be very surprised if an analytics db didn't have a built in method for computing median so I'm assuming I'm missing something.
http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_NTH_WF.html
http://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html

And as of 2014-10-17, Redshift supports the MEDIAN window function:
# select min(median) from (select median(num) over () from temp);
min
-----
4.0

Try the NTILE function.
You would divide your data into 2 ranked groups and pick the minimum value from the first group. That's because in datasets with an odd number of values, the first ntile will have 1 more value than the second. This approximation should work very well for large datasets.
create table temp (num smallint);
insert into temp values (1),(5),(10),(2),(4);
select num, ntile(2) over(order by num desc) from temp ;
num | ntile
-----+-------
10 | 1
5 | 1
4 | 1
2 | 2
1 | 2
select min(num) as median from (select num, ntile(2) over(order by num desc) from temp) where ntile = 1;
median
--------
4

I had difficulty with this also, but got some help from Amazon. Since the 2014-06-30 version of Redshift, you can do this with the PERCENTILE_CONT or PERCENTILE_DISC window functions.
They're slightly weird to use, as they will tack the median (or whatever percentile you choose) onto every row. You put that in a subquery and then take the MIN (or whatever) of the median column.
# select count(num), min(median) as median
from
(select num, percentile_cont (0.5) within group (order by num) over () as median from temp);
count | median
-------+--------
5 | 4.0
(The reason it's complicated is that window functions can also do their own mini-group-by and ordering to give you the median of many groups all at once, and other tricks.)
In the case of an even number of values, CONT(inuous) will interpolate between the two middle values, where DISC(rete) will pick one of them.

I typically use the NTILE function to split the data into two groups if I’m looking for an answer that’s close enough. However, if I want the exact median (e.g. the midpoint of an even set of rows), I use a technique suggested on the AWS Redshift Discussion Forum.
This technique orders the rows in both ascending and descending order, then if there is an odd number of rows, it returns the average of the middle row (that is, where row_num_asc = row_num_desc), which is simply the middle row itself.
CREATE TABLE temp (num SMALLINT);
INSERT INTO temp VALUES (1),(5),(10),(2),(4);
SELECT
AVG(num) AS median
FROM
(SELECT
num,
SUM(1) OVER (ORDER BY num ASC) AS row_num_asc,
SUM(1) OVER (ORDER BY num DESC) AS row_num_desc
FROM
temp) AS ordered
WHERE
row_num_asc IN (row_num_desc, row_num_desc - 1, row_num_desc + 1);
median
--------
4
If there is an even number of rows, it returns the average of the two middle rows.
INSERT INTO temp VALUES (9);
SELECT
AVG(num) AS median
FROM
(SELECT
num,
SUM(1) OVER (ORDER BY num ASC) AS row_num_asc,
SUM(1) OVER (ORDER BY num DESC) AS row_num_desc
FROM
temp) AS ordered
WHERE
row_num_asc IN (row_num_desc, row_num_desc - 1, row_num_desc + 1);
median
--------
4.5

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Finding ratio of grouped data in PostgreSQL - postgresql

Related

Finding the timeslot with the maximum decrease in count of nearby points

Calculating ratios in postgresql

Is there a smarter method to create series with different intervalls for count within a query?

T-sql Percent calculation stuffed with WHERE clauses doesn't work

How to calculate median in AWS Redshift?

Categories

Resources