How to divide COUNT(CASE ) by COUNT() - postgresql

I am trying to calculate a % by doing the following
(COUNT(CASE
WHEN col1 > 0
THEN my_id
ELSE null
END)/COUNT(my_id))*100 AS my_percent
The column, my_percent, which is output is a column of all zeros.
Individually both COUNTs return non-negative integers as expected, almost all are > 0.
COUNT(CASE
WHEN col1 > 0
THEN my_id
ELSE null
END) AS count_case
COUNT(my_id) AS simple_count
Why does the % function return zeros rather than positive numbers? How can I modify the code to give the expected output (positive numbers not zeros)?

count has a bigint return value, and PostgreSQL uses integer division that truncates fractional digits:
SELECT 7 / 3;
?column?
══════════
2
(1 row)
To avoid that, cast to double precision or numeric:
CAST(count(CASE WHEN col1 > 0 THEN my_id ELSE null END) AS double precision)
/
CAST(COUNT(my_id) AS double precision)
* 100 AS my_percent

Related

How can I find the percentage of records in a table in DB2?

I have a single table - drivers
I want to know what percentage of the driver's have been terminated this month compared to all the active drivers. I think I have made it very complicated and while it does return a result it is just a 0. I tried using the cast as decimal but that doesn't work for me either as the calculation still results in 0.
WITH X AS
(
SELECT
CAST(COUNT(*) AS DECIMAL (5,2)) TERMINATED,
0 ACTIVE
FROM DRIVER WHERE
MONTH(TERMINATION_DATE) = MONTH(CURRENT TIMESTAMP) AND YEAR(TERMINATION_DATE) = YEAR(CURRENT TIMESTAMP)
UNION ALL
SELECT
0 TERMINATED,
CAST(COUNT(*) AS FLOAT) ACTIVE
FROM DRIVER WHERE
ACTIVE_IN_DISP = 'True'
)
SELECT
CAST(SUM(TERMINATED)/(SUM(ACTIVE) + SUM(TERMINATED)*100) AS DECIMAL (10,2))
FROM X
Your parentheses are a bit off, instead of SUM(TERMINATED)/(SUM(ACTIVE) + SUM(TERMINATED)*100)
it should read
SUM(TERMINATED)/(SUM(ACTIVE) + SUM(TERMINATED))*100
You got a too small number, that's why it wouldn't fit into two decimal places.
I would rewrite Your query like this:
WITH begin_of_month(begin_of_month) AS (VALUES CURRENT DATE - (DAY(CURRENT DATE)-1) DAYS),
X AS (
SELECT COUNT(*) AS terminated, 0 AS active
FROM driver
WHERE termination_date >= begin_of_month
AND termination_date < begin_of_month + 1 MONTH --index, if present, can be used!
UNION ALL
SELECT 0 AS terminated, COUNT(*) AS active
FROM driver
WHERE active_in_disp = 'True'
)
SELECT CAST(FLOAT(SUM(TERMINATED))/(SUM(ACTIVE) + SUM(TERMINATED))*100 AS DECIMAL (10,2)) --CAST only if You wish to display with 2 decimal places
FROM X

postgresql: datatype numeric with limited digits

I am looking for numeric datatype with limited digits
(before and after the decimal point)
The function kills only digits after the decimal point. (PG version >= 13)
create function num_flex( v numeric, d int) returns numeric as
$$
select case when v=0 then 0
when v < 1 and v > -1 then trim_scale(round(v, d - 1 ) )
else trim_scale(round(v, d - 1 - least(log(abs(v))::int,d-1) ) ) end;
$$
language sql ;
For testing:
select num_flex( 0, 6)
union all
select num_flex( 1.22000, 6)
union all
select num_flex( (-0.000000123456789*10^x)::numeric,6)
from generate_series(1,15,3) t(x)
union all
select num_flex( (0.0000123456789*10^x)::numeric,6)
from generate_series(1,15,3) t(x) ;
It runs,
but have someone a better idea or find a bug (a situation, that is not implemented)?
The next step is to integrate this in PG, so that I can write
select 12.123456789::num_flex6 ;
select 12.123456789::num_flex7 ;
for a num_flex datatype with 6 or 7 digits.
with types from num_flex2 to num_flex9. Is this possible?
There are a few problems with your function:
Accepting negative digit counts (parameter d). num_flex(1234,-2) returns 1200 - you specified you want the function to only kill digits after decimal point, so 1234 would be expected.
Incorrect results between -1 and 1. num_flex(0.123,3) returns 0.12 instead of 0.123. I guess this might also be desired effect if you do want to count 0 to the left of decimal point. Normally, that 0 is ignored when a number's precision and scale are considered.
Your counting of digits to the left of decimal point is incorrect due to how ::int rounding works. log(abs(11))::int is 1 but log(abs(51))::int is 2. ceil(log(abs(v)))::int returns 2 in both cases, while keeping int type to still work as 2nd parameter in round().
create or replace function num_flex(
input_number numeric,
digit_count int,
is_counting_unit_zero boolean default false)
returns numeric as
$$
select trim_scale(
case
when input_number=0
then 0
when digit_count<=0 --avoids negative rounding
then round(input_number,0)
when (input_number between -1 and 1) and is_counting_unit_zero
then round(input_number,digit_count-1)
when (input_number between -1 and 1)
then round(input_number,digit_count)
else
round( input_number,
greatest( --avoids negative rounding
digit_count - (ceil(log(abs(input_number))))::int,
0)
)
end
);
$$
language sql;
Here's a test
select *,"result"="should_be"::numeric as "is_correct" from
(values
('num_flex(0.1234 ,4)',num_flex(0.1234 ,4), '0.1234'),
('num_flex(1.234 ,4)',num_flex(1.234 ,4), '1.234'),
('num_flex(1.2340000 ,4)',num_flex(1.2340000 ,4), '1.234'),
('num_flex(0001.234 ,4)',num_flex(0001.234 ,4), '1.234'),
('num_flex(123456 ,5)',num_flex(123456 ,5), '123456'),
('num_flex(0 ,5)',num_flex(0 ,5), '0'),
('num_flex(00000.00000 ,5)',num_flex(00000.00000 ,5), '0'),
('num_flex(00000.00001 ,5)',num_flex(00000.00001 ,5), '0.00001'),
('num_flex(12345678901 ,5)',num_flex(12345678901 ,5), '12345678901'),
('num_flex(123456789.1 ,5)',num_flex(123456789.1 ,5), '123456789'),
('num_flex(1.234 ,-4)',num_flex(1.234 ,4), '1.234')
) as t ("operation","result","should_be");
-- operation | result | should_be | is_correct
----------------------------+-------------+-------------+------------
-- num_flex(0.1234 ,4) | 0.1234 | 0.1234 | t
-- num_flex(1.234 ,4) | 1.234 | 1.234 | t
-- num_flex(1.2340000 ,4) | 1.234 | 1.234 | t
-- num_flex(0001.234 ,4) | 1.234 | 1.234 | t
-- num_flex(123456 ,5) | 123456 | 123456 | t
-- num_flex(0 ,5) | 0 | 0 | t
-- num_flex(00000.00000 ,5) | 0 | 0 | t
-- num_flex(00000.00001 ,5) | 0.00001 | 0.00001 | t
-- num_flex(12345678901 ,5) | 12345678901 | 12345678901 | t
-- num_flex(123456789.1 ,5) | 123456789 | 123456789 | t
-- num_flex(1.234 ,-4) | 1.234 | 1.234 | t
--(11 rows)
You can declare the precision (total number of digits) of your numeric data type in the column definition. Only digits after decimal point will be rounded. If there are too many digits before the decimal point, you'll get an error.
The downside is that numeric(n) is actually numeric(n,0), which is dictated by the SQL standard. So if by limiting the column's number of digits to 5 you want to have 12345.0 as well as 0.12345, there's no way you can configure numeric to hold both. numeric(5) will round 0.12345 to 0, numeric(5,5) will dedicate all digits to the right of decimal point and reject 12345.
create table test (numeric_column numeric(5));
insert into test values (12345.123);
table test;
-- numeric_column
------------------
-- 12345
--(1 row)
insert into test values (123456.123);
--ERROR: numeric field overflow
--DETAIL: A field with precision 5, scale 0 must round to an absolute value less than 10^5.

Update Numeric field without Decimal point and Zeros

I am trying to update a numeric field. But the field can not have zeros after decimal point. But the table that I am trying to pull values contain data as 87.00,90.00,100.00 etc.. How do I update without decimal point and zeros?
Example :percentage is a numeric field.
Update value available 100.00,90.00 etc.
update table1
set percent =(tmpercent as integer)
from table2
where table2.custid=table1.custoid;
;
gives error.
Table1:
CustID Percent(numeric)
1 90
2 80
Table2:
CustomID tmpPercent(varchar)
1 87.00
2 90.00
i often use typecasting ::FLOAT::NUMERIC to get rid of extra fraction zeros of numerics
or you can use TRUNC() function to force fraction truncation
try
update table1
set percent = tmpercent::FLOAT::NUMERIC
from table2
where table2.custid=table1.custoid;
or
update table1
set percent = TRUNC(tmpercent::NUMERIC)
from table2
where table2.custid=table1.custoid;
It is going to depend on how the numeric field is specified in the table. From here:
https://www.postgresql.org/docs/current/datatype-numeric.html
We use the following terms below: The precision of a numeric is the total count of significant digits in the whole number, that is, the number of digits to both sides of the decimal point. The scale of a numeric is the count of decimal digits in the fractional part, to the right of the decimal point. So the number 23.5141 has a precision of 6 and a scale of 4. Integers can be considered to have a scale of zero.
NUMERIC(precision, scale)
So if your field has a scale > 0 then you will see 0 to the right of the decimal point, unless you set scale to 0. As example:
create table numeric_test (num_fld numeric(5,2), num_fld_0 numeric(5,0));
insert into numeric_test (num_fld, num_fld_0) values ('90.0', '90.0');
select * from numeric_test ;
num_fld | num_fld_0
---------+-----------
90.00 | 90
insert into numeric_test (num_fld, num_fld_0) values ('90.5', '90.5');
select * from numeric_test ;
num_fld | num_fld_0
---------+-----------
90.00 | 90
90.50 | 91
insert into numeric_test (num_fld, num_fld_0) values ('90.0'::float, '90.0'::float);
select * from numeric_test ;
num_fld | num_fld_0
---------+-----------
90.00 | 90
90.50 | 91
90.00 | 90
Using scale 0 means you have basically created an integer field. If you have a scale > 0 then you are going to get decimals in the field.

TSQL - replace isnumeric = 0

I have a select statement and in that select statement I have a few columns on which I perform basic calculations (e.g. [Col1] * 3.14). However, occasionally I run into non-numeric values and when that happens, the whole stored procedure fails because of one row.
I've thought about using a WHERE ISNUMERIC(Col1) <> 0, but then I would be excluding information in the other columns.
Is there a way in TSQL to somehow replace all stings with NULL or 0??
Something like...
SELECT blah1, blah2, blah3
CASE WHEN ISNUMERIC(Col1) = 1 THEN [Col1] * 3.14 ELSE NULL END as whatever
FROM your_table
A case can also be made that..
The non-numeric values should be converted to numeric or NULL if that's what's expected in the column, and
If numbers are expected then the column should be a numeric data type in the first place and not a character data type, which allows for these types of errors.
I prefer Try_Cast:
SELECT
someValue
,TRY_CAST(someValue as int) * 3.14 AS TRY_CAST_to_int
,TRY_CAST(someValue as decimal) * 3.14 AS TRY_CAST_to_decimal
,IIF(ISNUMERIC(someValue) = 1, someValue, null) * 3.14 as IIF_IS_NUMERIC
FROM (values
( 'asdf'),
( '2' ),
( '1.55')
) s(someValue)
ISNUMERIC is a terrible way to do this, as there are far too many things that identify as NUMERIC which are not able to be multiplied by a non-MONEY data type.
https://www.brentozar.com/archive/2018/02/fifteen-things-hate-isnumeric/
This fails miserably, as '-' is a numeric...
DECLARE #example TABLE (numerics VARCHAR(10));
INSERT INTO #example VALUES ('-')
SELECT CASE WHEN ISNUMERIC(numerics) = 1 THEN numerics * 3.14 ELSE NULL END
FROM #example;
Try TRY_CAST instead (albeit amend your DECIMAL precision to suit your needs):
DECLARE #example TABLE (numerics VARCHAR(10));
INSERT INTO #example VALUES ('-')
SELECT TRY_CAST(numerics AS decimal(10,2)) * 3.14 FROM #example;
trycast will test for a specfic type
declare #T table (num varchar(20));
insert into #T values ('12'), ('3.14'), ('5.6E12'), ('$120'), ('-'), (''), ('cc'), ('aa'), ('bb'), ('1/5');
select t.num, ISNUMERIC(t.num) as isnumeric
, isnull(TRY_CONVERT(smallmoney, t.num), 0) as smallmoney
, TRY_CONVERT(float, t.num) as float
, TRY_CONVERT(decimal(18,4), t.num) as decimal
, isnull(TRY_CONVERT(smallmoney, t.num), TRY_CONVERT(float, t.num)) as mix
from #T t
num isnumeric smallmoney float decimal
-------------------- ----------- --------------------- ---------------------- ---------------------------------------
12 1 12.00 12 12.0000
3.14 1 3.14 3.14 3.1400
5.6E12 1 0.00 5600000000000 NULL
$120 1 120.00 NULL NULL
- 1 0.00 NULL NULL
0 0.00 0 NULL
cc 0 0.00 NULL NULL
aa 0 0.00 NULL NULL
bb 0 0.00 NULL NULL
1/5 0 0.00 NULL NULL
interesting the last still fails

how to get average that ignores outliers?

say I have a postgresql table with the following values:
id | value
----------
1 | 4
2 | 8
3 | 100
4 | 5
5 | 7
If I use postgresql to calculate the average, it gives me an average of 24.8 because the high value of 100 has great impact on the calculation. While in fact I would like to find an average somewhere around 6 and eliminate the extreme(s).
I am looking for a way to eliminate extremes and want to do this "statistically correct". The extreme's cannot be fixed. I cannot say; If a value is over X, it has to be eliminated.
I have been bending my head on the postgresql aggregate functions but cannot put my finger on what is right for me to use. Any suggestions?
Postgresql can also calculate the standard deviation.
You could take only the data points which are in the average() +/- 2*stddev() which would roughly correspond to the 90% datapoints closest to the average.
Of course 2 can also be 3 (95%) or 6 (99.995%) but do not get hung up on the numbers because in the presence of a collection outliers you are no longer dealing with a normal distribution.
Be very careful and validate that it works as expected.
I cannot say; If a value is over X, it has to be eliminated.
Well, you could use having and a subselect to eliminate outliers, something like:
HAVING value < (
SELECT 2 * avg(value)
FROM mytable
GROUP BY ...
)
(Or, for that matter, use a more complex version to eliminate anything above 2 or 3 standard deviations if you want something that will be better at eliminating only outliers.)
The other option is to look at generating a median value, which is a fairly statistically sound way of accounting for outliers; happily there are three reasonable examples of just that: one from the Postgresql Wiki, one built as an Oracle compatability layer, and another from the PostgreSQL Journal. Note the caveats around how precisely/accurately they implement medians.
Here's an aggregate function which will calculate the trimmed mean for a set of values, excluding values outside N standard deviations from the mean.
Example:
DROP TABLE IF EXISTS foo;
CREATE TEMPORARY TABLE foo (x FLOAT);
INSERT INTO foo VALUES (1);
INSERT INTO foo VALUES (2);
INSERT INTO foo VALUES (3);
INSERT INTO foo VALUES (4);
INSERT INTO foo VALUES (100);
SELECT avg(x), tmean(x, 2), tmean(x, 1.5) FROM foo;
-- avg | tmean | tmean
-- -----+-------+-------
-- 22 | 22 | 2.5
Code:
DROP TYPE IF EXISTS tmean_stype CASCADE;
CREATE TYPE tmean_stype AS (
deviations FLOAT,
count INT,
acc FLOAT,
acc2 FLOAT,
vals FLOAT[]
);
CREATE OR REPLACE FUNCTION tmean_sfunc(tmean_stype, float, float)
RETURNS tmean_stype AS $$
SELECT $3, $1.count + 1, $1.acc + $2, $1.acc2 + ($2 * $2), array_append($1.vals, $2);
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION tmean_finalfunc(tmean_stype)
RETURNS float AS $$
DECLARE
fcount INT;
facc FLOAT;
mean FLOAT;
stddev FLOAT;
lbound FLOAT;
ubound FLOAT;
val FLOAT;
BEGIN
mean := $1.acc / $1.count;
stddev := sqrt(($1.acc2 / $1.count) - (mean * mean));
lbound := mean - stddev * $1.deviations;
ubound := mean + stddev * $1.deviations;
-- RAISE NOTICE 'mean: % stddev: % lbound: % ubound: %', mean, stddev, lbound, ubound;
fcount := 0;
facc := 0;
FOR i IN array_lower($1.vals, 1) .. array_upper($1.vals, 1) LOOP
val := $1.vals[i];
IF val >= lbound AND val <= ubound THEN
fcount := fcount + 1;
facc := facc + val;
END IF;
END LOOP;
IF fcount = 0 THEN
return NULL;
END IF;
RETURN facc / fcount;
END;
$$ LANGUAGE plpgsql;
CREATE AGGREGATE tmean(float, float)
(
SFUNC = tmean_sfunc,
STYPE = tmean_stype,
FINALFUNC = tmean_finalfunc,
INITCOND = '(-1, 0, 0, 0, {})'
);
Gist (which should be identical): https://gist.github.com/4458294
Mind using the ntile window function. It allows you to easily isolate extreme values from the result set.
Let's say you want to cut 10% from both sides of the result set. Then passing the value of 10 to ntile and looking for values between 2 and 9 would give you the desired result. Keep also in mind that if you have less than 10 records, you might accidentally cut more than 20%, so be sure to check the total amount of records as well.
WITH yyy AS (
SELECT
id,
value,
NTILE(10) OVER (ORDER BY value) AS ntiled,
COUNT(*) OVER () AS counted
FROM
xxx)
SELECT
*
FROM
yyy
WHERE
counted < 10 OR ntiled BETWEEN 2 AND 9;
You can use IQR to filter outliers. PL/pgSQL code:
select percentile_cont(0.25) WITHIN GROUP (ORDER BY value)
into q1
from table;
select percentile_cont(0.75) WITHIN GROUP (ORDER BY value)
into q3
from table;
iqr := q3 - q1;
min := q1 - 1.5 * iqr;
max := q3 + 1.5 * iqr;
select value
into result
from table
where value >= min and value <= max;
return result;