Create function to compute an average of 3 values - postgresql

I'm trying to write a function in postgre sql to take an average across three columns. I have written the following function:
create function xcol_avg (col1, col2, col3)
returns numeric as $$
begin
return (coalesce(col1, 0) + coalesce(col2,0) +coalesce(col3, 0))/
case when (col 1 is null or col1 = 0 then 0 else 1 end +
case when (col 2 is null or col2 = 0 then 0 else 1 end +
case when (col 3 is null or col3 = 0 then 0 else 1 end;
end
What is the problem with my code? Also, is there a way to get the function to return null if it ends up dividing by 0? Any help is really appreciated.
Thanks!

Actually, you can make a function that will use a variable number of arguments and depending on their number compute the average. In Postgres there's a word VARIADIC for such things:
SQL functions can be declared to accept variable numbers of arguments, so long as all the "optional" arguments are of the same data type
Function code:
CREATE FUNCTION xcol_avg(numeric, VARIADIC numeric[])
RETURNS numeric
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
RETURN (SELECT AVG(vals) FROM unnest($2 || ARRAY[$1]) t(vals));
END;
$$;
Use case with different number of arguments:
select xcol_avg(1,6); -- returns 3.5
select xcol_avg(1,5.5,4); -- returns 3.5
select xcol_avg(1,2,3,4,5,6,7); -- returns 4
Click on this Button to try this online.
Explanation:
Marking a function as IMMUTABLE improves the execution time by allowing the optimizer to pre-evaluate the function. Immutable functions cannot modify the database and are guaranteed to always return the same results when called with the same input.
Declaring the last parameter of a function as VARIADIC which has to be of an array type lets you provide optional arguments that will be passed to the function as an array. Note that you don't explicitly write the array, you just list your parameters as you normally would.
unnest() is a function that returns a set of rows by expanding an array. In other words it's "unpacking" the array elements into separate rows
|| is an array operator that provides the array-to-array concatenation. Here it serves the purpose of connecting the first (required) argument with the rest given in a VARIADIC array.
AVG() is an aggregate function that computes an average of all input values. In our case it would take "unpacked" rows from a column named vals and compute the average.
With this solution you don't need to worry about dividing by zero, as at least one argument is required and avg() is doing the job you wanted to do manually by building up the denominator.
Apply it in a query:
This function would also work for computing an average of multiple columns in a row. Consider a table tbl with columns name, cost1, cost2, cost3 and below statement:
SELECT
name, cost1, cost2, cost3,
xcol_avg(cost1, cost2, cost3) AS average_cost
FROM tbl
For more general information about CREATE FUNCTION check the resourceful documentation.

Related

How to create a user-defined function in Postgresql?

I have the following query in Postgres. I want to have a function where the user can define the value of record_year and record_month using the function call, dynamically without having to specify in the query statement. That way the same query can be reused for different user input values for record_year and record_month (in this case). Can anybody help me out?
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y
ON x.sid = y.sid
WHERE x.record_time='23:50'
and x.record_year='2020'
and x.record_month='1'
group by x.station_id, x.record_year, x.record_month, y.addr;
The function call could be something like, select function_name(record_year, record_month);
As you are returning a result the function should be defined as returns table(). PL/pgSQL is not required if all you want to do is to return a result. A SQL function will be enough.
create function get_data(p_time time, p_year int, p_month int)
returns table (sid int, record_year int, record_month int, addr text)
as
$$
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y ON x.sid = y.sid
WHERE x.record_time p_time
and x.record_year = p_year
and x.record_month = p_month
group by x.station_id, x.record_year, x.record_month, y.addr
$$
language sql
stable;
You have to adjust the data types of the returned columns - I have only guessed them based on name.
Note that your full outer join is really a left join because of the conditions on table x
As it is a set returning function, use it like a table in the FROM clause:
select *
from get_data(time '23:50', 2020, 1);

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

Trying to create aggregate function in PostgreSQL

I'm trying to create new aggregate function in PostgreSQL to use instead of the sum() function
I started my journey in the manual here.
Since I wanted to create a function that takes an array of double precision values, sums them and then does some additional calculations I first created that final function:
takes double precision as input and gives double precision as output
DECLARE
v double precision;
BEGIN
IF tax > 256 THEN
v := 256;
ELSE
v := tax;
END IF;
RETURN v*0.21/0.79;
END;
Then I wanted to create the aggregate function that takes an array of double precision values and puts out a single double precision value for my previous function to handle.
CREATE AGGREGATE aggregate_ee_income_tax (float8[]) (
sfunc = array_agg
,stype = float8
,initcond = '{}'
,finalfunc = eeincometax);
What I get when I run that command is:
ERROR: function array_agg(double precision, double precision[]) does
not exist
I'm somewhat stuck here, because the manual lists array_agg() as existing function. What am I doing wrong?
Also, when I run:
\da
List of aggregate functions
Schema | Name | Result data type | Argument data types | Description
--------+------+------------------+---------------------+-------------
(0 rows)
My installation has no aggregate functions at all? Or does only list user defined functions?
Basically what I'm trying to understand:
1) Can I use an existing functions to sum up my array values?
2) How can I find out about input and ouptut data types of functions? Docs claim that array_agg() takes any kind of input.
3) What is wrong with my own aggregate function?
Edit 1
To give more information and clearer picture of what I'm trying to achieve:
I have one huge query over several tables which goes something like this:
SELECT sum(tax) ... from (SUBQUERY) as foo group by id
I want to replace that sum function with my own aggregate function so I don't have to do additional calculations on backend - since they can all be done on database level.
Edit 2
Accepted Ants's answer. Since final solution comes from comments I post it here for reference:
CREATE AGGREGATE aggregate_ee_income_tax (float8)
(
sfunc = float8pl
,stype = float8
,initcond = '0.0'
,finalfunc = eeincometax
);
Array agg is an aggregate function not a regular function, so it can't be used as a state transition function for a new aggregate. What you want to do is to create an aggregate function which has a state transition function that is identical to array_agg and a custom final func.
Unfortunately the state transition function of array_agg is defined in terms of an internal datatype so it can't be reused. Fortunately there is an existing function in core that already does what you want.
CREATE AGGREGATE aggregate_ee_income_tax (float8)(
sfunc = array_append,
stype = float8[],
initcond = '{}',
finalfunc = eeincometax);
Also note that you had your types mixed up, you probably want aggregate a set of floats to an array, not a set of arrays to a float.
In addition to #Ants excellent advice:
1.) Your final function could be simplified to:
CREATE FUNCTION eeincometax(float8)
RETURNS float8 LANGUAGE SQL AS
$func$
SELECT (least($1, 256) * 21) / 79
$func$;
2.) It seems like you are dealing with money? In this case I would strongly advise to use the type numeric (preferred) or money for the purpose. Floating point operations are often not precise enough.
3.) The initial condition of the aggregate can simply be just 0:
CREATE AGGREGATE aggregate_ee_income_tax(float8)
(
sfunc = float8pl
,stype = float8
,initcond = 0
,finalfunc = eeincometax
);
4.) In your case (least(sum(tax), 256) * 21) / 79 is probably faster than your custom aggregate. Aggregate functions provided by PostgreSQL are written in C and optimized for performance. I would use that instead.

calculation in postgresql function

I'm trying to create function that serves as replacement for postgresql sum function in my query.
query itself it LOOONG and something like this
SELECT .... sum(tax) as tax ... from (SUBQUERY) as bar group by id;
I want to replace this sum function with my own which alters the end value by tax calculation.
So I created pgsql function like this:
DECLARE
v double precision;
BEGIN
IF val > 256 THEN
v := 256;
ELSE
v := val;
END IF;
RETURN (v*0.21)/0.79;
END;
That function uses double precision as input. Using this function gives me error though:
column "bar.tax" must appear in the GROUP BY clause or be used in an aggregate function
So I tried to use double precision[] as input type for function - in that case the error was telling me that there was no postgresql function matching the name and given input type.
So is there a way to replace sum function in my query by using pgsql functions or not - I don't want to change any other parts of my query unless I really, Really, REALLY have to.
Look into CREATE AGGREGATE for aggregate functions. And in your expression you can do something like sum(LEAST(tax,256)*21/79) if I read you right.

SQL invalid conversion return null instead of throwing error

I have a table with a varchar column, and I want to find values that match a certain number. So lets say that column contains the following entries (except with millions of rows in real life):
123456789012
2345678
3456
23 45
713?2
00123456789012
So I decide I want all the rows which are numerically 123456789012 write a statement that looks something like this:
SELECT * FROM MyTable WHERE CAST(MyColumn as bigint) = 123456789012
It should return the first and last row, but instead the whole query blows up because it can't convert the "23 45" and "713?2" to bigint.
Is there another way to do the conversion that will return NULL for values that can't convert?
SQL Server does NOT guarantee boolean operator short-circuit, see On SQL Server boolean operator short-circuit. So all solution using ISNUMERIC(...) AND CAST(...) are fundamentally flawed (they may work, but hey can arbitrarily fail later dependiong on the generated plan). A better solution is using CASE, as Thomas suggests: CASE ISNUMERIC(...) WHEN 1 THEN CAST(...) ELSE NULL END. But, as gbn pointed out, ISNUMERIC is notoriously finicky in identifying what 'numeric' means and many cases where one would expect it to return 0 it returns 1. So mixing the CASE with the LIKE:
CASE WHEN MyRow NOT LIKE '%[^0-9]%' THEN CAST(MyRow as bigint) ELSE NULL END
But the real problem is that if you have millions of rows and you have to search them like this, you'll always end up scanning end-to-end since the expression is not SARG-able (no matter how we rewrite it). The real issue here is data purity, and should be addressed at the appropriate level, where the data is populated. Another thing to consider is if is possible to create a persisted computed column with this expression and create a filtered index on it which eliminates NULL (ie. non-numeric). That would speed up things a little.
If you are using SQL Server 2012 you can use the 2 new methods:
TRY_CAST()
TRY_CONVERT()
Both methods are equivalent. They return a value cast to the specified data type if the cast succeeds; otherwise, returns null. The only difference is that CONVERT is SQL Server specific, CAST is ANSI. using CAST will make your code more portable (although not sure if any other database provider implements TRY_CAST)
ISNUMERIC will accept empty string and values like 1.23 or 5E-04 so could be unreliable.
And you don't know what order things will be evaluated in so it could still fail (SQL is declarative, not procedural, so the WHERE clause probably won't be evaluated left to right)
So:
you want to accept value that consist only of the characters 0-9
you need to materialise the "number" filter so it's applied before CAST
Something like:
SELECT
*
FROM
(
SELECT TOP 2000000000 *
FROM MyTable
WHERE MyColumn NOT LIKE '%[^0-9]%' --double negative rejects anything except 0-9
ORDER BY MyColumn
) foo
WHERE
CAST(MyColumn as bigint) = 123456789012 --applied after number check
Edit: quick example that fails.
CREATE TABLE #foo (bigintstring varchar(100))
INSERT #foo (bigintstring )VALUES ('1.23')
INSERT #foo (bigintstring )VALUES ('1 23')
INSERT #foo (bigintstring )VALUES ('123')
SELECT * FROM #foo
WHERE
ISNUMERIC(bigintstring) = 1
AND
CAST(bigintstring AS bigint) = 123
SELECT *
FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as float) = 123456789012
The ISNUMERIC() function should give you what you need.
SELECT * FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as bigint) = 123456789012
And to add a case statement like Thomas suggested:
SELECT * FROM MyTable
WHERE CASE(ISNUMERIC(MyRow)
WHEN 1 THEN CAST(MyRow as bigint)
ELSE NULL
END = 123456789012
http://msdn.microsoft.com/en-us/library/ms186272.aspx
SELECT *
FROM MyTable
WHERE (ISNUMERIC(MyColumn) = 1) AND (CAST(MyColumn as bigint) = 123456789012)
Additionally you can use a CASE statement in order to get null values.
SELECT
CASE
WHEN (ISNUMERIC(MyColumn) = 1) THEN CAST(MyColumn as bigint)
ELSE NULL
END AS 'MyColumnAsBigInt'
FROM tableName
If you require additional filtering, for numerics which are not valid to be cast to bigint, you can use the following instead of ISNUMERIC:
PATINDEX('%[^0-9]%',MyColumn)) = 0
If you need decimal values instead of integers, cast to float instead and change the regex to '%[^0-9.]%'