Can someone help me figure out Oracle's (10g) AND/OR short circuitry? - oracle10g

Consider the following query and notice the CALCULATE_INCENTIVE function:
SELECT EMP.* FROM EMPLOYEES EMPS
WHERE
EMP.STATUS = 1 AND
EMP.HIRE_DATE > TO_DATE('1/1/2010') AND
EMP.FIRST_NAME = 'JOHN' AND
CALCULATE_INCENTIVE(EMP.ID) > 1000
ORDER BY EMPS.ID DESC;
I was under the impression that Oracle uses the same (or similar) short-circuitry that .NET uses in its and/or logic. For example, if EMP.STATUS = 2, it won't bother evaluating the rest of the expression since the entire expression would return false anyway.
In my case, the CALCULATE_INCENTIVE function is being called on every employee in the db rather than only on the 9 records that the first three WHERE expressions return. I've even tried putting parenthesis around the specific expressions that I want to group together for short-circuit evaluation, but I can't figure it out.
Anyone have any ideas how to get the CALCULATE_INCENTIVE not to be evaluated if any of the previous expressions return false?

One way is to put the primary criteria into a subquery that Oracle can't optimize away, then put the secondary criteria into the outer query. The easiest way to ensure that Oracle doesn't optimize out the subquery is to include rownum in the select statement:
SELECT * FROM (
SELECT EMP.*, ROWNUM
FROM EMPLOYEES EMPS
WHERE
EMP.STATUS = 1
AND EMP.HIRE_DATE > TO_DATE('1/1/2010')
AND EMP.FIRST_NAME = 'JOHN')
WHERE CALCULATE_INCENTIVE(ID) > 1000
ORDER BY EMPS.ID DESC;

Oracle supports short-circuit evaluation in PL/SQL. In SQL, however, the optimizer is free to evaluate the predicates in whatever order it desires, to push predicates into views and subqueries, and to otherwise transform the SQL statement as it sees fit. This means that you should not rely on predicates being applied in a particular order and makes the order predicates appear in the WHERE clause essentially irrelevant. The indexes that are available, the optimizer statistics that are present, the optimizer parameters, and system statistics are all vastly more important than the order of predicates in the WHERE clause.
In PL/SQL, for example, you can demonstrate this with a function that throws an error if it's actually called.
SQL> ed
Wrote file afiedt.buf
1 create function throw_error( p_parameter IN NUMBER )
2 return number
3 as
4 begin
5 raise_application_error( -20001, 'The function was called' );
6 return 1;
7* end;
SQL> /
Function created.
SQL> ed
Wrote file afiedt.buf
1 declare
2 l_num NUMBER;
3 begin
4 l_num := 1;
5 if( l_num = 2 and throw_error( l_num ) = 2 )
6 then
7 null;
8 else
9 dbms_output.put_line( 'Short-circuited the AND' );
10 end if;
11 if( l_num = 1 or throw_error( l_num ) = 2 )
12 then
13 dbms_output.put_line( 'Short-circuited the OR' );
14 end if;
15* end;
16 /
Short-circuited the AND
Short-circuited the OR
PL/SQL procedure successfully completed.
In SQL, on the other hand, the order of operations is determined by the optimizer, not by you, so the optimizer is free to short-circuit or not short-circuit however it wants. Jonathan Gennick has a great article Subquery Madness! that discusses this in some detail. In your particular case, if you had a composite index on (FIRST_NAME, HIRE_DATE, STATUS) along with appropriate statistics, the optimizer would almost certainly use the index to evaluate the first three conditions and then only call the CALCULATE_INCENTIVE function for the ID's that met the other three criteria. If you created a function-based index on CALCULATE_INCENTIVE(id), the optimizer would likely use that rather than calling the function at all at runtime. But the optimizer would be perfectly free to decide to call the function for every row in either case if it decided that it would be more efficient to do so.

Related

Create function to compute an average of 3 values

I'm trying to write a function in postgre sql to take an average across three columns. I have written the following function:
create function xcol_avg (col1, col2, col3)
returns numeric as $$
begin
return (coalesce(col1, 0) + coalesce(col2,0) +coalesce(col3, 0))/
case when (col 1 is null or col1 = 0 then 0 else 1 end +
case when (col 2 is null or col2 = 0 then 0 else 1 end +
case when (col 3 is null or col3 = 0 then 0 else 1 end;
end
What is the problem with my code? Also, is there a way to get the function to return null if it ends up dividing by 0? Any help is really appreciated.
Thanks!
Actually, you can make a function that will use a variable number of arguments and depending on their number compute the average. In Postgres there's a word VARIADIC for such things:
SQL functions can be declared to accept variable numbers of arguments, so long as all the "optional" arguments are of the same data type
Function code:
CREATE FUNCTION xcol_avg(numeric, VARIADIC numeric[])
RETURNS numeric
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
RETURN (SELECT AVG(vals) FROM unnest($2 || ARRAY[$1]) t(vals));
END;
$$;
Use case with different number of arguments:
select xcol_avg(1,6); -- returns 3.5
select xcol_avg(1,5.5,4); -- returns 3.5
select xcol_avg(1,2,3,4,5,6,7); -- returns 4
Click on this Button to try this online.
Explanation:
Marking a function as IMMUTABLE improves the execution time by allowing the optimizer to pre-evaluate the function. Immutable functions cannot modify the database and are guaranteed to always return the same results when called with the same input.
Declaring the last parameter of a function as VARIADIC which has to be of an array type lets you provide optional arguments that will be passed to the function as an array. Note that you don't explicitly write the array, you just list your parameters as you normally would.
unnest() is a function that returns a set of rows by expanding an array. In other words it's "unpacking" the array elements into separate rows
|| is an array operator that provides the array-to-array concatenation. Here it serves the purpose of connecting the first (required) argument with the rest given in a VARIADIC array.
AVG() is an aggregate function that computes an average of all input values. In our case it would take "unpacked" rows from a column named vals and compute the average.
With this solution you don't need to worry about dividing by zero, as at least one argument is required and avg() is doing the job you wanted to do manually by building up the denominator.
Apply it in a query:
This function would also work for computing an average of multiple columns in a row. Consider a table tbl with columns name, cost1, cost2, cost3 and below statement:
SELECT
name, cost1, cost2, cost3,
xcol_avg(cost1, cost2, cost3) AS average_cost
FROM tbl
For more general information about CREATE FUNCTION check the resourceful documentation.

sp_executesql vs user defined scalar function

In the table below I am storing some conditions like this:
Then, generally, in second table, I am having the following records:
and what I need is to compare these values using the right condition and store the result ( let's say '0' for false, and '1' for true in additional column).
I am going to do this in a store procedure and basically I am going to compare from several to hundreds of records.
What of the possible solution is to use sp_executesql for each row building dynamic statements and the other is to create my own scalar function and to call it for eacy row using cross apply.
Could anyone tell which is the more efficient way?
Note: I know that the best way to answer this is to make the two solutions and test, but I am hoping that there might be answered of this, based on other stuff like caching and SQL internal optimizations and others, which will save me a lot of time because this is only part of a bigger problem.
I don't see the need in use of sp_executesql in this case. You can obtain result for all records at once in a single statement:
select Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
and update additional column with something like:
;with cte as (
select t.AdditionalColumn, Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
)
update cte
set AdditionalColumn = Result
If above logic is supposed to be applied in many places, not just over one table, then yes you may think about function. Though I would used rather inline table-valued function (not scalar), because of there is overhead imposed with use of user defined scalar functions (to call and return, and the more rows to be processed the more time wastes).
create function ftComparison
(
#v1 float,
#v2 float,
#cType int
)
returns table
as return
select
Result = case
when ct.Abbreviation='=' and #v1=#v2 then 1
when ct.Abbreviation='>' and #v1>#v2 then 1
when ct.Abbreviation='>=' and #v1>=#v2 then 1
when ct.Abbreviation='<=' and #v1<=#v2 then 1
when ct.Abbreviation='<>' and #v1<>#v2 then 1
when ct.Abbreviation='<' and #v1<#v2 then 1
else 0
end
from ConditionType ct
where ct.ID = #cType
which can be applied then as:
select f.Result
from YourTable t
cross apply ftComparison(ValueOne, ValueTwo, t.ConditionTypeID) f
or
select f.Result
from YourAnotherTable t
cross apply ftComparison(SomeValueColumn, SomeOtherValueColumn, #someConditionType) f

UDTF returning a Table on DB2 V5R4 with Dynamic SQL

I must to write a UDF returning a Table. I’ve done it with Static SQL.
I’ve created Procedures preparing a Dynamic and Complex SQL sentence and returning a cursor.
But now I must to create a UDF with Dynamic SQL and return a table to be used with an IN clause inside other select.
It is possible on DB2 v5R4? Do you have an example?
Thanks in advance...
I don't have V5R4, but I have i 6.1 and V5R3. I have a 6.1 example, and I poked around in V5R3 to find how to make the same example work there. I can't guarantee V5R4, but this ought to be extremely close. Generating the working V5R3 code into 'Run SQL Scripts' gives this:
DROP SPECIFIC FUNCTION SQLEXAMPLE.DYNTABLE ;
SET PATH "QSYS","QSYS2","SYSPROC","SYSIBMADM","SQLEXAMPLE" ;
CREATE FUNCTION SQLEXAMPLE.DYNTABLE (
SELECTBY VARCHAR( 64 ) )
RETURNS TABLE (
CUSTNBR DECIMAL( 6, 0 ) ,
CUSTFULLNAME VARCHAR( 12 ) ,
CUSTBALDUE DECIMAL( 6, 0 ) )
LANGUAGE SQL
NO EXTERNAL ACTION
MODIFIES SQL DATA
NOT FENCED
DISALLOW PARALLEL
CARDINALITY 100
BEGIN
DECLARE DYNSTMT VARCHAR ( 512 ) ;
DECLARE GLOBAL TEMPORARY TABLE SESSION.TCUSTCDT
( CUSTNBR DECIMAL ( 6 , 0 ) NOT NULL ,
CUSTNAME VARCHAR ( 12 ) ,
CUSTBALDUE DECIMAL ( 6 , 2 ) )
WITH REPLACE ;
SET DYNSTMT = 'INSERT INTO Session.TCustCDt SELECT t2.CUSNUM , (t2.INIT CONCAT '' '' CONCAT t2.LSTNAM) as FullName , t2.BALDUE FROM QIWS.QCUSTCDT t2 ' CONCAT CASE WHEN SELECTBY = '' THEN '' ELSE SELECTBY END ;
EXECUTE IMMEDIATE DYNSTMT ;
RETURN SELECT * FROM SESSION . TCUSTCDT ;
END ;
COMMENT ON SPECIFIC FUNCTION SQLEXAMPLE.DYNTABLE
IS 'UDTF returning dynamic table' ;
And in 'Run SQL Scripts', the function can be called like this:
SELECT t1.* FROM TABLE(sqlexample.dyntable('WHERE STATE = ''TX''')) t1
The example is intended to work over IBM's sample QCUSCDT table in library QIWS. Most systems will have that table available. The table function returns values from two QCUSCDT columns, CUSNUM and BALDUE, directly through two of the table function's columns, CUSTNBR and CUSTBALDUE. The third table function column, CUSTFULLNAME, gets its value by a concatenation of INIT and LSTNAM from QCUSTCDT.
However, the part that apparently relates to the question is the SELECTBY parameter of the function. The usage example shows that a WHERE clause is passed in and used to help built a dynamic 'INSERT INTO... SELECT...statement. The example shows that rows containingSTATE='TX'` will be returned. A more complex clause could be passed in or the needed condition(s) could be retrieved from somewhere else, e.g., from another table.
The dynamic statement inserts rows into a GLOBAL TEMPORARY TABLE named SESSION.TCUSTCDT. The temporary table is defined in the function. The temporary column definitions are guaranteed (by the developer) to match the 'RETURNS TABLE` columns of the table function because no dynamic changes can be made to any of those elements. This allows SQL to handle reliably columns returned from the function, and that lets it compile the function.
The RETURN statement simply returns whatever rows are in the temporary table after the dynamic statement completes.
The various field definitions take into account the somewhat unusual definitions in the QCUSTCDT file. Those don't make great sense, but they're useful enough.

Multiple ordering options

It is very common for a web page to have multiple ordering options for a table. Right now I have a case where there are 12 options (ordenable columns). The easiest (that I know of) way to do it is to build the SQL query concatenating strings. But I'm wondering if it is the best approach. The string concatenation is something like this (python code):
order = {
1: "c1 desc, c2",
2: "c2, c3",
...
12: "c10, c9 desc"
}
...
query = """
select c1, c2
from the_table
order by %(order)s
"""
...
cursor.execute(query, {'order': AsIs(order[order_option])})
...
My alternative solution until now is to place a series of cases in the order by clause:
select c1, c2
from the_table
order by
case %(order_option)s
when 1 then array[c1 * -1, c2]
when 2 then array[c2, c3]
else [0.0, 0.0]
end
,
case %(order_option)s
when 3 then c4
else ''
end
,
...
,
case when %(order_option)s < 1 or %(order_option)s > 12 then c5 end
;
What is the best practice concerning multiple ordering choices? What happens with index utilization in my alternative code?
First of all, #order is not valid PostgreSQL syntax. You probably borrowed the syntax style from MS SQL Server or MySQL. You cannot use variables in a plain SQL query like that.
In PostgreSQL you would probably create a function. You can use variables there, just drop the #.
Sorting by ARRAY is generally rather slow - and not necessary in your case. You could simplify to:
ORDER BY
CASE _order
WHEN 1 THEN c2
WHEN 2 THEN c3 * -1
ELSE NULL -- undefined!
END
, c1
However, a CASE expression like this cannot use plain indexes. So, if you are looking for performance, one way (of several) would be a plpgsql function like this:
CREATE OR REPLACE FUNCTION foo(int)
RETURNS TABLE(c1 int, c2 int) AS
$BODY$
BEGIN
CASE $1
WHEN 1 THEN
RETURN QUERY
SELECT t.c1, t.c2
FROM tbl t
ORDER BY t.c2, t.c1;
WHEN 2 THEN
RETURN QUERY
SELECT t.c1, t.c2
FROM tbl t
ORDER BY t.c3 DESC, t.c1;
ELSE
RAISE WARNING 'Unexpected parameter: "%"', $1;
END CASE;
END;
$BODY$
LANGUAGE plpgsql STABLE;
This way, even plain indexes can be used.
If you actually only have two alternatives for ORDER BY, you could also just write two
functions.
Create multi-column indexes on (c2, c1) and (c3 DESC, c1) for maximum performance. But be aware that maintaining indexes carries a cost, too, especially if your table sees a lot of write operations.
Additional answer for rephrased question
As I said, the CASE construct will not use plain indexes. Indexes on expressions would be an option, but what you have in your example is outside the scope.
So, if you want performance, build the query in your app (your first approach) or write a server side function (possibly with dynamic SQL and EXECUTE) that does something similar inside PostgreSQL. The WHERE clause with a complex CASE statement works, but is slower.

Unexpected SQL results: string vs. direct SQL

Working SQL
The following code works as expected, returning two columns of data (a row number and a valid value):
sql_amounts := '
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( '|| id || ', 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken';
FOR r, amount IN EXECUTE sql_amounts LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Non-Working SQL
The following code does not work as expected; the first column is a row number, the second column is NULL.
FOR r, amount IN
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( id, 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken
LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Question
Why does the non-working code return a NULL value for the second column when the query itself returns two valid columns? (This question is mostly academic; if there is a way to express the query without resorting to wrapping it in a text string, that would be great to know.)
Full Code
http://pastebin.com/hgV8f8gL
Software
PostgreSQL 8.4
Thank you.
The two statements aren't strictly equivalent.
Assuming id = 4, the first one gets planned/prepared on each pass, and behaves like:
prepare dyn_stmt as '... x_function( 4, 25 ) ...'; execute dyn_stmt;
The other gets planned/prepared on the first pass only, and behaves more like:
prepare stc_stmt as '... x_function( $1, 25 ) ...'; execute stc_stmt(4);
(The loop will actually make it prepare a cursor for the above, but that's besides the point for our sake.)
A number of factors can make the two yield different results.
Search path changes before calling the procedure will be ignored by the second call. In particular if this makes x_table point to something different.
Constants of all kinds and calls to immutable functions are "hard-wired" in the second call's plan.
Consider this as an illustration of these side-effects:
deallocate all;
begin;
prepare good as select now();
prepare bad as select current_timestamp;
execute good; -- yields the current timestamp
execute bad; -- yields the current timestamp
commit;
execute good; -- yields the current timestamp
execute bad; -- yields the timestamp at which it was prepared
Why the two aren't returning the same results in your case would depend on the context (you only posted part of your pl/pgsql function, so it's hard to tell), but my guess is you're running into a variation of the above kind of problem.
From Tom Lane:
I think the problem is that you're assuming "amount" will refer to a table column of the query, when actually it's a local variable of the plpgsql function. The second interpretation will take precedence unless you qualify the column reference with the table's name/alias.
Note: PG 9.0 will throw an error by default when there is an ambiguity of this type.