In PostgreSQL, if we have such a query:
SELECT --
FROM --
WHERE --
GROUP BY --
HAVING
func1 AND func2;
I think there could be three strategy in planner:
func1 first perform on target list then func2 on same target list
func1 first perform on target list, generate a smaller result set, and then func2 perform on the small result set
suppose func1 cost c1, func2 cost c2, and c1>c2, func2 first perform on target list , generate a smaller result set, and then func1 perform on the small result set
which one is the actually approach in PostgreSQL?
If either func is a non-aggregate and non-VOLATILE expression, the planner may effectively move it to the WHERE clause.
Otherwise, (func1 AND func2) would be applied as a single filter expression on the resulting groups. At this point, the executor's lazy boolean evaluation rules kick in; if the first condition evaluates to false, it will not bother to execute the second. So the behaviour is closest to your second or third options, but performed in a single pass of the result set.
The order of evaluation is up to the planner, so in theory it may decide to execute func2 first. However, I'm not sure what might trigger this behaviour; even when func1 has a cost of 1000000000, it still seems to favour left-to-right evaluation.
The EXPLAIN ANALYSE output will show you where in the execution plan that these conditions are applied, and by adding some RAISE NOTICE statements to the body of the functions, you can observe the exact sequence of function calls.
Related
I have rewritten some of my database accessing code, in order to save some cycles. My main goal was to achieve as much server-side evaluation of my LINQ querys as possible.
In order to do so, i replaced this:
data = ...some LINQ...
if(condition){
data = data.Where(element => filter-condition)
}
with this:
data = ...some LINQ...
.Where(element => !condition || filter-condition)
condition in this case is an expression that does not depend on the current element. So you could say it is practically a constant during the whole query, as it always evaluates to true for all elements in data or it evaluates to false for all elements.
On the other hand filter-condition is an expression that depends on the current element, as you would expect from you usual Where clause condition.
This optimization works like a charm, because it enables server-side evaluation in SQL on the database, and the LINQ to SQL compiler is intelligent enough to even short-cirquit the generated SQL if my condition evaluates to false.
My question ist, what happens if this code is not evaluated in SQL on server-side. Lets say i would do the following:
data = ...some LINQ...
.AsEnumerable() //Enforces client-side query evaluation
.Where(element => !condition || filter-condition)
Now my Where clause gets evaluated client-side, which is not a problem on the functional side. Of course, the performance is weaker for client-side execution. But what about my custom optimization i did beforhand? Is there a performance penalty for evaluating condition for every element in my data sequence? Or is LINQ on client-side also intelligent enough, to short-circuit the "constant" expression condition?
Or is LINQ on client-side also intelligent enough, to short-circuit the "constant" expression condition?
Sort of. || always gets short-circuit evaluation in C#, but LINQ on the client does not have any sort of query optimizer, so the !condition || filter-condition predicate will be evaluated for every entity.
I have a SQL select statement that reads items. There are conditions for which items to display, but when one condition fails, I don't need to check the other.
For example:
where item like 'M%'
and item_class='B'
and fGetOnHand(item)>0
If either of the first 2 fail, i do NOT want to do the last one (a call to a user defined function).
From what I have read on this site, SQL Server's AND and OR operators do not follow short circuiting behavior. This means that the call to the UDF could happen first, or maybe not at all, should one of the other conditions happen first and fail.
We might be able to try rewriting your logic using a CASE expression, where the execution order is fixed:
WHERE
CASE WHEN item NOT LIKE 'M%' OR item_class <> 'B'
THEN 0
WHEN fGetOnHand(item) <= 0
THEN 0
ELSE 1 END = 1
The above logic forces the check on item and item_class to happen first. Should either fail, then the first branch of the CASE expression evaluates to 0, and the condition fails. Only if both these two checks pass would the UDF be evaluated.
This is very verbose, but if the UDF call is a serious penalty, then perhaps phrasing your WHERE clause as above would be worth the trade off of verbose code for better performance.
I'm querying a large historical database (HDB) to select records by alphabet character, then upserting the selected records to another kdb table...
pullRecords:{[x]select from `ts where sym like x}
pushRecords:{[x]`newTS upsert x}
The actual database contains millions of rows of records. If I were to run this simultaneously for each character it would result in an abort error since it requires more memory than what's available.
The ts and newTS tables, which I set up for testing are below. I also set up a metadata table called metaTable, which has a flag column to signal when the query has finished running...
ts:([]sym:1000000?`A`Ab`B`Bc`C`Ca`X`Xz`Y`Yx`Z`Zy;price:1000000?100.0;num:til 10000
newTS:([]sym:`$(); price:`float$(); amt:`int$())
metaTable:([id:`char$()]flag:`boolean$())
I'd like to stop the script from running, based on the value of the flag column. If a value of 1b is found, it means the script is locked on that character, and no other characters can run their queries until the lock is reset to a negative boolean. If all values are equal to 0b then the character checking the flag column acquires the lock (updates the value to 1b),and runs their functions. Once the queries have completed the lock will be reset.
What I'd like to do is the following...
(1) Declare 2 variables.
setflag:1b
resetflag:0b
(2) Check flag column in metaTable and set to 1b if 0b.
if[select flag from metaTable where id like "A*"=resetflag;update flag:setflag from metaTable where id="A";'"Flag set"]
if[select flag from metaTable;'"Flag already set for char "A""]
(2a) The above fails with a type error. I can store the select query in a variable and then index into the variable but this doesn't return the updated value once it's been set.
chkflg:select flag from metaTable
if[chkflg.flag[0];...]
(3) Run pullRecords query, count rows of data pulled for character, run pushRecords query.
if[select flag from metaTable;pulled::pullRecords["A*"];'"Pulling data"]
amt:count pulled
if[select flag from metaTable;pushRecords[pulled];'"Pushing data"]
(4) Check amount of data pulled from ts equals amount of data pushed to newTS. If so, update the flag in metaTable from 1b to 0b. Unlock the script and start process for next character.
if[amt~count select from newTS where sym like "A*";update flag:resetflag from `metaTable where id="A";'"Lock released"]
You are looking for something similar to the transaction behavior. You could do this without using metaTable. You can use global variables(or variables in namespaces) to serve as a lock.
Below is an example template to setup on the master service(service which is handling concurrent requests). Modify it according to your setup.
Define 2 global variables- lock(boolean) to serve as lock and lock_char to store the current locked character.
q) lock:0b
q) lock_char:""
Define a function which will first check if the lock can be acquired(lock value=0b). If yes then get the lock and perform rest of the operations else show the message and return.
q) transaction:{[ch] if[lock;show "Currently locked for character:",lock_ch;:0b];
/ else acquire lock and perform other operations
`lock set 1b; `lock_char set ch; s:ch,"*";
`newTs upsert t: select from ts where sym like s;
if[not count[t]=count select from newTs where sym like s;call_roolback_function[]];
/ reset lock
`lock set 0b; `lock_char set "";
:1b;
}
Call function:
q) transaction "A"
I want to create a function that takes an initial condition as an argument and uses a set of values to compute a final result. In my specific case (has to do with geometry processing in PostGIS), it's important that each member of the set is processed against the current (which might be the initial) state one at a time for keeping the result clean. (I need to deal with sliver and gap issues, and have had a very difficult time doing so any way other than one element at a time.) The processing I need to do is already defined as a function that takes two appropriate arguments (where the first can be the current state and the second can be a value from the set).
So I want something similar to what you would expect is intended by this:
SELECT my_func('some initial condition', my_table.some_column) FROM my_table;
Aggregates seem like a natural fit for this, but I can't figure out how to get the function to accept an initial state. An iterative approach in PL/pgSQL would be fairly straightforward:
CREATE FUNCTION my_func(initial sometype, values sometype[])
-- Returns, language, etc.
AS $$
DECLARE
current sometype := initial;
v sometype;
BEGIN
FOREACH v IN ARRAY values LOOP
current := SomeBinaryOperation(current, v);
END LOOP;
RETURN current;
END
$$
But it would require rolling the values up into an array manually:
SELECT my_func('some initial condition', ARRAY_AGG(my_table.some_column)) FROM my_table;
You can create aggregates with multiple arguments, but the arguments that follow the first one are used as additional arguments to the transition function. I can see no way that one of them could be turned into an initial condition. (At least not without a remarkably hacky function that treats its third argument as an initial condition if the first argument is NULL or similar. And that's only if the aggregate argument can be a constant instead of a column.)
Am I best off just using the PL/pgSQL iterative approach, or is there a way to create an aggregate that accepts its initial condition as an argument? Or is there something I haven't thought of?
I'm on PostgreSQL 9.3 at the moment, but upgrading may be an option if there's new stuff that would help.
I am calling one stored procedure from another, and the procedure I am calling has an output parameter. I am then piping the output value into a local variable. That's all well and good, but the problem is that this procedure also has a select statement in it, so when I exec, the results of the procedure are being returned in the final results set.
Is there a way to simply get the value of the output parameter, and ignore everything else?
While technically yes, you shouldn't do it. The engine consumes resources to produce the result set you ignore. You may also produce unnecessary contention. If you don't need the result set, you need another procedure that should only produce the output you desire.
I'm sure there are some tricks for doing this - but the obvious solution that springs to mind is:
INSERT INTO #my_rubbish_temp_table_that_i_CREATEd_earlier
EXEC dbo.mySproc #a, #b, #c OUTPUT
...as per Remus' response, this is a waste of CPU, I/O, etc.
If you can add an additional parameter to your stored procedure that allows the suppression of the resultset, that'd be grand.