Unexpected SQL results: string vs. direct SQL - postgresql

Working SQL
The following code works as expected, returning two columns of data (a row number and a valid value):
sql_amounts := '
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( '|| id || ', 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken';
FOR r, amount IN EXECUTE sql_amounts LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Non-Working SQL
The following code does not work as expected; the first column is a row number, the second column is NULL.
FOR r, amount IN
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( id, 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken
LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Question
Why does the non-working code return a NULL value for the second column when the query itself returns two valid columns? (This question is mostly academic; if there is a way to express the query without resorting to wrapping it in a text string, that would be great to know.)
Full Code
http://pastebin.com/hgV8f8gL
Software
PostgreSQL 8.4
Thank you.

The two statements aren't strictly equivalent.
Assuming id = 4, the first one gets planned/prepared on each pass, and behaves like:
prepare dyn_stmt as '... x_function( 4, 25 ) ...'; execute dyn_stmt;
The other gets planned/prepared on the first pass only, and behaves more like:
prepare stc_stmt as '... x_function( $1, 25 ) ...'; execute stc_stmt(4);
(The loop will actually make it prepare a cursor for the above, but that's besides the point for our sake.)
A number of factors can make the two yield different results.
Search path changes before calling the procedure will be ignored by the second call. In particular if this makes x_table point to something different.
Constants of all kinds and calls to immutable functions are "hard-wired" in the second call's plan.
Consider this as an illustration of these side-effects:
deallocate all;
begin;
prepare good as select now();
prepare bad as select current_timestamp;
execute good; -- yields the current timestamp
execute bad; -- yields the current timestamp
commit;
execute good; -- yields the current timestamp
execute bad; -- yields the timestamp at which it was prepared
Why the two aren't returning the same results in your case would depend on the context (you only posted part of your pl/pgsql function, so it's hard to tell), but my guess is you're running into a variation of the above kind of problem.

From Tom Lane:
I think the problem is that you're assuming "amount" will refer to a table column of the query, when actually it's a local variable of the plpgsql function. The second interpretation will take precedence unless you qualify the column reference with the table's name/alias.
Note: PG 9.0 will throw an error by default when there is an ambiguity of this type.

Related

In DB2 SQL, is it possible to set a variable in the SELECT statement to use multiple times..?

In DB2 SQL, is it possible to SET a variable with the contents of a returned field in the SELECT statement, to use multiple times for calculated fields and criteria further along in the same SELECT statement?
The purpose is to shrink and streamline the code, by doing a calculation once at the beginning and using it multiple times later on...including the HAVING, WHERE, and ORDER BY.
To be honest, I'm not sure this is possible in any version of SQL, much less DB2.
This is on an IBM iSeries 8202 with DB2 SQL v6, which unfortunately is not a candidate for upgrade at this time. This is a very old & messy database, which I have no control over. I must regularly include "cleanup functions" in my SQL.
To to clarify the question, note the following pseudocode. Actual working code follows further below.
DECLARE smnum INTEGER --Not sure if this is correct.
SELECT
-- This is where I'm not sure what to do.
SET CAST((CASE WHEN %smnum%='' THEN '0' ELSE %smnum% END) AS INTEGER) INTO smnum,
%smnum% AS sm,
invdat,
invno,
daqty,
dapric,
dacost,
(dapric-dacost)*daqty AS profit
FROM
saleshistory
WHERE
%smNum% = 30
ORDER BY
%smnum%
Below is my actual working SQL. When adjusted for 2017 or 2016, it can return >10K rows, depending on the salesperson. The complete table has >22M rows.
That buttload of CASE((CAST... function is what I wish to replace with a variable. This is not the only example of this. If I can make it work, I have many other queries that could benefit from the technique.
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
WHERE
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER)=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(TRIM(DACOMP) AS INTEGER)=1
ORDER BY
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER),
DAIDAT,
DAINV#,
DALIN#
Just use a subquery or CTE. I can't figure out the actual logic you want, but the structure looks like this:
select . . .
from (select d.*,
(CASE . . . END) as calc_field
from VIPDTAB.DAILYV d
) d
No variable declaration is needed.
Here is what your SQL would look like with the sub-query that Gordon suggested:
SELECT
DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(DAITEM AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
(SELECT
D.*,
CAST((CASE WHEN D.DASM#='' THEN '0' ELSE D.DASM# END) AS INTEGER) AS DASM
FROM VIPDTAB.DAILYV D
) D
WHERE
DASM=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(DACOMP AS INTEGER)=1
ORDER BY
DASM,
DAIDAT,
DAINV#,
DALIN#
Notice that I removed a lot of the trim() functions, and you could likely remove the rest. The way IBM resolves the Varchar vs. Char comparison thing is by ignoring trailing blanks. So trim(anything) = '' is the same as anything = ''. And since cast(' 123 ' as integer) = 123, I have removed trims from within the cast functions as well. In addition trim(dabsw) = 'B' is the same as dabsw = 'B' as long as the 'B' is the first character in dabsw. So you could even remove that trim if all you are concerned with is trailing blanks.
Here are some additional notes based on comments. The above paragraph is not talking about auto-trim. Fixed length fields will always return as fixed length fields, the trailing blanks will remain. But in comparisons and expressions where trailing blanks are unimportant, or even a hindrance, they are ignored. In expressions where trailing blanks are important, like concatenation, the trailing blanks are not ignored. Another thing, trim() removes both leading and trailing blanks. If you are using trim() to read a fixed length character field into a Varchar, then rtrim() is likely the better choice as it only removes the trailing blanks.
Also, I didn't go through your fields to make sure I got everything you need, I just used * in the sub-query. For performance, it would be best to only return the fields you need. So if you replace D.* with an actual field list, you can remove the correlation name in the from clause of the sub-query. But, the sub-query itself still needs a correlation clause.
My verification was done using IBM i v7.1.
You can encapsalate the case statement in a view. I even have the fancy profit calc in there for you to order by profit. Now the biggest issue you have is the CCSID on the view for calculated columns but that's another question.
create or replace view VIPDTAB.DAILYVQ as
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
now you can
select dasm, count(*) from vipdtab.dailyvq where dasm = 0 group by dasm order by dasm
or
select * from vipdtab.dailyvq order by profit desc

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

Postgresql function executed much longer than the same query

I'm using PostgreSQL 9.2.9 and have the following problem.
There are function:
CREATE OR REPLACE FUNCTION report_children_without_place(text, date, date, integer)
RETURNS TABLE (department_name character varying, kindergarten_name character varying, a1 bigint) AS $BODY$
BEGIN
RETURN QUERY WITH rh AS (
SELECT (array_agg(status ORDER BY date DESC))[1] AS status, request
FROM requeststatushistory
WHERE date <= $3
GROUP BY request
)
SELECT
w.name,
kgn.name,
COUNT(*)
FROM kindergarten_request_table_materialized kr
JOIN rh ON rh.request = kr.id
JOIN requeststatuses s ON s.id = rh.status AND s.sysname IN ('confirmed', 'need_meet_completion', 'kindergarten_need_meet')
JOIN workareas kgn ON kr.kindergarten = kgn.id AND kgn.tree <# CAST($1 AS LTREE) AND kgn.active
JOIN organizationforms of ON of.id = kgn.organizationform AND of.sysname IN ('state','municipal','departmental')
JOIN workareas w ON w.tree #> kgn.tree AND w.active
JOIN workareatypes mt ON mt.id = w.type AND mt.sysname = 'management'
WHERE kr.requestyear = $4
GROUP BY kgn.name, w.name
ORDER BY w.name, kgn.name;
END
$BODY$ LANGUAGE PLPGSQL STABLE;
EXPLAIN ANALYZE SELECT * FROM report_children_without_place('83.86443.86445', '14-04-2015', '14-04-2015', 2014);
Total runtime: 242805.085 ms.
But query from function's body executes much faster:
EXPLAIN ANALYZE WITH rh AS (
SELECT (array_agg(status ORDER BY date DESC))[1] AS status, request
FROM requeststatushistory
WHERE date <= '14-04-2015'
GROUP BY request
)
SELECT
w.name,
kgn.name,
COUNT(*)
FROM kindergarten_request_table_materialized kr
JOIN rh ON rh.request = kr.id
JOIN requeststatuses s ON s.id = rh.status AND s.sysname IN ('confirmed', 'need_meet_completion', 'kindergarten_need_meet')
JOIN workareas kgn ON kr.kindergarten = kgn.id AND kgn.tree <# CAST('83.86443.86445' AS LTREE) AND kgn.active
JOIN organizationforms of ON of.id = kgn.organizationform AND of.sysname IN ('state','municipal','departmental')
JOIN workareas w ON w.tree #> kgn.tree AND w.active
JOIN workareatypes mt ON mt.id = w.type AND mt.sysname = 'management'
WHERE kr.requestyear = 2014
GROUP BY kgn.name, w.name
ORDER BY w.name, kgn.name;
Total runtime: 2156.740 ms.
Why function executed so longer than the same query? Thank's
Your query runs faster because the "variables" are not actually variable -- they are static values (IE strings in quotes). This means the execution planner can leverage indexes. Within your stored procedure, your variables are actual variables, and the planner cannot make assumptions about indexes. For example - you might have a partial index on requeststatushistory where "date" is <= '2012-12-31'. The index can only be used if the $3 is known. Since it might hold a date from 2015, the partial index would be of no use. In fact, it would be detrimental.
I frequently construct a string within my functions where I concatenate my variables as literals and then execute the function using something like the following:
DECLARE
my_dynamic_sql TEXT;
BEGIN
my_dynamic_sql := $$
SELECT *
FROM my_table
WHERE $$ || quote_literal($3) || $$::TIMESTAMPTZ BETWEEN start_time
AND end_time;$$;
/* You can only see this if client_min_messages = DEBUG */
RAISE DEBUG '%', my_dynamic_sql;
RETURN QUERY EXECUTE my_dynamic_sql;
END;
The dynamic SQL is VERY useful because you can actually get an explain of the query when I have set client_min_messages=DEBUG; I can scrape the query from the screen and paste it back in after EXPLAIN or EXPLAIN ANALYZE and see what the execution planner is doing. This also allows you to construct very different queries as needed to optimize for variables (IE exclude unnecessary tables if warranted) and maintain a common API for your clients.
You may be tempted to avoid the dynamic SQL for fear of performance issues (I was at first) but you will be amazed at how LITTLE time is spent in planning compared to some of the cost of a couple of table scans on your seven-table join!
Good luck!
Follow-up: You might experiment with Common Table Expressions (CTEs) for performance as well. If you have a table that has a low signal-to-noise ratio (has many, many more records in it than you actually want to return) then a CTE can be very helpful. PostgreSQL executes CTEs early in the query, and materializes the resulting rows in memory. This allows you to use the same result set multiple times and in multiple places in your query. The benefit can really be surprising if you design it correctly.
sql_txt := $$
WITH my_cte as (
select fk1 as moar_data 1
, field1
, field2 /*do not need all other fields taking up RAM!*/
from my_table
where field3 between $$ || quote_literal(input_start_ts) || $$::timestamptz
and $$ || quote_literal(input_end_ts) || $$::timestamptz
),
keys_cte as ( select key_field
from big_look_up_table
where look_up_name = ANY($$ ||
QUOTE_LITERAL(input_array_of_names) || $$::VARCHAR[])
)
SELECT field1, field2, moar_data1, moar_data2
FROM moar_data_table
INNER JOIN my_cte
USING (moar_data1)
WHERE moar_data_table.moar_data_key in (select key_field from keys_cte) $$;
An execution plan is likely to show that it chooses to use an index on moar_data_tale.moar_data_key. This would appear to go against what I said above in my prior answer - except for the fact that the keys_cte results are materialized (and therefore cannot be changed by another transaction in a race-condition) -- you have your own little copy of the data for use in this query.
Oh - and CTEs can use other CTEs that are declared earlier in the same query. I have used this "trick" to replace sub-queries in very complex joins and seen great improvements.
Happy Hacking!

sp_executesql vs user defined scalar function

In the table below I am storing some conditions like this:
Then, generally, in second table, I am having the following records:
and what I need is to compare these values using the right condition and store the result ( let's say '0' for false, and '1' for true in additional column).
I am going to do this in a store procedure and basically I am going to compare from several to hundreds of records.
What of the possible solution is to use sp_executesql for each row building dynamic statements and the other is to create my own scalar function and to call it for eacy row using cross apply.
Could anyone tell which is the more efficient way?
Note: I know that the best way to answer this is to make the two solutions and test, but I am hoping that there might be answered of this, based on other stuff like caching and SQL internal optimizations and others, which will save me a lot of time because this is only part of a bigger problem.
I don't see the need in use of sp_executesql in this case. You can obtain result for all records at once in a single statement:
select Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
and update additional column with something like:
;with cte as (
select t.AdditionalColumn, Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
)
update cte
set AdditionalColumn = Result
If above logic is supposed to be applied in many places, not just over one table, then yes you may think about function. Though I would used rather inline table-valued function (not scalar), because of there is overhead imposed with use of user defined scalar functions (to call and return, and the more rows to be processed the more time wastes).
create function ftComparison
(
#v1 float,
#v2 float,
#cType int
)
returns table
as return
select
Result = case
when ct.Abbreviation='=' and #v1=#v2 then 1
when ct.Abbreviation='>' and #v1>#v2 then 1
when ct.Abbreviation='>=' and #v1>=#v2 then 1
when ct.Abbreviation='<=' and #v1<=#v2 then 1
when ct.Abbreviation='<>' and #v1<>#v2 then 1
when ct.Abbreviation='<' and #v1<#v2 then 1
else 0
end
from ConditionType ct
where ct.ID = #cType
which can be applied then as:
select f.Result
from YourTable t
cross apply ftComparison(ValueOne, ValueTwo, t.ConditionTypeID) f
or
select f.Result
from YourAnotherTable t
cross apply ftComparison(SomeValueColumn, SomeOtherValueColumn, #someConditionType) f

Recursive replace from a table of characters

In short, I am looking for a single recursive query that can perform multiple replaces over one string. I have a notion it can be done, but am failing to wrap my head around it.
Granted, I'd prefer the biz-layer of the application, or even the CLR, to do the replacing, but these are not options in this case.
More specifically, I want to replace the below mess - which is C&P in 8 different stored procedures - with a TVF.
SET #temp = REPLACE(RTRIM(#target), '~', '-')
SET #temp = REPLACE(#temp, '''', '-')
SET #temp = REPLACE(#temp, '!', '-')
SET #temp = REPLACE(#temp, '#', '-')
SET #temp = REPLACE(#temp, '#', '-')
-- 23 additional lines reducted
SET #target = #temp
Here is where I've started:
-- I have a split string TVF called tvf_SplitString that takes a string
-- and a splitter, and returns a table with one row for each element.
-- EDIT: tvf_SplitString returns a two-column table: pos, element, of which
-- pos is simply the row_number of the element.
SELECT REPLACE('A~B!C#D#C!B~A', MM.ELEMENT, '-') TGT
FROM dbo.tvf_SplitString('~-''-!-#-#', '-') MM
Notice I've joined all the offending characters into a single string separated by '-' (knowing that '-' will never be one of the offending characters), which is then split. The result from this query looks like:
TGT
------------
A-B!C#D#C!B-A
A~B!C#D#C!B~A
A~B-C#D#C-B~A
A~B!C-D-C!B~A
A~B!C#D#C!B~A
So, the replace clearly works, but now I want it to be recursive so I can pull the top 1 and eventually come out with:
TGT
------------
A-B-C-D-C-B-A
Any ideas on how to accomplish this with one query?
EDIT: Well, actual recursion isn't necessary if there's another way. I'm pondering the use of a table of numbers here, too.
You can use this in a scalar function. I use it to remove all control characters from some external input.
SELECT #target = REPLACE(#target, invalidChar, '-')
FROM (VALUES ('~'),(''''),('!'),('#'),('#')) AS T(invalidChar)
I figured it out. I failed to mention that the tvf_SplitString function returns a row number as "pos" (although a subquery assigning row_number could also have worked). With that fact, I could control cross join between the recursive call and the split.
-- the cast to varchar(max) matches the output of the TVF, otherwise error.
-- The iteration counter is joined to the row number value from the split string
-- function to ensure each iteration only replaces on one character.
WITH XX AS (SELECT CAST('A~B!C#D#C!B~A' AS VARCHAR(MAX)) TGT, 1 RN
UNION ALL
SELECT REPLACE(XX.TGT, MM.ELEMENT, '-'), RN + 1 RN
FROM XX, dbo.tvf_SplitString('~-''-!-#-#', '-') MM
WHERE XX.RN = MM.pos)
SELECT TOP 1 XX.TGT
FROM XX
ORDER BY RN DESC
Still, I'm open to other suggestions.