PostgreSQL Recursive Query Performance - postgresql

I am a noob when it comes to PostgreSQL, but I was able to get it to produce what I needed it to do which was to take a hierarchy that was up to 30 levels deep, and create a flattened list 'jagged' listview with the topmost level of and every intervening level to each end node. The recursive function, just pushes every parent found, into an array, and then returns the final flattened list for each node using (LIMIT 1)
The following bit of SQL generates the table I need. My question is whether my function that returns the array of values that I use populate the row-columns is called once per row, or is called once for each of the 30 columns in each row.
Can someone guide me to how I would determine that? And/or if it is blatantly obvious that my SQL is inefficient, what might be a better way of putting together the statements.
Thanks in advance for having a look.
DROP FUNCTION IF EXISTS fnctreepath(nodeid NUMERIC(10,0));
CREATE FUNCTION fnctreepath(nodeid NUMERIC(10,0))
RETURNS TABLE (endnode NUMERIC, depth INTEGER, path NUMERIC[]) AS
$$
WITH RECURSIVE ttbltreepath(endnode, nodeid, parentid, depth, path) AS (
SELECT src.nodeid AS endnode, src.nodeid, src.parentid, 1::INT AS depth,
ARRAY[src.nodeid::NUMERIC(10,0)]::NUMERIC(10,0)[] AS path
FROM tree AS src WHERE nodeid = $1
UNION
SELECT ttbl.endnode, src.nodeid, src.parentid, ttbl.depth + 1 AS depth,
ARRAY_PREPEND(src.nodeid::NUMERIC(10,0), ttbl.path::NUMERIC(10,0)[])::NUMERIC(10,0)[] AS path
FROM tree AS src, ttbltreepath AS ttbl WHERE ttbl.parentid = src.nodeid
)
SELECT endnode, depth, path FROM ttbltreepath GROUP BY endnode, depth, path ORDER BY endnode, depth DESC LIMIT 1;
$$ LANGUAGE SQL;
DROP TABLE IF EXISTS treepath;
SELECT parentid, nodeid, name
(fnctreepath(tree.nodeid)).depth,
(fnctreepath(tree.nodeid)).path[1] as nodeid01,
(fnctreepath(tree.nodeid)).path[2] as nodeid02,
(fnctreepath(tree.nodeid)).path[3] as nodeid03,
(fnctreepath(tree.nodeid)).path[4] as nodeid04,
(fnctreepath(tree.nodeid)).path[5] as nodeid05,
(fnctreepath(tree.nodeid)).path[6] as nodeid06,
(fnctreepath(tree.nodeid)).path[7] as nodeid07,
(fnctreepath(tree.nodeid)).path[8] as nodeid08,
(fnctreepath(tree.nodeid)).path[9] as nodeid09,
(fnctreepath(tree.nodeid)).path[10] as nodeid10,
(fnctreepath(tree.nodeid)).path[11] as nodeid11,
(fnctreepath(tree.nodeid)).path[12] as nodeid12,
(fnctreepath(tree.nodeid)).path[13] as nodeid13,
(fnctreepath(tree.nodeid)).path[14] as nodeid14,
(fnctreepath(tree.nodeid)).path[15] as nodeid15,
(fnctreepath(tree.nodeid)).path[16] as nodeid16,
(fnctreepath(tree.nodeid)).path[17] as nodeid17,
(fnctreepath(tree.nodeid)).path[18] as nodeid18,
(fnctreepath(tree.nodeid)).path[19] as nodeid19,
(fnctreepath(tree.nodeid)).path[20] as nodeid20,
(fnctreepath(tree.nodeid)).path[21] as nodeid21,
(fnctreepath(tree.nodeid)).path[22] as nodeid22,
(fnctreepath(tree.nodeid)).path[23] as nodeid23,
(fnctreepath(tree.nodeid)).path[24] as nodeid24,
(fnctreepath(tree.nodeid)).path[25] as nodeid25,
(fnctreepath(tree.nodeid)).path[26] as nodeid26,
(fnctreepath(tree.nodeid)).path[27] as nodeid27,
(fnctreepath(tree.nodeid)).path[28] as nodeid28,
(fnctreepath(tree.nodeid)).path[29] as nodeid29,
(fnctreepath(tree.nodeid)).path[30] as nodeid30
INTO treepath
FROM tree;

You should check the volatile attribute of your function.
By default a function is VOLATILE, meaning any call to the function may alter the database, so the query optimiser cannot reuse the result when you use the function several times in the same statement.
Your function is not IMUTABLE, 2+2=4 is immutable. But you should define the STABLE volatility keyword for your function, this way the optimiser could reuse your call of fnctreepath(tree.nodeid) used several time in the same statement as a stable result and share it (run it only once).

Related

How to convert an jsonb array and use stats moment

how are you?
I needed to store an array of numbers as JSONB in PostgreSQL.
Now I'm trying to calculate stats moments from this JSON, I'm facing some issues.
Sample of my data:
I already was able to convert a JSON into a float array.
I used a function to convert jsonb to float array.
CREATE OR REPLACE FUNCTION jsonb_array_castdouble(jsonb) RETURNS float[] AS $f$
SELECT array_agg(x)::float[] || ARRAY[]::float[] FROM jsonb_array_elements_text($1) t(x);
$f$ LANGUAGE sql IMMUTABLE;
Using this SQL:
with data as (
select
s.id as id,
jsonb_array_castdouble(s.snx_normalized) as serie
FROM
spectra s
)
select * from data;
I found a function that can do these calculations and I need to pass an array for that: https://github.com/ellisonch/PostgreSQL-Stats-Aggregate/
But this function requires an array in another way: unnested
I already tried to use unnest, but it will get only one value, not the entire array :(.
My goal is:
Be able to apply stats moment (kurtosis, skewness) for each row.
like:
index
skewness
1
21.2131
2
1.123
Bonus: There is a way to not use this 'with data', use the transformation in the select statement?
snx_wavelengths is JSON, right? And also you provided it as a picture and not text :( the data looks like (id, snx_wavelengths) - I believe you meant id saying index (not a good idea to use a keyword, would require identifier doublequotes):
1,[1,2,3,4]
2,[373,232,435,84]
If that is right:
select id, (stats_agg(v::float)).skewness
from myMeasures,
lateral json_array_elements_text(snx_wavelengths) v
group by id;
DBFiddle demo
BTW, you don't need "with data" in the original sample if you don't want to use and could replace with a subquery. ie:
select (stats_agg(n)).* from (select unnest(array[16,22,33,24,15])) data(n)
union all
select (stats_agg(n)).* from (select unnest(array[416,622,833,224,215])) data(n);
EDIT: And if you needed other stats too:
select id, "count","min","max","mean","variance","skewness","kurtosis"
from myMeasures,
lateral (select (stats_agg(v::float)).* from json_array_elements_text(snx_wavelengths) v) foo
group by id,"count","min","max","mean","variance","skewness","kurtosis";
DBFiddle demo

How to create a user-defined function in Postgresql?

I have the following query in Postgres. I want to have a function where the user can define the value of record_year and record_month using the function call, dynamically without having to specify in the query statement. That way the same query can be reused for different user input values for record_year and record_month (in this case). Can anybody help me out?
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y
ON x.sid = y.sid
WHERE x.record_time='23:50'
and x.record_year='2020'
and x.record_month='1'
group by x.station_id, x.record_year, x.record_month, y.addr;
The function call could be something like, select function_name(record_year, record_month);
As you are returning a result the function should be defined as returns table(). PL/pgSQL is not required if all you want to do is to return a result. A SQL function will be enough.
create function get_data(p_time time, p_year int, p_month int)
returns table (sid int, record_year int, record_month int, addr text)
as
$$
SELECT x.sid, x.record_year, x.record_month, y.addr
FROM x
FULL OUTER JOIN y ON x.sid = y.sid
WHERE x.record_time p_time
and x.record_year = p_year
and x.record_month = p_month
group by x.station_id, x.record_year, x.record_month, y.addr
$$
language sql
stable;
You have to adjust the data types of the returned columns - I have only guessed them based on name.
Note that your full outer join is really a left join because of the conditions on table x
As it is a set returning function, use it like a table in the FROM clause:
select *
from get_data(time '23:50', 2020, 1);

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

Unpack a PostGIS/PostgreSQL record with SQLAlchemy

How would I write the following query using SQLAlchemy's ORM?
SELECT filename, (stats).*
FROM (
SELECT filename, ST_SummaryStats(rast, 1, TRUE) AS stats FROM tiles
) AS stats_table;
Here, ST_SummaryStats is a PostGIS function that returns the record that I wish to unpack. tiles is a PostGIS table with filename and rast (raster) columns. My attempt is as follows:
sub_q = db_session.query(
Tiles.filename,
func.ST_SummaryStats(Tiles.rast, 1, True).label('stats'),
).subquery()
q = db_session.query(
sub_q.columns.filename,
sub_q.columns.stats,
)
However, I don't know how to write the (stats).* expression -- and hence unpack the record -- with SQLAlchemy's ORM. Consequently, stats appears to be a tuple.
Thanks in advance for any help.
ST_SummaryStats() returns a record, so rather than use it as a SELECT expression (which would return the record), use it as a FROM clause and pick the desired statistics at the SELECT level, so it becomes very simply:
SELECT filename, count, sum, mean, stddev, min, max
FROM tiles, ST_SummaryStats(tiles.rast, 1, true);
This results in a so-called LATERAL JOIN and since ST_SummaryStats() returns only a single row for the indicated raster in tiles you do not need a join condition, filter or anything else.
I am not sure about SQLAlchemy's ability to use the result of a function as a class, but a sure-fire way of making this work is to wrap the above SELECT into a VIEW and then access the view from SQLAlchemy:
CREATE VIEW raster_stats AS
SELECT filename, count, sum, mean, stddev, min, max
FROM tiles, ST_SummaryStats(tiles.rast, 1, true);

UDTF returning a Table on DB2 V5R4 with Dynamic SQL

I must to write a UDF returning a Table. I’ve done it with Static SQL.
I’ve created Procedures preparing a Dynamic and Complex SQL sentence and returning a cursor.
But now I must to create a UDF with Dynamic SQL and return a table to be used with an IN clause inside other select.
It is possible on DB2 v5R4? Do you have an example?
Thanks in advance...
I don't have V5R4, but I have i 6.1 and V5R3. I have a 6.1 example, and I poked around in V5R3 to find how to make the same example work there. I can't guarantee V5R4, but this ought to be extremely close. Generating the working V5R3 code into 'Run SQL Scripts' gives this:
DROP SPECIFIC FUNCTION SQLEXAMPLE.DYNTABLE ;
SET PATH "QSYS","QSYS2","SYSPROC","SYSIBMADM","SQLEXAMPLE" ;
CREATE FUNCTION SQLEXAMPLE.DYNTABLE (
SELECTBY VARCHAR( 64 ) )
RETURNS TABLE (
CUSTNBR DECIMAL( 6, 0 ) ,
CUSTFULLNAME VARCHAR( 12 ) ,
CUSTBALDUE DECIMAL( 6, 0 ) )
LANGUAGE SQL
NO EXTERNAL ACTION
MODIFIES SQL DATA
NOT FENCED
DISALLOW PARALLEL
CARDINALITY 100
BEGIN
DECLARE DYNSTMT VARCHAR ( 512 ) ;
DECLARE GLOBAL TEMPORARY TABLE SESSION.TCUSTCDT
( CUSTNBR DECIMAL ( 6 , 0 ) NOT NULL ,
CUSTNAME VARCHAR ( 12 ) ,
CUSTBALDUE DECIMAL ( 6 , 2 ) )
WITH REPLACE ;
SET DYNSTMT = 'INSERT INTO Session.TCustCDt SELECT t2.CUSNUM , (t2.INIT CONCAT '' '' CONCAT t2.LSTNAM) as FullName , t2.BALDUE FROM QIWS.QCUSTCDT t2 ' CONCAT CASE WHEN SELECTBY = '' THEN '' ELSE SELECTBY END ;
EXECUTE IMMEDIATE DYNSTMT ;
RETURN SELECT * FROM SESSION . TCUSTCDT ;
END ;
COMMENT ON SPECIFIC FUNCTION SQLEXAMPLE.DYNTABLE
IS 'UDTF returning dynamic table' ;
And in 'Run SQL Scripts', the function can be called like this:
SELECT t1.* FROM TABLE(sqlexample.dyntable('WHERE STATE = ''TX''')) t1
The example is intended to work over IBM's sample QCUSCDT table in library QIWS. Most systems will have that table available. The table function returns values from two QCUSCDT columns, CUSNUM and BALDUE, directly through two of the table function's columns, CUSTNBR and CUSTBALDUE. The third table function column, CUSTFULLNAME, gets its value by a concatenation of INIT and LSTNAM from QCUSTCDT.
However, the part that apparently relates to the question is the SELECTBY parameter of the function. The usage example shows that a WHERE clause is passed in and used to help built a dynamic 'INSERT INTO... SELECT...statement. The example shows that rows containingSTATE='TX'` will be returned. A more complex clause could be passed in or the needed condition(s) could be retrieved from somewhere else, e.g., from another table.
The dynamic statement inserts rows into a GLOBAL TEMPORARY TABLE named SESSION.TCUSTCDT. The temporary table is defined in the function. The temporary column definitions are guaranteed (by the developer) to match the 'RETURNS TABLE` columns of the table function because no dynamic changes can be made to any of those elements. This allows SQL to handle reliably columns returned from the function, and that lets it compile the function.
The RETURN statement simply returns whatever rows are in the temporary table after the dynamic statement completes.
The various field definitions take into account the somewhat unusual definitions in the QCUSTCDT file. Those don't make great sense, but they're useful enough.