PostgreSQL CTE records as parameters to function - postgresql

I have a function that accepts two integers as parameters my_function(input_a, input_b). Is there an easy way to pass the results of a CTE (that returns records of input_a, input_b) into the function?
Should I be looking into writing a custom function with a for loop or is there a better approach?

If the function returns a single record then:
WITH cte AS (SELECT 1 a, 2 b)
SELECT my_function(a, b) FROM cte;
will work. However, if the function is an SRF (Set-Returning-Function), then you need to use LATERAL, to let the database know that you want to feed the results of the prior tables in the JOIN statement to the functions later on in the JOIN. This is accomplished like so:
WITH cte AS (SELECT 1 a, 2 b)
SELECT * FROM cte, LATERAL my_function(a, b);
The LATERAL will cause PostgreSQL to take each row from the CTE and run "my_function" with the values from that row, returning the results of that function to the overall SELECT statement.

Related

It's a function or table in this later join case?

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:
SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;
All contents extract from PostgreSQL manual. LINK
Now I finally probably get what does LATERAL mean. In this case,
Overall I am Not sure get_product_names is a table or function. The following is my understanding.
A: get_product_names(m.id) is a function, and using m.id as a input parameter returns a table. The return table alias as pname. Overall it's a table m join a null (where condition) table.
B: get_product_names is a table, table m left join table get_product_names on m.id. pname is alias for get_product_names. Overall it's a table m join a null (where condition) table.
get_product_names is a table function (also known as set returning function or SRF in PostgreSQL slang). Such a function does not necessarily return a single result row, but arbitrarily many rows.
Since the result of such a function is a table, you typically use it in SQL statements where you would use a table, that is in the FROM clause.
A simple example is
SELECT * FROM generate_series(1, 5);
generate_series
-----------------
1
2
3
4
5
(5 rows)
You can also use normal functions in this way, they are then treated as a table function that returns exactly one row.

PostgreSQL reusing computed result as input to other select computations

Is there any way we can take a computed result inside the select clause and insert it into another computation inside the select clause?
For example this is what I want to have but can't so far:
select trim(leading https://www.amazon.com for url) as trimmedURL,
substring(trimmedURL, from position('/' in trimmedURL) for position ('html' in trimmedURL))....
As you can see I have used trimmedURL 3 times inside the substring function. I know how to naively do that be copy/paste of trim(leading https://www.amazon.com for url) into the substring function.
Is there any way to avoid that and not create really large function calls as the first value computed might be placed many times inside other functions. This will improve code readability and usability.
you could use a lateral join and place the computed fields i the lateral query. the lateral fields are then accessible from the main query.
Postgres documentation for lateral join
i.e.
SELECT
trimmedUrl
, SUBSTRING(trimmedURL,10,20) url_part
FROM mytable
LEFT JOIN LATERAL (SELECT trim(leading https://www.amazon.com for url) as trimmedURL) trmd
ON TRUE
also, note that postgresql ignores casing in the naming of columns / tables etc unless they are quoted.
Here's a self-contained example:
WITH x(col) AS (Values ('abc://cdf/def'), ('abc://xyz/pqr'))
SELECT x.col, SUBSTRING(y.col2 from position('/' in y.col2)) resuing_computation
FROM x
LEFT JOIN LATERAL (SELECT trim(leading 'abc://' from col) col2) y ON TRUE

Retrieval of columns from functions that returns table (or setof record)

All time I have an variation of this problem, and not remember how to workaround, only "oop was so simple, but how to?"... Perhaps there are some patterns and best way to work with each pattern. Let's see the main one, examplefying by unnest() and ts_stat().
First, good examples, no problems, because unnest() returns only one column:
SELECT * FROM unnest(array[1,2,3]) t(id); -- is ok, the int columns there!
SELECT unnest(array[1,2,3]) t(id); -- is ok, the int columns
WITH t AS (SELECT unnest(array[1,2,3]) as id)
SELECT id, unnest(array[4,id]) as x
FROM t; -- more complex, but ok!
Now a function that returns a defined SETOF RECORD,
SELECT * FROM ts_stat('SELECT kx FROM terms where id=2') -- GOOD
-- show all word|ndoc|nentry columns
SELECT ts_stat('SELECT kx FROM terms where id=2') as x -- BAD
-- because lost columns, show only "x" column... but works
-- NOTE: you can imagine any other function, as json_each(), etc.
See GOOD/BAD considerations... So, this is the problem: a SETOF RECORD with more tham one column. In the simplest (unnest above) case, the solution is to use in the "FROM side", as a table; but, when RECORD have multiple fields, arises the problem.
--MAIN EXAMPLE FOR THE DISCUSSION:
WITH t AS (SELECT unnest(array[1,2,3]) as id)
SELECT id, ts_stat('SELECT kx FROM terms where id='||id) as x
FROM t; -- BAD, but works...
Now, in this main example, is not possible to use ts_stat() in the "FROM side", so, characterizing the pattern: a function that returns a TABLE or a SETOF RECORD, in a query where we need columns, but the function can't in the "FROM side".
QUESTION: What the generic (and most elegant) solution to this pattern? How (syntax pattern) to show columns?
NOTE: another problem is that, if you not remember exactly the syntax of solution, you try things that not works... In this case an error:
WITH t AS (SELECT unnest(array[1,2,3]) as id)
SELECT id, x.word, x.ndoc, x.nentry
FROM (
SELECT t.nsid,
ts_stat('SELECT kx FROM terms where id='||id) as x
FROM t
) s;
SQL PARSER ERROR (PostgreSQL 9.5): no table "x" in the FROM clause.
You should never use a set-returning-function (SRF) in a SELECT list. The main example should be written with an implicit LATERAL JOIN:
SELECT v.id, x.*
FROM (VALUES (1),(2),(3)) v(id)
JOIN ts_stat('SELECT kx FROM terms where id=' || v.id) x ON true;
The lateral join is implicit here because an SRF can refer to columns from relations specified before it the FROM clause without using the keyword LATERAL. In the example above the SRF ts_stat() makes a lateral reference to column and relation v(id). You can also do this with e.g. sub-queries but then you have to explicitly use the keyword LATERAL.
Note that while you can use a SRF in a select list, its use is discouraged. You provide the example of unnest(anyarray) which is interesting because there is also the overloaded variant unnest(anyarray, ...) (i.e. unnest multiple arrays in one call) which will throw an error when used in a select list; in can only be used as a row source. The reason why you should not use SRFs in a select list is that there is no obvious solution when using multiple SRFs each producing a different number of rows.

Does inner join effect order by?

I have a function a() which gives result in a specific order.
I want to do:
select final.*,tablex.name
from a() as final
inner join tablex on (a.key=tablex.key2)
My question is, can I guarantee that the join won't effect the order of rows as a() set it?
a() is:
select ....
from....
joins...
order by x,y,z
The short version:
The order of rows returned by a SQL query is not guaranteed in any way unless you use an order by
Any order you see without an order by is pure coincidence and can not be relied upon.
So how did I always get the correct order so far? when I did Select * from a()
If your function is a SQL function, then the query inside the function is executed "as is" (it's essentially "inlined") so you only run a single query that does have an order by. If it's a PL/pgSQL function and the only thing it does is a RETURN QUERY ... then you again only have a single query that is executed which does have an order by.
Assuming you do use a SQL function, then running:
select final.*,tablex.name
from a() as final
join tablex on a.key=tablex.key2
is equivalent to:
select final.*,tablex.name
from (
-- this is your query inside the function
select ...
from ...
join ...
order by x,y,z
) as final
join tablex on a.key=tablex.key2;
In this case the order by inside the derived table doesn't make sense as it might be "overruled" by an overall order by statement. In fact some databases would outright reject this query (and I sometime wish Postgres would do as well).
Without an order by on the **overall* query, the database is free to choose any order of rows that it wants.
So to get back to the initial question:
can I guarantee that the join won't effect the order of rows as a() set it?
The answer to that is a clear: NO - the order of the rows for that query is in no way guaranteed. If you need an order that you can rely on, you have to specify an order by.
I would even go so far to remove the order by from the function - what if someone runs: select * from a() order by z,y,x - I don't think Postgres will be smart enough to remove the order by inside the function.

Selecting all rows who belong to a group with some properties

I use PostgreSQL for a web application, and I've run into a type of query I can't think of a way to write efficiently.
What I'm trying to do is select all rows from a table which, when grouped a certain way, the group meets some criteria. For example, the naive way to structure this query might be something like this:
SELECT *
FROM table T
JOIN (
SELECT iT.a, iT.b, SUM(iT.c) AS sum
FROM table iT
GROUP BY iT.a, iT.b
) TG ON (TG.a = T.a AND TG.b = T.b)
WHERE TG.sum > 100;
The problem I'm having is that this effectively doubles the time it takes the query to execute, since it's essentially selecting the rows from that table twice.
How can I structure queries of this type efficiently?
You can try a window function although I don't know if it is more efficient. I guess it is as it avoids the join. Test this and your query with explain
select *
from (
select
a, b,
sum(c) over(partition by a, b) as sum
from t
) s
where "sum" > 100