I have the following procedure :
CREATE OR REPLACE FUNCTION findKNN()
RETURNS Text AS $body$
DECLARE
cur refcursor;
tempcur refcursor;
gid_ integer;
_var1 integer;
_var2 integer;
BEGIN
open cur for execute('select gid from polygons');
loop
fetch cur into gid_;
open tempcur for SELECT g1.gid , g2.gid FROM polygons AS g1, polygons AS g2
WHERE g1.gid = gid_ and g1.gid <> g2.gid ORDER BY g1.gid , ST_Distance(g1.the_geom,g2.the_geom)
LIMIT 5;
loop
fetch tempcur into _var1 , _var2;
-- how to return _var1 , _var2 here ?
end loop;
end loop;
close cur;
END;
$body$
LANGUAGE plpgsql;
But I don't know how to return the result out of this procedure. The query returns 5 rows for each execution within outer cursor loop. How can I retrieve these five rows for each query execution?
Unless you are trying to do something more complicated that is not in your question, you can radically simplify to:
CREATE OR REPLACE FUNCTION find_knn()
RETURNS TABLE(gid1 integer, gid2 integer) AS
$body$
BEGIN
RETURN QUERY
SELECT g1.gid , g2.gid
FROM polygons g1
JOIN polygons g2 ON g1.gid <> g2.gid
-- WHERE g1.gid = <some_condition> -- ???
ORDER BY g1.gid, st_distance(g1.the_geom, g2.the_geom)
LIMIT 5;
END;
$body$ LANGUAGE plpgsql;
Or even:
CREATE OR REPLACE FUNCTION find_knn()
RETURNS TABLE(gid1 integer, gid2 integer) AS
$body$
SELECT g1.gid , g2.gid
FROM polygons g1
JOIN polygons g2 ON g1.gid <> g2.gid
-- WHERE g1.gid = <some_condition> -- ???
ORDER BY g1.gid, st_distance(g1.the_geom, g2.the_geom)
LIMIT 5;
$body$ LANGUAGE sql;
Call:
SELECT * FROM x.find_knn();
The manual about Returning From a Function.
The manual about CREATE FUNCTION.
Retrieve a small slice of a huge join
(Answer to comment.)
There is many ways to pick a small slice of a huge join without actually evaluating the whole join. In most cases you don't even have to worry about it. For instance, run this at home:
EXPLAIN ANALYZE
SELECT *
FROM huge_tbl t1
CROSS JOIN huge_tbl t2
LIMIT 5
You will see that only 5 rows will be processed, not the whole cross join.
The same is true for a CTE:
WITH a AS (
SELECT *
FROM huge_tbl t1
CROSS JOIN huge_tbl t2
)
SELECT *
FROM a
LIMIT 5
Some limitations apply. I quote the excellent manual:
PostgreSQL's implementation evaluates only as many rows of a WITH
query as are actually fetched by the parent query.
To make absolutely sure, you could apply the LIMIT (or a fitting WHERE clause) at the source:
SELECT *
FROM (SELECT * FROM huge_table LIMIT 1) t1
CROSS JOIN (SELECT * FROM huge_table LIMIT 5) t2;
Related
I have table1 which contains around 900k line of records, and table2 around 500k rows and 26 columns. sample table
I want to update table1 with total of unique combinations from table2
Tried with different type of counts, but still the performance is extremally slow. Is there any alternative options to improve the performance please?
query #1, the completion time for just 10 lines is 1 min 47 secs.
do $$
declare
rec record;
current_joins text;
current_result int;
begin for rec in (select joins from table1 where line<=10) loop
select rec.joins into current_joins;
execute format('select count(*) from (select 1 from table2 group by %1$s) as some_alias;', current_joins) into current_result;
update table1 set result = current_result where joins=current_joins;
end loop;
end $$;
query #2, the completion time for just 10 lines is 1 min 48 secs
do $$
declare
rec record;
current_joins text;
current_result int;
begin for rec in (select joins from table1 where line<=10) loop
select rec.joins into current_joins;
execute format('select count(*) from (select 1 from table2 group by %1$s having count(*)>=1) as some_alias;', current_joins) into current_result;
update table1 set result = current_result where joins=current_joins;
end loop;
end $$;
query #3, the completion time for just 10 lines is 1 min 52 secs
do $$
declare
rec record;
current_joins text;
current_result int;
begin for rec in (select joins from table1 where line<=10) loop
select rec.joins into current_joins;
execute format('select count(distinct (%1$s)) from table2', current_joins) into current_result;
update table1 set result = current_result where joins=current_joins;
end loop;
end $$;
query #4, the completion time for just 10 lines is 1 min 55 secs
do $$
declare
rec record;
current_joins text;
current_result int;
begin for rec in (select joins from table1 where line<=10) loop
select rec.joins into current_joins;
execute format('select count(*) from (select distinct (%1$s) from table2) as temp', current_joins) into current_result;
update table1 set result = current_result where joins=current_joins;
end loop;
end $$;
other option, tried with Microsoft Excel, the completion time for just 10 lines is <10 secs which is much faster than all the above sql codes, but still slow with 500k data
=LET(A,Data!$C$2:$AB$12,B,ROWS(A),C,FILTERXML("<A><B>"&SUBSTITUTE(C2,",","</B><B>")&"</B></A>","//B"),ROWS(UNIQUE(INDEX(A,SEQUENCE(B),TRANSPOSE(C)))))
Following functions are created for doing housekeeping within the database (PostgreSQL 11.4).
entity_with_multiple_taskexec: returns a list of entities for which the housekeeping should be done.
row_id_to_delete: returns tuples of id's to delete
Just for completeness, the function which works fine:
CREATE OR REPLACE FUNCTION entity_with_multiple_taskexec()
RETURNS TABLE(entitykey varchar) AS
$func$
BEGIN
RETURN QUERY select distinct task.entitykey from
(select task.entitykey from task where dtype = 'PropagationTask' group by task.entitykey having count(*) > (select count(*) from conninstance)) more_than_one_entry
inner join task on task.entitykey = more_than_one_entry.entitykey
inner join taskexec on taskexec.task_id = task.id order by task.entitykey asc;
END
$func$ LANGUAGE plpgsql;
But which the second function, I'm not able to return a table, created from looping through the results of the entity_with_multiple_taskexec function;
CREATE OR REPLACE FUNCTION row_id_to_delete()
RETURNS TABLE(task_id varchar, taskexec_id varchar) AS
$func$
DECLARE
entityrow RECORD;
resultset RECORD;
BEGIN
FOR entityrow IN SELECT entitykey FROM entity_with_multiple_taskexec() LOOP
insert into resultset select task.id as task_id, taskexec.id as taskexec_id from task
inner join taskexec on taskexec.task_id = task.id where taskexec.entitykey = entityrow.entitykey order by taskexec.enddate desc offset 1
END LOOP;
RETURN resultset;
END
$func$ LANGUAGE plpgsql;
This breaks with the following error
ERROR: syntax error at or near "END"
LINE 12: END LOOP;
I've tried different approaches. What would be a good solution to return the table?
You don't need a loop, just join to the function as if it is a table.
There is also no need to use PL/pgSQL for this, a simple language sql function will be more efficient.
CREATE OR REPLACE FUNCTION row_id_to_delete()
RETURNS TABLE(task_id varchar, taskexec_id varchar) AS
$func$
select task.id as task_id, taskexec.id as taskexec_id
from task
join taskexec on taskexec.task_id = task.id
join entity_with_multiple_taskexec() as mt on mt.entitykey = taskexec.entitykey
order by taskexec.enddate desc
offset 1
$func$
LANGUAGE sql;
I am developing an application its database is Postgres 9.5
I am having the following plpgsql function
CREATE OR REPLACE FUNCTION public.getall_available_products(
IN start_day_id integer,
IN end_day_id integer)
RETURNS TABLE(id integer) AS
$BODY$
SELECT product_id As id
FROM product_days
WHERE available > 0
AND days_id BETWEEN start_day_id AND end_day_id
$BODY$
LANGUAGE sql VOLATILE
I need to use the result of the above function in a join query in another plpgsql
function
CREATE OR REPLACE FUNCTION public.get_available_product_details(
IN start_day_id integer,
IN end_day_id integer)
RETURNS SETOF record AS
$BODY$declare
begin
SELECT pd.days_id As pd_days_id, pd.id AS p_id, pd.name AS p_name
FROM product p JOIN product_days pd
Using(id)
WHERE pd.id in
Select * from
//here I need to use the result of the getall_available_products
//function
end;
$BODY$
LANGUAGE plpgsql VOLATILE
How should I use the result of the first function in the second function? where I specify with comments.
You can select from set / table returning functions like tables or views. In your case:
SELECT pd.days_id As pd_days_id, pd.id AS p_id, pd.name AS p_name
FROM product p JOIN product_days pd USING(id)
WHERE pd.id IN
(SELECT a.id FROM public.getall_available_products(start_day_id, end_day_id) a);
You may even join with functions:
SELECT pd.days_id As pd_days_id, pd.id AS p_id, pd.name AS p_name
FROM product p JOIN product_days pd USING(id)
JOIN public.getall_available_products(start_day_id, end_day_id) a ON pd.id = a.id;
This should give the same result.
Note: If you want pass column values as function arguments you should take a look at the relatively new keyword LATERAL.
In a Firebird SQL stored procedure I use a 'select into' in a 'for do' loop and I don't find the equivalent for pg function.
for select purchase.quantity, purchase.purchasevalue, purchase.purchased, purchase.id from purchase
join cellarbook cb on purchase.fk_cellarbook_id = cb.id
join bottle bot on cb.fk_bottle_id = bot.id
where bot.id = :bottleid
order by purchase.purchased ASC
into :purquantity, :purvalue, :purdate, :purid
do
begin
/* calculate quantity on hand at point of purchase
here come some more 'select' and calculations and
then and 'update' */
select sum(psum.quantity) as purquantitysum from purchase
join cellarbook cb on psum.fk_cellarbook_id = cb.id
join bottle bot on cb.fk_bottle_id = bot.id
where bot.id = bottleid and psum.purchased <= pur.purchased and psum.id <> pur.id
into :purquantitysum
end
I think it is a 'for in loop' but I am hung up on what the equivalent for the 'select into' is.
You need to use a record variable for this:
declare
r record;
begin
for r in
select col_1, col_2 from some_table;
loop
select sum(x)
from other_table
where id = r.col_1;
end loop;
end;
More examples are in the manual:
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING
When you run update or select statements inside a loop is usually code-smell ("row-by-row processing"). In most of the cases it is much more efficient to do a bulk processing of everything in a single statement.
I'd use a cursor. There are several variations on the same theme available. I normally use this:
declare mycursor cursor for select a, b from c;
declare d, e bigint;
begin
loop
fetch from mycursor into d, e
exit when not found;
-- do your thing here
end loop;
close mycursor;
-- maybe do some other stuff
end;
i am writing a folloing pgsql procedure :
CREATE OR REPLACE FUNCTION KNN(gid_ integer)
RETURNS Text AS $body$
DECLARE
row_ RECORD;
BEGIN
SELECT g1.gid As SOURCE, g2.gid As Neighbors FROM polygons as g1, polygons as g2 WHERE g1.gid = $1 and g1.gid <> g2.gid ORDER BY g1.gid,
ST_Distance(g1.the_geom,g2.the_geom) limit 5;
END
$body$
LANGUAGE plpgsql;
Now that the query return 5 rows for each value of arrgument supplied to procedure. How can i return those 5 rows. Also, how can i execute the procedure for all values of argument stored in a table polygons as column gid. Please somebody give the full code please. thankful to you.
You can use the RETURNS TABLE syntax to implicitly create OUT variables:
CREATE OR REPLACE FUNCTION KNN(
gid_ integer
) RETURNS TABLE (
source integer,
neighbor integer
) LANGUAGE SQL AS $$
SELECT g1.gid As SOURCE
, g2.gid As Neighbors
FROM polygons AS g1,
polygons AS g2
WHERE g1.gid = $1
AND g1.gid <> g2.gid
ORDER BY g1.gid
, ST_Distance(g1.the_geom,g2.the_geom)
LIMIT 5;
$$;
To use it, use SELECT * FROM KNN(42) and you will get back up to five two-column rows.