i am writing a folloing pgsql procedure :
CREATE OR REPLACE FUNCTION KNN(gid_ integer)
RETURNS Text AS $body$
DECLARE
row_ RECORD;
BEGIN
SELECT g1.gid As SOURCE, g2.gid As Neighbors FROM polygons as g1, polygons as g2 WHERE g1.gid = $1 and g1.gid <> g2.gid ORDER BY g1.gid,
ST_Distance(g1.the_geom,g2.the_geom) limit 5;
END
$body$
LANGUAGE plpgsql;
Now that the query return 5 rows for each value of arrgument supplied to procedure. How can i return those 5 rows. Also, how can i execute the procedure for all values of argument stored in a table polygons as column gid. Please somebody give the full code please. thankful to you.
You can use the RETURNS TABLE syntax to implicitly create OUT variables:
CREATE OR REPLACE FUNCTION KNN(
gid_ integer
) RETURNS TABLE (
source integer,
neighbor integer
) LANGUAGE SQL AS $$
SELECT g1.gid As SOURCE
, g2.gid As Neighbors
FROM polygons AS g1,
polygons AS g2
WHERE g1.gid = $1
AND g1.gid <> g2.gid
ORDER BY g1.gid
, ST_Distance(g1.the_geom,g2.the_geom)
LIMIT 5;
$$;
To use it, use SELECT * FROM KNN(42) and you will get back up to five two-column rows.
Related
I have a table Answer and Many to Many table Link (Answer n-n Answer)
Link have 2 column : from_id and to_id reference to answer_id.
I want get all descendant of answer by answer_id ( from_id in Link ).
I have written function as below :
CREATE OR REPLACE FUNCTION getAllChild(_answer_id BIGINT)
RETURNS SETOF BIGINT AS $$
DECLARE r link;
BEGIN
FOR r IN
SELECT * FROM link
WHERE from_id = _answer_id
LOOP
RETURN NEXT r.to_id;
RETURN QUERY SELECT * FROM getAllChild(r.to_id);
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql STRICT;
SELECT * FROM getAllChild(1);
The result is fine if to_id not duplicate with from_id that already got otherwise I will get recursive infinity.
My question is how I can make loop skip the existed to_id to call getAllChild() in RETURN QUERY
I'd suggest you do this with a recursive CTE, you could use the same approach in a function though.
You can use an array to keep a track of all the from_id's you've dealt with, and then in the next run through you ignore any records for from_id's already in the results. In the code below I'm using the path array to track all the from_id's already seen.
with recursive t as
(
select l.from_id,l.to_id, ARRAY[l.from_id] as path, 1 as depth
from link l where from_id = 2
union all
select l.from_id,l.to_id, array_append(t.path,l.from_id), t.depth+1
from link l
inner join t on l.from_id = t.to_id
where not (l.from_id = ANY (t.path)) -- ignore records already processed
)
select * from t;
Fiddle at: http://sqlfiddle.com/#!15/024e80/1
Updated: As a function
CREATE OR REPLACE FUNCTION getAllChild(_answer_id BIGINT)
RETURNS SETOF BIGINT AS $$
BEGIN
return query
with recursive t as
(
select l.from_id,l.to_id, ARRAY[l.from_id] as path, 1 as depth from link l where from_id = _answer_id
union all
select l.from_id,l.to_id, array_append(t.path,l.from_id), t.depth+1 from link l
inner join t on l.from_id = t.to_id
where not (l.from_id = ANY (t.path))
)
select to_id from t;
END;
$$ LANGUAGE plpgsql STRICT;
Arrays documentation: https://www.postgresql.org/docs/current/static/arrays.html
CTEs: https://www.postgresql.org/docs/current/static/queries-with.html
I have multiple tables with each two rows of interest: connection_node_start_id and connection_node_end_id. My goal is to get a collection of all those IDs, either as a flat ARRAY or as a new TABLE consisting of one row.
Example output ARRAY:
result = {1,4,7,9,2,5}
Example output TABLE:
IDS
-------
1
4
7
9
2
5
My fist attempt is somewhat clumsy and does not work properly as the SELECT statement just returns one row. It seems there must be a simple way to do this, can someone point me into the right direction?
CREATE OR REPLACE FUNCTION get_connection_nodes(anyarray)
RETURNS anyarray AS
$$
DECLARE
table_name varchar;
result integer[];
sel integer[];
BEGIN
FOREACH table_name IN ARRAY $1
LOOP
RAISE NOTICE 'table_name(%)',table_name;
EXECUTE 'SELECT ARRAY[connection_node_end_id,
connection_node_start_id] FROM ' || table_name INTO sel;
RAISE NOTICE 'sel(%)',sel;
result := array_cat(result, sel);
END LOOP;
RETURN result;
END
$$
LANGUAGE 'plpgsql';
Test table:
connection_node_start_id | connection_node_end_id
--------------------------------------------------
1 | 4
7 | 9
Call:
SELECT get_connection_nodes(ARRAY['test_table']);
Result:
{1,4} -- only 1st row, rest is missing
For Postgres 9.3+
CREATE OR REPLACE FUNCTION get_connection_nodes(text[])
RETURNS TABLE (ids int) AS
$func$
DECLARE
_tbl text;
BEGIN
FOREACH _tbl IN ARRAY $1
LOOP
RETURN QUERY EXECUTE format('
SELECT t.id
FROM %I, LATERAL (VALUES (connection_node_start_id)
, (connection_node_end_id)) t(id)'
, _tbl);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Related answer on dba.SE:
SELECT DISTINCT on multiple columns
Or drop the loop and concatenate a single query. Probably fastest:
CREATE OR REPLACE FUNCTION get_connection_nodes2(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(format(
'SELECT t.id FROM %I, LATERAL (VALUES (connection_node_start_id)
, (connection_node_end_id)) t(id)'
, tbl), ' UNION ALL ')
FROM unnest($1) tbl
);
END
$func$ LANGUAGE plpgsql;
Related:
Loop through like tables in a schema
LATERAL was introduced with Postgres 9.3.
For older Postgres
You can use the set-returning function unnest() in the SELECT list, too:
CREATE OR REPLACE FUNCTION get_connection_nodes2(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(
'SELECT unnest(ARRAY[connection_node_start_id
, connection_node_end_id]) FROM ' || tbl
, ' UNION ALL '
)
FROM (SELECT quote_ident(tbl) AS tbl FROM unnest($1) tbl) t
);
END
$func$ LANGUAGE plpgsql;
Should work with pg 8.4+ (or maybe even older). Works with current Postgres (9.4) as well, but LATERAL is much cleaner.
Or make it very simple:
CREATE OR REPLACE FUNCTION get_connection_nodes3(text[])
RETURNS TABLE (ids int) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT string_agg(format(
'SELECT connection_node_start_id FROM %1$I
UNION ALL
SELECT connection_node_end_id FROM %1$I'
, tbl), ' UNION ALL ')
FROM unnest($1) tbl
);
END
$func$ LANGUAGE plpgsql;
format() was introduced with pg 9.1.
Might be a bit slower with big tables because each table is scanned once for every column (so 2 times here). Sort order in the result is different, too - but that does not seem to matter for you.
Be sure to sanitize escape identifiers to defend against SQL injection and other illegal syntax. Details:
Table name as a PostgreSQL function parameter
The EXECUTE ... INTO statement can only return data from a single row:
If multiple rows are returned, only the first will be assigned to the INTO variable.
In order to concatenate values from all rows you have to aggregate them first by column and then append the arrays:
EXECUTE 'SELECT array_agg(connection_node_end_id) ||
array_agg(connection_node_start_id) FROM ' || table_name INTO sel;
You're probably looking for something like this:
CREATE OR REPLACE FUNCTION d (tblname TEXT [])
RETURNS TABLE (c INTEGER) AS $$
DECLARE sql TEXT;
BEGIN
WITH x
AS (SELECT unnest(tblname) AS tbl),
y AS (
SELECT FORMAT('
SELECT connection_node_end_id
FROM %s
UNION ALL
SELECT connection_node_start_id
FROM %s
', tbl, tbl) AS s
FROM x)
SELECT string_agg(s, ' UNION ALL ')
INTO sql
FROM y;
RETURN QUERY EXECUTE sql;
END;$$
LANGUAGE plpgsql;
CREATE TABLE a (connection_node_end_id INTEGER, connection_node_start_id INTEGER);
INSERT INTO A VALUES (1,2);
CREATE TABLE b (connection_node_end_id INTEGER, connection_node_start_id INTEGER);
INSERT INTO B VALUES (100, 101);
SELECT * from d(array['a','b']);
c
-----
1
2
100
101
(4 rows)
I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:
for i in 1..25 LOOP
insert into playtime.meta_random_sample
select i, ID
from tbl
order by random() limit 15000
end loop
Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.
Example with plpgsql:
DO
$do$
BEGIN
FOR i IN 1..25 LOOP
INSERT INTO playtime.meta_random_sample
(col_i, col_id) -- declare target columns!
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000;
END LOOP;
END
$do$;
For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:
INSERT INTO playtime.meta_random_sample (col_i, col_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000
) t;
About generate_series():
What is the expected behaviour for multiple set-returning functions in SELECT clause?
About optimizing performance of random selections:
Best way to select random rows PostgreSQL
Below is example you can use:
create temp table test2 (
id1 numeric,
id2 numeric,
id3 numeric,
id4 numeric,
id5 numeric,
id6 numeric,
id7 numeric,
id8 numeric,
id9 numeric,
id10 numeric)
with (oids = false);
do
$do$
declare
i int;
begin
for i in 1..100000
loop
insert into test2 values (random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random());
end loop;
end;
$do$;
I just ran into this question and, while it is old, I figured I'd add an answer for the archives. The OP asked about for loops, but their goal was to gather a random sample of rows from the table. For that task, Postgres 9.5+ offers the TABLESAMPLE clause on WHERE. Here's a good rundown:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/
I tend to use Bernoulli as it's row-based rather than page-based, but the original question is about a specific row count. For that, there's a built-in extension:
https://www.postgresql.org/docs/current/tsm-system-rows.html
CREATE EXTENSION tsm_system_rows;
Then you can grab whatever number of rows you want:
select * from playtime tablesample system_rows (15);
I find it more convenient to make a connection using a procedural programming language (like Python) and do these types of queries.
import psycopg2
connection_psql = psycopg2.connect( user="admin_user"
, password="***"
, port="5432"
, database="myDB"
, host="[ENDPOINT]")
cursor_psql = connection_psql.cursor()
myList = [...]
for item in myList:
cursor_psql.execute('''
-- The query goes here
''')
connection_psql.commit()
cursor_psql.close()
Here is the one complex postgres function involving UUID Array, For loop, Case condition and Enum data update. This function parses each row and checks for the condition and updates the individual row.
CREATE OR REPLACE FUNCTION order_status_update() RETURNS void AS $$
DECLARE
oid_list uuid[];
oid uuid;
BEGIN
SELECT array_agg(order_id) FROM order INTO oid_list;
FOREACH uid IN ARRAY uid_list
LOOP
WITH status_cmp AS (select COUNT(sku)=0 AS empty,
COUNT(sku)<COUNT(sku_order_id) AS partial,
COUNT(sku)=COUNT(sku_order_id) AS full
FROM fulfillment
WHERE order_id=oid)
UPDATE order
SET status=CASE WHEN status_cmp.empty THEN 'EMPTY'::orderstatus
WHEN status_cmp.full THEN 'FULL'::orderstatus
WHEN status_cmp.partial THEN 'PARTIAL'::orderstatus
ELSE null
END
FROM status_cmp
WHERE order_id=uid;
END LOOP;
END;
$$ LANGUAGE plpgsql;
To run the above function
SELECT order_status_update();
Using procedure.
CREATE or replace PROCEDURE pg_temp_3.insert_data()
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO meta_random_sample(col_serial, parent_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, parent_id
FROM parent_tree order by random() limit 2
) t;
END;
Call the procedure.
call pg_temp_3.insert_data();
PostgreSQL manual: https://www.postgresql.org/docs/current/sql-createprocedure.html
I have a web based system that has several tables (postgres/pgsql) that hold many to many relationships such as;
table x
column_id1 smallint FK
column_id2 smallint FK
In this scenario the update is made based on column_id2
At first to update these records we would run the following function;
-- edited to protect the innocent
CREATE FUNCTION associate_id1_with_id2(integer[], integer) RETURNS integer
AS $_$
DECLARE
a alias for $1;
b alias for $2;
i integer;
BEGIN
delete from tablex where user_id = b;
FOR i IN array_lower(a,1) .. array_upper(a,1) LOOP
INSERT INTO tablex (
column_id2,
column_id1)
VALUES (
b,
a[i]);
end loop;
RETURN i;
END;
$_$
LANGUAGE plpgsql;
that seemed sloppy and now with the addition of auditing it really shows.
What I am trying to do now is only delete and insert the necessary rows.
I have been trying various forms of the following with no luck
CREATE OR REPLACE FUNCTION associate_id1_with_id2(integer[], integer) RETURNS integer
AS $_$
DECLARE
a alias for $1;
b alias for $2;
c varchar;
i integer;
BEGIN
c = array_to_string($1,',');
INSERT INTO tablex (
column_id2,
column_id1)
(
SELECT column_id2, column_id1
FROM tablex
WHERE column_id2 = b
AND column_id1 NOT IN (c)
);
DELETE FROM tablex
WHERE column_id2 = b
AND column_id1 NOT IN (c);
RETURN i;
END;
$_$
LANGUAGE plpgsql;
depending on the version of the function I'm attempting there are various errors such as explicit type casts (i'm guessing it doesnt like c being varchar?) for the current version.
first off, is my approach correct or is there a more elegant solution given there are a couple tables which this type of handling is required? If not could you please point me in the right direction?
if this is the right approach could you please assist with the array conversion for the NOT IN portion of the where clause?
Instead of array_to_string, use unnest to transform the array into a set of rows (as if it was a table), and the problem can be solved with vanilla SQL:
INSERT INTO tablex(column_id1,column_id2)
select ai,b from unnest(a) as ai where not exists
(select 1 from tablex where column_id1=ai and column_id2=b);
DELETE FROM tablex
where column_id2=b and column_id1 not in
(select ai from unnest(a) as ai);
I have the following procedure :
CREATE OR REPLACE FUNCTION findKNN()
RETURNS Text AS $body$
DECLARE
cur refcursor;
tempcur refcursor;
gid_ integer;
_var1 integer;
_var2 integer;
BEGIN
open cur for execute('select gid from polygons');
loop
fetch cur into gid_;
open tempcur for SELECT g1.gid , g2.gid FROM polygons AS g1, polygons AS g2
WHERE g1.gid = gid_ and g1.gid <> g2.gid ORDER BY g1.gid , ST_Distance(g1.the_geom,g2.the_geom)
LIMIT 5;
loop
fetch tempcur into _var1 , _var2;
-- how to return _var1 , _var2 here ?
end loop;
end loop;
close cur;
END;
$body$
LANGUAGE plpgsql;
But I don't know how to return the result out of this procedure. The query returns 5 rows for each execution within outer cursor loop. How can I retrieve these five rows for each query execution?
Unless you are trying to do something more complicated that is not in your question, you can radically simplify to:
CREATE OR REPLACE FUNCTION find_knn()
RETURNS TABLE(gid1 integer, gid2 integer) AS
$body$
BEGIN
RETURN QUERY
SELECT g1.gid , g2.gid
FROM polygons g1
JOIN polygons g2 ON g1.gid <> g2.gid
-- WHERE g1.gid = <some_condition> -- ???
ORDER BY g1.gid, st_distance(g1.the_geom, g2.the_geom)
LIMIT 5;
END;
$body$ LANGUAGE plpgsql;
Or even:
CREATE OR REPLACE FUNCTION find_knn()
RETURNS TABLE(gid1 integer, gid2 integer) AS
$body$
SELECT g1.gid , g2.gid
FROM polygons g1
JOIN polygons g2 ON g1.gid <> g2.gid
-- WHERE g1.gid = <some_condition> -- ???
ORDER BY g1.gid, st_distance(g1.the_geom, g2.the_geom)
LIMIT 5;
$body$ LANGUAGE sql;
Call:
SELECT * FROM x.find_knn();
The manual about Returning From a Function.
The manual about CREATE FUNCTION.
Retrieve a small slice of a huge join
(Answer to comment.)
There is many ways to pick a small slice of a huge join without actually evaluating the whole join. In most cases you don't even have to worry about it. For instance, run this at home:
EXPLAIN ANALYZE
SELECT *
FROM huge_tbl t1
CROSS JOIN huge_tbl t2
LIMIT 5
You will see that only 5 rows will be processed, not the whole cross join.
The same is true for a CTE:
WITH a AS (
SELECT *
FROM huge_tbl t1
CROSS JOIN huge_tbl t2
)
SELECT *
FROM a
LIMIT 5
Some limitations apply. I quote the excellent manual:
PostgreSQL's implementation evaluates only as many rows of a WITH
query as are actually fetched by the parent query.
To make absolutely sure, you could apply the LIMIT (or a fitting WHERE clause) at the source:
SELECT *
FROM (SELECT * FROM huge_table LIMIT 1) t1
CROSS JOIN (SELECT * FROM huge_table LIMIT 5) t2;