Using nested loops in Redshift - amazon-redshift

I am trying to create a nested for loop within a Redshift procedure but getting following error; SQL Error [24000]: ERROR: opening multiple cursors from within the same client connection is not allowed.
Question: What is the best way to get around it?
Example code:
create or replace procedure test.nested_loop as $$
declare
row RECORD;
row2 RECORD;
begin
for row in
select
*
from
(
select
col_name
from
tbl_name1
)
loop
execute 'some code';
for row2 in
select
*
from
(
select
col_name2
from
tbl_name2
)
loop
execute 'some code';
end loop;
end loop;
return;
end;
$$ language plpgsql;

You need to create a single cursor with all the info you need. This can be done by cross joining your tables.
create or replace procedure test.nested_loop as $$
declare
row RECORD;
begin
for row in
select
*
from
(
select
tbl_name1.col_name, tbl_name2.col_name2
from tbl_name1
cross join tbl_name2
)
loop
execute 'some code that parses row.col_name and row.col_name2';
end loop;
return;
end;
$$ language plpgsql;
If the number of loops in this code is large you will want to address this in a different way. This process runs very slow as the number of rows being pulled from the cursor grows.

Related

Postgresql query across different tables with dynamic query

I'm trying to get a customer id which can be placed in one of ten different tables. I don't want to hard code those table names to find it so I tried postgresql function as follows.
create or replace FUNCTION test() RETURNS SETOF RECORD AS $$
DECLARE
rec record;
BEGIN
select id from schema.table_0201_0228 limit 1 into rec;
return next rec;
select id from schema.table_0301_0331 limit 1 into rec;
return next rec;
END $$ language plpgsql;
select * from test() as (id int)
As I'm not familiar with postgresql function usage, how can I improve the code to replace 'schema.table1' with a variable, loop each table and return the result?
NOTE: table names may change overtime. For example, table_0201_0228 and table_0301_0331 are for February and March respectively.
You need dynamic SQL for that:
create or replace FUNCTION test(p_schema text)
RETURNS table(id int)
AS $$
DECLARE
l_tab record;
l_sql text;
BEGIN
for l_tab in (select schemaname, tablename
from pg_tables
where schemaname = p_schema)
loop
l_sql := format('select id from %I.%I limit 1', l_tab.schemaname, l_tab.tablename);
return query execute l_sql;
end loop;
END $$
language plpgsql;
I made the schema name a parameter, but of course you can hard-code it. As the function is defined as returns table there is no need to specify the column name when using it:
select *
from test('some_schema');

"INSERT INTO ... FETCH ALL FROM ..." can't be compiled

I have some function on PostgreSQL 9.6 returning a cursor (refcursor):
CREATE OR REPLACE FUNCTION public.test_returning_cursor()
RETURNS refcursor
IMMUTABLE
LANGUAGE plpgsql
AS $$
DECLARE
_ref refcursor = 'test_returning_cursor_ref1';
BEGIN
OPEN _ref FOR
SELECT 'a' :: text AS col1
UNION
SELECT 'b'
UNION
SELECT 'c';
RETURN _ref;
END
$$;
I need to write another function in which a temp table is created and all data from this refcursor are inserted to it. But INSERT INTO ... FETCH ALL FROM ... seems to be impossible. Such function can't be compiled:
CREATE OR REPLACE FUNCTION public.test_insert_from_cursor()
RETURNS table(col1 text)
IMMUTABLE
LANGUAGE plpgsql
AS $$
BEGIN
CREATE TEMP TABLE _temptable (
col1 text
) ON COMMIT DROP;
INSERT INTO _temptable (col1)
FETCH ALL FROM "test_returning_cursor_ref1";
RETURN QUERY
SELECT col1
FROM _temptable;
END
$$;
I know that I can use:
FOR _rec IN
FETCH ALL FROM "test_returning_cursor_ref1"
LOOP
INSERT INTO ...
END LOOP;
But is there better way?
Unfortunately, INSERT and SELECT don't have access to cursors as a whole.
To avoid expensive single-row INSERT, you could have intermediary functions with RETURNS TABLE and return the cursor as table with RETURN QUERY. See:
Return a query from a function?
CREATE OR REPLACE FUNCTION f_cursor1_to_tbl()
RETURNS TABLE (col1 text) AS
$func$
BEGIN
-- MOVE BACKWARD ALL FROM test_returning_cursor_ref1; -- optional, see below
RETURN QUERY
FETCH ALL FROM test_returning_cursor_ref1;
END
$func$ LANGUAGE plpgsql; -- not IMMUTABLE
Then create the temporary table(s) directly like:
CREATE TEMP TABLE t1 ON COMMIT DROP
AS SELECT * FROM f_cursor1_to_tbl();
See:
Creating temporary tables in SQL
Still not very elegant, but much faster than single-row INSERT.
Note: Since the source is a cursor only the first call succeeds. Executing the function a second time would return an empty set. You would need a cursor with the SCROLL option and move to the start for repeated calls.
This function does INSERT INTO from refcursor. It is universal for all the tables. The only requirement is that all columns of table corresponds to columns of refcursor by types and order (not necessary by names).
to_json() does the trick to convert any primitive data types to string with double-quotes "", which are later replaced with ''.
CREATE OR REPLACE FUNCTION public.insert_into_from_refcursor(_table_name text, _ref refcursor)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
_sql text;
_sql_val text = '';
_row record;
_hasvalues boolean = FALSE;
BEGIN
LOOP --for each row
FETCH _ref INTO _row;
EXIT WHEN NOT found; --there are no rows more
_hasvalues = TRUE;
SELECT _sql_val || '
(' ||
STRING_AGG(val.value :: text, ',') ||
'),'
INTO _sql_val
FROM JSON_EACH(TO_JSON(_row)) val;
END LOOP;
_sql_val = REPLACE(_sql_val, '"', '''');
_sql_val = TRIM(TRAILING ',' FROM _sql_val);
_sql = '
INSERT INTO ' || _table_name || '
VALUES ' || _sql_val;
--RAISE NOTICE 'insert_into_from_refcursor(): SQL is: %', _sql;
IF _hasvalues THEN --to avoid error when trying to insert 0 values
EXECUTE (_sql);
END IF;
END;
$$;
Usage:
CREATE TABLE public.table1 (...);
PERFORM my_func_opening_refcursor();
PERFORM public.insert_into_from_refcursor('public.table1', 'name_of_refcursor_portal'::refcursor);
where my_func_opening_refcursor() contains
DECLARE
_ref refcursor = 'name_of_refcursor_portal';
OPEN _ref FOR
SELECT ...;

Create a function to get column from multiple tables in PostgreSQL

I'm trying to create a function to get a field value from multiple tables in my database. I made script like this:
CREATE OR REPLACE FUNCTION get_all_changes() RETURNS SETOF RECORD AS
$$
DECLARE
tblname VARCHAR;
tblrow RECORD;
row RECORD;
BEGIN
FOR tblrow IN SELECT tablename FROM pg_catalog.pg_tables WHERE schemaname='public' LOOP /*FOREACH tblname IN ARRAY $1 LOOP*/
RAISE NOTICE 'r: %', tblrow.tablename;
FOR row IN SELECT MAX("lastUpdate") FROM tblrow.tablename LOOP
RETURN NEXT row;
END LOOP;
END LOOP;
END
$$
LANGUAGE 'plpgsql' ;
SELECT get_all_changes();
But it is not working, everytime it shows this error
tblrow.tablename" not defined in line "FOR row IN SELECT MAX("lastUpdate") FROM tblrow.tablename LOOP"
Your inner FOR loop must use the FOR...EXECUTE syntax as shown in the manual:
FOR target IN EXECUTE text_expression [ USING expression [, ... ] ] LOOP
statements
END LOOP [ label ];
In your case something along this line:
FOR row IN EXECUTE 'SELECT MAX("lastUpdate") FROM ' || quote_ident(tblrow.tablename) LOOP
RETURN NEXT row;
END LOOP
The reason for this is explained in the manual somewhere else:
Oftentimes you will want to generate dynamic commands inside your PL/pgSQL functions, that is, commands that will involve different tables or different data types each time they are executed. PL/pgSQL's normal attempts to cache plans for commands (as discussed in Section 39.10.2) will not work in such scenarios. To handle this sort of problem, the EXECUTE statement is provided[...]
Answer to your new question (mislabeled as answer):
This can be much simpler. You do not need to create a table just do define a record type.
If at all, you would better create a type with CREATE TYPE, but that's only efficient if you need the type in multiple places. For just a single function, you can use RETURNS TABLE instead :
CREATE OR REPLACE FUNCTION get_all_changes(text[])
RETURNS TABLE (tablename text
,"lastUpdate" timestamp with time zone
,nums integer) AS
$func$
DECLARE
tblname text;
BEGIN
FOREACH tblname IN ARRAY $1 LOOP
RETURN QUERY EXECUTE format(
$f$SELECT '%I', MAX("lastUpdate"), COUNT(*)::int FROM %1$I
$f$, tblname)
END LOOP;
END
$func$ LANGUAGE plpgsql;
A couple more points:
Use RETURN QUERY EXECUTE instead of the nested loop. Much simpler and faster.
Column aliases would only serve as documentation, those names are discarded in favor of the names declared in the RETURNS clause (directly or indirectly).
Use format() with %I to replace the concatenation with quote_ident() and %1$I to refer to the same parameter another time.
count() usually returns type bigint. Cast the integer, since you defined the column in the return type as such: count(*)::int.
Thanks,
I finally made my script like:
CREATE TABLE IF NOT EXISTS __rsdb_changes (tablename text,"lastUpdate" timestamp with time zone, nums bigint);
CREATE OR REPLACE FUNCTION get_all_changes(varchar[]) RETURNS SETOF __rsdb_changes AS /*TABLE (tablename varchar(40),"lastUpdate" timestamp with time zone, nums integer)*/
$$
DECLARE
tblname VARCHAR;
tblrow RECORD;
row RECORD;
BEGIN
FOREACH tblname IN ARRAY $1 LOOP
/*RAISE NOTICE 'r: %', tblrow.tablename;*/
FOR row IN EXECUTE 'SELECT CONCAT('''|| quote_ident(tblname) ||''') AS tablename, MAX("lastUpdate") AS "lastUpdate",COUNT(*) AS nums FROM ' || quote_ident(tblname) LOOP
/*RAISE NOTICE 'row.tablename: %',row.tablename;*/
/*RAISE NOTICE 'row.lastUpdate: %',row."lastUpdate";*/
/*RAISE NOTICE 'row.nums: %',row.nums;*/
RETURN NEXT row;
END LOOP;
END LOOP;
RETURN;
END
$$
LANGUAGE 'plpgsql' ;
Well, it works. But it seems I can only create a table to define the return structure instead of just RETURNS SETOF RECORD. Am I right?
Thanks again.

postgresql copy with schema support

I'm trying to load some data from CSV using the postgresql COPY command. The trick is that I'd like to implement multi-tenancy on a userid (which is contained in the CSV). Is there an easy way to tell the postgres copy command to filter based on this userid when loading the csv?
i.e. all rows with userid=x go to schema=x, rows with userid=y go to schema=y.
There is not a way of doing this with just the COPY command, but you could copy all your data into a master table, and then put together a simple PL/PGSQL function that does this for you. Something like this -
CREATE OR REPLACE FUNCTION public.spike()
RETURNS void AS
$BODY$
DECLARE
user_id integer;
destination_schema text;
BEGIN
FOR user_id IN SELECT userid FROM master_table GROUP BY userid LOOP
CASE user_id
WHEN 1 THEN
destination_schema := 'foo';
WHEN 2 THEN
destination_schema := 'bar';
ELSE
destination_schema := 'baz';
END CASE;
EXECUTE 'INSERT INTO '|| destination_schema ||'.my_table SELECT * FROM master_table WHERE userid=$1' USING user_id;
-- EXECUTE 'DELETE FROM master_table WHERE userid=$1' USING user_id;
END LOOP;
TRUNCATE TABLE master_table;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;
This gets all unique user_ids from the master_table, uses a CASE statement to determine the destination schema, and then executes an INSERT SELECT to move rows, and finally deletes the moved rows.

I want to have my pl/pgsql script output to the screen

I have the following script that I want output to the screen from.
CREATE OR REPLACE FUNCTION randomnametest() RETURNS integer AS $$
DECLARE
rec RECORD;
BEGIN
FOR rec IN SELECT * FROM my_table LOOP
SELECT levenshtein('mystring',lower('rec.Name')) ORDER BY levenshtein;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
I want to get the output of the levenshein() function in a table along with the rec.Name. How would I do that? Also, it is giving me an error about the line where I call levenshtein(), saying that I should use perform instead.
Assuming that you want to insert the function's return value and the rec.name into a different table. Here is what you can do (create the table new_tab first)-
SELECT levenshtein('mystring',lower(rec.Name)) AS L_val;
INSERT INTO new_tab (L_val, rec.name);
The usage above is demonstrated below.
I guess, you can use RAISE INFO 'This is %', rec.name; to view the values.
CREATE OR REPLACE FUNCTION randomnametest() RETURNS integer AS $$
DECLARE
rec RECORD;
BEGIN
FOR rec IN SELECT * FROM my_table LOOP
SELECT levenshtein('mystring',lower(rec.Name))
AS L_val;
RAISE INFO '% - %', L_val, rec.name;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
Note- the FROM clause is optional in case you select from a function in a select like netxval(sequence_name) and don't have any actual table to select from i.e. like SELECT nextval(sequence_name) AS next_value;, in Oracle terms it would be SELECT sequence_name.nextval FROM dual; or SELECT function() FROM dual;. There is no dual in postgreSQL.
I also think that the ORDER BY is not necessary since my assumption would be that your function levenshtein() will most likely return only one value at any point of time, and hence wouldn't have enough data to ORDER.
If you want the output from a plpgsql function like the title says:
CREATE OR REPLACE FUNCTION randomnametest(_mystring text)
RETURNS TABLE (l_dist int, name text) AS
$BODY$
BEGIN
RETURN QUERY
SELECT levenshtein(_mystring, lower(t.name)), t.name
FROM my_table t
ORDER BY 1;
END;
$$ LANGUAGE plpgsql;
Declare the table with RETURNS TABLE.
Use RETURN QUERY to return records from the function.
Avoid naming conflicts between column names and OUT parameters (from the RETURNS TABLE clause) by table-qualifying column names in queries. OUT parameters are visible everywhere in the function body.
I made the string to compare to a parameter to the function to make this more useful.
There are other ways, but this is the most effective for the task. You need PostgreSQL 8.4 or later.
For a one-time use I would consider to just use a plain query (= function body without the RETURN QUERY above).