Stored procedure in Redshift to use string replacement with unload - amazon-redshift

I want to achieve something among the following lines:
create or replace procedure my_sp_name(p_date date) as
$$
begin
unload ('
select *
from my_table
where my_date = ''DATE_GOES_HERE''
')
to 's3://my-bucket/some_key/DATE_GOES_HERE/some_file'
iam_role 'my_iam_role'
allowoverwrite csv parallel off;
end ;
$$
language plpgsql;
I have been trying different things, however, can't seem to find what is the correct way to to do the string replacement in both parts of the unload. What is the correct way of doing this?

You need to EXECUTE "SQL statement" in the procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html under "dynamic SQL" section.

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(
You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.
As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement
Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

How to create a stored procedure including the "SELECT" in Oracle SQL Developer?

I am novice user in oracle and well I am creating a stored procedure to display data from a table, because my teaching process requires it. At first I ran my query follows.
Create or replace procedure p_ mostrar
Is
Begin
Select ID_MODULO, NOMBRE, URL, ESTADO, ICONO FROM MODULO WHERE ESTADO=1 ;
Commit;
End p_mostrar;
And he throws me the following error:
The judgment was expected INTO" After some research changed the syntax and run it as follows:
Create or replace procedure p_ mostrar (C1 out sys_refcursor)
Is
Begin Open C1 for Select ID_MODULO, NOMBRE, URL, ESTADO, ICONO
FROM MODULO
WHERE ESTADO=1 ;
Commit;
End p_mostrar;
And I think runs correctly. But now it does not know how to run the procedure. I thank you in advance and expect a prompt response. Remember, I'm learning with Oracle SQL Developer.
When you are dealing with select statement inside a stored procedure, you need to include INTO clause to the select statement to store the values in a variable. Try this
CREATE OR REPLACE PROCEDURE p_mostrar
IS
v_id_modulo modulo.id_modulo%TYPE;
v_nombre modulo.nombre%TYPE;
v_url modulo.url%TYPE;
v_estado modulo.estado%TYPE;
v_icono modulo.icono%TYPE;
BEGIN
SELECT id_modulo, nombre, url, estado, icono
INTO v_id_modulo, v_nombre, v_url, v_estado, v_icono --needed to catch the values selected and store it to declared variables
FROM modulo
WHERE estado=1 ;
Commit; -- i dont think this is necessary, you use commit statement only when you use DML statements to manage the changes made
dbms_output.put_line(v_id_modulo||' '|| v_nombre||' '||v_url||' '||v_estado||' '||v_icono); --used to display the values stored in the variables
END;
This should work:
var result refcursor
execute p_ mostrar(:result)

Use function variable in dynamic COPY statement

According to docs of PostgreSQL it is possible to copy data to csv file right from a query without using an intermediate table. I am curious how to do that.
CREATE OR REPLACE FUNCTION m_tbl(my_var integer)
RETURNS void AS
$BODY$
DECLARE
BEGIN
COPY (
select my_var
)
TO 'c:/temp/out.csv';
END;
$$ LANGUAGE plpgsql;
I get an error: no such column 'my_var'.
Yes, it is possible to COPY from any query, whether or not it refers to a table.
However, COPY is a non-plannable statement, a utility statement. It doesn't support query parameters - and query parameters are how PL/PgSQL implements the insertion of variables into statements.
So you can't use PL/PgSQL variables with COPY.
You must instead use dynamic SQL with EXECUTE. See the Pl/PgSQL documentation for examples. There are lots of examples here on Stack Overflow and on https://dba.stackexchange.com/ too.
Something like:
EXECUTE format('
COPY (
select %L
)
TO ''c:/temp/out.csv'';
', my_var);
The same applies if you want the file path to be dynamic - you'd use:
EXECUTE format('
COPY (
select %L
)
TO %L;
', my_var, 'file_name.csv');
It also works for dynamic column names but you would use %I (for identifier, like "my_name") instead of %L for literal like 'my_value'. For details on %I and %L, see the documentation for format.

How to get postgres (8.4) query results with unknown columns

Edit: After posting I found Erwin Brandstetter's answer to a similar question. It sounds like in 9.2+ I could use the last option he listed, but none of the other alternatives sound workable for my situation. However, the comment from Jakub Kania and reiterated by Craig Ringer suggesting I use COPY, or \copy, in psql appears to solve my problem.
My goal is to get the results of executing a dynamically created query into a text file.
The names and number of columns are unknown; the query generated at run time is a 'pivot' one, and the names of columns in the SELECT list are taken from values stored in the database.
What I envision is being able, from the command line to run:
$ psql -o "myfile.txt" -c "EXECUTE mySQLGeneratingFuntion(param1, param2)"
But what I'm finding is that I can't get results from an EXECUTEd query unless I know the number of columns and their types that are in the results of the query.
create or replace function carrier_eligibility.createSQL() returns varchar AS
$$
begin
return 'SELECT * FROM carrier_eligibility.rule_result';
-- actual procedure writes a pivot query whose columns aren't known til run time
end
$$ language plpgsql
create or replace function carrier_eligibility.RunSQL() returns setof record AS
$$
begin
return query EXECUTE carrier_eligibility.createSQL();
end
$$ language plpgsql
-- this works, but I want to be able to get the results into a text file without knowing
-- the number of columns
select * from carrier_eligibility.RunSQL() AS (id int, uh varchar, duh varchar, what varchar)
Using psql isn't a requirement. I just want to get the results of the query into a text file, with the column names in the first row.
What format of a text file do you want? Something like csv?
How about something like this:
CREATE OR REPLACE FUNCTION sql_to_csv(in_sql text) returns setof text
SECURITY INVOKER -- CRITICAL DO NOT CHANGE THIS TO SECURITY DEFINER
LANGUAGE PLPGSQL AS
$$
DECLARE t_row RECORD;
t_out text;
BEGIN
FOR t_row IN EXECUTE in_sql LOOP
t_out := t_row::text;
t_out := regexp_replace(regexp_replace(t_out, E'^\\(', ''), E'\\)$', '');
return next t_out;
END LOOP;
END;
$$;
This should create properly quoted csv strings without the header. Embedded newlines may be a problem but you could write a quick Perl script to connect and write the data or something.
Note this presumes that the tuple structure (parenthesized csv) does not change with future versions, but it currently should work with 8.4 at least through 9.2.

dynamic sql query in postgres

I was attempting to use Dynamic SQL to run some queries in postgres.
Example:
EXECUTE format('SELECT * from result_%s_table', quote_ident((select id from ids where condition = some_condition)))
I have to query a table, which is of the form result_%s_table wherein, I need to substitute the correct table name (an id) from an another table.
I get the error ERROR: prepared statement "format" does not exist
Link: string substitution with query result postgresql
EXECUTE ... USING only works in PL/PgSQL - ie within functions or DO blocks written in the PL/PgSQL language. It does not work in plain SQL; the EXECUTE in plain SQL is completely different, for executing prepared statements. You cannot use dynamic SQL directly in PostgreSQL's SQL dialect.
Compare:
PL/PgSQL's EXECUTE ... USING; to
SQL's EXECUTE
See the 2nd last par in my prior answer.
In addition to not running except in PL/PgSQL your SQL statement is wrong, it won't do what you expect. If (select id from ids where condition = some_condition) returns say 42, the statement would fail if id is an integer. If it's cast to text you'd get:
EXECUTE format('SELECT * from result_%s_table', quote_ident('42'));
EXECUTE format('SELECT * from result_%s_table', '"42"');
EXECUTE 'SELECT * from result_"42"_table';
That's invalid. You actually want result_42_table or "result_42_table". You'd have to write something more like:
EXECUTE format('SELECT * from %s', quote_ident('result_'||(select id from ids where condition = some_condition)||'_table'))
... if you must use quote_ident.
CREATE OR REPLACE FUNCTION public.exec(
text)
RETURNS SETOF RECORD
LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
RETURN QUERY EXECUTE $1 ;
END
$BODY$;
usage:
select * from exec('select now()') as t(dt timestamptz)
Try using
RETURN QUERY EXECUTE '<SQL Command>'
This will return data into form of table. You have to use this into stored function of PostgreSQL.
I have already created on full demonstration on custom filter and custom sorting using dynamic query of PostgreSQL.
Please visit this url:
http://www.dbrnd.com/2015/05/postgresql-dynamic-sql/
These all look more complicated than the OP's question. A different formatting should do the trick.. but it could absolutely the case that I don't understand.
From how I read OP's question, I think others in a similar situation may benefit from how I got it.
I am using Postgre on Redshift, and I ran into this issue and found a solution.
I was trying to create a dynamic query, putting in my own date.
date = dt.date(2018, 10, 30)
query = ''' select * from table where date >= ''' + str(my_date) + ''' order by date '''
But, the query entirely ignores the condition when typing it this way.
However, if you use the percent sign (%), you can insert the date correctly.
One correct way to write the above statement is:
query = ''' select * from table where date >= ''' + ''' '%s' ''' % my_date + ''' order by date '''
So, maybe this is helpful, or maybe it is not. I hope it helps at least one person in my situation!
Best wishes.
EXECUTE will work only on pl/pqsql environment.
instead of EXECUTE try with SELECT
SELECT format('SELECT * from result_%s_table', quote_ident((select id from ids where condition = some_condition))
output would be the dynamic query.