Deleting old rows in a schema using stored procedure - postgresql

Is there a way to select all tables in a schema and delete rows that do not fit the condition (older than some date) in one procedure? I can do the same thing using 2 separate queries,
It would look something like: SELECT table_name FROM information_schema.tables WHERE table_schema = 'schemaName' and then DELETE FROM table_name WHERE time < now()-'12 months'::interval;" but cannot put my head on how to do the same using one stored procedure, I assume I should use for loop on some type of select query, but since I never realy worked with loops in postgres I always get some type of exception trying to do this.
Any help appreciated a lot

You can typically combine both SELECT and DELETE statements using a CTE :
WITH list AS
( SELECT ...
FROM ...
WHERE ...
RETURNING ...
)
DELETE FROM ...
USING list
WHERE ...
Except the fact that in your case, the returned parameter of the SELECT statement corresponds to the name of the table where rows must be deleted. This means that the table name is not known before the run time, and this implies to use a dynamic sql command within a plpgsql FUNCTION :
CREATE OR REPLACE FUNCTION delete_rows_from_table(table_name text)
RETURNS void LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE E'
DELETE FROM ' || quote_ident(table_name) || E'
WHERE time < now()- \'12 months\':: interval' ;
END ;
$$ ;
Then you can use the FUNCTION delete_rows_from_table in your SELECT statement :
SELECT delete_rows_from_table(table_name)
FROM information_schema.tables
WHERE table_schema = 'schemaName'

Related

how to iterate in all schemas and find count from all tables present in all schemas with same table name for every 5mins?

imagine there are 5 schemas in my database and in every schema there is a common name table (ex:- table1) after every 5mins records get inserted in table1, how I can iterate in all schemas n calculate the count of table1[i have to automate the process so i am going to write the code in function and call that function after every 5mins using crontab].
Basically 2 options: Hard code schema.table and union the results. So something like:
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_or_rows integer)
language sql
as $$
select 'schema1', count(*) from schema1.table1 union all
select 'schema2', count(*) from schema2.table1 union all
select 'schema3', count(*) from schema3.table1 union all
...
select 'scheman', count(*) from scheman.table1;
$$;
The alternative being building the query dynamically from information_scheme.
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_of_rows bigint)
language plpgsql
as $$
declare
c_rows_count cursor is
select table_schema::text
from information_schema.tables
where table_name = 'table1';
l_tbl record;
l_sql_statement text = '';
l_connector text = '';
l_base_select text = 'select ''%s'', count(*) from %I.table1';
begin
for l_tbl in c_rows_count
loop
l_sql_statement = l_sql_statement ||
l_connector ||
format (l_base_select, l_tbl.table_schema, l_tbl.table_schema);
l_connector = ' union all ';
end loop;
raise notice E'Running Query: \n%', l_sql_statement;
return query execute l_sql_statement;
end;
$$;
Which is better. With few schema and few schema add/drop, opt for the first. It is direct and easily shows what you are doing. If you add/drop schema often then opt for the second. If you have many schema, but seldom add/drop them then modify the second to generate the first, save and schedule execution of the generated query.
NOTE: Not tested

Get PostgreSQL resultset column types without executing query using psycopg2

Given an arbitrary PostgreSQL query, e.g. SELECT * FROM (...) AS T, how can I get resultset column names and types WITHOUT actually executing the query using psycopg2 Python3 library?
I saw JDBC solution using getMetaData(), but I cannot figure out how to get that same information in psycopg2.
PreparedStatement pstmt = con.prepareStatement("select ... ");
ResultSetMetaData meta = pstmt.getMetaData();
for (int i=1; i <= meta.getColumnCount(); i++) { ... }
As far as I understand, that's not possible without executing(running cur.execute())
But, If you want a Postgres solution using a function that can be used by Psycopg2 as a query, you may use this solution. As you were expecting, this will not execute your query, it simply creates a temporary View which allows us to query it's metadata using the catalog information_schema.columns
CREATE OR REPLACE function define_query(query text)
RETURNS TABLE( column_name text,data_type text)
LANGUAGE plpgsql AS
$$
DECLARE
v_view_n TEXT := 'temp_view$';
BEGIN
EXECUTE format( 'CREATE OR REPLACE TEMP VIEW %I AS %s', v_view_n,query);
RETURN QUERY select i.column_name::text, i.data_type ::text
from information_schema.columns i where i.table_name = v_view_n;
END $$;
Once you've got this function, you can get the definition of any query by simply calling this function and not executing it.
knayak=# select * from define_query('select 1::int as a,''TWO''::text as b');
column_name | data_type
-------------+-----------
a | integer
b | text
(2 rows)
I think you have to execute something at least.
If you don't want any rows returned, you can query like select * from XXX where false. This query will return types of columns to client with 0 rows.

How to use variable as table name in plpgsql

I'm new to plpgsql. I'm trying to run a simple query in plpgsql using a variable as table name in plpgsql. But the variable is being interpreted as the table name instead of the value of the variable being interpreted as variable name.
DECLARE
v_table text;
z_table text;
max_id bigint;
BEGIN
FOR v_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'my_database'
AND table_schema = 'public'
AND table_name not like 'z_%'
LOOP
z_table := 'z_' || v_table;
SELECT max(id) from z_table INTO max_id;
DELETE FROM v_table where id > max_id;
END LOOP;
Some background information. For every table in my database, I have another table starting with "z_". E.g. for a table called "employee" I have identical table called "z_employee". z_employee contains the same set of data as employee. I use it to restore the employee table at the start of every test.
When I run this function I get the following error:
ERROR: relation "z_table" does not exist
LINE 1: SELECT max(id) from z_table
My guess is that I'm not allowed to use the variable z_table in the SQL query. At least not the way I'm using it here. But I don't know how it's supposed to be done.
Use dynamic SQL with EXECUTE, simplify, and escape identifiers properly:
CREATE OR REPLACE FUNCTION f_test()
RETURNS void AS
$func$
DECLARE
v_table text;
BEGIN
FOR v_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'my_database'
AND table_schema = 'public'
AND table_name NOT LIKE 'z_%'
LOOP
EXECUTE format('DELETE FROM %I v WHERE v.id > (SELECT max(id) FROM %I)'
, v_table, 'z_' || v_table);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Table names may need to be quoted to defend against syntax errors or even SQL injection! I use the convenient format() to concatenate the DELETE statement and escape identifiers properly.
A separate SELECT would be more expensive. You can do it all with a single DELETE statement.
Related:
Table name as a PostgreSQL function parameter
Aside:
You might use the (slightly faster) system catalog pg_tables instead:
SELECT tablename
FROM pg_catalog.pg_tables
WHERE schemaname = 'public'
AND tablename NOT LIKE 'z_%'
See:
How to check if a table exists in a given schema
table_catalog in information_schema.tables has no equivalent here. Only tables of the current database are visible anyway. So the above predicate WHERE table_catalog = 'my_database' produces an empty result set when connected to the wrong database.

Saving the output of a dynamic query that uses prepare statement into a table

In continuation to a previous case (a solution by #Erwin Brandstetter), in which a dynamic SELECT query that uses a 'prepare' statement was created and then was executed by calling it, as below:
--the function for making the prepare statement:
CREATE OR REPLACE FUNCTION f_prep_query (_tbl regclass, _prefix text)
RETURNS void AS
$func$
DECLARE
_prep_qry text := (
SELECT 'PREPARE stmt_dyn AS SELECT '
|| string_agg(quote_ident(attname), ',' ORDER BY attname)
|| ' FROM ' || _tbl
FROM pg_attribute
WHERE attrelid = _tbl
AND attname LIKE _prefix || '%'
AND attnum > 0
AND NOT attisdropped
);
BEGIN
EXECUTE _prep_qry;
EXCEPTION WHEN duplicate_prepared_statement THEN
DEALLOCATE stmt_dyn;
EXECUTE _prep_qry;
END
$func$ LANGUAGE plpgsql;
--the calling:
BEGIN; -- optional
SELECT f_prep_query('dkj_p_k27ac'::regclass, 'enri'::text);
EXECUTE stmt_dyn;
I would like to ask the following:
The desired output that we get from the indicated procedure is outputted into the DataOutput.
I would like to find a way to store the data into a new table in the db.
Generally, if you just want to write to a table, don't use a prepared SELECT statement (or a cursor). That's very inefficient for the purpose.
Write to the table directly like explained in the previous answer:
Saving the output of a dynamic query that uses refcursor into a table
The complete INSERT could be the prepared statement. But not CREATE TABLE AS. Per documentation:
Any SELECT, INSERT, UPDATE, DELETE, or VALUES statement.

How to execute a string result of a stored procedure in postgres

I have created the following stored procedure, which basically receives a name of table, and a prefix. The function then finds all columns that share this prefix and returns as an output a 'select' query command ('myoneliner').
as follows:
CREATE OR REPLACE FUNCTION mytext (mytable text, myprefix text)
RETURNS text AS $myoneliner$
declare
myoneliner text;
BEGIN
SELECT 'SELECT ' || substr(cols,2,length(cols)-2) ||' FROM '||mytable
INTO myoneliner
FROM (
SELECT array(
SELECT DISTINCT quote_ident(column_name::text)
FROM information_schema.columns
WHERE table_name = mytable
AND column_name LIKE myprefix||'%'
order by quote_ident
)::text cols
) sub;
RETURN myoneliner;
END;
$myoneliner$ LANGUAGE plpgsql;
Call:
select mytext('dkj_p_k27ac','enri');
As a result of running this stored procedure and the 'select' that is following it, I get the following output at the Data Output window (all within one cell, named "mytext text"):
'SELECT enrich_d_dkj_p_k27ac,enrich_lr_dkj_p_k27ac,enrich_r_dkj_p_k27ac
FROM dkj_p_k27ac'
I would like to basically be able to take the output command line that I received as an output and execute it. In other words, I would like to be able and execute the output of my stored procedure.
How can I do so?
I tried the following:
CREATE OR REPLACE FUNCTION mytext (mytable text, myprefix text)
RETURNS SETOF RECORD AS $$
declare
smalltext text;
myoneliner text;
BEGIN
SELECT 'SELECT ' || substr(cols,2,length(cols)-2) ||' FROM '||mytable
INTO myoneliner
FROM (
SELECT array(
SELECT DISTINCT quote_ident(column_name::text)
FROM information_schema.columns
WHERE table_name = mytable
AND column_name LIKE myprefix||'%'
order by quote_ident
)::text cols
) sub;
smalltext=lower(myoneliner);
raise notice '%','my additional text '||smalltext;
RETURN QUERY EXECUTE smalltext;
END;
$$ LANGUAGE plpgsql;
Call function:
SELECT * from mytext('dkj_p_k27ac','enri');
But I'm getting the following error message, could you please advise what should I change in order for it to execute?:
ERROR: a column definition list is required for functions returning "record"
LINE 26: SELECT * from mytext('dkj_p_k27ac','enri');
********** Error **********
ERROR: a column definition list is required for functions returning "record"
SQL state: 42601
Character: 728
Your first problem was solved by using dynamic SQL with EXECUTE like Craig advised.
But the rabbit hole goes deeper:
CREATE OR REPLACE FUNCTION myresult(mytable text, myprefix text)
RETURNS SETOF RECORD AS
$func$
DECLARE
smalltext text;
myoneliner text;
BEGIN
SELECT INTO myoneliner
'SELECT '
|| string_agg(quote_ident(column_name::text), ',' ORDER BY column_name)
|| ' FROM ' || quote_ident(mytable)
FROM information_schema.columns
WHERE table_name = mytable
AND column_name LIKE myprefix||'%'
AND table_schema = 'public'; -- schema name; might be another param
smalltext := lower(myoneliner); -- nonsense
RAISE NOTICE 'My additional text: %', myoneliner;
RETURN QUERY EXECUTE myoneliner;
END
$func$ LANGUAGE plpgsql;
Major points
Don't cast the whole statement to lower case. Column names might be double-quoted with upper case letters, which are case-sensitive in this case (no pun intended).
You don't need DISTINCT in the query on information_schema.columns. Column names are unique per table.
You do need to specify the schema, though (or use another way to single out one schema), or you might be mixing column names from multiple tables of the same name in multiple schemas, resulting in nonsense.
You must sanitize all identifiers in dynamic code - including table names: quote_ident(mytable). Be aware that your text parameter to the function is case sensitive! The query on information_schema.columns requires that, too.
I untangled your whole construct to build the list of column names with string_agg() instead of the array constructor. Related answer:
Update multiple columns that start with a specific string
The assignment operator in plpgsql is :=.
Simplified syntax of RAISE NOTICE.
Core problem impossible to solve
All of this still doesn't solve your main problem: SQL demands a definition of the columns to be returned. You can circumvent this by returning anonymous records like you tried. But that's just postponing the inevitable. Now you have to provide a column definition list at call time, just like your error message tells you. But you just don't know which columns are going to be returned. Catch 22.
Your call would work like this:
SELECT *
FROM myresult('dkj_p_k27ac','enri') AS f (
enrich_d_dkj_p_k27ac text -- replace with actual column types
, enrich_lr_dkj_p_k27ac text
, enrich_r_dkj_p_k27ac text);
But you don't know number, names (optional) and data types of returned columns, not at creation time of the function and not even at call time. It's impossible to do exactly that in a single call. You need two separate queries to the database.
You could return all columns of any given table dynamically with a function using polymorphic types, because there is a well defined type for the whole table. Last chapter of this related answer:
Refactor a PL/pgSQL function to return the output of various SELECT queries