how to iterate in all schemas and find count from all tables present in all schemas with same table name for every 5mins? - postgresql

imagine there are 5 schemas in my database and in every schema there is a common name table (ex:- table1) after every 5mins records get inserted in table1, how I can iterate in all schemas n calculate the count of table1[i have to automate the process so i am going to write the code in function and call that function after every 5mins using crontab].

Basically 2 options: Hard code schema.table and union the results. So something like:
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_or_rows integer)
language sql
as $$
select 'schema1', count(*) from schema1.table1 union all
select 'schema2', count(*) from schema2.table1 union all
select 'schema3', count(*) from schema3.table1 union all
...
select 'scheman', count(*) from scheman.table1;
$$;
The alternative being building the query dynamically from information_scheme.
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_of_rows bigint)
language plpgsql
as $$
declare
c_rows_count cursor is
select table_schema::text
from information_schema.tables
where table_name = 'table1';
l_tbl record;
l_sql_statement text = '';
l_connector text = '';
l_base_select text = 'select ''%s'', count(*) from %I.table1';
begin
for l_tbl in c_rows_count
loop
l_sql_statement = l_sql_statement ||
l_connector ||
format (l_base_select, l_tbl.table_schema, l_tbl.table_schema);
l_connector = ' union all ';
end loop;
raise notice E'Running Query: \n%', l_sql_statement;
return query execute l_sql_statement;
end;
$$;
Which is better. With few schema and few schema add/drop, opt for the first. It is direct and easily shows what you are doing. If you add/drop schema often then opt for the second. If you have many schema, but seldom add/drop them then modify the second to generate the first, save and schedule execution of the generated query.
NOTE: Not tested

Related

Function to return dynamic set of columns for given table

I have a fields table to store column information for other tables:
CREATE TABLE public.fields (
schema_name varchar(100),
table_name varchar(100),
column_text varchar(100),
column_name varchar(100),
column_type varchar(100) default 'varchar(100)',
column_visible boolean
);
And I'd like to create a function to fetch data for a specific table.
Just tried sth like this:
create or replace function public.get_table(schema_name text,
table_name text,
active boolean default true)
returns setof record as $$
declare
entity_name text default schema_name || '.' || table_name;
r record;
begin
for r in EXECUTE 'select * from ' || entity_name loop
return next r;
end loop;
return;
end
$$
language plpgsql;
With this function I have to specify columns when I call it!
select * from public.get_table('public', 'users') as dept(id int, uname text);
I want to pass schema_name and table_name as parameters to function and get record list, according to column_visible field in public.fields table.
Solution for the simple case
As explained in the referenced answers below, you can use registered (row) types, and thus implicitly declare the return type of a polymorphic function:
CREATE OR REPLACE FUNCTION public.get_table(_tbl_type anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE format('TABLE %s', pg_typeof(_tbl_type));
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.get_table(NULL::public.users); -- note the syntax!
Returns the complete table (with all user columns).
Wait! How?
Detailed explanation in this related answer, chapter
"Various complete table types":
Refactor a PL/pgSQL function to return the output of various SELECT queries
TABLE foo is just short for SELECT * FROM foo:
Is there a shortcut for SELECT * FROM?
2 steps for completely dynamic return type
But what you are trying to do is strictly impossible in a single SQL command.
I want to pass schema_name and table_name as parameters to function and get record list, according to column_visible field in
public.fields table.
There is no direct way to return an arbitrary selection of columns (return type not known at call time) from a function - or any SQL command. SQL demands to know number, names and types of resulting columns at call time. More in the 2nd chapter of this related answer:
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
There are various workarounds. You could wrap the result in one of the standard document types (json, jsonb, hstore, xml).
Or you generate the query with one function call and execute the result with the next:
CREATE OR REPLACE FUNCTION public.generate_get_table(_schema_name text, _table_name text)
RETURNS text AS
$func$
SELECT format('SELECT %s FROM %I.%I'
, string_agg(quote_ident(column_name), ', ')
, schema_name
, table_name)
FROM fields
WHERE column_visible
AND schema_name = _schema_name
AND table_name = _table_name
GROUP BY schema_name, table_name
ORDER BY schema_name, table_name;
$func$ LANGUAGE sql;
Call:
SELECT public.generate_get_table('public', 'users');
This create a query of the form:
SELECT usr_id, usr FROM public.users;
Execute it in the 2nd step. (You might want to add column numbers and order columns.)
Or append \gexec in psql to execute the return value immediately. See:
How to force evaluation of subquery before joining / pushing down to foreign server
Be sure to defend against SQL injection:
INSERT with dynamic table name in trigger function
Define table and column names as arguments in a plpgsql function?
Asides
varchar(100) does not make much sense for identifiers, which are limited to 63 characters in standard Postgres:
Maximum characters in labels (table names, columns etc)
If you understand how the object identifier type regclass works, you might replace schema and table name with a singe regclass column.
I think you just need another query to get the list of columns you want.
Maybe something like (this is untested):
create or replace function public.get_table(_schema_name text, _table_name text, active boolean default true) returns setof record as $$
declare
entity_name text default schema_name || '.' || table_name;
r record;
columns varchar;
begin
-- Get the list of columns
SELECT string_agg(column_name, ', ')
INTO columns
FROM public.fields
WHERE fields.schema_name = _schema_name
AND fields.table_name = _table_name
AND fields.column_visible = TRUE;
-- Return rows from the specified table
RETURN QUERY EXECUTE 'select ' || columns || ' from ' || entity_name;
RETURN;
end
$$
language plpgsql;
Keep in mind that column/table references may need to be surrounded by double quotes if they have certain characters in them.

Perform query using tables and columns from information_schema

I'm trying to using information_schema.columns to find all of the columns in my database that has a geometry type and then check the SRID for the data in those columns.
I can do this with multiple queries where I first find the table names and column names
SELECT table_name, column_name
FROM information_schema.columns
WHERE udt_name = 'geometry';
and then (manually)
SELECT ST_SRID(column_name)
FROM table_name;
for each entry.
Does anyone how to streamline this into a single query?
Table names can't be variable; Postgres needs to be able to come up with an execution plan before it knows the parameter values. So you can't do this in a simple SQL statement.
Instead, you need to construct a dynamic query string using a procedural language like PL/pgSQL:
CREATE FUNCTION SRIDs() RETURNS TABLE (
tablename TEXT,
columnname TEXT,
srid INTEGER
) AS $$
BEGIN
FOR tablename, columnname IN (
SELECT table_name, column_name
FROM information_schema.columns
WHERE udt_name = 'geometry'
)
LOOP
EXECUTE format(
'SELECT ST_SRID(%s) FROM %s',
columnname, tablename
) INTO srid;
RETURN NEXT;
END LOOP;
END
$$
LANGUAGE plpgsql;
SELECT * FROM SRIDs();

Postgresql 9.1 select from all schemas

I have a Postgresql 9.1 database with couple hundred schemas. All have same structure, just different data. I need to perform a select on a table and get data from each schema. Unfortunately I haven't found a decent way to do it.
I tried setting the search path to schema_1,schema_2, etc and then perform a select on the table but it only selects data from the first schema.
The only way I managed to do it so far is by generating a big query like:
select * from schema_1.table
union
select * from schema_2.table
union
(...another 100 lines....)
Is there any other way to do this in a more reasonable fashion? If this is not possible, can I at least find out which of the schemas has records in that table without performing this select?
Different schemas mean different tables, so if you have to stick to this structure, it'll mean unions, one way or the other. That can be pretty expensive. If you're after partitioning through the convenience of search paths, it might make sense to reverse your schema:
Store a big table in the public schema, and then provision views in each of the individual schemas.
Check out this sqlfiddle that demonstrates my concept:
http://sqlfiddle.com/#!12/a326d/1
Also pasted inline for posterity, in case sqlfiddle is inaccessible:
Schema:
CREATE SCHEMA customer_1;
CREATE SCHEMA customer_2;
CREATE TABLE accounts(id serial, name text, value numeric, customer_id int);
CREATE INDEX ON accounts (customer_id);
CREATE VIEW customer_1.accounts AS SELECT id, name, value FROM public.accounts WHERE customer_id = 1;
CREATE VIEW customer_2.accounts AS SELECT id, name, value FROM public.accounts WHERE customer_id = 2;
INSERT INTO accounts(name, value, customer_id) VALUES('foo', 100, 1);
INSERT INTO accounts(name, value, customer_id) VALUES('bar', 100, 1);
INSERT INTO accounts(name, value, customer_id) VALUES('biz', 150, 2);
INSERT INTO accounts(name, value, customer_id) VALUES('baz', 75, 2);
Queries:
SELECT SUM(value) FROM public.accounts;
SET search_path TO 'customer_1';
SELECT * FROM accounts;
SET search_path TO 'customer_2';
SELECT * FROM accounts;
Results:
425
1 foo 100
2 bar 100
3 biz 150
4 baz 75
If you have to know some about data in tables, you have to do SELECT. There is no any other way. Schema is just logical addressing - for your case is important, so you use lot of tables, and you have to do massive UNION.
search_path works as expected. It has no meaning - return data from mentioned schemes, but it specify a order for searching not fully qualified table. Searching ends on first table, that has requested name.
Attention: massive unions can require lot of memory.
you can use a dynamic SQL and stored procedures with temp table:
postgres=# DO $$
declare r record;
begin
drop table if exists result;
create temp table result as select * from x.a limit 0; -- first table;
for r in select table_schema, table_name
from information_schema.tables
where table_name = 'a'
loop
raise notice '%', r;
execute format('insert into result select * from %I.%I',
r.table_schema,
r.table_name);
end loop;
end; $$;
result:
NOTICE: (y,a)
NOTICE: (x,a)
DO
postgres=# select * from result;
a
----
1
2
3
4
5
..
Here's one approach. You will need to pre-feed it all the schema names you are targeting. You could change this to just loop through all the schemas as Pavel shows if you know you want every schema. In my example I have three schemas that I care about each containing a table called bar. The logic will run a select on each schema's bar table and insert the value into a result table. At the end you have a table with all the data from all the tables. You could change this to update, delete, or do DDL. I chose to keep it simple and just collect the data from each table in each schema.
--START SETUP AKA Run This Section Once
create table schema3.bar(bar_id SERIAL PRIMARY KEY,
bar_name VARCHAR(50) NOT NULL);
insert into schema1.bar(bar_name) select 'One';
insert into schema2.bar(bar_name) select 'Two';
insert into schema3.bar(bar_name) select 'Three';
--END SETUP
DO $$
declare r record;
DECLARE l_id INTEGER = 1;
DECLARE l_schema_name TEXT;
begin
drop table if exists public.result;
create table public.result (bar_id INTEGER, bar_name TEXT);
drop table if exists public.schemas;
create table public.schemas (id serial PRIMARY KEY, schema_name text NOT NULL);
INSERT INTO public.schemas(schema_name)
VALUES ('schema1'),('schema2'),('schema3');
for r in select *
from public.schemas
loop
raise notice '%', r;
SELECT schema_name into l_schema_name
FROM public.schemas
WHERE id = l_id;
raise notice '%', l_schema_name;
EXECUTE 'set search_path TO ' || l_schema_name;
EXECUTE 'INSERT into public.result(bar_id, bar_name) select bar_id, bar_name from ' || l_schema_name || '.bar';
l_id = l_id + 1;
end loop;
end; $$;
--DEBUG
select * from schema1.bar;
select * from schema2.bar;
select * from schema3.bar;
select * from public.result;
select * from public.schemas;
--CLEANUP
--DROP TABLE public.result;
--DROP TABLE public.schemas;

PostgreSQL equivalent of Oracle "bulk collect"

In PostgreSQL exists some ways to make a statement using bulk collect into like in Oracle?
Example in Oracle:
create or replace procedure prc_tst_bulk_test is
type typ_person is table of tb_person%rowtype;
v_tb_person typ_person;
begin
select *
bulk collect into v_tb_person
from tb_person;
-- make a selection in v_tb_person, for instance
select name, count(*) from v_tb_person where age > 50
union
select name, count(*) from v_tb_person where gender = 1
end;
In PostgreSQL 10 you can use array_agg:
declare
v_ids int[];
begin
select array_agg(id) INTO v_ids
from mytable1
where host = p_host;
--use v_ids...
end;
You'll have array and it can be used to make select from it using unnest:
select * from unnest(v_ids) where ...
There is no such syntax in PostgreSQL, nor a close functional equivalent.
You can create a temporary table in your PL/PgSQL code and use that for the desired purpose. Temp tables in PL/PgSQL are a little bit annoying because the names are global within the session, but they work correctly in PostgreSQL 8.4 and up.
A better alternative for when you're doing all the work within a single SQL statement is to use a common table expression (CTE, or WITH query). This won't be suitable for all situations.
The example above would be much better solved by a simple RETURN QUERY in PL/PgSQL, but I presume your real examples are more complex.
Assuming that tb_person is some kind of expensive-to-generate view that you don't just want to scan in each branch of the union, you could do something like:
CREATE OR REPLACE FUNCTION prc_tst_bulk()
RETURNS TABLE (name text, rowcount integer) AS
$$
BEGIN
RETURN QUERY
WITH v_tb_person AS (SELECT * FROM tb_person)
select name, count(*) from v_tb_person where age > 50
union
select name, count(*) from v_tb_person where gender = 1;
END;
$$ LANGUAGE plpgsql;
This particular case can be further simplified into a plain SQL function:
CREATE OR REPLACE FUNCTION prc_tst_bulk()
RETURNS TABLE (name text, rowcount integer) AS
$$
WITH v_tb_person AS (SELECT * FROM tb_person)
select name, count(*) from v_tb_person where age > 50
union
select name, count(*) from v_tb_person where gender = 1;
$$ LANGUAGE sql;
You can use a PostgreSQL arrays too - it is similar to Oracle's collections:
postgres=# create table _foo(a int, b int);
CREATE TABLE
postgres=# insert into _foo values(10,20);
INSERT 0 1
postgres=# create or replace function multiply()
returns setof _foo as $$
/*
* two tricks are here
* table name can be used as type name
* table name can be used as fictive column that packs all fields
*/
declare a _foo[] = (select array(select _foo from _foo));
begin
return query select * from unnest(a)
union
all select * from unnest(a);
end;
$$ language plpgsql;
CREATE FUNCTION
postgres=# select * from multiply();
a | b
----+----
10 | 20
10 | 20
(2 rows)
But in your case Craig Ringer's proposal is perfect and should be preferable.
-- Fetch the next 5 rows in the cursor_01:
FETCH FORWARD 5 FROM cursor_01;
PostgreSQL 10+ works.
https://www.postgresql.org/docs/10/sql-fetch.html

postgresql copy with schema support

I'm trying to load some data from CSV using the postgresql COPY command. The trick is that I'd like to implement multi-tenancy on a userid (which is contained in the CSV). Is there an easy way to tell the postgres copy command to filter based on this userid when loading the csv?
i.e. all rows with userid=x go to schema=x, rows with userid=y go to schema=y.
There is not a way of doing this with just the COPY command, but you could copy all your data into a master table, and then put together a simple PL/PGSQL function that does this for you. Something like this -
CREATE OR REPLACE FUNCTION public.spike()
RETURNS void AS
$BODY$
DECLARE
user_id integer;
destination_schema text;
BEGIN
FOR user_id IN SELECT userid FROM master_table GROUP BY userid LOOP
CASE user_id
WHEN 1 THEN
destination_schema := 'foo';
WHEN 2 THEN
destination_schema := 'bar';
ELSE
destination_schema := 'baz';
END CASE;
EXECUTE 'INSERT INTO '|| destination_schema ||'.my_table SELECT * FROM master_table WHERE userid=$1' USING user_id;
-- EXECUTE 'DELETE FROM master_table WHERE userid=$1' USING user_id;
END LOOP;
TRUNCATE TABLE master_table;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;
This gets all unique user_ids from the master_table, uses a CASE statement to determine the destination schema, and then executes an INSERT SELECT to move rows, and finally deletes the moved rows.