Postgresql 9.1 select from all schemas - postgresql

I have a Postgresql 9.1 database with couple hundred schemas. All have same structure, just different data. I need to perform a select on a table and get data from each schema. Unfortunately I haven't found a decent way to do it.
I tried setting the search path to schema_1,schema_2, etc and then perform a select on the table but it only selects data from the first schema.
The only way I managed to do it so far is by generating a big query like:
select * from schema_1.table
union
select * from schema_2.table
union
(...another 100 lines....)
Is there any other way to do this in a more reasonable fashion? If this is not possible, can I at least find out which of the schemas has records in that table without performing this select?

Different schemas mean different tables, so if you have to stick to this structure, it'll mean unions, one way or the other. That can be pretty expensive. If you're after partitioning through the convenience of search paths, it might make sense to reverse your schema:
Store a big table in the public schema, and then provision views in each of the individual schemas.
Check out this sqlfiddle that demonstrates my concept:
http://sqlfiddle.com/#!12/a326d/1
Also pasted inline for posterity, in case sqlfiddle is inaccessible:
Schema:
CREATE SCHEMA customer_1;
CREATE SCHEMA customer_2;
CREATE TABLE accounts(id serial, name text, value numeric, customer_id int);
CREATE INDEX ON accounts (customer_id);
CREATE VIEW customer_1.accounts AS SELECT id, name, value FROM public.accounts WHERE customer_id = 1;
CREATE VIEW customer_2.accounts AS SELECT id, name, value FROM public.accounts WHERE customer_id = 2;
INSERT INTO accounts(name, value, customer_id) VALUES('foo', 100, 1);
INSERT INTO accounts(name, value, customer_id) VALUES('bar', 100, 1);
INSERT INTO accounts(name, value, customer_id) VALUES('biz', 150, 2);
INSERT INTO accounts(name, value, customer_id) VALUES('baz', 75, 2);
Queries:
SELECT SUM(value) FROM public.accounts;
SET search_path TO 'customer_1';
SELECT * FROM accounts;
SET search_path TO 'customer_2';
SELECT * FROM accounts;
Results:
425
1 foo 100
2 bar 100
3 biz 150
4 baz 75

If you have to know some about data in tables, you have to do SELECT. There is no any other way. Schema is just logical addressing - for your case is important, so you use lot of tables, and you have to do massive UNION.
search_path works as expected. It has no meaning - return data from mentioned schemes, but it specify a order for searching not fully qualified table. Searching ends on first table, that has requested name.
Attention: massive unions can require lot of memory.
you can use a dynamic SQL and stored procedures with temp table:
postgres=# DO $$
declare r record;
begin
drop table if exists result;
create temp table result as select * from x.a limit 0; -- first table;
for r in select table_schema, table_name
from information_schema.tables
where table_name = 'a'
loop
raise notice '%', r;
execute format('insert into result select * from %I.%I',
r.table_schema,
r.table_name);
end loop;
end; $$;
result:
NOTICE: (y,a)
NOTICE: (x,a)
DO
postgres=# select * from result;
a
----
1
2
3
4
5
..

Here's one approach. You will need to pre-feed it all the schema names you are targeting. You could change this to just loop through all the schemas as Pavel shows if you know you want every schema. In my example I have three schemas that I care about each containing a table called bar. The logic will run a select on each schema's bar table and insert the value into a result table. At the end you have a table with all the data from all the tables. You could change this to update, delete, or do DDL. I chose to keep it simple and just collect the data from each table in each schema.
--START SETUP AKA Run This Section Once
create table schema3.bar(bar_id SERIAL PRIMARY KEY,
bar_name VARCHAR(50) NOT NULL);
insert into schema1.bar(bar_name) select 'One';
insert into schema2.bar(bar_name) select 'Two';
insert into schema3.bar(bar_name) select 'Three';
--END SETUP
DO $$
declare r record;
DECLARE l_id INTEGER = 1;
DECLARE l_schema_name TEXT;
begin
drop table if exists public.result;
create table public.result (bar_id INTEGER, bar_name TEXT);
drop table if exists public.schemas;
create table public.schemas (id serial PRIMARY KEY, schema_name text NOT NULL);
INSERT INTO public.schemas(schema_name)
VALUES ('schema1'),('schema2'),('schema3');
for r in select *
from public.schemas
loop
raise notice '%', r;
SELECT schema_name into l_schema_name
FROM public.schemas
WHERE id = l_id;
raise notice '%', l_schema_name;
EXECUTE 'set search_path TO ' || l_schema_name;
EXECUTE 'INSERT into public.result(bar_id, bar_name) select bar_id, bar_name from ' || l_schema_name || '.bar';
l_id = l_id + 1;
end loop;
end; $$;
--DEBUG
select * from schema1.bar;
select * from schema2.bar;
select * from schema3.bar;
select * from public.result;
select * from public.schemas;
--CLEANUP
--DROP TABLE public.result;
--DROP TABLE public.schemas;

Related

how to iterate in all schemas and find count from all tables present in all schemas with same table name for every 5mins?

imagine there are 5 schemas in my database and in every schema there is a common name table (ex:- table1) after every 5mins records get inserted in table1, how I can iterate in all schemas n calculate the count of table1[i have to automate the process so i am going to write the code in function and call that function after every 5mins using crontab].
Basically 2 options: Hard code schema.table and union the results. So something like:
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_or_rows integer)
language sql
as $$
select 'schema1', count(*) from schema1.table1 union all
select 'schema2', count(*) from schema2.table1 union all
select 'schema3', count(*) from schema3.table1 union all
...
select 'scheman', count(*) from scheman.table1;
$$;
The alternative being building the query dynamically from information_scheme.
create or replace function count_rows_in_each_table1()
returns table (schema_name text, number_of_rows bigint)
language plpgsql
as $$
declare
c_rows_count cursor is
select table_schema::text
from information_schema.tables
where table_name = 'table1';
l_tbl record;
l_sql_statement text = '';
l_connector text = '';
l_base_select text = 'select ''%s'', count(*) from %I.table1';
begin
for l_tbl in c_rows_count
loop
l_sql_statement = l_sql_statement ||
l_connector ||
format (l_base_select, l_tbl.table_schema, l_tbl.table_schema);
l_connector = ' union all ';
end loop;
raise notice E'Running Query: \n%', l_sql_statement;
return query execute l_sql_statement;
end;
$$;
Which is better. With few schema and few schema add/drop, opt for the first. It is direct and easily shows what you are doing. If you add/drop schema often then opt for the second. If you have many schema, but seldom add/drop them then modify the second to generate the first, save and schedule execution of the generated query.
NOTE: Not tested

How can i Alter TABLE all column from not null to null by one query of PostgreSQL 9.1

I migrate my data from MySQL Database to PostgreSQL Database and by mistaken i have all my column set not-null in PostgreSQL database.
After this i am facing issue for inserting data and have to uncheck not null manually, so is there any way to do for all column in table ( except id(PRIMARY KEY) ).
i have this query also for single column but its also time consuming,
ALTER TABLE <table name> ALTER COLUMN <column name> DROP NOT NULL;
I don't think there is built in functionality for this. But it's easy to write a function to do this. For example:
CREATE OR REPLACE FUNCTION set_nullable(relation TEXT)
RETURNS VOID AS
$$
DECLARE
rec RECORD;
BEGIN
FOR rec IN (SELECT * FROM pg_attribute
WHERE attnotnull = TRUE AND attrelid=relation::regclass::oid AND attnum > 0 AND attname != 'id')
LOOP
EXECUTE format('ALTER TABLE %s ALTER COLUMN %s DROP NOT NULL', relation, rec.attname);
RAISE NOTICE 'Set %.% nullable', relation, rec.attname;
END LOOP;
END
$$
LANGUAGE plpgsql;
Use it like this:
SELECT set_nullable('my_table');
Just perform a new CREATE TABLE noNULL using the same structure as your current table and modify the NULLable field you need.
Then insert from old table
INSERT INTO noNULL
SELECT *
FROM oldTable
then delete oldTable and rename noNull -> oldTable

PostgreSQL equivalent of Oracle "bulk collect"

In PostgreSQL exists some ways to make a statement using bulk collect into like in Oracle?
Example in Oracle:
create or replace procedure prc_tst_bulk_test is
type typ_person is table of tb_person%rowtype;
v_tb_person typ_person;
begin
select *
bulk collect into v_tb_person
from tb_person;
-- make a selection in v_tb_person, for instance
select name, count(*) from v_tb_person where age > 50
union
select name, count(*) from v_tb_person where gender = 1
end;
In PostgreSQL 10 you can use array_agg:
declare
v_ids int[];
begin
select array_agg(id) INTO v_ids
from mytable1
where host = p_host;
--use v_ids...
end;
You'll have array and it can be used to make select from it using unnest:
select * from unnest(v_ids) where ...
There is no such syntax in PostgreSQL, nor a close functional equivalent.
You can create a temporary table in your PL/PgSQL code and use that for the desired purpose. Temp tables in PL/PgSQL are a little bit annoying because the names are global within the session, but they work correctly in PostgreSQL 8.4 and up.
A better alternative for when you're doing all the work within a single SQL statement is to use a common table expression (CTE, or WITH query). This won't be suitable for all situations.
The example above would be much better solved by a simple RETURN QUERY in PL/PgSQL, but I presume your real examples are more complex.
Assuming that tb_person is some kind of expensive-to-generate view that you don't just want to scan in each branch of the union, you could do something like:
CREATE OR REPLACE FUNCTION prc_tst_bulk()
RETURNS TABLE (name text, rowcount integer) AS
$$
BEGIN
RETURN QUERY
WITH v_tb_person AS (SELECT * FROM tb_person)
select name, count(*) from v_tb_person where age > 50
union
select name, count(*) from v_tb_person where gender = 1;
END;
$$ LANGUAGE plpgsql;
This particular case can be further simplified into a plain SQL function:
CREATE OR REPLACE FUNCTION prc_tst_bulk()
RETURNS TABLE (name text, rowcount integer) AS
$$
WITH v_tb_person AS (SELECT * FROM tb_person)
select name, count(*) from v_tb_person where age > 50
union
select name, count(*) from v_tb_person where gender = 1;
$$ LANGUAGE sql;
You can use a PostgreSQL arrays too - it is similar to Oracle's collections:
postgres=# create table _foo(a int, b int);
CREATE TABLE
postgres=# insert into _foo values(10,20);
INSERT 0 1
postgres=# create or replace function multiply()
returns setof _foo as $$
/*
* two tricks are here
* table name can be used as type name
* table name can be used as fictive column that packs all fields
*/
declare a _foo[] = (select array(select _foo from _foo));
begin
return query select * from unnest(a)
union
all select * from unnest(a);
end;
$$ language plpgsql;
CREATE FUNCTION
postgres=# select * from multiply();
a | b
----+----
10 | 20
10 | 20
(2 rows)
But in your case Craig Ringer's proposal is perfect and should be preferable.
-- Fetch the next 5 rows in the cursor_01:
FETCH FORWARD 5 FROM cursor_01;
PostgreSQL 10+ works.
https://www.postgresql.org/docs/10/sql-fetch.html

postgresql copy with schema support

I'm trying to load some data from CSV using the postgresql COPY command. The trick is that I'd like to implement multi-tenancy on a userid (which is contained in the CSV). Is there an easy way to tell the postgres copy command to filter based on this userid when loading the csv?
i.e. all rows with userid=x go to schema=x, rows with userid=y go to schema=y.
There is not a way of doing this with just the COPY command, but you could copy all your data into a master table, and then put together a simple PL/PGSQL function that does this for you. Something like this -
CREATE OR REPLACE FUNCTION public.spike()
RETURNS void AS
$BODY$
DECLARE
user_id integer;
destination_schema text;
BEGIN
FOR user_id IN SELECT userid FROM master_table GROUP BY userid LOOP
CASE user_id
WHEN 1 THEN
destination_schema := 'foo';
WHEN 2 THEN
destination_schema := 'bar';
ELSE
destination_schema := 'baz';
END CASE;
EXECUTE 'INSERT INTO '|| destination_schema ||'.my_table SELECT * FROM master_table WHERE userid=$1' USING user_id;
-- EXECUTE 'DELETE FROM master_table WHERE userid=$1' USING user_id;
END LOOP;
TRUNCATE TABLE master_table;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;
This gets all unique user_ids from the master_table, uses a CASE statement to determine the destination schema, and then executes an INSERT SELECT to move rows, and finally deletes the moved rows.

Massive insertions from one big table to other related tables

Into:
Currently i have scraped all the data into one PostgreSQL 'Bigtable' table(there are about 1.2M rows). Now i need to split the design into separate tables which all have dependency on the Bigtable. Some of the tables might have subtables. The model looks pretty much like snowflake.
Problem:
What would be best option to inserting data into tables? I thought to make the insertion with functions written in 'SQL' or PLgSQL. But the problem is still with auto-generated ID-s.
Also if you know what tools might make this problem solving easier then post!
//Edit i have added example, this not the real case just for illustration
1.2 M rows is not too much. The best tool is sql script executed from console "psql". If you have a some newer version of Pg, then you can use inline functions (DO statement) when it is necessary. But probably the most useful command is INSERT INTO SELECT statement.
-- file conversion.sql
DROP TABLE IF EXISTS f1 CASCADE;
CREATE TABLE f1(a int, b int);
INSERT INTO f1
SELECT x1, y1
FROM data
WHERE x1 = 10;
...
-- end file
psql mydb -f conversion.sql
If I understand your question, you can use a psql function like this:
CREATE OR REPLACE FUNCTION migration() RETURNS integer AS
$BODY$
DECLARE
currentProductId INTEGER;
currentUserId INTEGER;
currentReg RECORD;
BEGIN
FOR currentReg IN
SELECT * FROM bigtable
LOOP
-- Product
SELECT productid INTO currentProductId
FROM product
WHERE name = currentReg.product_name;
IF currentProductId IS NULL THEN
EXECUTE 'INSERT INTO product (name) VALUES (''' || currentReg.product_name || ''') RETURNING productid'
INTO currentProductId;
END IF;
-- User
SELECT userid INTO currentUserId
FROM user
WHERE first_name = currentReg.first_name and last_name = currentReg.last_name;
IF currentUserId IS NULL THEN
EXECUTE 'INSERT INTO user (first_name, last_name) VALUES (''' || currentReg.first_name || ''', ''' || currentReg.last_name || ''') RETURNING userid'
INTO currentUserId;
-- Insert into userAdded too with: currentUserId and currentProductId
[...]
END IF;
-- Rest of tables
[...]
END LOOP;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;
select * from migration();
In this case it's assumed that each table runs its own primary key sequence and I have reduced the number of fields in the tables to simplify.
I hope you have been helpful.
No need to use a function for this (unless I misunderstood your problem)
If your id columns are all defined as serial column (i.e. they automatically generate the values), then this can be done with simple INSERT statements. This assumes that the target tables are all empty.
INSERT INTO users (firstname, lastname)
SELECT DISTINCT firstname, lastname
FROM bigtable;
INSERT INTO category (name)
SELECT DISTINCT category_name
FROM bigtable;
-- the following assumes a column categoryid in the product table
-- which is not visible from your screenshot
INSERT INTO product (product_name, description, categoryid)
SELECT DISTINCT b.product_name, b.description, c.categoryid
FROM bigtable b
JOIN category c ON c.category_name = b.category_name;
INSERT INTO product_added (product_productid, user_userid)
SELECT p.productid, u.userid
FROM bigtable b
JOIN product p ON p.product_name = b.product_name
JOIN users u ON u.firstname = b.firstname AND u.lastname = b.lastname