postgresql copy with schema support - postgresql

I'm trying to load some data from CSV using the postgresql COPY command. The trick is that I'd like to implement multi-tenancy on a userid (which is contained in the CSV). Is there an easy way to tell the postgres copy command to filter based on this userid when loading the csv?
i.e. all rows with userid=x go to schema=x, rows with userid=y go to schema=y.

There is not a way of doing this with just the COPY command, but you could copy all your data into a master table, and then put together a simple PL/PGSQL function that does this for you. Something like this -
CREATE OR REPLACE FUNCTION public.spike()
RETURNS void AS
$BODY$
DECLARE
user_id integer;
destination_schema text;
BEGIN
FOR user_id IN SELECT userid FROM master_table GROUP BY userid LOOP
CASE user_id
WHEN 1 THEN
destination_schema := 'foo';
WHEN 2 THEN
destination_schema := 'bar';
ELSE
destination_schema := 'baz';
END CASE;
EXECUTE 'INSERT INTO '|| destination_schema ||'.my_table SELECT * FROM master_table WHERE userid=$1' USING user_id;
-- EXECUTE 'DELETE FROM master_table WHERE userid=$1' USING user_id;
END LOOP;
TRUNCATE TABLE master_table;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;
This gets all unique user_ids from the master_table, uses a CASE statement to determine the destination schema, and then executes an INSERT SELECT to move rows, and finally deletes the moved rows.

Related

postgresql for loop script in text form can not be executed

I am trying to write function in postgresql, that creates temp_table with columns table_name text, table_rec jsonb and fill it through for loop with table names from my table containing names of tables and records in json. I have the for loop in string and I want to execute it. But it doesnt work.
I have variable rec record, sql_query text and tab_name text and I want to do this:
CREATE OR REPLACE FUNCTION public.test51(
)
RETURNS TABLE(tabel_name text, record_json jsonb)
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
declare
rec record;
tabel_name text;
tabel_names text[];
counter integer := 1;
sql_query text;
limit_for_sending integer;
rec_count integer;
begin
select into tabel_names array(select "TABLE_NAME" from public."TABLES");
create temp table temp_tab(tab_nam text, recik jsonb);
while array_length(tabel_names, 1) >= counter loop
tabel_name := '"' || tabel_names[counter] || '"';
select into limit_for_sending "TABLE_LIMIT_FOR_SENDING_DATA" from public."TABLES" where "TABLE_NAME" = tabel_name;
sql_query := 'select count(*) from public.' || tabel_name;
execute sql_query into rec_count;
if (rec_count >= limit_for_sending and limit_for_sending is not null) then
sql_query := 'for rec in select * from public.' || tabel_name || '
loop
insert into temp_tab
select ' || tabel_name || ', to_jsonb(rec);
end loop';
execute sql_query;
end if;
counter := counter + 1;
end loop;
return query
select * from temp_tabik;
drop table temp_tabik;
end;
$BODY$;
Thank you for response.
It seems you have some table that contains the information for which tables you want to return all rows as JSONB. And that meta-table also contains a column that sets a threshold under which the rows should not be returned.
You don't need the temp table or an array to store the table names. You can iterate through the query on the TABLES table and run the dynamic SQL directly in that loop.
return query in PL/pgSQL doesn't terminate the function, it just appends the result of the query to the result of the function.
Dynamic SQL is best created using the format() function because it is easier to read and using the %I placeholder will properly deal with quoted identifiers (which is really important as you are using those dreaded upper case table names)
As far as I can tell, your function can be simplified to:
CREATE OR REPLACE FUNCTION public.test51()
RETURNS TABLE(tabel_name text, record_json jsonb)
LANGUAGE plpgsql
AS
$BODY$
declare
rec record;
sql_query text;
rec_count bigint;
begin
for rec in
select "TABLE_NAME" as table_name, "TABLE_LIMIT_FOR_SENDING_DATA" as rec_limit
from public."TABLES"
loop
if rec.rec_limit is not null then
execute format('select count(*) from %I', rec.table_name)
into rec_count;
end if;
if (rec.rec_limit is not null and rec_count >= rec.rec_limit) then
sql_query := format('select %L, to_jsonb(t) from %I as t', rec.table_name, rec.table_name);
return query execute sql_query;
end if;
end loop;
end;
$BODY$;
Some notes
the language name is an identifier and should not be enclosed in single quotes. This syntax is deprecated and might be removed in a future version so don't get used to it.
you should really avoid those dreaded quoted identifiers. They are much more trouble than they are worth it. See the Postgres wiki for details.

Using a variable on a PostgreSQL function to drop a schema

I'm trying to create a function on PostgreSQL, and I have some problem to use a local variable. Here's my code :
DECLARE query RECORD;
DECLARE schema_name TEXT;
BEGIN
FOR query IN SELECT * FROM context WHERE created_at + make_interval(days => duration) <= CURRENT_TIMESTAMP LOOP
SELECT lower(quote_ident(query.title)) INTO schema_name;
DROP SCHEMA schema_name CASCADE;
DELETE FROM context WHERE id = query.id;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
The select and delete queries work fine, and I've made a test returning the value of schema_name variable, and it's OK.
My problem is with this line :
DROP SCHEMA schema_name CASCADE;
I get an error as "the schema 'schema_name' doesn't exist".
I'd really appreciate any ideas for how to use this variable to do the drop query.
You need dynamic SQL for this:
DECLARE
query RECORD;
BEGIN
FOR query IN SELECT id, lower(title) as title
FROM context
WHERE created_at + make_interval(days => duration) <= CURRENT_TIMESTAMP
LOOP
execute format('DROP SCHEMA %I CASCADE', query.title);
DELETE FROM context WHERE id = query.id;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
I also removed the unnecessary SELECT statement to make the title lower case, this is better done in the query directly.
Also: variable assignment is faster with := then with select, so:
schema_name := lower(quote_ident(query.title));
would be better if the variable was needed.

Postgresql query across different tables with dynamic query

I'm trying to get a customer id which can be placed in one of ten different tables. I don't want to hard code those table names to find it so I tried postgresql function as follows.
create or replace FUNCTION test() RETURNS SETOF RECORD AS $$
DECLARE
rec record;
BEGIN
select id from schema.table_0201_0228 limit 1 into rec;
return next rec;
select id from schema.table_0301_0331 limit 1 into rec;
return next rec;
END $$ language plpgsql;
select * from test() as (id int)
As I'm not familiar with postgresql function usage, how can I improve the code to replace 'schema.table1' with a variable, loop each table and return the result?
NOTE: table names may change overtime. For example, table_0201_0228 and table_0301_0331 are for February and March respectively.
You need dynamic SQL for that:
create or replace FUNCTION test(p_schema text)
RETURNS table(id int)
AS $$
DECLARE
l_tab record;
l_sql text;
BEGIN
for l_tab in (select schemaname, tablename
from pg_tables
where schemaname = p_schema)
loop
l_sql := format('select id from %I.%I limit 1', l_tab.schemaname, l_tab.tablename);
return query execute l_sql;
end loop;
END $$
language plpgsql;
I made the schema name a parameter, but of course you can hard-code it. As the function is defined as returns table there is no need to specify the column name when using it:
select *
from test('some_schema');

Best way to fill array from SELECT

I am creating a function to delete a product category from a database. First it SELECTs all child tables that inherit from the table to be deleted from, and will return JSON object of table names if child categories depend on it. What's the most efficient way to fill array with values from SELECT query?
CREATE OR REPLACE FUNCTION delete_category(catid INT)
RETURNS json AS $$
DECLARE
depend "catalog";
dependlist "catalog"[];
BEGIN
FOR depend IN SELECT * FROM catalog LOOP
dependlist:=dependlist || depend;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Using append || operator is pretty slow way - on some older PostgreSQL versions it is terrible slow. You can use two possibilities - mentioned array_agg or ARRAY(SUBSELECT) constructor:
dependlist := ARRAY(SELECT depend FROM catalog);
or
dependlist := (SELECT array_agg(depend) FROM catalog);
or
SELECT array_agg(depend) FROM catalog INTO dependlist;
The performance should be same for all possibilities.
I used the array_agg() function, and this worked fine. Only thing I am unsure of is why I added depend after table name in FROM?
CREATE OR REPLACE FUNCTION delete_category(catid INT)
RETURNS jsonb AS $$
DECLARE
--depend "catalog";
dependlist "catalog"[];
cnt INT;
BEGIN
--Check no other tables dependent
SELECT array_agg(depend) INTO dependlist FROM catalog depend
WHERE inherit_from=catid;
IF array_length(dependlist,1) IS NOT NULL THEN
RETURN jsonb_build_object('error',
'children','tables',to_jsonb(dependlist));
END IF;
--Check table is empty of products
SELECT COUNT(*) INTO cnt FROM (SELECT tablename
FROM catalog WHERE catalogid=catid) AS tname;
RETURN jsonb_build_object('error','none');
END;
$$ LANGUAGE plpgsql;

How to select from variable that is a table name n Postgre >=9.2

i have a variable that is a name of a table. How can i select or update from this using variable in query , for example:
create or replace function pg_temp.testtst ()
returns varchar(255) as
$$
declare
r record; t_name name;
begin
for r in SELECT tablename FROM pg_tables WHERE schemaname = 'public' limit 100 loop
t_name = r.tablename;
update t_name set id = 10 where id = 15;
end loop;
return seq_name;
end;
$$
language plpgsql;
it shows
ERROR: relation "t_name" does not exist
Correct reply is a comment from Anton Kovalenko
You cannot use variable as table or column name in embedded SQL ever.
UPDATE dynamic_table_name SET ....
PostgreSQL uses a prepared and saved plans for embedded SQL, and references to a target objects (tables) are deep and hard encoded in plans - a some characteristics has significant impact on plans - for one table can be used index, for other not. Query planning is relatively slow, so PostgreSQL doesn't try it transparently (without few exceptions).
You should to use a dynamic SQL - a one purpose is using for similar situations. You generate a new SQL string always and plans are not saved
DO $$
DECLARE r record;
BEGIN
FOR r IN SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'public'
LOOP
EXECUTE format('UPDATE %I SET id = 10 WHERE id = 15', r.table_name);
END LOOP;
END $$;
Attention: Dynamic SQL is unsafe (there is a SQL injection risks) without parameter sanitization. I used a function "format" for it. Other way is using "quote_ident" function.
EXECUTE 'UPDATE ' || quote_ident(r.table_name) || 'SET ...