moving large data between postgresql tables - postgresql

I have two tables with two fields each in a PostgreSQL 12.4 database.
table_one (id serial, rec text) - 10 mil recs
table_two (id serial, rec jsonb)
I need to move data from table_one to table_two and transform the text into jsonb.
This method worked fairly well until the number of records got to around 400k.
INSERT INTO table_one(rec) SELECT rec::jsonb FROM table_two;
I tried the following two copy commands, but ran into syntax errors with both
COPY (SELECT item_records::jsonb FROM mdm_impt_json_raw_text) TO mdm_impt_json_raw;
COPY mdm_impt_json_raw(item_records) FROM (SELECT item_records::jsonb FROM mdm_impt_json_raw_text);
Could someone either help me with the syntax errors on the copy commands or suggest a better method for moving the data.

Related

Delete Duplicate rows in several Postgresql tables

I have a postgres database with several tables like table1, table2, table3. More than 1000 tables.
I imported all of these tables from a script. And apparently the script had issues to import.
Many tables have duplicate rows (all values exactly same).
I am able to go in each table and then delete duplicate row using Dbeaver, but because there are over 1000 tables, it is very time consuming.
Example of tables:
table1
name gender age
a m 20
a m 20
b f 21
b f 21
table2
fruit hobby
x running
x running
y stamp
y stamp
How can I do the following:
Identify tables in postgres with duplicate rows.
Delete all duplicate rows, leaving 1 record.
I need to do this on all 1000+ tables at once.
As you want to automate your deduplication of all table, you need to use plpgsql function where you can write dynamic queries to achieve it.
Try This function:
create or replace function func_dedup(_schemaname varchar) returns void as
$$
declare
_rec record;
begin
for _rec in select table_name from information_schema. tables where table_schema=_schemaname
loop
execute format('CREATE TEMP TABLE tab_temp as select DISTINCT * from '||_rec.table_name);
execute format('truncate '||_rec.table_name);
execute format('insert into '||_rec.table_name||' select * from tab_temp');
execute format('drop table tab_temp');
end loop;
end;
$$
language plpgsql
Now call your function like below:
select * from func_dedup('your_schema'); --
demo
Steps:
Get the list of all tables in your schema by using below query and loop it for each table.
select table_name from information_schema. tables where table_schema=_schemaname
Insert all distinct records in a TEMP TABLE.
Truncate your main table.
Insert all your data from TEMP TABLE to main table.
Drop the TEMP TABLE. (here dropping temp table is important we have to reuse it for next loop cycle.)
Note - if your tables are very large in size the consider using Regular Table instead of TEMP TABLE.

How does SELECT INTO works with SAS

I'm new with SAS and I try to copy my Code from Access vba into SAS.
In Access I use often the SELECT INTO funtion, but it seems to me this function is not in SAS.
I have two tables and I get each day new data and I want to update my table with the new lines. Now I Need to check if some new lines appear -> if yes insert this lines into the old table.
I tried some Code from stackoverflow and other stuff from Google, but I didn't find something which works.
INSERT INTO OLD_TABLE T
VALUES (GRVID = VTGONR)
FROM NEW_TABLE V
WHERE not exists (SELECT V.VTGONR FROM NEW_TABLE V WHERE T.GRVID = V.VTGONR);
Not sure what the purpose of using the VALUES keyword is in your example. PROC SQL uses VALUES() to list static values. Like:
VALUES (100)
SAS just uses normal SQL syntax instead. See for example: https://www.techonthenet.com/sql/insert.php
To specify the observations to insert just use SELECT. You can add a WHERE clause as part of the select to limit the rows that you select to insert. To tell INSERT which columns to insert into list them inside () after the table name. Otherwise it will expect the order that the columns are listed in the select statement to match the order of the columns in the target table.
insert into old_table(GRVID)
select VTGONR from new_table
where VTGONR not in (select GRVID from old_table)
;

Postgres Crosstab query - "duplicate category" error where 2 values have the first 62 characters in common

I'm a newbie working on a postgres 9.5 (dynamic) crosstab query, that has been working fine in general, but I've come up with a peculiar issue with large nearly identical category names and I hope there's an easy solution/explanation.
Requires tablefunc:
CREATE EXTENSION IF NOT EXISTS tablefunc;
Schema:
CREATE TABLE temp_table (id integer, name text, data text);
INSERT INTO temp_table VALUES (1, 'ThisSentenceIsExactlySixtyTwoCharactersLongPlusNumbersAtTheEnd', 'data1');
INSERT INTO temp_table VALUES (2, 'ThisSentenceIsExactlySixtyTwoCharactersLongPlusNumbersAtTheEnd1', 'data2');
Query:
SELECT * FROM CROSSTAB($$SELECT id, name, data FROM temp_table ORDER BY 1,2$$ , $$SELECT DISTINCT name FROM temp_table$$) AS ct (row integer, col_1 text,col_2 text);
Instead of the result I expect, I get:
ERROR: duplicate category name SQL state: 42710
Can anyone please tell me what's going on here, and if there's a simple fix?
Thanks!
I'm guessing it has something to do with the fact that PostgreSQL truncates identifiers (including column names and category names) to 63 characters. Seems like there's an off-by-one error somewhere in crosstab as well maybe. Do your names need to be so long? That's probably the easiest fix. You could also try increasing NAMEDATALEN and recompiling postgres.
https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

Create a temp table (if not exists) for use into a custom procedure

I'm trying to get the hang of using temp tables:
CREATE OR REPLACE FUNCTION test1(user_id BIGINT) RETURNS BIGINT AS
$BODY$
BEGIN
create temp table temp_table1
ON COMMIT DELETE ROWS
as SELECT table1.column1, table1.column2
FROM table1
INNER JOIN -- ............
if exists (select * from temp_table1) then
-- work with the result
return 777;
else
return 0;
end if;
END;
$BODY$
LANGUAGE plpgsql;
I want the row temp_table1 to be deleted immediately or as soon as possible, that's why I added ON COMMIT DELETE ROWS. Obviously, I got the error:
ERROR: relation "temp_table1" already exists
I tried to add IF NOT EXISTS but I couldn't, I simply couldn't find working example of it that would be the I'm looking for.
Your suggestions?
DROP Table each time before creating TEMP table as below:
BEGIN
DROP TABLE IF EXISTS temp_table1;
create temp table temp_table1
-- Your rest Code comes here
The problem of temp tables is that dropping and recreating temp table bloats pg_attribute heavily and therefore one sunny morning you will find db performance dead, and pg_attribute 200+ gb while your db would be like 10gb.
So we're very heavy on temp tables having >500 rps and async i\o via nodejs and thus experienced a very heavy bloating of pg_attribute because of that. All you are left with is a very aggressive vacuuming which halts performance.
All answers given here do not solve this, because they all bloat pg_attribute heavily.
So the solution is elegantly this
create temp table if not exists my_temp_table (description) on commit delete rows;
So you go on playing with temp tables and save your pg_attribute.
You want to DROP term table after commit (not DELETE ROWS), so:
begin
create temp table temp_table1
on commit drop
...
Documentation

How to find table creation time?

How can I find the table creation time in PostgreSQL?
Example: If I created a file I can find the file creation time like that I want to know the table creation time.
I had a look through the pg_* tables, and I couldn't find any creation times in there. It's possible to locate the table files, but then on Linux you can't get file creation time. So I think the answer is that you can only find this information on Windows, using the following steps:
get the database id with select datname, datdba from pg_database;
get the table filenode id with select relname, relfilenode from pg_class;
find the table file and look up its creation time; I think the location should be something like <PostgreSQL folder>/main/base/<database id>/<table filenode id> (not sure what it is on Windows).
You can't - the information isn't recorded anywhere. Looking at the table files won't necessarily give you the right information - there are table operations that will create a new file for you, in which case the date would reset.
I don't think it's possible from within PostgreSQL, but you'll probably find it in the underlying table file's creation time.
Suggested here :
SELECT oid FROM pg_database WHERE datname = 'mydb';
Then (assuming the oid is 12345) :
ls -l $PGDATA/base/12345/PG_VERSION
This workaround assumes that PG_VERSION is the least likely to be modified after the creation.
NB : If PGDATA is not defined, check Where does PostgreSQL store the database?
Check data dir location
SHOW data_directory;
Check For Postgres relation file path :
SELECT pg_relation_filepath('table_name');
you will get the file path of your relation
check for creation time of this file <data-dir>/<relation-file-path>
I tried a different approach to get table creation date which could help for keeping track of dynamically created tables. Suppose you have a table inventory in your database where you manage to save the creation date of the tables.
CREATE TABLE inventory (id SERIAL, tablename CHARACTER VARYING (128), created_at DATE);
Then, when a table you want to keep track of is created it's added in your inventory.
CREATE TABLE temp_table_1 (id SERIAL); -- A dynamic table is created
INSERT INTO inventory VALUES (1, 'temp_table_1', '2020-10-07 10:00:00'); -- We add it into the inventory
Then you could get advantage of pg_tables to run something like this to get existing table creation dates:
SELECT pg_tables.tablename, inventory.created_at
FROM pg_tables
INNER JOIN inventory
ON pg_tables.tablename = inventory.tablename
/*
tablename | created_at
--------------+------------
temp_table_1 | 2020-10-07
*/
For my use-case it is ok because I work with a set of dynamic tables that I need to keep track of.
P.S: Replace inventory in the database with your table name.
I'm trying to follow a different way for obtain this.
Starting from this discussion my solution was:
DROP TABLE IF EXISTS t_create_history CASCADE;
CREATE TABLE t_create_history (
gid serial primary key,
object_type varchar(20),
schema_name varchar(50),
object_identity varchar(200),
creation_date timestamp without time zone
);
--delete event trigger before dropping function
DROP EVENT TRIGGER IF EXISTS t_create_history_trigger;
--create history function
DROP FUNCTION IF EXISTS public.t_create_history_func();
CREATE OR REPLACE FUNCTION t_create_history_func()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE
obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands () WHERE command_tag in ('SELECT INTO','CREATE TABLE','CREATE TABLE AS')
LOOP
INSERT INTO public.t_create_history (object_type, schema_name, object_identity, creation_date) SELECT obj.object_type, obj.schema_name, obj.object_identity, now();
END LOOP;
END;
$$;
--ALTER EVENT TRIGGER t_create_history_trigger DISABLE;
--DROP EVENT TRIGGER t_create_history_trigger;
CREATE EVENT TRIGGER t_create_history_trigger ON ddl_command_end
WHEN TAG IN ('SELECT INTO','CREATE TABLE','CREATE TABLE AS')
EXECUTE PROCEDURE t_create_history_func();
In this way you obtain a table that records all the creation tables.
--query
select pslo.stasubtype, pc.relname, pslo.statime
from pg_stat_last_operation pslo
join pg_class pc on(pc.relfilenode = pslo.objid)
and pslo.staactionname = 'CREATE'
Order By pslo.statime desc
will help to accomplish desired results
(tried it on greenplum)
You can get this from pg_stat_last_operation. Here is how to do it:
select * from pg_stat_last_operation where objid = 'table_name'::regclass order by statime;
This table stores following operations:
select distinct staactionname from pg_stat_last_operation;
staactionname
---------------
ALTER
ANALYZE
CREATE
PARTITION
PRIVILEGE
VACUUM
(6 rows)