PostgreSQL pg_dump/COPY - postgresql

I have a requirement to dump the contents of a definable selection of tables as CSV's for an initial load of systems that are not able to connect with PostgreSQL for various reasons.
I have written a script to do this which runs through a list of tables using psql with the -c flag to run psql's \COPY command to dump the corresponding table to a file like this:
COPY table_name TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|');
It works fine. But I am sure you have already spotted the problem: as the process takes ~57 minutes for ~60 odd tables, the likelyhood of consistency is quite close to absolute zero.
I had a think about it and suspected I could make a few lightweight changes to pg_dump to do what I want, i.e., create multiple csv's from pg_dump whilst having a hope of integrity between the tables - and being able to specify parallel dumps too.
I have added a few flags to allow me to apply a file postfix (the date), set the format options and pass in a path for the relevant output file.
However my modified pg_dump was failing when writing to a file, like:
COPY table_name (pkey_id, field1, field2 ... fieldn) TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')
Note: Within pg_dump, the column list is expanded
So I cast around for further information and found these COPY Tips.
It looks like writing to a file is a no-no over the network; however I am on the same machine (for now). I felt writing to /tmp would be OK as it is writable by anyone.
So I tried cheating with:
seingramp#seluonkeydb01:~$ ./tp_dump -a -t table_name -D /tmp/ -k "FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|'" -K "_$DATE_POSTFIX"
tp_dump: warning: there are circular foreign-key constraints on this table:
tp_dump: table_name
tp_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
tp_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
--
-- PostgreSQL database dump
--
-- Dumped from database version 12.3
-- Dumped by pg_dump version 14devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
--
-- Data for Name: material_master; Type: TABLE DATA; Schema: mm; Owner: postgres
--
COPY table_name (pkey_id, field1, field2 ... fieldn) FROM stdin;
tp_dump: error: query failed:
tp_dump: error: query was: COPY table_name (pkey_id, field1, field2 ... fieldn) TO PROGRAM 'gzip > /tmp/table_name_20200814.csv.gz' WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')
I have neutered the data as it is customer specific.
I didn't find pg_dump's error message very helpful, do you have any ideas as to what I am doing wrong?
The changes really are quite small (excuse the code!) starting ~line 1900, ignoring the flags added around getopt().
/*
* Use COPY (SELECT ...) TO when dumping a foreign table's data, and when
* a filter condition was specified. For other cases a simple COPY
* suffices.
*/
if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
{
/* Note: this syntax is only supported in 8.2 and up */
appendPQExpBufferStr(q, "COPY (SELECT ");
/* klugery to get rid of parens in column list */
if (strlen(column_list) > 2)
{
appendPQExpBufferStr(q, column_list + 1);
q->data[q->len - 1] = ' ';
}
else
appendPQExpBufferStr(q, "* ");
if ( copy_from_spec )
{
if ( copy_from_postfix )
{
appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "",
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_postfix,
copy_from_spec);
}
else
{
appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "",
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_spec);
}
}
else
{
appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "");
}
}
else
{
if ( copy_from_spec )
{
if ( copy_from_postfix )
{
appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
column_list,
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_postfix,
copy_from_spec);
}
else
{
appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
column_list,
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_spec);
}
}
else
{
appendPQExpBuffer(q, "COPY %s %s TO stdout;",
fmtQualifiedDumpable(tbinfo),
column_list);
}
I tried a couple of other cheats too, like specifying a directory owned by postgres. I know it's a quick hack but I hope you can help, and thanks for looking.

This is a use case for pg_restore -f.
So:
-- Create custom format dump file
pg_dump -d some_db -U some_user -Fc -f dump.out
-- Move that file to where you need it
-- Dump data only from named table to a file from the dump file.
pg_restore -a -t table_1 -f table_1_data.sql dump.out
The pg_dump will create a consistent snapshot of the tables, so you have the database in a 'frozen' state in dump.out. Then you can use pg_restore to 'thaw out' those parts you need on your schedule. By using -a you will get the COPY you want.

Related

Postgres - limit number of rows COPY FROM

Is there a way to limit the Postgres COPY FROM syntax to only the first row? There doesn't seem to be an option listed in the documentation.
I know there's that functionality in SQL Server, see FIRSTROW AND LASTROW options below:
BULK INSERT sometable
FROM 'E:\filefromabove.txt
WITH
(
FIRSTROW = 2,
LASTROW = 4,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
You could use the PROGRAM option to preprocess the file to read from the standard output.
To load only the first line use
Unix/Linux/Mac
COPY sometable from PROGRAM 'head -1 filefromabove.txt' ;
Windows
COPY sometable from PROGRAM 'set /p var= <filefromabove.txt && echo %var%' ;

How to skip empty line in psql \COPY in PostgreSQL

In PostgreSQL psql, how to make \copy command ignore empty lines in input file?
Here is the code to reproduce it,
create table t1(
n1 int
);
echo "1
2
" > m.csv
psql> \copy t1(n1) FROM 'm.csv' (delimiter E'\t', NULL 'NULL', FORMAT CSV, HEADER false);
ERROR: invalid input syntax for integer: ""
CONTEXT: COPY t1, line 3, column n1: ""
There is an empty line in file m.csv
cat m.csv
1
2
<< empty line
PostgreSQL COPY is very strict, so there is not possibility to start COPY in tolerant mode. If it is possible, you can use COPY FROM PROGRAM
[pavel#nemesis ~]$ cat ~/data.csv
10,20,30
40,50,60
70,80,90
psql -c "\copy f from program ' sed ''/^\s*$/d'' ~/data.csv ' csv" postgres

Convert Tables in Postgresql to Shapefile

So far I have loaded all the parcel tables (with geometry information) in Alaska to PostgreSQL. The tables are originally stored in dump format. Now, I want to convert each table in Postgres to shapefile through cmd interface using ogr2ogr.
My code is something like below:
ogr2ogr -f "ESRI Shapefile" "G:\...\Projects\Dataset\Parcel\test.shp" PG:"dbname=parceldb host=localhost port=5432 user=postgres password=postgres" -sql "SELECT * FROM ak_fairbanks"
However, the system kept returning me this info: Unable to open datasource
PG:dbname='parceldb' host='localhost' port='5432' user='postgres' password='postgres'
With the following drivers.
There is pgsql2shp option also available. For this you need to have this utility in your system.
The command that can be follow for this conversion is
pgsql2shp -u <username> -h <hostname> -P <password> -p 5434 -f <file path to save shape file> <database> [<schema>.]<table_name>
This command has other options also which can be seen on this link.
Exploring this case based on the comments in another answer, I decided to share my Bash scripts and my ideas.
Exporting multiple tables
To export many tables from a specific schema, I use the following script.
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[#]}
do
DATA_QUERY="SELECT * FROM $schema.$TABLE"
SHP_NAME="$TABLE"
if [[ ${#FILTER[#]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[#]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
Splitting data into multiple files
To avoid the problem of large tables when pgsql2shp does not write data to the shapefile, we can adopt data splitting using the paging strategy. In Postgres we can use LIMIT, OFFSET and ORDER BY for paging.
Applying this method and considering that your table has a primary key used to sort the data in my example script.
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[#]}
do
GET_PK="SELECT a.attname "
GET_PK="${GET_PK}FROM pg_index i "
GET_PK="${GET_PK}JOIN pg_attribute a ON a.attrelid = i.indrelid AND a.attnum = ANY(i.indkey) "
GET_PK="${GET_PK}WHERE i.indrelid = 'test'::regclass AND i.indisprimary"
PK=($($PG_BIN/psql $PG_CON -t -c "$GET_PK"))
MAX_ROWS=($($PG_BIN/psql $PG_CON -t -c "SELECT COUNT(*) FROM $schema.$TABLE"))
LIMIT=10000
OFFSET=0
# base query
DATA_QUERY="SELECT * FROM $schema.$TABLE"
# continue until all data are fetched.
while [ $OFFSET -le $MAX_ROWS ]
do
DATA_QUERY_P="$DATA_QUERY ORDER BY $PK OFFSET $OFFSET LIMIT $LIMIT"
OFFSET=$(( OFFSET+LIMIT ))
SHP_NAME="${TABLE}_${OFFSET}"
if [[ ${#FILTER[#]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[#]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
done
Common config file
Configuration file for PostgreSQL connection used in both examples (pgconfig):
user="postgres"
host="my_ip_or_hostname"
port="5432"
database="my_database"
schema="my_schema"
password="secret"
Another strategy is to choose GeoPackage as the output file that supports a larger file size than the shapefile format, maintaining portability across Operating Systems and having sufficient support in GIS softwares.
ogr2ogr -f GPKG output_file.gpkg PG:"host=my_ip_or_hostname user=postgres dbname=my_database password=secret" -sql "SELECT * FROM my_schema.my_table"
References:
Retrieve primary key columns - Postgres
LIMIT, OFFSET, ORDER BY and Pagination in PostgreSQL
ogr2ogr - GDAL

Conditionally construct schema depending on database version in PostgreSQL

Assume the following table and custom range type:
create table booking (
identifier integer not null primary key,
room uuid not null,
start_time time without time zone not null,
end_time time without time zone not null
);
create type timerange as range (subtype = time);
In PostgreSQL v10, you can do:
alter table booking add constraint overlapping_times
exclude using gist
(
room with =,
timerange(start_time, end_time) with &&
);
In PostgreSQL v9.5/v9.6, you have to manually cast the uuid column as gist_btree does not support uuid:
alter table booking add constraint overlapping_times
exclude using gist
(
(room::text) with =,
timerange(start_time, end_time) with &&
);
I would like to support v9.5, v9.6 and v10 for my customers. Is there a way to conditionally add the above constraint in the same .sql file, depending on the version of the current database?
You can use dynamic sql
For example:
do
$block$
declare
l_version text;
begin
select setting into l_version from pg_settings where name = 'server_version';
execute
format(
$script$
alter table booking add constraint overlapping_times
exclude using gist
(
%s with =,
timerange(start_time, end_time) with &&
)
$script$,
case when (l_version like '9.5.%' or l_version like '9.6.%') then '(room::text)' else 'room' end
)
;
end;
$block$
language plpgsql;
Here is a proof of concept:
#!/bin/sh
#THE_HOST="192.168.0.104"
THE_HOST="192.168.0.101"
get_version ()
{
psql -t -h ${THE_HOST} -U postgres postgres <<OMG | awk -e '{ print $2; }'
select version();
OMG
}
# ############################################################################
# main
#
# - Connect to database to retrieve version
# - use the retrieved version to create a symlink "versioned" to
# one of our subdirs
# - call an sql script that includes this symlink/some.sql
# symlinking to a non-existing directory will cause a dead link
# (, and the script to fail.)
# ergo: there should be a subdir "verX.Y.Z" for every supported version X.Y.Z
# ############################################################################
pg_version=`get_version`
#echo "version=${pg_version}"
rm versioned
ln -fs "ver${pg_version}" versioned
if [ -f versioned/alter.sql ]; then
echo created link versioned to "ver${pg_version}"
else
echo "version ${pg_version} not supported today..."
echo "Failed!"
exit 1
fi
psql -t -h ${THE_HOST} -U postgres postgres <<LETS_GO
\i common/create.sql
\i versioned/alter.sql
\echo done!
LETS_GO
#eof

PostgreSQL, perl and dojo special character issue (æ,ø and å)

I have a webpage made in perl and dojo using a PostgreSQL database. I have to search for availale people in the database and since im from Denmark the letters æ,ø and å has to be available in the search. I thought this was standard when using UTF8 and when I normally program in php over mysql I didn't think it would be that hard.
I have done properly every trick I know to convert this search_word to the right encoding so i can search in the postgre sql database for correct names with æ,ø and å... but it still fails.
i have my perl code making the fetch but this fetch returns 0 rows and when i insert the same command in the psql terminal i get 46 rows returned (copy from "tail -f log terminal" the STDERR statement and inserts it into another terminal connected to the database through the psql command)... the perl code is:
sub dbSearchPersons {
my $search_word = escapeSql($_[0]);
$search_word = Encode::decode_utf8($search_word);
$statement = "SELECT id,name,initials,email FROM person WHERE name ilike '\%".$search_word."\%' OR email ilike '\%".$search_word."\%' OR initials ilike '\%".$search_word."\%' ORDER BY name ASC";
$sth = $dbh->prepare($statement);
$num_rows = $sth->execute();
print STDERR "Statement: " . $statement;
if($num_rows > 0){
$persons = $dbh->selectall_hashref($statement,'id');
}
dbFinish($sth);
webdie($DBI::errstr) if($DBI::errstr);
}
and as you can see i write the SQL statement to STDERR and which outputs the following:
[Fri Apr 27 11:24:26 2012] [error] [client 10.254.0.1] Statement: SELECT id,name,initials,email FROM person WHERE name ilike '%Jørgen%' OR email ilike '%Jørgen%' OR initials ilike '%Jørgen%' ORDER BY name ASC, referer: https://xx.xxx.xxx.xx/cgi-bin/users.cgi
The sql I correctly written (as i can see it through the terminal output above) and if I copy and paste the statement from the terminal and inserts it directly into the psql terminal, i get 46 rows returned as I should... But the perl still wont return any rows.
I don't get it? When formatting a string to display "ø" and not "ø" (as perl translates the UTF8 encoding to, from "J%C3%B8rgen" which gets send through dojo.xhr.post), should I not be able to use it in a SQL statement? Is it because the psql database can have a certain encoding i have to take that into account somehow? Or could it be some completely different?
Hope someone can help me. I have been struggling with this problem for two days now and since the things looks like they should, but don't work I get a little sad :/
Regards,
Thor Astrup Pedersen
You probably forgot to pg_enable_utf8. The database interface will return then Perl character data to you.
$ createdb -e -E UTF-8 -l en_US.UTF-8 -T template0 so10349280
CREATE DATABASE so10349280 ENCODING 'UTF-8' TEMPLATE template0 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
$ echo 'create table person (id int, name varchar, initials varchar, email varchar)'|psql so10349280
CREATE TABLE
$ echo "insert into person (id, name) values (1, 'Jørgensen')"|psql so10349280
INSERT 0 1
$ echo 'select * from person'|psql so10349280
id | name | initials | email
----+-----------+----------+-------
1 | Jørgensen | |
$ perl -Mutf8 -Mstrictures -MDBI -MDevel::Peek -E'
my $dbh = DBI->connect(
"DBI:Pg:dbname=so10349280", $ENV{LOGNAME}, "", { RaiseError => 1, AutoCommit => 1, pg_enable_utf8 => 1}
);
my $r = $dbh->selectall_hashref("select * from person where name = ?", "id", undef, "Jørgensen");
Dump $r->{1}{name};
'
SV = PV(0x836e20) at 0xa58dc8
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0xa5a000 "J\303\270rgensen"\0 [UTF8 "J\x{f8}rgensen"]
CUR = 10
LEN = 16
You don't say quite clear, I think you eventually intend to send out the character data as JSON for use with Dojo. You need to encode them into UTF-8 octets; the various JSON libaries take care of that automatically for you, no need to invoke Encode functions manually.