Convert Tables in Postgresql to Shapefile - postgresql

So far I have loaded all the parcel tables (with geometry information) in Alaska to PostgreSQL. The tables are originally stored in dump format. Now, I want to convert each table in Postgres to shapefile through cmd interface using ogr2ogr.
My code is something like below:
ogr2ogr -f "ESRI Shapefile" "G:\...\Projects\Dataset\Parcel\test.shp" PG:"dbname=parceldb host=localhost port=5432 user=postgres password=postgres" -sql "SELECT * FROM ak_fairbanks"
However, the system kept returning me this info: Unable to open datasource
PG:dbname='parceldb' host='localhost' port='5432' user='postgres' password='postgres'
With the following drivers.

There is pgsql2shp option also available. For this you need to have this utility in your system.
The command that can be follow for this conversion is
pgsql2shp -u <username> -h <hostname> -P <password> -p 5434 -f <file path to save shape file> <database> [<schema>.]<table_name>
This command has other options also which can be seen on this link.

Exploring this case based on the comments in another answer, I decided to share my Bash scripts and my ideas.
Exporting multiple tables
To export many tables from a specific schema, I use the following script.
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[#]}
do
DATA_QUERY="SELECT * FROM $schema.$TABLE"
SHP_NAME="$TABLE"
if [[ ${#FILTER[#]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[#]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
Splitting data into multiple files
To avoid the problem of large tables when pgsql2shp does not write data to the shapefile, we can adopt data splitting using the paging strategy. In Postgres we can use LIMIT, OFFSET and ORDER BY for paging.
Applying this method and considering that your table has a primary key used to sort the data in my example script.
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[#]}
do
GET_PK="SELECT a.attname "
GET_PK="${GET_PK}FROM pg_index i "
GET_PK="${GET_PK}JOIN pg_attribute a ON a.attrelid = i.indrelid AND a.attnum = ANY(i.indkey) "
GET_PK="${GET_PK}WHERE i.indrelid = 'test'::regclass AND i.indisprimary"
PK=($($PG_BIN/psql $PG_CON -t -c "$GET_PK"))
MAX_ROWS=($($PG_BIN/psql $PG_CON -t -c "SELECT COUNT(*) FROM $schema.$TABLE"))
LIMIT=10000
OFFSET=0
# base query
DATA_QUERY="SELECT * FROM $schema.$TABLE"
# continue until all data are fetched.
while [ $OFFSET -le $MAX_ROWS ]
do
DATA_QUERY_P="$DATA_QUERY ORDER BY $PK OFFSET $OFFSET LIMIT $LIMIT"
OFFSET=$(( OFFSET+LIMIT ))
SHP_NAME="${TABLE}_${OFFSET}"
if [[ ${#FILTER[#]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[#]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
done
Common config file
Configuration file for PostgreSQL connection used in both examples (pgconfig):
user="postgres"
host="my_ip_or_hostname"
port="5432"
database="my_database"
schema="my_schema"
password="secret"
Another strategy is to choose GeoPackage as the output file that supports a larger file size than the shapefile format, maintaining portability across Operating Systems and having sufficient support in GIS softwares.
ogr2ogr -f GPKG output_file.gpkg PG:"host=my_ip_or_hostname user=postgres dbname=my_database password=secret" -sql "SELECT * FROM my_schema.my_table"
References:
Retrieve primary key columns - Postgres
LIMIT, OFFSET, ORDER BY and Pagination in PostgreSQL
ogr2ogr - GDAL

Related

Batch file for reading sql scripts from file and export results to csv

I want to make a batch file that will get query from .SQL script from the directory and export results in .csv format. I need to connect to the Postgres server.
So I'm trying to do this using that answer https://stackoverflow.com/a/39049102/9631920.
My file:
#!/bin/bash
# sql_to_csv.sh
echo test1
CONN="psql -U my_user -d my_db -h host -port"
QUERY="$(sed 's/;//g;/^--/ d;s/--.*//g;' 'folder/folder/folder/file.sql' | tr '\n' ' ')"
echo test2
echo "$QUERY"
echo test3
echo "\\copy ($QUERY) to 'folder/folder/folder/file.csv' with csv header" | $CONN > /dev/null
echo query in progress
It shows me script from query and test3 and then stops. What am I doing wrong?
edit.
My file:
#!/bin/bash
PSQL = "psql -h 250.250.250.250 -p 5432 -U user -d test"
${PSQL} << OMG2
CREATE TEMP VIEW xyz AS
`cat C:\Users\test\Documents\my_query.sql`
;
\copy (select * from xyz) TO 'C:\Users\test\Documents\res.csv';
OMG2
But it's not asking password, and not getting any result file
a shell HERE-document will solve most of your quoting woes
a temp view will solve the single-query-on-a-single line problem
Example (using a multi-line two-table JOIN):
#!/bin/bash
PSQL="psql -U www twitters"
${PSQL} << OMG
-- Some comment here
CREATE TEMP VIEW xyz AS
SELECT twp.name, twt.*
FROM tweeps twp
JOIN tweets twt
ON twt.user_id = twp.id
AND twt.in_reply_to_id > 3
WHERE 1=1
AND (False OR twp.screen_name ilike '%omg%' )
;
\copy (select * from xyz) TO 'omg.csv';
OMG
If you want the contents of an existing .sql file, you can cat it inside the here document, using a backtick-expansion:
#!/bin/bash
PSQL="psql -X -n -U www twitters"
${PSQL} << OMG2
-- Some comment here
CREATE TEMP VIEW xyz AS
-- ... more comment
-- cat the original file here
`cat /home/dir1/dir2/dir3/myscript.sql`
;
\copy (select * from xyz) TO 'omg.csv';
OMG2

PostgreSQL pg_dump/COPY

I have a requirement to dump the contents of a definable selection of tables as CSV's for an initial load of systems that are not able to connect with PostgreSQL for various reasons.
I have written a script to do this which runs through a list of tables using psql with the -c flag to run psql's \COPY command to dump the corresponding table to a file like this:
COPY table_name TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|');
It works fine. But I am sure you have already spotted the problem: as the process takes ~57 minutes for ~60 odd tables, the likelyhood of consistency is quite close to absolute zero.
I had a think about it and suspected I could make a few lightweight changes to pg_dump to do what I want, i.e., create multiple csv's from pg_dump whilst having a hope of integrity between the tables - and being able to specify parallel dumps too.
I have added a few flags to allow me to apply a file postfix (the date), set the format options and pass in a path for the relevant output file.
However my modified pg_dump was failing when writing to a file, like:
COPY table_name (pkey_id, field1, field2 ... fieldn) TO table_name.csv WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')
Note: Within pg_dump, the column list is expanded
So I cast around for further information and found these COPY Tips.
It looks like writing to a file is a no-no over the network; however I am on the same machine (for now). I felt writing to /tmp would be OK as it is writable by anyone.
So I tried cheating with:
seingramp#seluonkeydb01:~$ ./tp_dump -a -t table_name -D /tmp/ -k "FORMAT 'csv', HEADER, QUOTE '\"', DELIMITER '|'" -K "_$DATE_POSTFIX"
tp_dump: warning: there are circular foreign-key constraints on this table:
tp_dump: table_name
tp_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
tp_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
--
-- PostgreSQL database dump
--
-- Dumped from database version 12.3
-- Dumped by pg_dump version 14devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
--
-- Data for Name: material_master; Type: TABLE DATA; Schema: mm; Owner: postgres
--
COPY table_name (pkey_id, field1, field2 ... fieldn) FROM stdin;
tp_dump: error: query failed:
tp_dump: error: query was: COPY table_name (pkey_id, field1, field2 ... fieldn) TO PROGRAM 'gzip > /tmp/table_name_20200814.csv.gz' WITH (FORMAT 'csv', HEADER, QUOTE '"', DELIMITER '|')
I have neutered the data as it is customer specific.
I didn't find pg_dump's error message very helpful, do you have any ideas as to what I am doing wrong?
The changes really are quite small (excuse the code!) starting ~line 1900, ignoring the flags added around getopt().
/*
* Use COPY (SELECT ...) TO when dumping a foreign table's data, and when
* a filter condition was specified. For other cases a simple COPY
* suffices.
*/
if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
{
/* Note: this syntax is only supported in 8.2 and up */
appendPQExpBufferStr(q, "COPY (SELECT ");
/* klugery to get rid of parens in column list */
if (strlen(column_list) > 2)
{
appendPQExpBufferStr(q, column_list + 1);
q->data[q->len - 1] = ' ';
}
else
appendPQExpBufferStr(q, "* ");
if ( copy_from_spec )
{
if ( copy_from_postfix )
{
appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "",
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_postfix,
copy_from_spec);
}
else
{
appendPQExpBuffer(q, "FROM %s %s) TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "",
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_spec);
}
}
else
{
appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
fmtQualifiedDumpable(tbinfo),
tdinfo->filtercond ? tdinfo->filtercond : "");
}
}
else
{
if ( copy_from_spec )
{
if ( copy_from_postfix )
{
appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
column_list,
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_postfix,
copy_from_spec);
}
else
{
appendPQExpBuffer(q, "COPY %s %s TO PROGRAM 'gzip > %s%s.csv.gz' WITH (%s)",
fmtQualifiedDumpable(tbinfo),
column_list,
copy_from_dest ? copy_from_dest : "",
fmtQualifiedDumpable(tbinfo),
copy_from_spec);
}
}
else
{
appendPQExpBuffer(q, "COPY %s %s TO stdout;",
fmtQualifiedDumpable(tbinfo),
column_list);
}
I tried a couple of other cheats too, like specifying a directory owned by postgres. I know it's a quick hack but I hope you can help, and thanks for looking.
This is a use case for pg_restore -f.
So:
-- Create custom format dump file
pg_dump -d some_db -U some_user -Fc -f dump.out
-- Move that file to where you need it
-- Dump data only from named table to a file from the dump file.
pg_restore -a -t table_1 -f table_1_data.sql dump.out
The pg_dump will create a consistent snapshot of the tables, so you have the database in a 'frozen' state in dump.out. Then you can use pg_restore to 'thaw out' those parts you need on your schedule. By using -a you will get the COPY you want.

ERROR: invalid input syntax for type timestamp with time zone

I have a PostgreSQL query as below which is running fine . I am calling it from a shell script as below
Result=$(psql -U username -d database -t -c
$'SELECT round(sum(i.total), 2) AS "ROUND(sum(i.total), 2)"
FROM invoice i
WHERE i.create_datetime = '2019-03-01 00:00:00-06'
AND i.is_review = '1' AND i.user_id != 60;')
now I want the value which I have hard coded as i.create_datetime = '2019-03-01 00:00:00-06' to replace it with a variable date value.
I have tried two ways
way 1:
Result=$(psql -U username -d database -t -c
$'WITH var(reviewMonth) as (values(\'$reviewMonth\'))
SELECT round(sum(i.total),2) AS "ROUND(sum(i.total),2)"
FROM var,invoice i
WHERE i.create_datetime = var.reviewMonth::timestamp
AND i.is_review = \'1\' AND i.user_id != 60;')
and
way 2:
Result=$(psql -U username -d database -t -c
$'SELECT round(sum(i.total),2) AS "ROUND(sum(i.total),2)"
FROM invoice i
WHERE i.create_datetime = \'$reviewMonth\'
AND i.is_review = \'1\' AND i.user_id != 60;')
But both way it's throwing error
way 1 throwing error as :
ERROR: operator does not exist: timestamp with time zone = text
way 2 throwing error as :
ERROR: invalid input syntax for type timestamp with time zone: "$reviewMonth"
Please suggest what should be my approach.
You should try using the psql variables. Here's an example:
# Put the query in a file, with the variable TSTAMP:
> echo "SELECT :'TSTAMP'::timestamp with time zone;" > query.sql
> export TSTAMP='2019-03-01 00:00:00-06'
> RESULT=$(psql -U postgres -t --variable=TSTAMP="$TSTAMP" -f query.sql )
> echo $RESULT
2019-03-01 06:00:00+00
Note how we format the string literal substitution in the query: :'TSTAMP'
You could also do the substitution yourself. Here's an example using a heredoc:
> export TSTAMP='2019-03-01 00:00:01-06'
> RESULT=$(psql -U postgres -t << EOF
SELECT '$TSTAMP'::timestamp with time zone;
EOF
)
> echo $RESULT
2019-03-01 06:00:01+00
In this case, we aren't using psql's variable substitution, so we have to quote the variable like '$TSTAMP' . Using a heredoc makes the quoting much simpler than using -c because you aren't trying to quote the whole command.
EDIT: more examples because it appears this wasn't clear enough. TSTAMP does not have to be hard coded, it's just a bash variable than can be set like any other bash variable.
> TSTAMP=$(date -d 'now' +'%Y-%m-01 00:00:00')
> RESULT=$(psql -U postgres -t << EOF
SELECT '$TSTAMP'::timestamp with time zone;
EOF
)
> echo $RESULT
2019-06-01 00:00:00+00
However, if you're really just looking for the start of the month, there's no need for shell variables at all
> RESULT=$(psql -U postgres -t << EOF
SELECT date_trunc('month', now());
EOF
)
> echo $RESULT
2019-06-01 00:00:00+00

Redis Mass Insertion from Postgresql file

Hello I am trying to migrate from Mysql to Postgresql.
I have an SQL file which queries some records and I want to put this in Redis with mass insert.
In Mysql it was working below this sample command;
sudo mysql -h $DB_HOST -u $DB_USERNAME -p$DB_PASSWORD $DB_DATABASE --skip-column-names --raw < test.sql | redis-cli --pipe
I figured out test.sql file for Postgresql syntax.
SELECT
'*3\r\n' ||
'$' || length(redis_cmd::text) || '\r\n' || redis_cmd::text || '\r\n' ||
'$' || length(redis_key::text) || '\r\n' || redis_key::text || '\r\n' ||
'$' || length(sval::text) || '\r\n' || sval::text || '\r\n'
FROM (
SELECT
'SET' as redis_cmd,
'ddi_lemmas:' || id::text AS redis_key,
lemma AS sval
FROM ddi_lemmas
) AS t
and its one output like
"*3\r\n$3\r\nSET\r\n$11\r\nddi_lemmas:1\r\n$22\r\nabil+abil+neg+aor+pagr\r\n"
But I couldn't find any example like Mysql command piping from command line.
There are some examples that have two stages not directly (first insert to a txt file and then put it in Redis)
sudo PGPASSWORD=$PASSWORD psql -U $USERNAME -h $HOSTNAME -d $DATABASE -f test.sql > data.txt
Above command working but with column names which i dont want.
I am trying to find directly send output of Postgresql result to Redis.
Could you help me please?
Solution:
If I want to insert with RESP commands from a sql file. (with the help of #teppic )
echo -e "$(psql -U $USERNAME -h $HOSTNAME -d $DATABASE -AEt -f test.sql)" | redis-cli --pipe
From the psql man page, -t will "Turn off printing of column names and result row count footers, etc."
-A turns off alignment, and -q sets "quiet" mode.
It looks like you're outputting RESP commands, in which case you'll have to use the escaped string format to get the newline/carriage return pairs, e.g. E'*3\r\n' (note the E).
It might be simpler to pipe SET commands to redis-cli:
psql -At -c "SELECT 'SET ddi_lemmas:' || id :: TEXT || ' ' || lemma FROM ddi_lemmas" | redis-cli

How to write PL/pgsql DDL to create schemas in redshift and then looping the ddls to create tables in the respective schemas?

I am trying to create multiple schemas using a ddl that will execute in redshift and then loop around those schemas execute the ddls that i have already created for creation of tables in those schemas. Can anybody suggest me right approach to do it?
Redshift does not support PL/pgsql. You will need to do this externally, e.g., use a shell script.
#!/bin/bash
# Create schemas in schema list
# * Generate check SQL
# * Check if schema exists
# * Create schema if it doesn't
echo " --------------------------------------------"
echo " CREATING SCHEMAS >>"
while read p; do
echo " '$p' >>";
sql="SELECT 1 as exists FROM information_schema.schemata WHERE schema_name = '$p';"
check=`psql -d "$1" -t -c "$sql"`;
# If $check is `null`
if [ -z "$check" ]; then
sql="CREATE SCHEMA $p;";
create=`psql -d "$1" -c "$sql";`
# If $create = `CREATE SCHEMA`
if [ "$create" = "CREATE SCHEMA" ]; then
echo " << '$p' - CREATED";
else
echo " << '$p' - ERROR";
fi
else
echo " << '$p' - EXISTS";
fi
done < ../Configuration/SETUP_Schemas.txt
echo " << DONE"
echo " --------------------------------------------"