Get mysqldump to dump data suitable for psql input (escaped single quotes) - postgresql

I'm trying to port a database from MySQL to PostgreSQL. I've rebuilt the schema in Postgres, so all I need to do is get the data across, without recreating the tables.
I could do this with code that iterates all the records and inserts them one at a time, but I tried that and it's waaayyyy to slow for our database size, so I'm trying to use mysqldump and a pipe into psql instead (once per table, which I may parallelize once I get it working).
I've had to jump through various hoops to get this far, turning on and off various flags to get a dump that is vaguely sane. Again, this only dumps the INSERT INTO, since I've already prepared the empty schema to get the data into:
/usr/bin/env \
PGPASSWORD=mypassword \
mysqldump \
-h mysql-server \
-u mysql-username \
--password=mysql-password \
mysql-database-name \
table-name \
--compatible=postgresql \
--compact \
-e -c -t \
--default-character-set=utf8 \
| sed "s/\\\\\\'/\\'\\'/g" \
| psql \
-h postgresql-server \
--username=postgresql-username \
postgresql-database-name
Everything except that ugly sed command is manageable. I'm doing that sed to try and convert MySQL's approach to quoting single-quotes inside of strings ('O\'Connor') o PostgreSQL's quoting requirements ('O''Connor'). It works, until there are strings like this in the dump: 'String ending with a backslash \\'... and yes, it seems there is some user input in our database that has this format, which is perfectly valid, but doesn't pass my sed command. I could add a lookbehind to the sed command, but I feel like I'm crawling into a rabbit hole. Is there a way to either:
a) Tell mysqldump to quote single quotes by doubling them up
b) Tell psql to expect backslashes to be interpreted as quoting escapes?
I have another issue with BINARY and bytea differences, but I've worked around that with a base64 encoding/decoding phase.
EDIT | Looks like I can do (b) with set backslash_quote = on; set standard_conforming_strings = off;, though I'm not sure how to inject that into the start of the piped output.

Dump the tables to TSV using mysqldump's --tab option and then import using psql's COPY method.

The file psqlrc and ~/.psqlrc may contain SQL commands to be executed when the client starts. You can put these three lines, or any other settings you would like in that file.
SET standard_conforming_strings = 'off';
SET backslash_quote = 'on';
SET escape_string_warning = 'off';
These settings for psql combined with the following mysqldump command will successfully migrate data only from mysql 5.1 to postgresql 9.1 with UTF-8 text (Chinese in my case). This method may be the only reasonable way to migrate a large database if creating an intermediate file would be too large or too time consuming. This requires you manually migrate the schema, since the two database's data types are vastly different. Plan on typing out some DDL to get it right.
mysqldump \
--host=<hostname> \
--user=<username> \
--password=<password> \
--default-character-set=utf8 \
--compatible=postgresql \
--complete-insert \
--extended-insert \
--no-create-info \
--skip-quote-names \
--skip-comments \
--skip-lock-tables \
--skip-add-locks \
--verbose \
<database> <table> | psql -n -d <database>

Try this:
sed -e "s/\\\\'/\\\\\\'/g" -e "s/\([^\\]\)\\\\'/\1\\'\\'/g"
Yeah, "Leaning Toothpick Syndrome", I know.

Related

Syntax error when running Query: "syntax error at or near "\""

I've generated a PostgreSQL script that I want to use to restore a database. When I go to my backup server to try to restore, I get the error: syntax error at or near "\".
It's getting stuck on the following characters \.
These appear like this:
COPY admin.roles (role_id, role_name, is_role_auto) from stdin;
\.
What's wrong with this statement? Is there config I missed? I'm on PostgreSQL 11.4 on Windows, the backup was taken with pg_dump, and I restore it using pgAdmin.
You cannot use pgAdmin to restore a "plain format" dump taken with pg_dump. It doesn't understand the psql syntax where COPY and its data are interleaved.
You will have to use psql to restore the dump:
psql -h server -p 5432 -U user -d database -f dumpfile.sql
It's really hard to know the specific error without seeing your backup and restore commands in their entirety, but if it helps, here is the boilerplate I use when I want to copy a table from production to a backup server:
$BIN/pg_dump -h production_server -p 5432 \
--dbname=postgres \
--superuser=postgres \
--verbose \
--format=c \
--compress=9 \
--table=admin.roles > backup.sql
$BIN/pg_restore \
--host=backup_server \
--port=5432 \
--username=postgres \
--dbname=postgres \
--clean \
--format=custom \
backup.sql
The format=c (or --format=custom) makes the content completely unreadable, but on a plus side it also avoids any weird errors with delimiters and the like, and it also perfectly copies complex data structures like arrays and BLOBs.

migrating from Postgres to MonetDB

I need to know how to migrate from Postgres to MonetDB. Postgres is getting slow and we are trying to change to Monet. Someone now if already exists a script or another thing to migrate to Monet?
exist something equivalent to plpgsql on MonetDB?
exist materialized view on MonetDB?
The following booklet may be relevant to quickly identify some syntactic feature differences. https://en.wikibooks.org/wiki/SQL_Dialects_Reference
And the citrus performance is covered in a blogpost
https://www.monetdb.org/content/citusdb-postgresql-column-store-vs-monetdb-tpc-h-shootout
firstly: You can export data from postgres like:
psql -h xxxxx -U xx -p xx -d postgres -c "copy (select * from db40.xxx) to '/tmp/xxx.csv' delimiter ';'"
secondly: You must replace the NULL like:
sed 's/\\N/NULL/g' xxx.csv >newxxx.csv
last: You can use this to copy data into monetdb like:
mclient -u monetdb -d voc -h 192.168.205.8 -p 50000 -s "COPY INTO newxxx from '/tmp/newxxx.csv' using delimiters ';';"

postgres psql error trying to pass parameters in sql script

In postgresql, I'm psql with the -v for variable input that I can call within a sql file.
For example from bash script, it looks like this:
"$PSQL_HOME"/psql -h $HOST_NM \
-p $PORT \
-U postgres \
-v v1=$1 \
-f Test.sql
...
..
From the sql file, it looks like this:
GRANT ALL ON TABLE mytable TO mra_dev_:v1;
GRANT ALL ON TABLE mytable TO mra_dev_:v1_load;
The first statement works, but the 2nd statement fails:
psql:Test.sql:472: ERROR: syntax error at or near ":"
LINE 1: GRANT ALL ON TABLE mytable TO mra_dev_:v1_load
^
How do I get around this? Somekind of escape or concat feature I can use for this?
My workaround was to add the string I needed to the parameter when called on the command line like this:
"$PSQL_HOME"/psql -h $HOST_NM \
-p $PORT \
-U postgres \
-v v1=$1 \
-v v2=$_load \
-f Test.sql
Then within the sql file, changed this:
GRANT ALL ON TABLE mytable TO mra_dev_:v2;
It works now.

How to restore postgres database into another database name

I use the postgres today
and got a problem
I dump the database that way
pg_dump zeus_development -U test > zeus_development.dump.out
what if I wnat to restore to another database zeus_production
How could I do?
Simple, first create your database using template0 as your template database:
createdb -U test -T template0 zeus_production
Then, restore your dump on this database:
psql -U test zeus_production -f /path/to/zeus_development.dump.out
When restoring, always use template0 explicit, as it is always an empty and unmodifiable database. If you don't use an explicit template, PostgreSQL will assume template1, and if it has some objects, like a table or function that your dumped database already has, you will get some errors while restoring.
Nonetheless, even if you were restoring on a database with the same name (zeus_development) you should create (or recreate) it the same way. Unless you used -C option while dumping (or -C of pg_restore if using a binary dump), which I don't recommend, because will give you less flexibility (like restoring on a different database name).
The PostgresSQL documentation has influenced me to use the custom format. I've been using it for years and it seems to have various advantages but your mileage may vary. That said, here is what worked for me:
pg_restore --no-owner --dbname postgres --create ~/Desktop/pg_dump
psql --dbname postgres -c 'ALTER DATABASE foodog_production RENAME TO foodog_development'
There was no foodog_development nor foodog_production databases existing before the sequence.
This restores the database from the dump (~/Desktop/pg_dump) which will create it with the name it was dumped as. The rename names the DB to whatever you want.
The --no-owner may not be needed if your user name is the same on both machines. In my case, the dump was done as user1 and the restore done as user2. The new objects need to be owned by user2 and --no-owner achieves this.
Isn't it easier to simply do the following?
createdb -U test -T zeus_development zeus_production
This has an answer on dba.stackexchange, which I reproduce here:
Let's define a few variables to make the rest easier to copy/paste
old_db=my_old_database
new_db=new_database_name
db_dump_file=backups/my_old_database.dump
user=postgres
The following assumes that your backup was created with the "custom" format like this:
pg_dump -U $user -F custom $old_db > "$db_dump_file"
To restore $db_dump_file to a new database name $new_db :
dropdb -U $user --if-exists $new_db
createdb -U $user -T template0 $new_db
pg_restore -U $user -d $new_db "$db_dump_file"
Here's a hacky way of doing it, that only works if you can afford the space and time to use regular .sql format, and if you can safely sed out your database name and user.
$ pg_dump -U my_production_user -h localhost my_production > my_prod_dump.sql
$ sed -i 's/my_production_user/my_staging_user/g' my_prod_dump.sql
$ sed -i 's/my_production/my_staging/g' my_prod_dump.sql
$ mv my_prod_dump.sql my_staging_dump.sql
$ sudo su postgres -c psql
psql> drop database my_staging;
psql> create database my_staging owner my_staging_user;
psql> \c my_staging;
psql> \i my_staging_dump.sql
If your dump does not include the name, the restore will use the DB defined in DESTINATION. Both SOURCE and DESTINATION are Connection URLs.
Dump without --create
pg_dump \
--clean --if-exists \
--file ${dump_path} \
--format=directory \
--jobs 5 \
--no-acl \
--no-owner \
${SOURCE}
Restore without --create
pg_restore \
--clean --if-exists \
--dbname=${DESTINATION} \
--format=directory \
--jobs=5 \
--no-acl \
--no-owner \
$dump_path

remove tablespaces from pg_dump

I am trying to write a script to import a database schema from a remote machine that only accepts ssh connections to a local one.
I managed to do anything except keep the same encoding has the remote database.
I found out that the solution was using pg_dump with -C (create) and that way I would be able to create the database with the same encoding but I faced a problem... there is a table space in the remote database and I dont want to import it.
I know that recent versions of psql already have the no-tablespace argument... but unlucky me, I'm not allowed to upgrade the postgres version.
Could someone tell me a way to remove all the tablespace ocurrences on a sql dump? like with sed or something.
Thanks a lot!
I used to switch tablespaces between installations by piping pg_dump through sed where I altered the TABLESPACE clause.
You can also just remove it and additionally remove CREATE TABLESPACE ... from the dump file with any editor and you are good to load it to another DB cluster.
I long since moved on to newer versions where I can use the --no-tablespaces option. Depending on your setup, a shell command could look something like this in Linux - from the top of my head, only tested cursory:
pgdump -h 123.456.7.89 -p 5432 mydb \
| sed \
-e' /^CREATE TABLESPACE / d' \
-e 's/ *TABLESPACE .*;/;/' \
-e "s/SET default_tablespace = .*;/SET default_tablespace = '';/"
| psql -p5432 mylocaldb
-e' /^CREATE TABLESPACE / d' ... delete lines beginning with "CREATE TABLESPACE ".
-e 's/ *TABLESPACE .*;/;/' ... trim the tablespace clause (always at the end of the line in pg_dump output) from CREATE TABLE or CREATE INDEX statements.
-e "s/SET default_tablespace = .*;/SET default_tablespace = '';" .. do away with any other default tablespace than the empty string - which signifies the default tablespace of the current db. Note the use of double quote ", so I can easily enter single quotes '.
If you know the name of the tablespace involved you can narrow this down. There is a theoretical possibility that a data line could start like one of the search terms. I have never encountered problems myself, though.
Check out a page like this for more info on sed.