How do you change the character encoding of a postgres database?

How do you change the character encoding of a postgres database? - postgresql

I have a database that was set up with the default character set SQL_ASCII. I want to switch it to UNICODE. Is there an easy way to do that?

First off, Daniel's answer is the correct, safe option.
For the specific case of changing from SQL_ASCII to something else, you can cheat and simply poke the pg_database catalogue to reassign the database encoding. This assumes you've already stored any non-ASCII characters in the expected encoding (or that you simply haven't used any non-ASCII characters).
Then you can do:
update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb'
This will not change the collation of the database, just how the encoded bytes are converted into characters (so now length('£123') will return 4 instead of 5). If the database uses 'C' collation, there should be no change to ordering for ASCII strings. You'll likely need to rebuild any indices containing non-ASCII characters though.
Caveat emptor. Dumping and reloading provides a way to check your database content is actually in the encoding you expect, and this doesn't. And if it turns out you did have some wrongly-encoded data in the database, rescuing is going to be difficult. So if you possibly can, dump and reinitialise.

To change the encoding of your database:
Dump your database
Drop your database,
Create new database with the different encoding
Reload your data.
Make sure the client encoding is set correctly during all this.
Source: http://archives.postgresql.org/pgsql-novice/2006-03/msg00210.php

Dumping a database with a specific encoding and try to restore it on another database with a different encoding could result in data corruption.
Data encoding must be set BEFORE any data is inserted into the database.
Check this :
When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data.
And this :
Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed in Section 22.2.) The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command.
I would rather rebuild everything from the begining properly with a correct local encoding on your debian OS as explained here :
su root
Reconfigure your local settings :
dpkg-reconfigure locales
Choose your locale (like for instance for french in Switzerland : fr_CH.UTF8)
Uninstall and clean properly postgresql :
apt-get --purge remove postgresql\*
rm -r /etc/postgresql/
rm -r /etc/postgresql-common/
rm -r /var/lib/postgresql/
userdel -r postgres
groupdel postgres
Re-install postgresql :
aptitude install postgresql-9.1 postgresql-contrib-9.1 postgresql-doc-9.1
Now any new database will be automatically be created with correct encoding, LC_TYPE (character classification), and LC_COLLATE (string sort order).

Daniel Kutik's answer is correct, but it can be even more safe, with database renaming.
So, the truly safe way is:
Create new database with the different encoding and name
Dump your database
Restore dump to the new DB
Test that your application runs correctly with the new DB
Rename old DB to something meaningful
Rename new DB
Test application again
Drop the old database
In case of emergency, just rename DBs back

# dump into file
pg_dump myDB > /tmp/myDB.sql
# create an empty db with the right encoding (on older versions the escaped single quotes are needed!)
psql -c 'CREATE DATABASE "tempDB" WITH OWNER = "myself" LC_COLLATE = '\''de_DE.utf8'\'' TEMPLATE template0;'
# import in the new DB
psql -d tempDB -1 -f /tmp/myDB.sql
# rename databases
psql -c 'ALTER DATABASE "myDB" RENAME TO "myDB_wrong_encoding";'
psql -c 'ALTER DATABASE "tempDB" RENAME TO "myDB";'
# see the result
psql myDB -c "SHOW LC_COLLATE"

I had the same issue in postgres 11 and I did change the database encoding using the below steps,
to update all the list of encoding
SET client_encoding = 'UTF8';
UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'dbname' ;
make sure to apply the update statment in template0 and template1 and postgres Database
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template0';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template1';
UPDATE 1

Related

pg_dump custom format file contains 'DROP DATABASE'

I wish to pg_dump a specific schema from one database and pg_restore it into an already existing database without dropping it.
The command I have been using for the pg_dump is as follows:
pg_dump -n mySchema -Z 9 -b -f mySchema.sql.gz -F c -U ${db_user} -h ${db_host} ${db_name}
-F c generates a custom format file suitable for pg_restore.
I'm aware that if I included the --clean flag it would 'clean (drop) database objects prior to outputting the commands for creating them.' according to the documentation here.
I did not include this flag.
However, when I run head on the generated file, I can see the following within it: DROP DATABASE myDatabase. Why is it here? I'm afraid that if I do a pg_restore with this file that it will drop my existing database.
Here is what head returns:
PGDMP
ymyDatabase11.8"11.10 (Ubuntu 11.10-1.pgdg18.04+1)�0ENCODINENCODINGSET client_encoding = 'UTF8';
false�00
STDSTRINGS
STDSTRINGS(SET standard_conforming_strings = 'on';
false�00
SEARCHPATH
SEARCHPATH8SELECT pg_catalog.set_config('search_path', '', false);
false�126221356smarDATABASEwCREATE DATABASE myDatabase WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
DROP DATABASE myDatabase;
myUserfalse�00DATABASE smartACL/GRANT CONNECT ON DATABASE myDatabase TO readaccess;

A custom format dump includes the DDL to drop the listed objects.
That is because with a custom format dump, you can specify the --clean option with pg_restore, that is, you don't need if you need the DROP statements until the dump is restored.
However, if you don't use --clean with pg_restore, the DROP statements won't be executed, so you don't have to worry.

Restore one table from a database in PostgreSQL and save it with plain/text data format

I have a table (tst2) in a database (tweets) in PostgreSQL and I need to have a plain/text format file out of it, I was wondering if there is any possible solution with pg_dump ? something like :
pg_dump -t tst2 tweets -f plain >...
also if I am in a wrong way please let me know?!

There are a couple of ways to dump a table into a text file.
First, you can use pg_dump, as you intended. In that case you'll get a SQL script to restore the tables. Just fix your command a bit:
pg_dump -t tst2 -t tweets -F plain >...
Second, you can dump contents of a table with copy command. There are either SQL version of the command (files will be created on the server):
copy tst2 to 'tst2.txt';
copy tweets to 'tweets.txt';
Or client-side psql version (files will be created on your client computer):
\copy tst2 to 'tst2.txt';
\copy tweets to 'tweets.txt';

pg_dump works for me, though there is some clutter before and after the table (after all, dumps are supposed to be used to fill up the table at recovery time).
I'm not sure what you use the > operator for; the dump goes to the file `plane'.
Your error message would help, of course.
On the other hand, what's wrong with using psql with a .pgpass password file and setting the PGDATABASE, PGHOST, PGPORT, and PGUSER anvironment variables, e.g.:
export PGDATABASE=tweet
# PGHOST, PGPORT, and PGUSER as per your setup
psql -c 'select * from tst2'

Constraints missing after pg_restore

After dumping a table and importing it to another postgres db constraints are missing.
I'm using this to dump:
pg_dump --host=local --username=user -W --encoding=UTF-8 -j 10 --file=dump_test --format=d -s --dbname=mydb -t addendum
This to import:
pg_restore -d myOtherdb --host=local -n public --username=user -W --exit-on-error --format=d -j 10 -t addendum dump_test/
What I can see in the resulting toc.dat is something like this:
ADD CONSTRAINT pk_addendum PRIMARY KEY (addendum_id);
> ALTER TABLE ONLY public.addendum DROP CONSTRAINT pk_addendum;
That looks like its creating and destroying the PK, but I'm not sure if my interpretation is correct as the file is binary.
edit: I'm using PostgreSQL 9.3

From the documentation:
Note: When -t is specified, pg_dump makes no attempt to dump any other database objects that the selected table(s) might depend upon. Therefore, there is no guarantee that the results of a specific-table dump can be successfully restored by themselves into a clean database.
You thus have some admittedly unattractive choices:
You can rebuild the constraints manually, especially if you still have the DDL which created them.
You can do a database-wide pg_dump to text, obtain the constraint DDL from there, see step 1.
You can do a database-wide pg_dump, and restore it fully.

I had the situation where the table already exists but using pg_restore deleted the constraints of the table.
There is an accepted answer already but I will try to provide an answer for those cases where the table to be restored is already available. In such cases, the constraints are deleted, only if you are trying to drop and recreate the table (-c or -C). Whereas if you only want the data from the dump you can perform delete all records on the table (DELETE FROM tableName) and then use pg_restore with -a flag. You can thus exclude -c or -C flag from you pg_restore command.

A little late to the party but here's something that may help.
If you're restoring a single table from a large dump file and having trouble getting the indexes with pg_restore (-t doesn't do indexes and constraints)
pg_restore db_dump_file.dump | awk '/table_name/{nr[NR]; nr[NR+1]}; NR in nr' > table_name_indexes_tmp.psql
You also need the subsequent line after a match for indexes and constraints. The awk command above gets line + 1 after every match.
This output file should contain your indexes (assuming the dump file actually contains them, plus data). Then you can apply them back to the table you restored as individual commands.
Not a perfect solution but better than trying to re-create them manually.

remove tablespaces from pg_dump

I am trying to write a script to import a database schema from a remote machine that only accepts ssh connections to a local one.
I managed to do anything except keep the same encoding has the remote database.
I found out that the solution was using pg_dump with -C (create) and that way I would be able to create the database with the same encoding but I faced a problem... there is a table space in the remote database and I dont want to import it.
I know that recent versions of psql already have the no-tablespace argument... but unlucky me, I'm not allowed to upgrade the postgres version.
Could someone tell me a way to remove all the tablespace ocurrences on a sql dump? like with sed or something.
Thanks a lot!

I used to switch tablespaces between installations by piping pg_dump through sed where I altered the TABLESPACE clause.
You can also just remove it and additionally remove CREATE TABLESPACE ... from the dump file with any editor and you are good to load it to another DB cluster.
I long since moved on to newer versions where I can use the --no-tablespaces option. Depending on your setup, a shell command could look something like this in Linux - from the top of my head, only tested cursory:
pgdump -h 123.456.7.89 -p 5432 mydb \
| sed \
-e' /^CREATE TABLESPACE / d' \
-e 's/ *TABLESPACE .*;/;/' \
-e "s/SET default_tablespace = .*;/SET default_tablespace = '';/"
| psql -p5432 mylocaldb
-e' /^CREATE TABLESPACE / d' ... delete lines beginning with "CREATE TABLESPACE ".
-e 's/ *TABLESPACE .*;/;/' ... trim the tablespace clause (always at the end of the line in pg_dump output) from CREATE TABLE or CREATE INDEX statements.
-e "s/SET default_tablespace = .*;/SET default_tablespace = '';" .. do away with any other default tablespace than the empty string - which signifies the default tablespace of the current db. Note the use of double quote ", so I can easily enter single quotes '.
If you know the name of the tablespace involved you can narrow this down. There is a theoretical possibility that a data line could start like one of the search terms. I have never encountered problems myself, though.
Check out a page like this for more info on sed.

I want to restore the database with a different schema

I have taken a dump of a database named temp1, by using the follwing command
$ pg_dump -i -h localhost -U postgres -F c -b -v -f pub.backup temp1
Now I want to restore the dump in a different database called "db_temp" , but in that I just want that all the tables should be created in a "temp_schema" ( not the default schema which is in the fms temp1 database ) which is in the "db_temp" database.
Is there any way to do this using pg_restore command?
Any other method also be appreciated!

A quick and dirty way:
1) rename default schema:
alter schema public rename to public_save;
2) create new schema as default schema:
create schema public;
3) restore data
pg_restore -f pub.backup db_temp [and whatever other options]
4) rename schemas according to need:
alter schema public rename to temp_schema;
alter schema public_save rename to public;

There is a simple solution:
Create your backup dump in plain SQL format (format "p" using the parameter --format=p or -F p)
Edit your pub.backup.sql dump with your favorite editor and add the following two lines at the top of your file:
create schema myschema;
SET search_path TO myschema;
Now you can restore your backup dump with the command
psql -f pub.backup.sql
The set search_path to <schema> command will set myschema as the default, so that new tables and other objects are created in this schema, independently of the "default" schema where they lived before.

There's no way in pg_restore itself. What you can do is use pg_restore to generate SQL output, and then send this through for example a sed script to change it. You need to be careful about how you write that sed script though, so it doesn't match and change things inside your data.

Probably the easiest method would be to simply rename the schema after restore, ie with the following SQL:
ALTER SCHEMA my_schema RENAME TO temp_schema
I believe that because you're using the compressed archive format for the output of pg_dump you can't alter it before restoring. The option would be to use the default output and do a search and replace on the schema name, but that would be risky and could perhaps cause data to be corrupted if you were not careful.

If you only have a few tables then you can restore one table at a time, pg_restore accepts -d database when you specify -t tablename. Of course, you'll have to set up the schema before restoring the tables and then sort out the indexes and constraints when you're done restoring the tables.
Alternatively, set up another server on a different port, restore using the new PostgreSQL server, rename the schema, dump it, and restore into your original database. This is a bit of a kludge of course but it will get the job done.
If you're adventurous you might be able to change the database name in the dump file using a hex editor. I think it is only mentioned in one place in the dump and as long as the new and old database names are the same it should work. YMMV, don't do anything like this in a production environment, don't blame me if this blows up and levels your home town, and all the rest of the usual disclaimers.

Rename the schema in a temporary database.
Export the schema:
pg_dump --schema-only --schema=prod > prod.sql
Create a new database. Restore the export:
psql -f prod.sql
ALTER SCHEMA prod RENAME TO somethingelse;
pg_dump --schema-only --schema=somethingelse > somethingelse.sql
(delete the database)
For the data you can just modify the set search_path at the top.

As noted, there's no direct support in pg_dump, psql or pg_restore to change the schema name during a dump/restore process. But it's fairly straightforward to export using "plain" format then modify the .sql file. This Bash script does the basics:
rename_schema () {
# Change search path so by default everything will go into the specified schema
perl -pi -e "s/SET search_path = $2, pg_catalog/SET search_path = $3, pg_catalog, $2;/" "$1"
# Change 'ALTER FUNCTION foo.' to 'ALTER FUNCTION bar.'
perl -pi -e 's/^([A-Z]+ [A-Z]+) '$2'\./$1 '$3'./' "$1"
# Change the final GRANT ALL ON SCHEMA foo TO PUBLIC
perl -pi -e 's/SCHEMA '$2'/SCHEMA '$3'/' "$1"
}
Usage:
pg_dump --format plain --schema=foo --file dump.sql MYDB
rename_schema dump.sql foo bar
psql -d MYDB -c 'CREATE SCHEMA bar;'
psql -d MYDB -f dumpsql

The question is pretty old, but maybe can help some one.
Streaming the output of pg_restore to sed and replace the schema name in order to import the dump to a different schema.
Something like:
pg_restore ${dumpfile} | \
sed -e "s/OWNER TO ${source_owner}/OWNER TO ${target_owner}/" \
-e "s/${source_schema}/${target_schema}/" | \
psql -h ${pgserver} -d ${dbname} -U ${pguser}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse