Cannot restore data from pg_dump due to blank strings being treated as nonexistant data - postgresql

I have a database currently on a PostgreSQL 9.3.9 server that I am backing up with pgdump in the simplest possible fashion, eg pg_dump orb > mar_9_2018.db.
One of those tables (linktags) has the following definition:
CREATE TABLE linktags (
linktagid integer NOT NULL,
linkid integer,
tagval character varying(1000)
);
When attempting to restore the database on PostgreSQL 11.2 via
cat mar_9_2018.db | docker exec -i pg-docker psql -U postgres
(docker container restore) the table returns empty because of the following error -
ERROR: missing data for column "tagval"
CONTEXT: COPY linktags, line 737: "1185 9325"
setval
I checked the db file and found that there are missing tabs where I would expect some sort of information, and clearly the restore process does as well.
I also verified that the value in the database is a blank string.
So -
Is there an idiomatic method to backup and restore a postgres database I am missing?
Is my version old enough that this version of pg_dump should have some special considerations?
Am I just restoring this wrong?
Edit:
I did some further research and found that I was incorrect in the original checking of NULLs, it was instead blank strings that are causing the issue.
If I make an example table with null strings and then blank strings, I can see the NULLs get a newline but the blank does not

pg_dump has an option to use INSERT instead of COPY
pg_dump -d db_name --inserts
as the manual warns, it might make restoration slow (and much larger dump file). Even in case of some inconsistencies tables will be filled with valid rows.
Another problem is with empty tables, pg_dump generates empty copy statement like:
COPY config (key, value) FROM stdin;
\.
in this case you'll be getting errors on reimport like:
ERROR: invalid input syntax for type smallint: " "
CONTEXT: COPY config, line 1, column group: " "
which doesn't happen with --insert option (no insert statement is generated).

Related

What’s the difference between COPY and pg_dump —data-only

I’m planning to migrate some tables out of existing database so the plan is to apply the schema on new database and then COPY data for each table.
What’s the difference of that versus pg_dump the data and then pg_restore?
Would copy needs restablishing indexes etc?
If you use pg_dump --data-only it will output the data as COPY statements, unless you override with --inserts or --column-inserts. So there is no difference in that case. In either case if the tables in the new database where not created with indexes they would need to be added. You could solve that with either -s -t <some_table> to get just the table schema or -t <some_table> to get the table schema and data.
pg_dump --data-only will produce a complete SQL script that can be run with psql. That script contains both the COPY statement and the data:
COPY laurenz.data_2020 (id, d, x) FROM stdin;
1499906 2020-11-07 13:26:00 x
1499907 2020-11-07 13:27:00 x
1499908 2020-11-07 13:28:00 x
\.
So it is all in one, and there is no danger that you restore a file to the wrong table, for example.
Other that convenience, there is no fundamental difference to running COPY directly.

Dump fails to re-create index over array of hstore column

I'm dumping a large database with pg_dump -O -U <user> <db> >dump.sql.
Here is the gist of dump.sql with everything irrelevant stripped.
When importing the dump into another Postgres instance (identical setup) with psql -f dump.sql -U <user> <db>, the following error happens on the CREATE INDEX line 147:
psql:dumped.sql:147: ERROR: type "hstore" does not exist
LINE 5: element hstore;
^
QUERY:
DECLARE
arrHstore ALIAS FOR $1;
key ALIAS FOR $2;
element hstore;
string text;
BEGIN
FOREACH element IN ARRAY arrHstore LOOP
string := concat(string, ' ', element->key);
END LOOP;
RETURN trim(leading from string, ' ');
END;
CONTEXT: compilation of PL/pgSQL function "immutable_array_to_string" near line 5
So everything but the final CREATE INDEX has worked.
Now I connect to the database with psql -U <user> <db> and paste the previously failing CREATE INDEX command... the index is created without any problem.
A few things I've tried:
Dumping only the structure with --schema-only creates a dump that imports just fine. The problem only occurs if there is at least one row inserted as part of the dump.
I tend to rule out owner/permissions as the cause because I create the dump with -O.
This particular index is years old and similar dumps have worked fine until recently. Therefore, it might be related to upgrading to Postgres 10.3. However, I have no easy way to test the dump on older versions since it's not downwards compatible.
Any idea what's going on here? Thanks a bunch in advance!
That's because of the security fixes concerning the public schema in the latest point release of PostgreSQL.
Either change the function so that it refers to the type hstore with its schema: public.hstore, or add SET search_path = public to the CREATE FUNCTION statement.
To schema-qualify the hstore operator ->, you can replace
element -> key
with
element OPERATOR(public.->) key
Similar for other operators.

PostgreSql : ERROR: relation "sequence" does not exist while restoring from dump file

I get the following error while restoring database from dump file on server:
ERROR: relation "table_id_seq" does not exist
LINE 1: SELECT pg_catalog.setval('table_id_seq', 362, true);
my local psql version is 10.2
server psql version is 9.6.8
Here is my dump command:
pg_dump -U username -h localhost db_name > filename.sql
Here is my restore command on server:
psql -U username -h localhost db_name < filename.sql
Please help, Thanks.
After I got information from #clemens and make some research I found that, in my dump file on section CREATE SEQUENCE table_id_seq has a statement AS integer that why when I restored into new database it did not create the nextval() for the sequence. If I remove the statement AS integer from the CREATE SEQUENCE section it works find.
In my dump file:
CREATE SEQUENCE table_id_seq
AS integer
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
Remove AS integer from dump file
CREATE SEQUENCE table_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
You can open the dump file with any text editor (Notepad, Vim, etc.). Search for table_id_seq. You should find a statement like
CREATE SEQUENCE table_id_seq ...
If it is missing then there is something strange with your dump. You might fix that by adding
CREATE SEQUENCE table_id_seq;
immediately in front of the statement
SELECT pg_catalog.setval('table_id_seq', 362, true);
from the error message.
But this is just a hack. You were supposed to find out why the dump made that mistake. But that requires more information.
In my case, the sequence checking is case-sensitive. That's why I was getting the relation error. So maybe it helps some people like me who end up desperately here. I've used a double-quote inside the single quotation mark in the SQL statement.
SELECT nextval('"USER_ID_seq"');
There're some examples in the official documentation:
https://www.postgresql.org/docs/9.1/functions-sequence.html

Postgres: Combining multiple COPY TO outputs to a postgres-importable file

I have my database hosted on heroku, and I want to download specific parts of the database (e.g. all the rows with id > x from table 1, all the rows with name = x from table 2, etc.) in a single file.
From some research and asking a question here it seems that some kind of modified pg_dump would solve my problem. However, I won't be able to use pg_dump because I won't have access to the command line (basically I want to be able to click a button in my web app and it will generate + download the database file).
So my new strategy is to use the postgres copy command. I'll go through the various tables in my server database, run COPY (Select * FROM ... WHERE ...) TO filename , where filename is just a temporary file that I will download when complete.
The issue is that this filename file will just have the rows, so I can't just turn around and import it into pgadmin. Assuming I have an 'empty' database set up (the schema, indices, and stuff are all already set up), is there a way I can format my filename file so that it can be easily imported into a postgres db?
Building on my comment about to/from stdout/stdin, and answering the actual question about including multiple tables in one file; you can construct the output file to interleave copy ... from stdin with actual data and load it via psql. For example, psql will support input files that look like this:
copy my_table (col1, col2, col3) from stdin;
foo bar baz
fizz buzz bizz
\.
(Note the trailing \. and that the separators should be tabs; you could also specify the delimiter option in the copy command).
psql will treat everything between the ';' and '.' as stdin. This essentially emulates what pg_dump does when you export table data and no schema (e.g., pg_dump -a -t my_table).
The resulting load could be as simple as psql mydb < output.dump.

PostgreSQL - batch + script + variable

I am not a programmer, I am struggling a bit with this.
I have a batch file connecting to my PostgreSQL server, and then open a sql script. Everything works as expected. My question is how to pass a variable (if possible) from one to the other.
Here is my batch file:
set PGPASSWORD=xxxx
cls
#echo off
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres -f C:\TotalProteinImport.sql
And here's the script:
copy totalprotein from 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
update anagrafica
set pt=(select totalprotein.resultvalue from totalprotein where totalprotein.accessionnbr=anagrafica.id)
where data_analisi = '12/23/2011';
delete from totalprotein;
This is working great, now the question is how could I pass a variable that would carry the date for data_analisi?
Like in the batch file, "Please enter date", and then the value is passed to the sql script.
You could create a function out of your your SQL script like this:
CREATE OR REPLACE FUNCTION f_myfunc(date)
RETURNS void AS
$BODY$
CREATE TEMP TABLE t_tmp ON COMMIT DROP AS
SELECT * FROM totalprotein LIMIT 0; -- copy table-structure from table
COPY t_tmp FROM 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
UPDATE anagrafica a
SET pt = t.resultvalue
FROM t_tmp t
WHERE a.data_analisi = $1
AND t.accessionnbr = a.id;
-- Temp table is dropped automatically at end of session
-- In this case (ON COMMIT DROP) after the transaction
$BODY$
LANGUAGE sql;
You can use language SQL for this kind of simple SQL batch.
As you can see I have made a couple of modifications to your script that should make it faster, cleaner and safer.
Major points
For reading data into an empty table temporarily, use a temporary table. Saves a lot of disc writes and is much faster.
To simplify the process I use your existing table totalprotein as template for the creation of the (empty) temp table.
If you want to delete all rows of a table use TRUNCATE instead of DELETE FROM. Much faster. In this particular case, you need neither. The temporary table is dropped automatically. See comments in function.
The way you updated anagrafica.pt you would set the column to NULL, if anything goes wrong in the process (date not found, wrong date, id not found ...). The way I rewrote the UPDATE, it only happens if matching data are found. I assume that is what you actually want.
Then ask for user input in your shell script and call the function with the date as parameter. That's how it could work in a Linux shell (as user postgres, with password-less access (using IDENT method in pg_haba.conf):
#! /bin/sh
# Ask for date. 'YYYY-MM-DD' = ISO date-format, valid with any postgres locale.
echo -n "Enter date in the form YYYY-MM-DD and press [ENTER]: "
read date
# check validity of $date ...
psql db -p5432 -c "SELECT f_myfunc('$date')"
-c makes psql execute a singe SQL command and then exits. I wrote a lot more on psql and its command line options yesterday in a somewhat related answer.
The creation of the according Windows batch file remains as exercise for you.
Call under Windows
The error message tells you:
Function tpimport(unknown) does not exist
Note the lower case letters: tpimport. I suspect you used mixe case letters to create the function. So now you have to enclose the function name in double quotes every time you use it.
Try this one (edited quotes!):
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres
-c "SELECT ""TPImport""('%dateimport%')"
Note how I use singe and double quotes here. I guess this could work under windows. See here.
You made it hard for yourself when you chose to use mixed case identifiers in PostgreSQL - a folly which I never tire of warning against. Now you have to double quote the function name "TPImport" every time you use it. While perfectly legit, I would never do that. I use lower case letters for identifiers. Always. This way I never mix up lower / upper case and I never have to use double quotes.
The ultimate fix would be to recreate the function with a lower case name (just leave away the double quotes and it will be folded to lower case automatically). Then the function name will just work without any quoting.
Read the basics about identifiers here.
Also, consider upgrading to a more recent version of PostgreSQL 8.3 is a bit rusty by now.
psql supports textual replacement variables. Within psql they can be set using \set and used using :varname.
\set xyz 'abcdef'
select :'xyz';
?column?
----------
abcdef
These variables can be set using command line arguments also:
psql -v xyz=value
The only problem is that these textual replacements always need some fiddling with quoting as shown by the first \set and select.
After creating the function in Postgres, you must create a .bat file in the bin directory of your Postgres version, for example C:\Program Files\PostgreSQL\9.3\bin. Here you write:
#echo off
cd C:\Program Files\PostgreSQL\9.3\bin
psql -p 5432 -h localhost -d myDataBase -U postgres -c "select * from myFunction()"