What’s the difference between COPY and pg_dump —data-only - postgresql

I’m planning to migrate some tables out of existing database so the plan is to apply the schema on new database and then COPY data for each table.
What’s the difference of that versus pg_dump the data and then pg_restore?
Would copy needs restablishing indexes etc?

If you use pg_dump --data-only it will output the data as COPY statements, unless you override with --inserts or --column-inserts. So there is no difference in that case. In either case if the tables in the new database where not created with indexes they would need to be added. You could solve that with either -s -t <some_table> to get just the table schema or -t <some_table> to get the table schema and data.

pg_dump --data-only will produce a complete SQL script that can be run with psql. That script contains both the COPY statement and the data:
COPY laurenz.data_2020 (id, d, x) FROM stdin;
1499906 2020-11-07 13:26:00 x
1499907 2020-11-07 13:27:00 x
1499908 2020-11-07 13:28:00 x
\.
So it is all in one, and there is no danger that you restore a file to the wrong table, for example.
Other that convenience, there is no fundamental difference to running COPY directly.

Related

How to use pgbench?

I have a table on pgadmin4 which consist of 100.000 lines and 23 columns.I need to benchmark postgresql on this specific table using pgbench,but i cant understand what parameters should i use.The database name is desdb and table called test.
PgAdmin4 is not a database server, it is a client. You don't have tables "on" pgadmin4, pgadmin4 is just one way of accessing tables which are on an actual server.
You don't benchmark tables, you benchmark queries. Knowing nothing about the table other than its name, all I could propose for a query is something like:
select * from test
Or
select count(*) from test
You could put that in a file test.sql, then run:
pgbench -n -f test.sql -T60 -P5 desdb
If you are like me and don't like littering your filesystem with bunches of tiny files with contents of no particular interest and you if use the bash shell, you could not create a test.sql file and instead make it dynamic:
pgbench -n -f <(echo 'select * from test') -T60 -P5 desdb
Whether that is a meaningful query to be benchmarking, I don't know. Do you care about how fast you can read (and then throw away) all columns for all rows in the table?
you can refer details regarding pgbench from : https://www.cloudbees.com/blog/tuning-postgresql-with-pgbench.

Slow psql import

I have a set of backups from my databases and they are in sql format. I am using following command to import them
Restore: $ psql -U {user-name} -d {desintation_db} -f {dumpfilename.sql}
It works well but I noticed that if prints out logs and it seems it is importing them row by row, For a 200Mb database it takes long time to import and I have several databases which they are around 20GB. Is there any faster way to import them? This method seems not to practical at all.
They are imported in whatever fashion was encoded in the sql file. That is generally going to be with COPY, but you could have done it with individual INSERTs if that is what you told pg_dump to do.
You should use the custom format (-F c) or the directory format (-F d) to dump your data. Then you can parallelize restore with the -j option of pg_restore.
This parallelizes the COPY statements that load the data and the CREATE INDEX statements. if your database consist of a single large table, that won't help you, but otherwise you should see a performance improvement.

Cannot restore data from pg_dump due to blank strings being treated as nonexistant data

I have a database currently on a PostgreSQL 9.3.9 server that I am backing up with pgdump in the simplest possible fashion, eg pg_dump orb > mar_9_2018.db.
One of those tables (linktags) has the following definition:
CREATE TABLE linktags (
linktagid integer NOT NULL,
linkid integer,
tagval character varying(1000)
);
When attempting to restore the database on PostgreSQL 11.2 via
cat mar_9_2018.db | docker exec -i pg-docker psql -U postgres
(docker container restore) the table returns empty because of the following error -
ERROR: missing data for column "tagval"
CONTEXT: COPY linktags, line 737: "1185 9325"
setval
I checked the db file and found that there are missing tabs where I would expect some sort of information, and clearly the restore process does as well.
I also verified that the value in the database is a blank string.
So -
Is there an idiomatic method to backup and restore a postgres database I am missing?
Is my version old enough that this version of pg_dump should have some special considerations?
Am I just restoring this wrong?
Edit:
I did some further research and found that I was incorrect in the original checking of NULLs, it was instead blank strings that are causing the issue.
If I make an example table with null strings and then blank strings, I can see the NULLs get a newline but the blank does not
pg_dump has an option to use INSERT instead of COPY
pg_dump -d db_name --inserts
as the manual warns, it might make restoration slow (and much larger dump file). Even in case of some inconsistencies tables will be filled with valid rows.
Another problem is with empty tables, pg_dump generates empty copy statement like:
COPY config (key, value) FROM stdin;
\.
in this case you'll be getting errors on reimport like:
ERROR: invalid input syntax for type smallint: " "
CONTEXT: COPY config, line 1, column group: " "
which doesn't happen with --insert option (no insert statement is generated).

Placeholder in PostgreSQL sql file

I have multiple tables that are created in the same way (same columns, indexes, etc.)
I would like to have one sql file for creating them all without duplicating the create statements.
Is there a way to use some kind of placeholder in sql file which would be substituted when executing the sql file with a parameter?
For example I would like to have below sql statement:
drop table if exists schema.%PLACEHOLDER%;
create table schema.%PLACEHOLDER%(id text, data text);
And execute such script with:
psql -f mysqlfile.sql -magic_parameter my_desired_table_name
Is this possible when executing PostgreSQL sql files, or maybe other way to achieve the same (except using sed)?
Sincr you are using psql, you can use variables as follows:
drop table if exists schema.:placeholder;
The invocation is:
psql -f mysqlfile.sql -v placeholder=table_name

Postgres: Combining multiple COPY TO outputs to a postgres-importable file

I have my database hosted on heroku, and I want to download specific parts of the database (e.g. all the rows with id > x from table 1, all the rows with name = x from table 2, etc.) in a single file.
From some research and asking a question here it seems that some kind of modified pg_dump would solve my problem. However, I won't be able to use pg_dump because I won't have access to the command line (basically I want to be able to click a button in my web app and it will generate + download the database file).
So my new strategy is to use the postgres copy command. I'll go through the various tables in my server database, run COPY (Select * FROM ... WHERE ...) TO filename , where filename is just a temporary file that I will download when complete.
The issue is that this filename file will just have the rows, so I can't just turn around and import it into pgadmin. Assuming I have an 'empty' database set up (the schema, indices, and stuff are all already set up), is there a way I can format my filename file so that it can be easily imported into a postgres db?
Building on my comment about to/from stdout/stdin, and answering the actual question about including multiple tables in one file; you can construct the output file to interleave copy ... from stdin with actual data and load it via psql. For example, psql will support input files that look like this:
copy my_table (col1, col2, col3) from stdin;
foo bar baz
fizz buzz bizz
\.
(Note the trailing \. and that the separators should be tabs; you could also specify the delimiter option in the copy command).
psql will treat everything between the ';' and '.' as stdin. This essentially emulates what pg_dump does when you export table data and no schema (e.g., pg_dump -a -t my_table).
The resulting load could be as simple as psql mydb < output.dump.