Slow psql import - postgresql

Slow psql import - postgresql

I have a set of backups from my databases and they are in sql format. I am using following command to import them
Restore: $ psql -U {user-name} -d {desintation_db} -f {dumpfilename.sql}
It works well but I noticed that if prints out logs and it seems it is importing them row by row, For a 200Mb database it takes long time to import and I have several databases which they are around 20GB. Is there any faster way to import them? This method seems not to practical at all.

They are imported in whatever fashion was encoded in the sql file. That is generally going to be with COPY, but you could have done it with individual INSERTs if that is what you told pg_dump to do.

You should use the custom format (-F c) or the directory format (-F d) to dump your data. Then you can parallelize restore with the -j option of pg_restore.
This parallelizes the COPY statements that load the data and the CREATE INDEX statements. if your database consist of a single large table, that won't help you, but otherwise you should see a performance improvement.

Related

What’s the difference between COPY and pg_dump —data-only

I’m planning to migrate some tables out of existing database so the plan is to apply the schema on new database and then COPY data for each table.
What’s the difference of that versus pg_dump the data and then pg_restore?
Would copy needs restablishing indexes etc?

If you use pg_dump --data-only it will output the data as COPY statements, unless you override with --inserts or --column-inserts. So there is no difference in that case. In either case if the tables in the new database where not created with indexes they would need to be added. You could solve that with either -s -t <some_table> to get just the table schema or -t <some_table> to get the table schema and data.

pg_dump --data-only will produce a complete SQL script that can be run with psql. That script contains both the COPY statement and the data:
COPY laurenz.data_2020 (id, d, x) FROM stdin;
1499906 2020-11-07 13:26:00 x
1499907 2020-11-07 13:27:00 x
1499908 2020-11-07 13:28:00 x
\.
So it is all in one, and there is no danger that you restore a file to the wrong table, for example.
Other that convenience, there is no fundamental difference to running COPY directly.

How to use pgbench?

I have a table on pgadmin4 which consist of 100.000 lines and 23 columns.I need to benchmark postgresql on this specific table using pgbench,but i cant understand what parameters should i use.The database name is desdb and table called test.

PgAdmin4 is not a database server, it is a client. You don't have tables "on" pgadmin4, pgadmin4 is just one way of accessing tables which are on an actual server.
You don't benchmark tables, you benchmark queries. Knowing nothing about the table other than its name, all I could propose for a query is something like:
select * from test
Or
select count(*) from test
You could put that in a file test.sql, then run:
pgbench -n -f test.sql -T60 -P5 desdb
If you are like me and don't like littering your filesystem with bunches of tiny files with contents of no particular interest and you if use the bash shell, you could not create a test.sql file and instead make it dynamic:
pgbench -n -f <(echo 'select * from test') -T60 -P5 desdb
Whether that is a meaningful query to be benchmarking, I don't know. Do you care about how fast you can read (and then throw away) all columns for all rows in the table?

you can refer details regarding pgbench from : https://www.cloudbees.com/blog/tuning-postgresql-with-pgbench.

How to restore one database from a .sql file in which there are two databases?

I was sent a .sql file in which there are two databases. Previously, I only dealt with .sql files in which there is one database. I also can't ask to send databases in different files.
Earlier I used this command:
psql -d first_db < /Users/colibri/Desktop/first_db.sql
Databases on the server and locally have different names.
Tell me, please, how can I now restore a specific database from a file in which there are several?

You have two choices:
Use an editor to delete everything except the database you want from the SQL file.
Restore the whole file and then drop the database you don't need.
The file was probably generated with pg_dumpall. Use pg_dump to dump a single database.

If this is the output of pg_dumpall and the file is too big to edit with something like vi, you can use a stream editor to isolate just what you want.
perl -ne 'print if /^\\connect foobar/.../^\\connect/' < old.sql > new.sql
The last dozen or so lines that this captures will be setting up for and creating the next database it wants to restore, so you might need to tinker with this a bit to get rid of those if you don't want it to attempt to create that database while you replay. You could change the ending landmark to something like the below so that it ends earlier, but that is more likely to hit false positives (where the data itself contains the magic string) than the '^\connect' landmark is.
perl -ne 'print if /^\\connect foobar/.../^-- PostgreSQL database dump complete/'

What is the purpose of the sql script file in a tar dump?

In a tar dump
$ tar -tf dvdrental.tar
toc.dat
2163.dat
...
2189.dat
restore.sql
After extraction
$ file *
2163.dat: ASCII text
...
2189.dat: ASCII text
restore.sql: ASCII text, with very long lines
toc.dat: PostgreSQL custom database dump - v1.12-0
What is the purpose of restore.sql?
toc.dat is binary, but I can open it and it looks like a sql
script too. How different are between the purposes of restore.sql
and toc.dat?
The following quote from the document does't answer my question:
with one file for each table and blob being dumped, plus a so-called Table of Contents file describing the dumped objects
in a machine-readable format that pg_restore can read.
Since a tar dump contains restore.sql besides the .dat files,
what is the difference between the sql script files restore.sql and toc.dat in a tar dump and a
plain dump (which has only one sql script file)?
Thanks.

restore.sql is not used by pg_restore. See this comment from src/bin/pg_dump/pg_backup_tar.c:
* The tar format also includes a 'restore.sql' script which is there for
* the benefit of humans. This script is never used by pg_restore.
toc.dat is the table of contents. It contains commands to create and drop each object in the dump and is used by pg_restore to create the objects. It also contains COPY statements that load the data from the *.dat file.
You can extract the table of contents in human-readable form with pg_restore -l, and you can edit the result to restore only specific objects with pg_restore -L.
The <number>.dat files are the files containing the table data, they are used by the COPY statements in toc.dat and restore.sql.

This looks a script to restore the data to PostgresQL. the script was created using pg_dump.
If you'd like to restore, please have a look at pg_restore.
The dat files contain the data to be restored in those \copy commands in the sql script.
the toc.dat file is not referenced inside the sql file. if you try to peek inside using cat toc.dat|strings you'll find that it contains data very similar to the sql file, but with a few more internal ids.
I think it might have been intended to work without the SQL at some point, but that's not how it's working right now. see the code to generate toc here.

Estimate/Print csv COPY status to postgresql table

I want to get an idea of how long it will take to copy a csv to a postgresql table. Is there a way to print the rows copied in a reasonable fashion or is there another way to somehow display the progress of the copy?
Perhaps there is a verbose setting or I should use --echo or -qecho
I am using:
psql -U postgres -d nyc_data -h localhost -c "\COPY rides FROM nyc_data_rides.csv CSV"

In Postgres 14, it's now possible to query the status of an active COPY via the internal pg_stat_progress_copy view.
e.g. to watch progress in terms of both bytes and lines processed:
select * from pg_stat_progress_copy \watch 1
Refs:
https://www.postgresql.org/docs/14/progress-reporting.html#COPY-PROGRESS-REPORTING
https://www.depesz.com/2021/01/12/waiting-for-postgresql-14-report-progress-of-copy-commands/

There is no such thing unfortunately.
One idea would be to divide the input into chunks of 1000 or 10000 lines, which you then import one after the other. That wouldn't slow processing considerably, and you can quickly get an estimate how long the whole import is going to take.

use pv tool
pv /tmp/some_table.csv | sudo -u postgres psql -d some_db -c "copy some_table from stdin delimiter ',' null '';"
and as a result, it will show
1.42GiB 0:11:42 [2.06MiB/s] [===================================================================================================================================================================>] 100%

As Laurenz Albe said, there's no way to measure how many time remaining to conclude the entire process. But one thing that I did today to take a good approximation was:
Start the "Monitor System" in my Linux
In this application there's a counter that how many data was uploaded since I started this application
Using the size of the file that I was uploading I made a good prediction about how many data was left to send to the server.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Slow psql import - postgresql

They are imported in whatever fashion was encoded in the sql file. That is generally going to be with COPY, but you could have done it with individual INSERTs if that is what you told pg_dump to do.

Related

What’s the difference between COPY and pg_dump —data-only

How to use pgbench?

How to restore one database from a .sql file in which there are two databases?

What is the purpose of the sql script file in a tar dump?

Estimate/Print csv COPY status to postgresql table

Categories

Resources