Estimate/Print csv COPY status to postgresql table - postgresql

I want to get an idea of how long it will take to copy a csv to a postgresql table. Is there a way to print the rows copied in a reasonable fashion or is there another way to somehow display the progress of the copy?
Perhaps there is a verbose setting or I should use --echo or -qecho
I am using:
psql -U postgres -d nyc_data -h localhost -c "\COPY rides FROM nyc_data_rides.csv CSV"

In Postgres 14, it's now possible to query the status of an active COPY via the internal pg_stat_progress_copy view.
e.g. to watch progress in terms of both bytes and lines processed:
select * from pg_stat_progress_copy \watch 1
Refs:
https://www.postgresql.org/docs/14/progress-reporting.html#COPY-PROGRESS-REPORTING
https://www.depesz.com/2021/01/12/waiting-for-postgresql-14-report-progress-of-copy-commands/

There is no such thing unfortunately.
One idea would be to divide the input into chunks of 1000 or 10000 lines, which you then import one after the other. That wouldn't slow processing considerably, and you can quickly get an estimate how long the whole import is going to take.

use pv tool
pv /tmp/some_table.csv | sudo -u postgres psql -d some_db -c "copy some_table from stdin delimiter ',' null '';"
and as a result, it will show
1.42GiB 0:11:42 [2.06MiB/s] [===================================================================================================================================================================>] 100%

As Laurenz Albe said, there's no way to measure how many time remaining to conclude the entire process. But one thing that I did today to take a good approximation was:
Start the "Monitor System" in my Linux
In this application there's a counter that how many data was uploaded since I started this application
Using the size of the file that I was uploading I made a good prediction about how many data was left to send to the server.

Related

How to use pgbench?

I have a table on pgadmin4 which consist of 100.000 lines and 23 columns.I need to benchmark postgresql on this specific table using pgbench,but i cant understand what parameters should i use.The database name is desdb and table called test.
PgAdmin4 is not a database server, it is a client. You don't have tables "on" pgadmin4, pgadmin4 is just one way of accessing tables which are on an actual server.
You don't benchmark tables, you benchmark queries. Knowing nothing about the table other than its name, all I could propose for a query is something like:
select * from test
Or
select count(*) from test
You could put that in a file test.sql, then run:
pgbench -n -f test.sql -T60 -P5 desdb
If you are like me and don't like littering your filesystem with bunches of tiny files with contents of no particular interest and you if use the bash shell, you could not create a test.sql file and instead make it dynamic:
pgbench -n -f <(echo 'select * from test') -T60 -P5 desdb
Whether that is a meaningful query to be benchmarking, I don't know. Do you care about how fast you can read (and then throw away) all columns for all rows in the table?
you can refer details regarding pgbench from : https://www.cloudbees.com/blog/tuning-postgresql-with-pgbench.

Slow psql import

I have a set of backups from my databases and they are in sql format. I am using following command to import them
Restore: $ psql -U {user-name} -d {desintation_db} -f {dumpfilename.sql}
It works well but I noticed that if prints out logs and it seems it is importing them row by row, For a 200Mb database it takes long time to import and I have several databases which they are around 20GB. Is there any faster way to import them? This method seems not to practical at all.
They are imported in whatever fashion was encoded in the sql file. That is generally going to be with COPY, but you could have done it with individual INSERTs if that is what you told pg_dump to do.
You should use the custom format (-F c) or the directory format (-F d) to dump your data. Then you can parallelize restore with the -j option of pg_restore.
This parallelizes the COPY statements that load the data and the CREATE INDEX statements. if your database consist of a single large table, that won't help you, but otherwise you should see a performance improvement.

On a PostgreSQL 12 / PostGIS 3.0 installation with MacPorts under MacOS, how to use postgis_restore.pl script?

I have a fresh installation of PostgreSQL 12 / PostGIS 3.0, done with MacPorts under MacOS (Mojave), and I try to restore a PostGIS enabled DB with the traditionnal script postis_restore.pl.
I first created an empty databasen with PostGIS 3.0 enabled on it.
My database dump comes from a PosgtresSQL 9.4 with PostGIS 2.5. It was done with the -Fc format.
I try :
perl /opt/local/share/postgresql12/contrib/postgis-3.0/postgis_restore.pl /Users/me/Documents/db/dump_file.dump | psql -h localhost -U postgres target_db 2>errors.txt
I get the following answer :
Converting /Users/me/Documents/db/dump_file.dump to ASCII on stdout...
Reading list of functions to ignore...
Writing manifest of things to read from dump file...
Writing ASCII to stdout...
ALTER TABLE
ALTER TABLE
pg_restore: error: one of -d/--dbname and -f/--file must be specified
Done.
SELECT 8500
DELETE 8500
UPDATE 0
INSERT 0 8500
DROP TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
How to solve :
pg_restore: error: one of -d/--dbname and -f/--file must be specified
Thank you
In PostgreSQL 12, the behavior of pg_restore was changed to demand either -f or -d. Previously, if neither was specified then it streamed its output to stdout, which behavior is now obtained by specifying -f -. This was changed because many people were confused by the old behavior (although the change itself is also confusing).
Apparently postgis_restore.pl never got updated to reflect this change. You should be able to find the spot that calls pg_restore and add -f - to it. Although given the fact that this script is apparently never tested, I'd be cautious about using it without further vetting.
It seems you forgot that a dump can have more than a database in it. Just inform the one you want you want restored, even if your file has only one database in it. Looking at the pg_restore documentation may help too.
As an updated version of PostGIS 3 seems not to be available in MacPorts at the moment, I corrected the script that way and it solved my issue :
String
open( INPUT, "pg_restore -L $manifest $dumpfile
replaced by
open( INPUT, "pg_restore -f - -L $manifest $dumpfile

How to restore one database from a .sql file in which there are two databases?

I was sent a .sql file in which there are two databases. Previously, I only dealt with .sql files in which there is one database. I also can't ask to send databases in different files.
Earlier I used this command:
psql -d first_db < /Users/colibri/Desktop/first_db.sql
Databases on the server and locally have different names.
Tell me, please, how can I now restore a specific database from a file in which there are several?
You have two choices:
Use an editor to delete everything except the database you want from the SQL file.
Restore the whole file and then drop the database you don't need.
The file was probably generated with pg_dumpall. Use pg_dump to dump a single database.
If this is the output of pg_dumpall and the file is too big to edit with something like vi, you can use a stream editor to isolate just what you want.
perl -ne 'print if /^\\connect foobar/.../^\\connect/' < old.sql > new.sql
The last dozen or so lines that this captures will be setting up for and creating the next database it wants to restore, so you might need to tinker with this a bit to get rid of those if you don't want it to attempt to create that database while you replay. You could change the ending landmark to something like the below so that it ends earlier, but that is more likely to hit false positives (where the data itself contains the magic string) than the '^\connect' landmark is.
perl -ne 'print if /^\\connect foobar/.../^-- PostgreSQL database dump complete/'

PostgreSQL - batch + script + variable

I am not a programmer, I am struggling a bit with this.
I have a batch file connecting to my PostgreSQL server, and then open a sql script. Everything works as expected. My question is how to pass a variable (if possible) from one to the other.
Here is my batch file:
set PGPASSWORD=xxxx
cls
#echo off
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres -f C:\TotalProteinImport.sql
And here's the script:
copy totalprotein from 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
update anagrafica
set pt=(select totalprotein.resultvalue from totalprotein where totalprotein.accessionnbr=anagrafica.id)
where data_analisi = '12/23/2011';
delete from totalprotein;
This is working great, now the question is how could I pass a variable that would carry the date for data_analisi?
Like in the batch file, "Please enter date", and then the value is passed to the sql script.
You could create a function out of your your SQL script like this:
CREATE OR REPLACE FUNCTION f_myfunc(date)
RETURNS void AS
$BODY$
CREATE TEMP TABLE t_tmp ON COMMIT DROP AS
SELECT * FROM totalprotein LIMIT 0; -- copy table-structure from table
COPY t_tmp FROM 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
UPDATE anagrafica a
SET pt = t.resultvalue
FROM t_tmp t
WHERE a.data_analisi = $1
AND t.accessionnbr = a.id;
-- Temp table is dropped automatically at end of session
-- In this case (ON COMMIT DROP) after the transaction
$BODY$
LANGUAGE sql;
You can use language SQL for this kind of simple SQL batch.
As you can see I have made a couple of modifications to your script that should make it faster, cleaner and safer.
Major points
For reading data into an empty table temporarily, use a temporary table. Saves a lot of disc writes and is much faster.
To simplify the process I use your existing table totalprotein as template for the creation of the (empty) temp table.
If you want to delete all rows of a table use TRUNCATE instead of DELETE FROM. Much faster. In this particular case, you need neither. The temporary table is dropped automatically. See comments in function.
The way you updated anagrafica.pt you would set the column to NULL, if anything goes wrong in the process (date not found, wrong date, id not found ...). The way I rewrote the UPDATE, it only happens if matching data are found. I assume that is what you actually want.
Then ask for user input in your shell script and call the function with the date as parameter. That's how it could work in a Linux shell (as user postgres, with password-less access (using IDENT method in pg_haba.conf):
#! /bin/sh
# Ask for date. 'YYYY-MM-DD' = ISO date-format, valid with any postgres locale.
echo -n "Enter date in the form YYYY-MM-DD and press [ENTER]: "
read date
# check validity of $date ...
psql db -p5432 -c "SELECT f_myfunc('$date')"
-c makes psql execute a singe SQL command and then exits. I wrote a lot more on psql and its command line options yesterday in a somewhat related answer.
The creation of the according Windows batch file remains as exercise for you.
Call under Windows
The error message tells you:
Function tpimport(unknown) does not exist
Note the lower case letters: tpimport. I suspect you used mixe case letters to create the function. So now you have to enclose the function name in double quotes every time you use it.
Try this one (edited quotes!):
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres
-c "SELECT ""TPImport""('%dateimport%')"
Note how I use singe and double quotes here. I guess this could work under windows. See here.
You made it hard for yourself when you chose to use mixed case identifiers in PostgreSQL - a folly which I never tire of warning against. Now you have to double quote the function name "TPImport" every time you use it. While perfectly legit, I would never do that. I use lower case letters for identifiers. Always. This way I never mix up lower / upper case and I never have to use double quotes.
The ultimate fix would be to recreate the function with a lower case name (just leave away the double quotes and it will be folded to lower case automatically). Then the function name will just work without any quoting.
Read the basics about identifiers here.
Also, consider upgrading to a more recent version of PostgreSQL 8.3 is a bit rusty by now.
psql supports textual replacement variables. Within psql they can be set using \set and used using :varname.
\set xyz 'abcdef'
select :'xyz';
?column?
----------
abcdef
These variables can be set using command line arguments also:
psql -v xyz=value
The only problem is that these textual replacements always need some fiddling with quoting as shown by the first \set and select.
After creating the function in Postgres, you must create a .bat file in the bin directory of your Postgres version, for example C:\Program Files\PostgreSQL\9.3\bin. Here you write:
#echo off
cd C:\Program Files\PostgreSQL\9.3\bin
psql -p 5432 -h localhost -d myDataBase -U postgres -c "select * from myFunction()"