How to use pgbench? - postgresql

I have a table on pgadmin4 which consist of 100.000 lines and 23 columns.I need to benchmark postgresql on this specific table using pgbench,but i cant understand what parameters should i use.The database name is desdb and table called test.

PgAdmin4 is not a database server, it is a client. You don't have tables "on" pgadmin4, pgadmin4 is just one way of accessing tables which are on an actual server.
You don't benchmark tables, you benchmark queries. Knowing nothing about the table other than its name, all I could propose for a query is something like:
select * from test
Or
select count(*) from test
You could put that in a file test.sql, then run:
pgbench -n -f test.sql -T60 -P5 desdb
If you are like me and don't like littering your filesystem with bunches of tiny files with contents of no particular interest and you if use the bash shell, you could not create a test.sql file and instead make it dynamic:
pgbench -n -f <(echo 'select * from test') -T60 -P5 desdb
Whether that is a meaningful query to be benchmarking, I don't know. Do you care about how fast you can read (and then throw away) all columns for all rows in the table?

you can refer details regarding pgbench from : https://www.cloudbees.com/blog/tuning-postgresql-with-pgbench.

Related

What’s the difference between COPY and pg_dump —data-only

I’m planning to migrate some tables out of existing database so the plan is to apply the schema on new database and then COPY data for each table.
What’s the difference of that versus pg_dump the data and then pg_restore?
Would copy needs restablishing indexes etc?
If you use pg_dump --data-only it will output the data as COPY statements, unless you override with --inserts or --column-inserts. So there is no difference in that case. In either case if the tables in the new database where not created with indexes they would need to be added. You could solve that with either -s -t <some_table> to get just the table schema or -t <some_table> to get the table schema and data.
pg_dump --data-only will produce a complete SQL script that can be run with psql. That script contains both the COPY statement and the data:
COPY laurenz.data_2020 (id, d, x) FROM stdin;
1499906 2020-11-07 13:26:00 x
1499907 2020-11-07 13:27:00 x
1499908 2020-11-07 13:28:00 x
\.
So it is all in one, and there is no danger that you restore a file to the wrong table, for example.
Other that convenience, there is no fundamental difference to running COPY directly.

Placeholder in PostgreSQL sql file

I have multiple tables that are created in the same way (same columns, indexes, etc.)
I would like to have one sql file for creating them all without duplicating the create statements.
Is there a way to use some kind of placeholder in sql file which would be substituted when executing the sql file with a parameter?
For example I would like to have below sql statement:
drop table if exists schema.%PLACEHOLDER%;
create table schema.%PLACEHOLDER%(id text, data text);
And execute such script with:
psql -f mysqlfile.sql -magic_parameter my_desired_table_name
Is this possible when executing PostgreSQL sql files, or maybe other way to achieve the same (except using sed)?
Sincr you are using psql, you can use variables as follows:
drop table if exists schema.:placeholder;
The invocation is:
psql -f mysqlfile.sql -v placeholder=table_name

Estimate/Print csv COPY status to postgresql table

I want to get an idea of how long it will take to copy a csv to a postgresql table. Is there a way to print the rows copied in a reasonable fashion or is there another way to somehow display the progress of the copy?
Perhaps there is a verbose setting or I should use --echo or -qecho
I am using:
psql -U postgres -d nyc_data -h localhost -c "\COPY rides FROM nyc_data_rides.csv CSV"
In Postgres 14, it's now possible to query the status of an active COPY via the internal pg_stat_progress_copy view.
e.g. to watch progress in terms of both bytes and lines processed:
select * from pg_stat_progress_copy \watch 1
Refs:
https://www.postgresql.org/docs/14/progress-reporting.html#COPY-PROGRESS-REPORTING
https://www.depesz.com/2021/01/12/waiting-for-postgresql-14-report-progress-of-copy-commands/
There is no such thing unfortunately.
One idea would be to divide the input into chunks of 1000 or 10000 lines, which you then import one after the other. That wouldn't slow processing considerably, and you can quickly get an estimate how long the whole import is going to take.
use pv tool
pv /tmp/some_table.csv | sudo -u postgres psql -d some_db -c "copy some_table from stdin delimiter ',' null '';"
and as a result, it will show
1.42GiB 0:11:42 [2.06MiB/s] [===================================================================================================================================================================>] 100%
As Laurenz Albe said, there's no way to measure how many time remaining to conclude the entire process. But one thing that I did today to take a good approximation was:
Start the "Monitor System" in my Linux
In this application there's a counter that how many data was uploaded since I started this application
Using the size of the file that I was uploading I made a good prediction about how many data was left to send to the server.

Postgres: Combining multiple COPY TO outputs to a postgres-importable file

I have my database hosted on heroku, and I want to download specific parts of the database (e.g. all the rows with id > x from table 1, all the rows with name = x from table 2, etc.) in a single file.
From some research and asking a question here it seems that some kind of modified pg_dump would solve my problem. However, I won't be able to use pg_dump because I won't have access to the command line (basically I want to be able to click a button in my web app and it will generate + download the database file).
So my new strategy is to use the postgres copy command. I'll go through the various tables in my server database, run COPY (Select * FROM ... WHERE ...) TO filename , where filename is just a temporary file that I will download when complete.
The issue is that this filename file will just have the rows, so I can't just turn around and import it into pgadmin. Assuming I have an 'empty' database set up (the schema, indices, and stuff are all already set up), is there a way I can format my filename file so that it can be easily imported into a postgres db?
Building on my comment about to/from stdout/stdin, and answering the actual question about including multiple tables in one file; you can construct the output file to interleave copy ... from stdin with actual data and load it via psql. For example, psql will support input files that look like this:
copy my_table (col1, col2, col3) from stdin;
foo bar baz
fizz buzz bizz
\.
(Note the trailing \. and that the separators should be tabs; you could also specify the delimiter option in the copy command).
psql will treat everything between the ';' and '.' as stdin. This essentially emulates what pg_dump does when you export table data and no schema (e.g., pg_dump -a -t my_table).
The resulting load could be as simple as psql mydb < output.dump.

PostgreSQL - batch + script + variable

I am not a programmer, I am struggling a bit with this.
I have a batch file connecting to my PostgreSQL server, and then open a sql script. Everything works as expected. My question is how to pass a variable (if possible) from one to the other.
Here is my batch file:
set PGPASSWORD=xxxx
cls
#echo off
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres -f C:\TotalProteinImport.sql
And here's the script:
copy totalprotein from 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
update anagrafica
set pt=(select totalprotein.resultvalue from totalprotein where totalprotein.accessionnbr=anagrafica.id)
where data_analisi = '12/23/2011';
delete from totalprotein;
This is working great, now the question is how could I pass a variable that would carry the date for data_analisi?
Like in the batch file, "Please enter date", and then the value is passed to the sql script.
You could create a function out of your your SQL script like this:
CREATE OR REPLACE FUNCTION f_myfunc(date)
RETURNS void AS
$BODY$
CREATE TEMP TABLE t_tmp ON COMMIT DROP AS
SELECT * FROM totalprotein LIMIT 0; -- copy table-structure from table
COPY t_tmp FROM 'c:/TP.csv' DELIMITERS ',' CSV HEADER;
UPDATE anagrafica a
SET pt = t.resultvalue
FROM t_tmp t
WHERE a.data_analisi = $1
AND t.accessionnbr = a.id;
-- Temp table is dropped automatically at end of session
-- In this case (ON COMMIT DROP) after the transaction
$BODY$
LANGUAGE sql;
You can use language SQL for this kind of simple SQL batch.
As you can see I have made a couple of modifications to your script that should make it faster, cleaner and safer.
Major points
For reading data into an empty table temporarily, use a temporary table. Saves a lot of disc writes and is much faster.
To simplify the process I use your existing table totalprotein as template for the creation of the (empty) temp table.
If you want to delete all rows of a table use TRUNCATE instead of DELETE FROM. Much faster. In this particular case, you need neither. The temporary table is dropped automatically. See comments in function.
The way you updated anagrafica.pt you would set the column to NULL, if anything goes wrong in the process (date not found, wrong date, id not found ...). The way I rewrote the UPDATE, it only happens if matching data are found. I assume that is what you actually want.
Then ask for user input in your shell script and call the function with the date as parameter. That's how it could work in a Linux shell (as user postgres, with password-less access (using IDENT method in pg_haba.conf):
#! /bin/sh
# Ask for date. 'YYYY-MM-DD' = ISO date-format, valid with any postgres locale.
echo -n "Enter date in the form YYYY-MM-DD and press [ENTER]: "
read date
# check validity of $date ...
psql db -p5432 -c "SELECT f_myfunc('$date')"
-c makes psql execute a singe SQL command and then exits. I wrote a lot more on psql and its command line options yesterday in a somewhat related answer.
The creation of the according Windows batch file remains as exercise for you.
Call under Windows
The error message tells you:
Function tpimport(unknown) does not exist
Note the lower case letters: tpimport. I suspect you used mixe case letters to create the function. So now you have to enclose the function name in double quotes every time you use it.
Try this one (edited quotes!):
C:\Progra~1\PostgreSQL\8.3\bin\psql -d Total -h localhost -p 5432 -U postgres
-c "SELECT ""TPImport""('%dateimport%')"
Note how I use singe and double quotes here. I guess this could work under windows. See here.
You made it hard for yourself when you chose to use mixed case identifiers in PostgreSQL - a folly which I never tire of warning against. Now you have to double quote the function name "TPImport" every time you use it. While perfectly legit, I would never do that. I use lower case letters for identifiers. Always. This way I never mix up lower / upper case and I never have to use double quotes.
The ultimate fix would be to recreate the function with a lower case name (just leave away the double quotes and it will be folded to lower case automatically). Then the function name will just work without any quoting.
Read the basics about identifiers here.
Also, consider upgrading to a more recent version of PostgreSQL 8.3 is a bit rusty by now.
psql supports textual replacement variables. Within psql they can be set using \set and used using :varname.
\set xyz 'abcdef'
select :'xyz';
?column?
----------
abcdef
These variables can be set using command line arguments also:
psql -v xyz=value
The only problem is that these textual replacements always need some fiddling with quoting as shown by the first \set and select.
After creating the function in Postgres, you must create a .bat file in the bin directory of your Postgres version, for example C:\Program Files\PostgreSQL\9.3\bin. Here you write:
#echo off
cd C:\Program Files\PostgreSQL\9.3\bin
psql -p 5432 -h localhost -d myDataBase -U postgres -c "select * from myFunction()"