pgAdmin slow when displaying data? - postgresql

I am seeing a huge degradation in performance after moving some tables from SQL Server 2008 to Postgres, and I'm wondering if I'm missing a configuration step, or it is normal for postgres to behave this way.
The query used is a simple SELECT from the table. No joins, no ordering, nothing.
The table itself has only about 12K rows.
I have tried this on 3 machines:
Machine A hardware: 50GB RAM, SSD disks, CPU: Xeon® E5-2620v3 (OS:
Ubuntu Server 16), DBMS: Postgres 9.5
Machine B hardware: 8GB RAM, Sata disks, CPU: Xeon E5-4640 (OS:
Ubuntu Server 12), DBMS: Postgres 9.4
Machine C hardware: 4GB RAM, IDE disks, CPU: Xeon E3-1220v2 (OS:
Windows Server 2008), DBMS: SQL Server 2008 R2
The performance I am seeing is similar between the 2 Postgres databases, despite the vast difference in hardware and configuration. How can this be?
Machine A query. Notice that I'm excluding the geometry column in order to work with "pure" datatypes:
EXPLAIN ANALYZE VERBOSE SELECT id, "OID", type, name, orofos, xrisi_orofoy, area_sqm,
perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc,
xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY"
FROM public."korydallos_Land_Uses";
Results:
"Seq Scan on public."korydallos_Land_Uses" (cost=0.00..872.41 rows=12841 width=209) (actual time=0.025..13.450 rows=12841 loops=1)"
" Output: id, "OID", type, name, orofos, xrisi_orofoy, area_sqm, perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc, xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY""
"Planning time: 0.137 ms"
"Execution time: 14.788 ms"
This is 14 seconds for a simple select!! Wtf? Compare this with SQL Server:
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements 0
Rows affected by INSERT, DELETE, or UPDATE statements 0
Number of SELECT statements 1
Rows returned by SELECT statements 12840
Number of transactions 0
Network Statistics
Number of server roundtrips 1
TDS packets sent from client 1
TDS packets received from server 1040
Bytes sent from client 1010
Bytes received from server 2477997
Time Statistics
Client processing time 985
Total execution time 1022
Wait time on server replies 37
I am at a loss at what could be happening. I also tried:
Checking for dead rows: 0
Vacuuming
Simply querying the primary key (!). This takes 500ms to execute.
With each column I add to the select, around 500 more ms are added to
the query.
Machine A Postgres performance settings:
max_connections = 200
shared_buffers = 12800MB
effective_cache_size = 38400MB
work_mem = 32MB
maintenance_work_mem = 2GB
min_wal_size = 4GB
max_wal_size = 8GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
Machine B Postgres performance settings:
max_connections = 200
shared_buffers = 128MB
#effective_cache_size = 4GB
#work_mem = 4MB
#maintenance_work_mem = 64MB
#min_wal_size = 80MB
#max_wal_size = 1GB
#checkpoint_completion_target = 0.5
#wal_buffers = -1
#default_statistics_target = 100
Table definition in Postgres:
CREATE TABLE public."korydallos_Land_Uses"
(
id integer NOT NULL DEFAULT nextval('"korydallos_Land_Uses_id_seq"'::regclass),
wkb_geometry geometry(Polygon,4326),
"OID" integer,
type character varying(255),
name character varying(255),
orofos character varying(255),
xrisi_orofoy character varying(255),
area_sqm numeric,
perimeter_m numeric,
syn1_ numeric,
syn1_id numeric,
str_name character varying(255),
str_no character varying(255),
katanomh numeric,
linkc numeric,
xrcode character varying(255),
kat numeric,
ot character varying(255),
use character varying(255),
syn numeric,
notes character varying(255),
"MinX" numeric,
"MinY" numeric,
"MaxX" numeric,
"MaxY" numeric,
CONSTRAINT "korydallos_Land_Uses_pkey" PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public."korydallos_Land_Uses"
OWNER TO root;
CREATE INDEX "sidx_korydallos_Land_Uses_wkb_geometry"
ON public."korydallos_Land_Uses"
USING gist
(wkb_geometry);
EDIT: Removed the irrelevant SQL Server definition as suggested in the comments. Keeping the time as I think it's still relevant.
As per the comments, more info using:
explain (analyze, verbose, buffers, timing) SELECT id, "OID", type, name, orofos, xrisi_orofoy, area_sqm,
perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc,
xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY"
FROM public."korydallos_Land_Uses"
Results:
"Seq Scan on public."korydallos_Land_Uses" (cost=0.00..872.41 rows=12841 width=209) (actual time=0.019..11.207 rows=12841 loops=1)"
" Output: id, "OID", type, name, orofos, xrisi_orofoy, area_sqm, perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc, xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY""
" Buffers: shared hit=744"
"Planning time: 1.073 ms"
"Execution time: 12.269 ms"
PG Admin shows me this in the "Explain tab":
How I measure the 14 seconds:
Status window of PG Admin 3, bottom right corner, when running the query. (It says 14.3 secs for the trolls here).

https://www.postgresql.org/docs/current/static/using-explain.html
Note that the “actual time” values are in milliseconds of real time,
so in your case
actual time=0.019..11.207
means running query took 11 milliseconds.
pgadmin "explain tab" says the same... Now if you see 14.3 sec in right bottom corner and the time it took is indeed 14 seconds (measured with watches) I assume it is some awful delay on network level or pgadmin itself. Try running this in psql for instance:
select clock_timestamp();
explain analyze select * FROM public."korydallos_Land_Uses";
select clock_timestamp();
this will show time intervals server side + time needed to send command from psql to server - if it takes still 14 seconds - talk to you network admin, if not, try upgrading pgadmin

Related

How to speed up large copy into a Postgres table

I'm trying to load a large dataset (25 GB) into a Postgres table. The command below works, but it takes 2.5 hours and fully utilizes 3-4 cores on the machine the entire time. Can I speed it up a lot? The problem is I want to insert another 1700 such 25 GB files into the table too (these would be separate partitions). 2.5 hours per file is too slow. Or rather it's not too slow, but makes me think subsequent queries against the data will be too slow.
Probably my dataset is too big for Postgres, but the idea of being able to run an optimized query (once the data is all inserted and later indexed and partitioned) and get a result back in < 3 seconds is appealing, hence I wanted to try.
I'm mostly following the guidelines from here. I don't have an index on the table yet. I don't use foreign keys, I'm using a bulk copy, etc.
-- some of the numeric columns have "nan" values so they need to start as strings in raw form
CREATE TABLE raw_eqtl_allpairs (
gene_id varchar,
variant_id varchar,
chr varchar,
bp BIGINT,
ref_ varchar,
alt varchar,
tss_distance bigint,
ma_samples int,
ma_count int,
maf varchar,
pval_nominal varchar,
slope varchar,
slope_se varchar,
chromosome_ varchar,
pos BIGINT,
af varchar
);
COPY raw_eqtl_allpairs(gene_id, variant_id, chr, bp, ref_, alt, tss_distance, ma_samples, ma_count, maf, pval_nominal, slope, slope_se, chromosome_, pos, af)
FROM '/Downloads/genomics_data.tsv'
DELIMITER E'\t'
CSV HEADER;
Edit 1:
I'm running on Docker on my Mac, which has 4 Intel i7 cores and 16 GB of ram. I have 500 GB of flash storage.
Edit 2:
I'm using the default /var/lib/postgresql/data/postgresql.conf that comes with the DockerHub Postgres 14.2 tag. For simplicity, I grepped it for key details. Seems like most of the important things were commented out:
#maintenance_work_mem = 64MB # min 1MB
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
max_wal_size = 1GB
#wal_level = replica # minimal, replica, or logical
#checkpoint_timeout = 5min # range 30s-1d
#archive_mode = off # enables archiving; off, on, or always
#max_wal_senders = 10 # max number of walsender processes
Maybe I could speed things up a lot if I changed wal_level to minimal, and changed max_wal_size to something larger, like 4GB?

Speed query very slow on TimescaleDB

I created a table with the command:
CREATE TABLE public.g_dl (
id bigint NOT NULL,
id_tram integer,
site_id character varying(40),
user_id character varying(40),
app_code character varying(40),
time TIMESTAMP WITHOUT TIME ZONE NOT NULL
);
SELECT create_hypertable('g_dl','time');
Then I insert an amount of about 34 million records.
The query speed is very good, the amount of Ram used by the docker container is about 500MB-1.2GB. But query speed gets a problem after I insert amount of 1.8 million records, the time field is out of order before I had inserted it.
I use DBbeaver and get the message "You might need to increase max_locks_per_transaction". Then I change that value up
increase max_locks_per_transaction = 1000
Query speed is very slow and the amount of ram used by docker is 10GB - 12GB. What I'm doing wrong way. Pls let me know.
Outout when explain:
EXPLAIN analyze select * from g_dlquantracts gd where id_tram = 300
Raw JSON explain:
https://drive.google.com/file/d/1bA5EwcWMUn5oTUirHFD25cSU1cYGwGLo/view?usp=sharing
Formatted execution plan: https://explain.depesz.com/s/tM57

When comparing insert performance between postgres and timescaledb, timescaledb didn't perform that well?

I tried an insert query performance test. Here are the numbers :-
Postgres
Insert : Avg Execution Time For 10 inserts of 1 million rows : 6260 ms
Timescale
Insert : Avg Execution Time For 10 inserts of 1 million rows : 10778 ms
Insert Queries:
-- Testing SQL Queries
--Join table
CREATE TABLE public.sensors(
id SERIAL PRIMARY KEY,
type VARCHAR(50),
location VARCHAR(50)
);
-- Postgres table
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
cpu DOUBLE PRECISION,
FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);
CREATE INDEX idx_sensor_id
ON sensor_data(sensor_id);
-- TimescaleDB table
CREATE TABLE sensor_data_ts (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
cpu DOUBLE PRECISION,
FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);
SELECT create_hypertable('sensor_data_ts', 'time');
-- Insert Data
INSERT INTO sensors (type, location) VALUES
('a','floor'),
('a', 'ceiling'),
('b','floor'),
('b', 'ceiling');
-- Postgres
EXPLAIN ANALYSE
INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
SELECT
time,
sensor_id,
random() AS cpu,
random()*100 AS temperature
FROM generate_series(now() - interval '125 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);
-- TimescaleDB
EXPLAIN ANALYSE
INSERT INTO sensor_data_ts (time, sensor_id, cpu, temperature)
SELECT
time,
sensor_id,
random() AS cpu,
random()*100 AS temperature
FROM generate_series(now() - interval '125 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);
Am I overlooking any optimizations ?
By default, a hypertable creates a chunk per week (that's configurable in the create_hypertable call). So with the above setting, you created 125 chunks for TimescaleDB, each with 8000 rows. There is overhead to this chunk creation, as well as the logic handling this. So with the dataset being so small, you are seeing the overhead of this chunk creation, which typically is amortized over much larger datasets: In most "natural" settings, we'll typically see on the order of millions+ (or at least 100,000s) of rows per chunk.
The place you start to see insert performance differences between a partitioned architecture like TimescaleDB and single table is also when the dataset (and particular, the indexes that you are currently maintaining) don't naturally fit in the memory.
In the above, 1M rows easily fit in memory, and the only index you have on your vanilla PG table is for sensor_id, so it's pretty tiny. (On the TimescaleDB hypertable, you by default has indexes on timestamps, distinct per chunk, so you actually have 125 indexes, each of size 8000 given the distinct timestamps).
For visual, see this older blog post: https://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/#result-15x-improvement-in-insert-rate
Note inserts to single PG table is ~same at the beginning, but then falls off as the table gets bigger and data/indexes start swapping to disk.
If you want to do larger performance tests, might suggest trying out the Time Series Benchmark Suite: https://github.com/timescale/tsbs
With TimescaleDB version 1.7 running on Docker was able to insert around 600,000 rows per second on my laptop using https://github.com/vincev/tsdbperf:
$ ./tsdbperf --workers 8 --measurements 200000
[2020-11-03T20:25:48Z INFO tsdbperf] Number of workers: 8
[2020-11-03T20:25:48Z INFO tsdbperf] Devices per worker: 10
[2020-11-03T20:25:48Z INFO tsdbperf] Metrics per device: 10
[2020-11-03T20:25:48Z INFO tsdbperf] Measurements per device: 200000
[2020-11-03T20:26:15Z INFO tsdbperf] Wrote 16000000 measurements in 26.55 seconds
[2020-11-03T20:26:15Z INFO tsdbperf] Wrote 602750 measurements per second
[2020-11-03T20:26:15Z INFO tsdbperf] Wrote 6027500 metrics per second

How to shrink pg_toast table?

I am running on postgres 9.3 on mac osx and I have a database which grew out of control. I used to have table which had one column which stored large data. Then I noticed that there the db size grew up to around 19gb just because of a pg_toast table. Then I remove the mentioned column and ran vacuum in order to get the db to a smaller size again, but it remained the same. So how can I shrink the database size?
SELECT nspname || '.' || relname AS "relation"
,pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
results in
pg_toast.pg_toast_700305 | 18 GB
pg_toast.pg_toast_700305_index | 206 MB
public.catalog_hotelde_images | 122 MB
public.routes | 120 MB
VACUUM VERBOSE ANALYZE pg_toast.pg_toast_700305; INFO: vacuuming "pg_toast.pg_toast_700305"
INFO: index "pg_toast_700305_index" now contains 9601330 row versions in 26329 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.06s/0.02u sec elapsed 0.33 sec.
INFO: "pg_toast_700305": found 0 removable, 0 nonremovable row versions in 0 out of 2393157 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.06s/0.07u sec elapsed 0.37 sec.
VACUUM
structure of the routes table
id serial NOT NULL,
origin_id integer,
destination_id integer,
total_time integer,
total_distance integer,
speed_id integer,
uid bigint,
created_at timestamp without time zone,
updated_at timestamp without time zone,
CONSTRAINT routes_pkey PRIMARY KEY (id)
You can use one of the two types of vacuuming: standard or full.
standard:
VACUUM table_name;
full:
VACUUM FULL table_name;
Keep in mind that VACUUM FULL locks the table it is working on until it's finished.
You may want to perform standard vacuum more frequently on your tables which have frequent upload/delete activity, it may not give you as much space as vacuum full does but you will be able to run operations like SELECT, INSERT, UPDATE and DELETE and it will take less time to complete.
In my case, when pg_toast (along with other tables) got out of control, standard VACUUM made a slight difference but was not enough. I used VACUUM FULL to reclaim more disk space which was very slow on large relations. I decided to tune autovacuum and use standard VACUUM more often on my tables which are updated frequently.
If you need to use VACUUM FULL, you should do it when your users are less active.
Also, do not turn off autovacuum.
You can get some additional information by adding verbose to your commands:
VACUUM FULL VERBOSE table_name;
Try the following:
vacuum full

Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql

This is my table schema
Column | Type | Modifiers
-------------+------------------------+------------------------------------------------------
id | integer | not null default nextval('message_id_seq'::regclass)
date_created | bigint |
content | text |
user_name | character varying(128) |
user_id | character varying(128) |
user_type | character varying(8) |
user_ip | character varying(128) |
user_avatar | character varying(128) |
chatbox_id | integer | not null
Indexes:
"message_pkey" PRIMARY KEY, btree (id)
"idx_message_chatbox_id" btree (chatbox_id)
"indx_date_created" btree (date_created)
Foreign-key constraints:
"message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE
This is the query
SELECT *
FROM message
WHERE chatbox_id=$1
ORDER BY date_created
OFFSET 0
LIMIT 20;
($1 will be replaced by the actual ID)
It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records
I am using Postresql Server 9.1.5 with default options.
Update the output of EXPLAIN ANALYZE
Limit (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
-> Index Scan Backward using indx_date_created on message (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)
Update server specification
Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB
Final solution
Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested
You could try to give PostgreSQL a better index for that query. I propose something like this:
create index invent_suitable_name on message(chatbox_id, date_created);
or
create index invent_suitable_name on message(chatbox_id, date_created desc);
Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.
For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).
UPD My guess for the reason of bad performance:
At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:
Index scan on chatbox_id;
Ordering of returned records to get top 20.
Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.
UPD2 EXPALIN ANALYZE shows 0.376 ms time and a good plan. Can you give details about a case with bad performance?