Running upsert queries in parallel takes a lot of time in postgres - postgresql

I'm running 50 parallel containers which ingesting rows to postgres, each container has one connection to postgres.
In the first half an hour, each insert query takes about 200 ms.
After half an hour, the queries become really slow and take about 5-8 seconds.
The query I use to insert (~100 rows each time):
INSERT INTO schema_name.table_name(id, source, field3)
VALUES
(value1, value2, value3),
(value1, value2, value),
...
ON CONFLICT (id, source) DO UPDATE
SET field3 = excluded.field3;
I use:
self.conn.autocommit = True
Also when I run SELECT * from pg_stat_activity; I don't see any stuck queries.
Any suggestions ?

Related

How to decrease insert query time for 20k rows in postgresql

I have written a query
INSERT INTO "public"."users" ("userId", "name", "age") VALUES ('1234', 'nop', '11'),....
I am trying to insert 20k rows but it is taking like 80+ seconds.
what can be the reason behind this?

Very long query planning times for database with lots of partitions in PostgreSQL

I have a PostgreSQL 10 database that contains two tables which both have two levels of partitioning (by list).
The data is now stored within 5K to 10K partitioned tables (grand-children of the two tables mentioned above) depending on the day.
There are three indexes per grand-child partition table but the two columns on which partitioning is done aren't indexed.
(Since I don't think this is needed no?)
The issue I'm observing is that the query planning time is very slow but the query execution time very fast.
Even when the partition values were hard-coded in the query.
Researching the issue, I thought that the linear search use by PostgreSQL 10 to find the metadata of the partition was the cause of it.
cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/
So I decided to try out PostgreSQL 11 which includes the two aforementioned patches:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81
Helas, it seems that the version change doesn't change anything.
Now I know that having lots of partitions isn't greatly appreciated by PostgreSQL but I still would like to understand why the query planner is so slow in PostgreSQL 10 and now PostgreSQL 11.
An example of a query would be:
EXPLAIN ANALYZE
SELECT
table_a.a,
table_b.a
FROM
(
SELECT
a,
b
FROM
table_a
WHERE
partition_level_1_column = 'foo'
AND
partition_level_2_column = 'bar'
)
AS table_a
INNER JOIN
(
SELECT
a,
b
FROM
table_b
WHERE
partition_level_1_column = 'baz'
AND
partition_level_2_column = 'bat'
)
AS table_b
ON table_b.b = table_a.b
LIMIT
10;
Running it will on database with 5K partitions will return Planning Time: 7155.647 ms but Execution Time: 2.827 ms.

Postgres table partitioning not improving query speed

I have a query that's run on a table (TABLE_A) that's partitioned by recTime……
WITH subquery AS (
select count(*) AS cnt, date_trunc('day',recTime) AS recTime
from TABLE_A
WHERE (recTime >= to_timestamp('2018-Nov-03 00:00:00','YYYY-Mon-DD HH24:MI:SS')
AND recTime <= to_timestamp('2018-Nov-03 23:59:59','YYYY-Mon-DD HH24:MI:SS'))
GROUP BY date_trunc('day',recTime)
)
UPDATE sumDly
SET fixes = subquery.cnt
FROM subquery
WHERE
sumDly.Day = subquery.recTime
If I do an explain on the query as shown above it's apparent that the database is doing an index scan on each of the partition in the parent table. The associated cost is high and the elapsed time is ridiculous.
If I explicitly force the use of the partition that actually has the data in by replacing….
from TABLE_A
With….
from TABLE_A_20181103
Then the explain only uses the required partition, and the query takes only a few minutes (and the returned results are the same as before)
QUESTION - Why does the database want to scan all the partitions in the table? I though the whole idea of partitioning was to help the database eliminate vast swathes of unneeded data in the first pass rather than forcing a scan on all the indexes in individual partitions?
UPDATE - I am using version 10.5 of postgres
SET constraint_exclusion = on and make sure to hard-code the constraints into the query. Note that you have indeed hard-coded the constrains in the query:
...
WHERE (recTime >= to_timestamp('2018-Nov-03 00:00:00','YYYY-Mon-DD HH24:MI:SS')
AND recTime <= to_timestamp('2018-Nov-03 23:59:59','YYYY-Mon-DD HH24:MI:SS'))
...

Postgres: truncate / load causes basic queries to take seconds

I have a Postgres table that on a nightly basis gets truncated, and then reloaded in a bulk insert (a million or so records).
This table is behaving very strangely: basic queries such as "SELECT * from mytable LIMIT 10" are taking 40+ seconds. Records are narrow, just a couple integer columns.
Perplexed.. very much appreciate your advice.

Update with join condition is taking too long in redshift

I have a table in a Redshift cluster with 5 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example:
Update tbl1
set price=tbl2.price, flag=true
from tbl2 join tbl1 on tbl1.id=tbl2.id
where tbl1.time between (some value) and
tbl2.createtime between (some value)
I have sort key on time and dist key on id. When I checked stl_scan table, its shows that my query is scanning 50 million rows on each slice, and only returning 50K rows on each slice. I stopped the query after 20 mins.
For testing, I created same table with 1 billion rows and same update query took 3 mins.
When I run select with same condition I get the results in few seconds.Is there anything I am doing wrong?
I believe the correct syntax is:
Update tbl1
set price = tbl2.price,
flag = true
from tbl2
where tbl1.id = tbl2.id and
tbl1.time between (some value) and
tbl2.createtime between (some value);
Note that tbl1 is only mentoned once, in the update clause. There is no join, just a correlation clause.