Very slow INSERT and strange behaviour with TABLOCK - tsql

I have a table like this
CREATE TABLE dbo.X(VID int NOT NULL, CID int NOT NULL)
There is a clustered columnstore index on this table.
I'm inserting data in batches, each batch consisting of 45 to 200 million rows.
The duration for a ~50 million insert is 2:40 minutes. The server uses 1 core.
If I add with (tablock/x) the duration is 1:50 minute. The server uses all cores for like 10-20 seconds and after that only one core. Why is this like that?
Inserting the data into a temp table is 4 seconds.
Is there anything I can do to speed things up?
Edit: These are my wait types

Related

High speed single row inserts with PostgreSQL & TimescaleDB

I have a case with a TSDB Hypertable looking approximately like this:
CREATE TABLE data (
pool_id INTEGER NOT NULL,
ts TIMESTAMP NOT NULL,
noise_err DECIMAL,
noise_val DECIMAL,
signal_err DECIMAL,
signal_val DECIMAL,
high_val DECIMAL,
low_val DECIMAL,
CONSTRAINT data_pid_fk FOREIGN KEY (pool_id) REFERENCES pools (id) ON DELETE CASCADE
);
CREATE UNIQUE INDEX data_pts_idx ON data (pool_id, ts);
SELECT create_hypertable('data', 'ts', 'pool_id', 100);
There are ~1000 pools, data contains >1 year of minute records for every pool, and quite a few analytical queries working with the last 3 to 5 days of data. New data is coming with arbitrary delay: 10ms to 30s depending on the pool.
Now the problem: I need to run analytical queries as fast as possible after the new record has been received, hence I can't insert in batches, and I need to speed up single row insertions.
I've run timescaledb-tune, then turned off synchronous commits (synchronous_commit = off), played with unlogged table mode, and tried to disable the auto vacuum, which didn't help much.
The best insert time I get is ~37ms and degrading when concurrent inserts start to 110ms.
What else except removing indexes/constraints can I do to speed up single row inserts?
First, why use timescaledb for this table in the first place? What are you getting from it that is worth this slowdown?
Second, you have 5200 partitions per year worth of data. That is approaching an unmanageable number of partitions.
I question the requirement for analytical queries that need to see the latest split second of data.
Anyway, the way to speed up single row inserts is:
Set synchronous_commit to off.
But be aware that that means data loss of around half a second of committed transactions in the event of a crash! If that is unacceptable, play with commit_siblings and commit_delay; that will also reduce the number of WAL flushes.
Use prepared statements. With single row inserts, planning time will be significant.
Don't use unlogged tables unless you don't mind if you lose the data after a crash.
Don't disable autovacuum.
Increase max_wal_size to get no more checkpoints than is healthy.

PostgresSql 9.6 deletes suddenly became slow

I have a database table where debug log entries are recorded. There are no foreign keys - it is a single standalone table.
I wrote a utility to delete a number of entries starting with the oldest.
There are 65 million entries so I deleted them 100,000 at a time to give some progress feedback to the user.
There is a primary key column called id
All was going fine until it got to about 5,000,000 million records remaining. Then it started taking over 1 minute to execute.
What is more, if I used PgAdmin and type the query in myself, but use an Id that I know is less than the minimum id, it still takes over one minute to execute!
I.e: delete from public.inettklog where id <= 56301001
And I know the min(id) is 56301002
Here is the result of an explain analyze
Your stats are way out of date. It thinks it will find 30 million rows, but instead finds zero. ANALYZE the table.

Postgres multi-column index is taking forever to complete

I have a table with around 270,000,000 rows and this is how I created it.
CREATE TABLE init_package_details AS
SELECT pcont.package_content_id as package_content_id,
pcont.activity_id as activity_id,
pc.org_id as org_id,
pc.bed_type as bed_type,
pc.is_override as is_override,
pmmap.package_id as package_id,
pcont.activity_qty as activity_qty,
pcont.charge_head as charge_head,
pcont.activity_charge as charge,
COALESCE(pc.charge,0) - COALESCE(pc.discount,0) as package_charge
FROM a pc
JOIN b od ON
(od.org_id = pc.org_id AND od.status='A')
JOIN c pm ON
(pc.package_id=pm.package_id)
JOIN d pmmap ON
(pmmap.pack_master_id=pm.package_id)
JOIN e pcont ON
(pcont.package_id=pmmap.package_id);
I need to build index on the init_package_details table.
This table is getting created at around 5-6 mins.
I have created btree index like,
CREATE INDEX init_package_details_package_content_id_idx
ON init_package_details(package_content_id);`
which is taking 10 mins (More than the time to create and populate the table itself)
And, when I create another index like,
CREATE INDEX init_package_details_package_act_org_bt_id_idx
ON init_package_details(activity_id,org_id,bed_type);
It just freezes and taking forever to complete. I waited for around 30 mins before I manually cancelled it.
Below are stats from iotop -o if it helps,
When I created table Averaging around 110-120 MB/s (This is how 270 million rows got inserted in 5-6 mins)
When I created First Index, It was averaging at around 70 MB/s
On second index, it is snailing at 5-7 MB/s
Could someone explain Why is this happening? Is there anyway I can speedup the index creations here?
EDIT 1: There are no other connections accessing the table. And, pg_stat_activity shows active as status throughout the running time. This happens inside a transaction (this is happening between BEGIN and COMMIT, it contains many other scripts in same .sql file).
EDIT 2:
postgres=# show work_mem ;
work_mem
----------
5MB
(1 row)
postgres=# show maintenance_work_mem;
maintenance_work_mem
----------------------
16MB
Building indexes takes a long time, that's normal.
If you are not bottlenecked on I/O, you are probably on CPU.
There are a few things to improve the performance:
Set maintenance_work_mem very high.
Use PostgreSQL v11 or better, where several parallel workers can be used.

Postgres: truncate / load causes basic queries to take seconds

I have a Postgres table that on a nightly basis gets truncated, and then reloaded in a bulk insert (a million or so records).
This table is behaving very strangely: basic queries such as "SELECT * from mytable LIMIT 10" are taking 40+ seconds. Records are narrow, just a couple integer columns.
Perplexed.. very much appreciate your advice.

fast growing table in postgresql

We run postgresql 9.5.2 in an RDS instance. One thing we noticed was that a certain table sometimes grow very rapidly in size.
The table in question has only 33k rows and ~600 columns. All columns are numeric (decimal(25, 6)). After vacuum full, the "total_bytes" as reported in the following query
select c.relname, pg_total_relation_size(c.oid) AS total_bytes
from pg_class c;
is about 150MB. However, we observed this grew to 71GB at one point. In a recent episode, total_bytes grew by 10GB in a 30 minute period.
During the episode mentioned above, we had a batch update query that runs ~4 times per minute that updates every record in the table. However, during other times table size remained constant despite similar update activities.
I understand this is probably caused by "dead records" being left over from the updates. Indeed when this table grow too big simply running vacuum full will shrink it to its normal size (150M). My questions are
have other people experienced similar rapid growth in table size in postgresql and is this normal?
if our batch update queries are causing the rapid growth in table size, why doesn't it happen every time? In fact I tried to to reproduce it manually by running something like
update my_table set x = x * 2
but couldn't -- table size remained the same before and after the query.
The problem is having 600 columns in a single table, which is never a good idea. This is going to cause a lot of problems, table size is just one of them.
From the PostgreSQL docs...
The actual storage requirement [for numeric values] is two bytes for each group of four decimal digits, plus three to eight bytes overhead.
So decimal(25, 6) is something like 8 + (31 / 4 * 2) or about 24 bytes per column. At 600 columns per row that's about 14,400 bytes per row or 14k per row. At 33,000 rows that's about 450 megs.
If you're updating every row 4 times per minute, that's going to leave about 1.8 gigs per minute of dead rows.
You should fix your schema design.
You shouldn't need to touch every row of a table 4 times a minute.
You should ask a question about redesigning that table and process.