Logical Replication of table with many fields, functions and indexes

Logical Replication of table with many fields, functions and indexes - postgresql

I have a table with 16 millions rows that needs to be replicated. The table has a total of 115 columns. Within those, there are 50 varchars columns and primary key with uuid that has 36 chars. There are 15 indexes,some dependencies and some functions.
The step I took :
make a copy of the table to the subscriber. So it doesn't have to go through the initialization phase of replication process.
create replica identity : default
create publication
create subscription : CREATE SUBSCRIPTION try1 CONNECTION 'host=10.100.9.40 port=5432 user=jhon password=gr3AT dbname=db_profile_20210714' PUBLICATION pub_try1 with (copy_data = false);
No data get sent to the subscriber.
I check the wall difference on the Publisher at 3 seconds interval using: SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) FROM pg_stat_replication .
The result of running it 3 times are : 10000, 5000 and 0. It cycles again over time, following more or less similar pattern.
Here my settings of the database :
name |setting |unit
-------------------------+----------+----
archive_command |(disabled)|
archive_mode |off |
archive_timeout |0 |s
hot_standby |on |
max_replication_slots |10 |
max_wal_senders |5 |
max_wal_size |8192 |MB
min_wal_size |2048 |MB
synchronous_standby_names|* |
wal_level |logical |
wal_log_hints |off |
wal_sender_timeout |60000 |ms
I tried another replications of the same table with less data (5000 data and empty) on the Publisher, with the same number of columns and indexes etc but without functions. I create 2 subscription with option copy_data : false and copy_data : true. Then I enter some data into the table using query on the Publisher : INSERT INTO table_publisher INTO SELECT * FROM table. both working normally, data get inserted into the subscriber.
Questions :
Does size matters in replication ? I don't think so
The table is actually got its data from an app. Does it matter if the table got data from apps ?
Does Functions block the replication ?
How can I make this replication work ?
Thanks

Related

How to make big postgres db faster?

I have big Postgres database(around 75 GB) and queries are very slow. Is there any way to make them faster?
About database:
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------------------+----------+----------+-------------+---------------+------------+-------------
public | fingerprints | table | postgres | permanent | heap | 35 GB |
public | songs | table | postgres | permanent | heap | 26 MB |
public | songs_song_id_seq | sequence | postgres | permanent | | 8192 bytes |
\d+ fingerprints
Table "public.fingerprints"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
hash | bytea | | not null | | extended | | |
song_id | integer | | not null | | plain | | |
offset | integer | | not null | | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"ix_fingerprints_hash" hash (hash)
"uq_fingerprints" UNIQUE CONSTRAINT, btree (song_id, "offset", hash)
Foreign-key constraints:
"fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
DB Scheme
DB Amount
No need to write to database, only read. All queries are very simple:
SELECT song_id
WHERE hash in fingerpints = X
EXPLAIN(analyze, buffers, format text) SELECT "song_id", "offset" FROM "fingerprints" WHERE "hash" = decode('eeafdd7ce9130f9697','hex');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_fingerprints_hash on fingerprints (cost=0.00..288.28 rows=256 width=8) (actual time=0.553..234.257 rows=871 loops=1)
Index Cond: (hash = '\xeeafdd7ce9130f9697'::bytea)
Buffers: shared hit=118 read=749
Planning Time: 0.225 ms
Execution Time: 234.463 ms
(5 rows)
234 ms looks fine where it is one query. But in reality there 3000 query per time, that takes about 600 seconds. It is audio recognition application, so algoritm works like that.
About indexes:
CREATE INDEX "ix_fingerprints_hash" ON "fingerprints" USING hash ("hash");
For pooler I use Odyssey.
Little bit of info from config:
shared_buffers = 4GB
huge_pages = try
work_mem = 582kB
maintenance_work_mem = 2GB
effective_io_concurrency = 200
max_worker_processes = 24
max_parallel_workers_per_gather = 12
max_parallel_maintenance_workers = 4
max_parallel_workers = 24
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 16GB
min_wal_size = 4GB
random_page_cost = 1.1
effective_cache_size = 12GB
Info about hardware:
Xeon 12 core (24 threads)
RAM DDR4 16 GB ECC
NVME disk
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)? And what parameters should I change to say to Postgres to store db in ram?
I read about several topics about pg_tune, etc. but experiments don't show any good results.

Increasing the RAM so that everything can stay in cache (perhaps after using pg_prewarm to get it into cache in the first place) would certainly work. But it is expensive and shouldn't be necessary.
Having a hash index on something which is already a hashed value is probably not very helpful. Have you tried just a default (btree) index instead?
If you CLUSTER the table on the index over the column named "hash" (which you can only do if it is a btree index) then rows with the same hash code should mostly share the same table page, which would greatly cut down on the number of different buffer reads needed to fetch them all.
If you could get it do a bitmap heap scan instead of an index scan, then it should be able to have a large number of read requests outstanding at a time, due to effective_io_concurrency. But the planner does not account for effective_io_concurrency when doing planning, which means it won't choose a bitmap heap scan specifically to get it that benefit. Normally an index read finding hundreds of rows on different pages would automatically choose a bitmap heap scan method, but in your case it is probably the low setting of random_page_cost which is inhibiting it from doing so. The low setting of random_page_cost is probably reasonable in itself, but it does have this unfortunate side effect. A problem with this strategy is that it doesn't reduce the overall amount of IO needed, it just allows them overlap and so make better use of multiple IO channels. But if many sessions are running many instances of this query at the same time, they will start filling up those channels and so start competing with each other. So the CLUSTER method is probably superior as it gets the same answer with less IO. If you want to play around with bitmap scans, you could temporarily increase random_page_cost or temporarily set enable_indexscan to off.

No need to write to database, only read.
So the DB is read-only.
And in comments:
db worked fine on small amount of data(few GB), but after i filled out database started to slowdown.
So indexes have been built up incrementally.
Indexes
UNIQUE CONSTRAINT on (song_id, "offset", hash)
I would replace that with:
ALTER TABLE fingerprints
DROP CONSTRAINT uq_fingerprints
, ADD CONSTRAINT uq_fingerprints UNIQUE(hash, song_id, "offset") WITH (FILLFACTOR = 100)
This enforces the same constraint, but the leading hash column in the underlying B-tree index now supports the filter on hash in your displayed query. And the fact that all needed columns are included in the index, further allows much faster index-only scans. The (smaller) index should also be more easily cached than the (bigger) table (plus index).
See:
Is a composite index also good for queries on the first field?
Also rewrites the index in pristine condition, and with FILLFACTOR 100 for the read-only DB. (Instead of the default 90 for a B-tree index.)
Hash index on (hash) and CLUSTER
The name of the column "hash" has nothing to do with the name of the index type, which also happens to be "hash". (The column should probably not be named "hash" to begin with.)
If (and only if) you also have other queries centered around one of few hash values, that cannot use index-only scans (and you actually see faster queries than without) keep the hash index, additionally. And optimize it. (Else drop it!)
ALTER INDEX ix_fingerprints_hash SET (FILLFACTOR = 100);
An incrementally grown index may end up with bloat or unbalanced overflow pages in case of a hash index. REINDEX should take care of that. While being at it, increase FILLFACTER to 100 (from the default 75 for a hash index) for your read-only (!) DB. You can REINDEX to make the change effective.
REINDEX INDEX ix_fingerprints_hash;
Or you can CLUSTER (like jjanes already suggested) on the rearranged B-tree index from above:
CLUSTER fingerprints USING uq_fingerprints;
Rewrites the table and all indexes; rows are physically sorted according to the given index, so "clustered" around the leading column(s). Effects are permanent for your read-only DB. But index-only scans do not benefit from this.
When done optimizing, run once:
VACUUM ANALYZE fingerprints;
work_mem
The tiny setting for work_mem stands out:
work_mem = 582kB
Even the (very conservative!) default is 4MB.
But after reading your question again, it would seem you only have tiny queries. So maybe that's ok after all.
Else, with 16GB RAM you can typically afford a 100 times as much. Depends on your work load of course.
Many small queries, many parallel workers --> keep small work_mem (like 4MB?)
Few big queries, few parallel workers --> go high (like 256MB? or more)
Large amounts of temporary files written in your database over time, and mentions of "disk" in the output of EXPLAIN ANALYZE would indicate the need for more work_mem.
Additional questions
Will the database be accelerated by purchase more RAM to handle all DB inside (128 GB in example)?
More RAM almost always helps until the whole DB can be cached in RAM and all processes can afford all the work_mem they desire.
And what parameters should I change to say to Postgres to store db in ram?
Everything that's read from the database is cached automatically in system cache and Postgres cache, up to the limit of available RAM. (Setting work_mem too high competes for that same resource.)

Postgresql Prune replicated data from outbox table

Problem Statement
In order to ensure disk size isn't growing unnecessary, I want to be able to delete rows that have been replicated from my outbox table.
Context
Postgres is at v12
We are using a Kafka source connector to stream changes made to a postgres table. These changes are insert only and thus are no longer needed once written to kafka. The source connector is using logical replication to stream the changes to the connector and the state of the replication can be displayed in pg_replication_slots.
When looking at the pg_replication_slots you can see useful data that it's storing in order to know what logs it has to keep to ensure replication can still happen for the client.
For example when I run:
select * from pg_replication_slots;
I might see:
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+----------+-----------+--------+--------------------+-----------+--------+------------+------+--------------+-------------+---------------------
debezium | wal2json | logical | 26593 | database_name | f | t | 7404 | | 26729 | 0/DCD98E8 | 0/DCD9920
(1 row)
What I'm interested in knowing is if I can reliably use that data and then the postgresql metadata on the table to select all rows that have been replicated from that slot.
For example, this doesn't work as far as I can tell, but ideally would return rows that have been replicated and are now safe to prune from the table:
select * from outbox where age(xmin) < (select age(catalog_xmin) from pg_replication_slots);
Any guidance would be sweet! Cheers!

I have been implementing the Outbox pattern using Debezium with MySQL and delete the outbox record straight after inserting it which I saw done here https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/ The insert is picked up and sent and the delete is ignored. So essentially there should never be anything in the outbox table(outside of the transaction).
I also pre-generate the primary keys for the entries(which I use for the event ID in Kafka) so I can bulk insert and delete.

Circling back around to this, I had to think a bit differently around how I we could tie the replications progress to our outbox table. Previously in my question I was trying to glean progress from pg_replication_slots, but in this working example I switched to using pg_stat_replication. This table can be queried by the slot_name we care about and can return lag results. For an example:
SELECT * FROM outbox WHERE created_at < (SELECT(NOW() - COALESCE(replay_lag, interval '60 seconds')) as stale_time from pg_stat_replication where pg_stat_replication.slot_name = 'outbox_slot');
So here this will return to us rows from our outbox table that were inserted outside of our replay_lag time or 1 minute.

Why does Postgres VACUUM FULL ANALYZE gives performance boost but VACUUM ANALYZE does not

I have a large database with the largest tables having more than 30 million records. The database server is a dedicated server with 64 cores, 128 GB RAM running ubuntu and postgres 12. So the server is more powerful than we normally need. The server receives around 300-400 new records every second.
The problem is that almost after 1 week or 10 days of use the database becomes extremely slow, therefore we have to perform VACUUM FULL ANALYZE, and after this everything goes back to normal. But we have to put our server in maintenance mode and then perform this operation every week which is a pain.
I came up with the idea that we don't need a VACUUM FULL and we can just run ANALYZE on the database as it can run in parallel, but this didn't work. There was no performance gains after running this. Even when i run simple VACUUM on the whole database and then run ANALYZE after it, it still doesn't give the kind of performance boost that we get from VACUUM FULL ANALYZE.
I know that VACUUM FULL copies the data from the old table to a new tables and deletes the old table. But what else does it do?
Update:
So i have also reindexed the 15 largest tables, in order to confirm if this would speed up the database. But this also didnt work.
So i had to execute VACUUM FULL ANALYZE, as i didnt see any other way. Now i am trying to identify the slow queries.
Thanks to jjanes, i was able to install Track_io_timing and also identified a few queries where indexes can be added. I am using like this
SELECT * FROM pg_stat_statements ORDER BY total_time DESC;
And i get this result.
userid | 10
dbid | 16401
queryid | -3264485807545194012
query | update events set field1 = $1, field2 = $2 , field3= $3, field4 = $4 , field5 =$5 where id = $6
calls | 104559
total_time | 106180828.60536088
min_time | 3.326082
max_time | 259055.09376800002
mean_time | 1015.5111334783633
stddev_time | 1665.0715182035976
rows | 104559
shared_blks_hit | 4456728574
shared_blks_read | 4838722113
shared_blks_dirtied | 879809
shared_blks_written | 326809
local_blks_hit | 0
local_blks_read | 0
local_blks_dirtied | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0
blk_read_time | 15074237.05887792
blk_write_time | 15691.634870000113
This query simply updates 1 record, and the table size is around 30 Million records.
Question: This query already uses an index, can you please guide on what should be the next step and why is this slow? Also IO information does this show?

As you say, VACUUM FULL is an expensive command. PGs secret weapon is AUTOVACUUM, which monitors database stats and attempts to target tables with dead tuples. Read about how to tune it for the database as a whole, and possibly for big tables.

Postgres: How to efficiently bucketize random event ids below (hour,config_id,sensor_id)

I have a large table "measurement" with 4 columns:
measurement-service=> \d measurement
Table "public.measurement"
Column | Type | Collation | Nullable | Default
-----------------------+-----------------------------+-----------+----------+---------
hour | timestamp without time zone | | not null |
config_id | bigint | | not null |
sensor_id | bigint | | not null |
event_id | uuid | | not null |
Partition key: RANGE (hour)
Indexes:
"hour_config_id_sensor_id_event_id_key" UNIQUE CONSTRAINT, btree (hour, config_id, sensor_id, event_id)
Number of partitions: 137 (Use \d+ to list them.)
An example of a partition name: "measurement_y2019m12d04"
And then i insert a lot of events as CSV via COPY to a temporary table, and from there i copy the table directly into the partition using ON CONFLICT DO NOTHING.
Example:
CREATE TEMPORARY TABLE 'tmp_measurement_y2019m12d04T02_12345' (
hour timestamp without timezone,
config_id bigint,
sensor_id bigint,
event_id uuid
) ON COMMIT DROP;
[...]
COPY tmp_measurement_y2019m12d04T02_12345 FROM STDIN DELIMITER ',' CSV HEADER;
INSERT INTO measurement_y2019m12d04 (SELECT * FROM tmp_measurement_y2019m12d04T02_12345) ON CONFLICT DO NOTHING;
I think i help postgres by sending CSV with data of the same hour only. Also within that hour, i remove all duplicates in the CSV. Therefore the CSV only contains unique rows.
But i send many batches for different hours. There is no order. It can be the hour of today, yesterday, the last week. Etc.
My approach worked alright so far, but i think i have reached a limit now. The insertion speed has become very slow. While the CPU is idle, i have 25% i/o wait. Subsystem is a RAID with several TB, using disks, that are not SSD.
maintenance_work_mem = 32GB
max_wal_size = 1GB
fsync = off
max_worker_processes = 256
wal_buffers = -1
shared_buffers = 64GB
temp_buffers = 4GB
effective_io_concurrency = 1000
effective_cache_size = 128GB
Each partition per day is around 20gb big and contains no more than 500m rows. And by maintaining the unique index per partition, i just duplicated the data once more.
The lookup speed, on the other hand, is quick.
I think the limit is in the maintenance of the btree with the rather random UUIDs in (hour,config_id,sensor_id). I constantly change it, its written out and has to be re-read.
I am wondering, if there is another approach. Basically i want uniqueness for (hour,config_id,sensor_id,event_id) and then a quick lookup per (hour,config_id,sensor_id).
I am considering removal of the unique index and only having an index over (hour,config_id,sensor_id). And then providing the uniqueness on the reader side. But it may slow down the reading, as the event_id can no longer be delivered via the index, when i lookup via (hour,config_id,sensor_id). It has to access the actual row to get the event_id.
Or i provide uniqueness via a hash index.
Any other ideas are welcome!
Thank you.

When you do the insert, you should specify an ORDER BY which matches the index of the table being inserted into:
INSERT INTO measurement_y2019m12d04
SELECT * FROM tmp_measurement_y2019m12d04T02_12345
order by hour, config_id, sensor_id, event_id
Only if this fails to give enough improvement would I consider any of the other options you list.
Hash indexes don't provide uniqueness. You can simulate it with an exclusion constraint, but I think they are less efficient. Exclusion constraints do support DO NOTHING, but not support DO UPDATE. So as long as your use case does not evolve to want DO UPDATE, you would be good on that front, but I still doubt it would actually solve the problem. If your bottleneck is IO from updating the index, hash would only make it worse as it is designed to scatter your data all over the place, rather than focus it in a small cacheable area.
You also mention parallel processing. For inserting into the temp table, that might be fine. But I wouldn't do the INSERT...SELECT in parallel. If IO is your bottleneck, that would probably just make it worse. Of course if IO is no longer the bottleneck after my ORDER BY suggestion, then ignore this part.

DB2 table partitioning and delete old records based on condition

I have a table with few million records.
___________________________________________________________
| col1 | col2 | col3 | some_indicator | last_updated_date |
-----------------------------------------------------------
| | | | yes | 2009-06-09.12.2345|
-----------------------------------------------------------
| | | | yes | 2009-07-09.11.6145|
-----------------------------------------------------------
| | | | no | 2009-06-09.12.2345|
-----------------------------------------------------------
I have to delete records which are older than month with some_indicator=no.
Again I have to delete records older than year with some_indicator=yes.This job will run everyday.
Can I use db2 partitioning feature for above requirement?.
How can I partition table using last_updated_date column and above two some_indicator values?
one partition should contain records falling under monthly delete criterion whereas other should contain yearly delete criterion records.
Are there any performance issues associated with table partitioning if this table is being frequently read,upserted?
Any other best practices for above requirement will surely help.

I haven't done much with partitioning (I've mostly worked with DB2 on the iSeries), but from what I understand, you don't generally want to be shuffling things between partitions (ie - making the partition '1 month ago'). I'm not even sure if it's even possible. If it was, you'd have to scan some (potentially large) portion of your table every day, just to move it (select, insert, delete, in a transaction).
Besides which, partitioning is a DB Admin problem, and it sounds like you just have a DB User problem - namely, deleting 'old' records. I'd just do this in a couple of statements:
DELETE FROM myTable
WHERE some_indicator = 'no'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 MONTH, TIME('00:00:00'))
and
DELETE FROM myTable
WHERE some_indicator = 'yes'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 YEAR, TIME('00:00:00'))
.... and you can pretty much ignore using a transaction, as you want the rows gone.
(as a side note, using 'yes' and 'no' for indicators is terrible. If you're not on a version that has a logical (boolean) type, store character '0' (false) and '1' (true))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse