How to stop clustering a table in PostgreSQL - postgresql

I've clustered a table using the following:
CLUSTER foos USING idx_foos_on_bar;
Now every time I run CLUSTER it reclusters that table (and all other tables with clustering) appropriately.
Now I want to stop reordering that one table (but still reorder all the others with a single CLUSTER command).
I don't see anything in the documentation about how to uncluster. Is this possible? Or do I have to completely drop and recreate the table?

http://www.postgresql.org/docs/9.3/static/sql-cluster.html
When a table is clustered, PostgreSQL remembers which index it was
clustered by. The form CLUSTER table_name reclusters the table using
the same index as before. You can also use the CLUSTER or SET WITHOUT
CLUSTER forms of ALTER TABLE to set the index to be used for future
cluster operations, or to clear any previous setting.
I think older versions didn't support the set without cluster option.

Related

pg_repack and logical replication: any risk to missing out on changes from the table while running pg_repack?

As I understand, pg_repack creates a temporary 'mirror' table (table B) and copies the rows from the original table (table A) and re-indexes them and then replaces the original with the mirror. The mirroring step creates a lot of noise with logical replication (a lot of inserts at once), so I'd like to ignore the mirror table from being replicated.
I'm a bit confused with what happens during the switch over though. Is there a risk with losing some changes? I don't think there is since all actual writes are still going to the original table before and after the switch, so it should be safe right?
We're running Postgres 10.7 on AWS Aurora, using wal2json as the output plugin for replication.
I have neither used pg_repack nor logical replication but according to pg_repack Github repository there is a possible issue using pg_repack with logical replication: see
https://github.com/reorg/pg_repack/issues/135
To perform a repack, pg_repack will:
create a log table to record changes made to the original table.
add a trigger onto the original table, logging INSERTs, UPDATEs, and DELETEs into our log table.
create a new table containing all the rows in the old table.
build indexes on this new table.
apply all changes which have occurred in the log table to the new table.
swap the tables, including indexes and toast tables, using the system catalogs.
drop the original table.
In my experience, the log table keeps all changes and applies them after build indexes, besides if repack needs to rollback changes applied on the original table too.

Create a customized slave of postgresql

I need to create a slave for BI purposes and I need to modify some tables (e.g., remove all passwords or sensitive data). My database is PostgreSQL. I wonder if I can do it in database layer or I should do it programmatically by writing a code to do the replication.
You could use logical replication and have replica enabled triggers (that fire ony on replication) that modify the data when they are applied:
ALTER TABLE mytab DISABLE TRIGGER mytrig;
ALTER TABLE mytab ENABLE REPLICA TRIGGER mytrig;
You have to make sure that no replication conflicts can arise from these modifications. For example, never modify a key column.
Replication conflicts would stop replication and break your system.
The traditional way to solve this problem is to use an ETL process. That way you can have a different data model on the target database and for example pre-aggregate data so that the data warehouse doesn't grow too big and has a data model optimized for analytical queries.

How do I cluster my PRIMARY KEY in postgres

I noticed in postgres when we create a table, it seems to automatically creates a btree index on the PRIMARY KEY CONSTRAINT. Looking at the properties of the CONSTRAINT, it would appear it is not clustered. How do I cluster it and should I cluster it?
You have to use the CLUSTER command:
CLUSTER stone.unitloaddetail USING pk10;
Remember that this rewrites the table and blocks it for others during that time.
Also, clustering is not maintained when table data are modified, so you have to schedule regular CLUSTER runs if you want to keep the table clustered.
Addressing the "should you" part, it depends on the likelihood of queries needing to access multiple rows having adjacent values of the clustering key.
For a table with a synthetic primary key, it probably makes more sense to cluster on a foreign key column.
Imagine that you have a table or products. Are you more likely to request multiple products having:
consecutive product_id?
the same location_id?
the same type_id?
the same manufacturer_id?
If it would solve a problem for you to improve the performance of the system for one of these particular cases, then that is the column by which you should consider clustering.
If doing so would not solve a problem, then do not do it.

Organize shards with citus data when tenant_id of a record is updated

I want to use citus data to shard my postgres database. before jumping into it i want to fully understand its behaviour in different scenarios. Although the docs explain most of the cases.
I want to know how would I go about moving data to a different shard when i update tenant_id of a record?
Citus errors out when you try to update the value of the partition column. You can move the data with INSERT INTO ... SELECT ... followed by a DELETE FROM ... inside a transaction.

Stop BDR from replicating DROP TABLE or CREATE TABLE

I have two databases with tables that I want to sync. I don't want to sync any other table. I'm using Postgres-BDR to do that.
Those tables are part of replication set common. There are some circumstances where other tables share a name across nodes (but are NOT in common), and a node will call DROP TABLE and then CREATE TABLE. Even though those tables aren't part of the common replication set, these commands are still replicated to the other nodes, causing the other node to lose all of its data in its table and then create an empty table.
How can I stop this? I only want commands that affect common to be replicated to the other nodes.
Nevermind, I found it. It's available with bdr.skip_ddl_replication.
I just put bdr.skip_ddl_replication = on in postgresql.conf, restarted the server, and BOOM! Works like a charm.
EDIT
It would be prudent of me to point out that the documentation warns that this option could break database replication if used improperly. But since I'll be VERY tightly controlling the table schema, it shouldn't cause any problems.