Adding a partition from another DB to existing partitioned table in Postgres 12 - postgresql

Short description of a problem
I need to build a partitioned table "users" with 2 partitions located on separate servers (Moscow and Hamburg), each partition is table with columns:
id - integer primary key with auto increment
region - smallint partition key, which equals either 100 for Hamburg or 200 for Moscow
login - unique character varying with length of 100.
I intended to make sequences for id as n*1000 + 100 for Hamburg, and n*1000 + 200 for Moscow, so just looking on primary key I will know which partition it belongs to.
region is intended to be read only and never change after creation, so no records will move between partitions.
SELECT queries must be able to return records from all partitions and UPDATE queries must be able to modify records on all partitions, INSERT/DELETE queries must be able to add/delete records only to local partition, so data stored in them is not completely isolated.
What was done
Using pgAdmin4
I created a "test" table on Hamburg server, added all column info, marked it as partitioned table with partition key region and partition type List.
I created a "hamburg" partition in this table, adding primary key constraint as id,region and unique key constraint as login,region.
I created a "moscow" table on Moscow server with the same column info as "test"
I added postgres_fdw extension to Hamburg server, created Foreign server pointing to DB on Moscow server and User mapping.
I added "moscow" foreign table to Hamburg server pointing to "moscow" table on Moscow server.
What is my problem
I couldn't figure out how to attach this foreign table as second partition to "test" table.
When I tried to attach partition through pgAdmin dialog in "test" table partitions properties it shows me an error: cannot unpack non-iterable Response object
When I tried to add partition with query as follows:
ALTER TABLE public.test ATTACH PARTITION public.moscow FOR VALUES IN (200);
It shows me an error:
ERROR: cannot attach foreign table "moscow" as partition of partitioned table "test"
DETAIL: Table "test" contains unique indexes.
SQL state: 42809
I removed unique constraint from login column but it shows the same error.
When I make partitioned table with the same properties and both partitions initially located on the same server all works well, except for postgres watch for login uniqueness per-partition rather than in whole table, but I suggest this is its limitation.
So, how can I attach a table located on the second server as partition to partitioned table located on the first one?

The error message is pretty clear: Since you cannot create an index on a partitioned table, PostgreSQL cannot create a partition of the unique index. But the unique index is required to implement the constraint.
See this source comment:
/*
* If we're attaching a foreign table, we must fail if any of the indexes
* is a constraint index; otherwise, there's nothing to do here. Do this
* before starting work, to avoid wasting the effort of building a few
* non-unique indexes before coming across a unique one.
*/
Either drop the unique constraint or don't use foreign tables as partitions.

Ok, I was finally able to add a foreign table as partition to partitioned table.
It was necessary to drop primary key property on id and unique property on login columns for partitioned table
After that I was able to attach foreign table as partition to partitioned table
Later I have added primary key property on id and unique property on login columns for each local partition.
So in the end I have unique global id as it is generated by sequences for each DB with never intersected values. For login uniqueness I have to manually check global table if there is any record with it before inserting.
P.S. Hopefully, this partitioning mechanism in postgres is suitable for geographically distant regions.

Related

postgresql partitioned table primary key issue

I am working on a requirement to create partition for customer name table in Postgresql DB. Primary Key of this table is an auto generated sequence number.
Since regular table cannot be turned into a partitioned table, I am planning to create a new partitioned table with primary key as composite key (sequence number + first letter of the customer's name).
I am not sure if it is required to define different sequence number one for each partition of customer name or defining the sequence number for partitioned table once will work.

Adding indexes to a postgres table after it is created and and having partitions

I have a Postgres table named: services, and it has columns called id, mac_addr, dns_name, hash, and it is partitioned based on mac_addr, so the partition tables names look like: services_3eeeea123e3 and so on. And there are around 20K partition based on mac_addrs
Q1: there was no index created when the tables were created. so now, when I am trying to add an index CREATE INDEX idx_services_id on services (id), it throws an error ERROR: cannot create an index on partitioned table "services"
But I am able to add indexes to individual partitioned tables CREATE INDEX idx_services_3eeeea123e3 on services_3eeeea123e3 (id).
So do I have to create an index on each partition table now? Is there a way to create an index on the base table(services) itself, which will automatically create an index on each partition table?
Q2: When I run a select query, it is fast when I use the direct partition table; however, using the base table is very slow. Any idea what could be the reason.
Fast: SELECT id, dns_name, hash from services_3eeeea123e3 where id='123232'
very slow: SELECT id,dns_name, hash from services where mac_addr='3eeeea123e3' and id='123232'

Partition Based on Date is Problematic for Table's Primary Key - Any Workaround?

I have a table that I would like to partition based on the "deleted" date.
create table resources(
id int,
type varchar(30),
deleted date
)
I want to have a foreign key from another table pointing to this table's id column. However since I have the partition based on the deleted date, I must include it in the primary key. Adding the deleted column to the primary key does not make sense and will also prevent the other table's FK from pointing to this table. Is there a workaround?
Thanks
No, there is no workaround for foreign keys to a partitioned table. You have to add deleted to the primary key or unique constraint and also to the table that references the partitioned table.
If you don't want that, you'll have to go without a foreign key constraint.
Perhaps it would be useful to add deleted to the referencing table and partition that along the same boundaries as resources. Then you could get rid of old data in a coordinated fashion, and if you add deleted to the join condition, you can profit from enable_partitionwise_join = on.

How do I verify in CQL that all the rows have successfully copied from a CSV to a Cassandra table? ***SELECT statements are not returning all results

I am trying to understand Cassandra by playing with a public dataset.
I had inserted 1.5M rows from CSV to a table on my local instance of Cassandra, WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }
The table was created with one field as a partition key, and one more as primary key
I had a confirmation that 1.5M rows were processed.
COPY Completed
But when I run SELECT or SELECT COUNT(*) on the table, I always get a max of 182 rows.

Secondly, the number of records returned with clustered columns seem to higher than single columns which is not making sense to me. What am I missing from Cassandra's architecture and querying point of view.
Lastly I have also tried reading the same Cassandra table from pyspark shell, and it seems to be reading 182 rows too.
Your primary key is PRIMARY KEY (state, severity). With this primary key definition, all rows for accidents in the same state of same severity will overwrite each other. You probably only have 182 different (state, severity) combinations in your dataset.
You could include another clustering column to record the unique accident, like an accident_id
This blog highlights the importance of the primary key, and has some examples:
https://www.datastax.com/blog/2016/02/most-important-thing-know-cassandra-data-modeling-primary-key

Postgresql 11 Partitioning detail tables based on column in master table in foreign-key relationship

The improvements to declarative range-based partitioning in version 11 seem like they can really work for my use case, but I'm not completely sure how foreign keys work with partitions.
I have tables Files -< Segments -< Entries in which there are hundreds of Segments per File and hundreds of Entries per segment, so Entries is on the order of 10,000 times the size of Files. Files have a CreationDate field and customers will define a retention period so they'll drop old Entries. All of this clearly points to date-based partitions so it's quicker to query the most recent entries first, and easy to drop old ones.
The first issue I encounter is that when I try to create the Files table, it sounds like I have to include createdDate as part of the primary key in order to have it in the RANGE partition:
CREATE TABLE Files2
(
FileId BIGSERIAL PRIMARY KEY,
createdDate timestamp with time zone,
filepath character varying COLLATE pg_catalog."default" NOT NULL
) PARTITION BY RANGE (createdDate)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ERROR: insufficient columns in PRIMARY KEY constraint definition
DETAIL: PRIMARY KEY constraint on table "files2" lacks column "createddate" which is part of the partition key.
If I drop the "PRIMARY KEY" from the definition of FileId, I don't get the error, but does that affect the efficiency of child lookups?
I also don't know how to declare the partitions for the child table. PARTITION BY RANGE (Files.createdDate) doesn't work.
Since it's only in version 11 that this use case would even be possible, I haven't found much information about it and would appreciate any pointers! Thanks!
I believe it is a new feature of the version 11. There is a document from which you can get the following information:
"A primary key constraint must include a partition key column. Attempting to create a primary key constraint that does not contain partitioned columns results in an error"
https://h50146.www5.hpe.com/products/software/oe/linux/mainstream/support/lcc/pdf/PostgreSQL11NewFeaturesGAen20181022-1.pdf