I'm trying to learn how sharding is configured in Postgres.
My Postgres setup has a temperature table which has 4 partitions each covering different range of "timestamp" value.
postgres=# \d+ temperature
Partitioned table "public.temperature"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+-----------------------------------------+---------+--------------+-------------
id | bigint | | not null | nextval('temperature_id_seq'::regclass) | plain | |
city_id | integer | | not null | | plain | |
timestamp | timestamp without time zone | | not null | | plain | |
temp | numeric(5,2) | | not null | | main | |
Partition key: RANGE ("timestamp")
Partitions: temperature_201901 FOR VALUES FROM ('2019-01-01 00:00:00') TO ('2019-02-01 00:00:00'),
temperature_201902 FOR VALUES FROM ('2019-02-01 00:00:00') TO ('2019-03-01 00:00:00'),
temperature_201903 FOR VALUES FROM ('2019-03-01 00:00:00') TO ('2019-04-01 00:00:00'),
temperature_201904 FOR VALUES FROM ('2019-04-01 00:00:00') TO ('2019-05-01 00:00:00')
temperature_201904 table, in particular, is a foreign table
postgres=# \d+ temperature_201904
Foreign table "public.temperature_201904"
Column | Type | Collation | Nullable | Default | FDW options | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+-----------------------------------------+-------------+---------+--------------+-------------
id | bigint | | not null | nextval('temperature_id_seq'::regclass) | | plain | |
city_id | integer | | not null | | | plain | |
timestamp | timestamp without time zone | | not null | | | plain | |
temp | numeric(5,2) | | not null | | | main | |
Partition of: temperature FOR VALUES FROM ('2019-04-01 00:00:00') TO ('2019-05-01 00:00:00')
Partition constraint: (("timestamp" IS NOT NULL) AND ("timestamp" >= '2019-04-01 00:00:00'::timestamp without time zone) AND ("timestamp" < '2019-05-01 00:00:00'::timestamp without time zone))
Server: shard02
Insert works as expected. If I insert the following value and check from the remote host shard02, then the value exists. Fantastic!
postgres=# select * from temperature_201904;
id | city_id | timestamp | temp
----+---------+---------------------+-------
1 | 1 | 2019-04-02 00:00:00 | 12.30
(1 row)
However, if I update the timestamp of this row such that it's no longer valid for the range defined for the partition, I'd expect it to get moved out and placed into the correct partition, temperature_201901, but it's not.
postgres=# update temperature set timestamp = '2019-01-04' where id=1;
UPDATE 1
postgres=# select * from temperature_201904 ;
id | city_id | timestamp | temp
----+---------+---------------------+-------
1 | 1 | 2019-01-04 00:00:00 | 12.30
Again, just to reiterate, this table has a range temperature_201904 FOR VALUES FROM ('2019-04-01 00:00:00') TO ('2019-05-01 00:00:00') and is a foreign table.
Feels like I'm missing something here.
Is this an expected behavior? If so, is there a way to configure such that data are automatically moved between nodes as their partition constraints are changed?
Thanks in advance!
postgres=# SELECT version();
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
This seems to be expected. From the docs
While rows can be moved from local partitions to a foreign-table partition (provided the foreign data wrapper supports tuple routing), they cannot be moved from a foreign-table partition to another partition.
Now I would have expected an ERROR rather than silently violating the implied constraint, but I wouldn't expect this to have worked the way to you want it to.
Related
I have two geographies in a postgres database, one represents the America/Detroit timezone, the other represents Etc/GMT+1.
These clearly do not intersect, however my call to ST_Intersects returns true.
why is this? the areas do not share any points with each other.
more information below...
mapped out in pgAdmin
WITH detroit AS (SELECT * FROM timezone_boundary WHERE tzid='America/Detroit')
SELECT tb.tzid FROM timezone_boundary tb
INNER JOIN detroit d
ON ST_Intersects(tb.geo_coords, d.geo_coords) IS TRUE AND tb.tzid='Etc/GMT+1';
-[ RECORD 1 ]---
tzid | Etc/GMT+1
my schema...
Table "public.timezone_boundary"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+--------------------
id | uuid | | not null | uuid_generate_v4()
tzid | text | | not null |
geo_coords | geography | | not null |
created_at | timestamp without time zone | | not null | now()
updated_at | timestamp without time zone | | not null | now()
Indexes:
"timezone_boundary_pkey" PRIMARY KEY, btree (id)
"timezone_boundary_geo_coords_idx" gist (geo_coords)
"timezone_boundary_tzid_key" UNIQUE CONSTRAINT, btree (tzid)
the envelopes...
SELECT ST_AsGeoJSON(ST_Envelope(geo_coords::geometry)) as envelope FROM timezone_boundary WHERE tzid='America/Detroit' OR tzid='Etc/GMT+1';
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------------
envelope | {"type":"Polygon","coordinates":[[[-90.41862,41.696128],[-90.41862,48.306063],[-82.122806,48.306063],[-82.122806,41.696128],[-90.41862,41.696128]]]}
-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------------------------------------------------
envelope | {"type":"Polygon","coordinates":[[[-22.5,-72.222222],[-22.5,90],[-7.5,90],[-7.5,-72.222222],[-22.5,-72.222222]]]}
postgres version
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------
version | PostgreSQL 12.7 (Debian 12.7-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
postgis version
-[ RECORD 1 ]--------+----------------------------------------------------------------------------------------------------------------------------------------------------------
postgis_full_version | POSTGIS="3.0.3 6660953" [EXTENSION] PGSQL="120" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.12.1"
I'm using ksqlDB version 0.11 (I cannot upgrade to newer versions at the moment), and willing to replicate a TABLE data into MySQL using JDBC Sink connector. ksqlDB v0.11 does not support multiple TABLE keys, and my data needs to be grouped using multiple GROUP BY expression.
Using this statement I create the table:
CREATE TABLE estads AS SELECT
STID AS stid,
ASIG AS asig,
COUNT(*) AS np,
MIN(NOTA) AS min,
MAX(NOTA) AS max,
AVG(NOTA) AS med,
LATEST_BY_OFFSET(FECHREG) AS fechreg
FROM estads_stm GROUP BY stid, asig EMIT CHANGES;
The resulting table has the following schema:
Name : ESTADS
Field | Type
---------------------------------------------
KSQL_COL_0 | VARCHAR(STRING) (primary key)
NP | BIGINT
MIN | DOUBLE
MAX | DOUBLE
MED | DOUBLE
FECHREG | VARCHAR(STRING)
As you can see, the two primary keys (stid and asig) has been merged into a field called KSQL_COL_0, which is the expected behavior for version 0.11. The problem is that I need to use JDBC Sink connector to replicate the data into a MySQL table with the following schema:
+---------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+-------------------+-----------------------------+
| stid | varchar(15) | NO | PRI | NULL | |
| asig | varchar(10) | NO | PRI | NULL | |
| np | smallint(6) | YES | | NULL | |
| min | decimal(5,2) | YES | | NULL | |
| max | decimal(5,2) | YES | | NULL | |
| med | decimal(5,2) | YES | | NULL | |
| fechreg | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+---------+--------------+------+-----+-------------------+-----------------------------+
I don't know how to "unmerge" the automatically generated KSQL_COL_0 in order to tell JDBC that both stid and asig are primary keys in the MySQL table. Any ideas how to manage this? I know that since ksqlDB version 0.15 this is no longer a problem, as ksqlDB tables support multiple keys, but as I said, upgrading is not an option in my case.
Thanks!
I figured it out.
Basically you need to use AS_VALUE() clause in the table creation query. This way you copy the value of both private keys in new columns while also have the newly created private key in its own column. Then, simply specify in the JCBD Sink Connector to get the values of all the columns except the newly created private key.
CREATE TABLE estads AS SELECT
STID AS k1,
ASIG AS k2,
AS_VALUE(STID) AS stid,
AS_VALUE(ASIG) AS asig,
COUNT(*) AS np,
MIN(NOTA) AS min,
MAX(NOTA) AS max,
AVG(NOTA) AS med,
LATEST_BY_OFFSET(FECHREG) AS fechreg
FROM estads_stm GROUP BY k1, k2 EMIT CHANGES;
Let's say I have this table:
ams=# \d player
Table "public.player"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+-------------------
id | integer | | not null |
created | timestamp with time zone | | not null | CURRENT_TIMESTAMP
player_info | jsonb | | not null |
And then I have this:
ams=# \d report
Table "public.report"
Column | Type | Collation | Nullable | Default
---------+--------------------------+-----------+----------+---------
id | integer | | not null |
created | timestamp with time zone | | not null |
data | jsonb[] | | not null |
How can I take the player_info from all the rows in the player table and insert that into a single row in the report table (into the data jsonb[] field)? My attempts with jsonb_agg() return a jsonb, and I can't for the life of me figure out how to go from jsonb to jsonb[]. Any pointers would be very much appreciated! Thanks in advance.
If you plainly want to copy the values, just treat it like any other data type, and use ARRAY_AGG.
SELECT ARRAY_AGG(player_info)
FROM player
WHERE id IN (...)
should return something of type json[].
Since jsonb[] is an array at the type level in PostgreSQL vs. a json array, use array_agg() instead of jsonb_agg().
insert into report
select 1 as id, now() as created, array_agg(player_info)
from player
;
I recently did a migration from a RDS postgresql to Aurora postgresql. The tables were migrated successfully but the tables are missing their defaults, constraints and references. It also did not migrate any sequences.
Table in source database:
Table "public.addons_snack"
Column | Type | Collation | Nullable | Default
---------------+--------------------------+-----------+----------+------------------------------------------
id | integer | | not null | nextval('addons_snack_id_seq'::regclass)
name | character varying(100) | | not null |
snack_type | character varying(2) | | not null |
price | integer | | not null |
created | timestamp with time zone | | not null |
modified | timestamp with time zone | | not null |
date | date | | |
Indexes:
"addons_snack_pkey" PRIMARY KEY, btree (id)
Check constraints:
"addons_snack_price_check" CHECK (price >= 0)
Referenced by:
TABLE "addons_snackreservation" CONSTRAINT "addons_snackreservation_snack_id_373507cf_fk_addons_snack_id" FOREIGN KEY (snack_id) REFERENCES addons_snack(id) DEFERRABLE INITIALLY DEFERRED
Tables in target database
Table "public.addons_snack"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+---------
id | integer | | not null |
name | character varying(100) | | not null |
snack_type | character varying(2) | | not null |
price | integer | | not null |
created | timestamp(6) with time zone | | not null |
modified | timestamp(6) with time zone | | not null |
date | date | | |
Indexes:
"addons_snack_pkey" PRIMARY KEY, btree (id)
Did I do something wrong or DMS is not capable of doing this?
This SQL Snippet will be a clear answer for you.
You can restore Index and Constraint by using pg_dump and pg_restore, and the snippet consists of executing them.
I apologize for the lengthy post. I'm trying to get in all the details.
We recently upgraded our Postgres AWS RDS from 9.5 to 11.1.
We have several large partitioned tables implemented using inheritance that we are considering converting to declarative partitioning.
(I'm talking about 5TB of partitioned data). I want to be sure of my methodology before I push forward.
For example here is how we would have created a partitioned table with inheritance. The table has a primary key and an index. The inherited partition has a check constraint and an index. (Not shown is the trigger on the primary table that would put the new rows in the correct partition.)
CREATE TABLE test
(
date_key numeric(15,0) NOT NULL,
metric numeric(15,0) NOT NULL,
value numeric(28,5) NOT NULL,
CONSTRAINT test_pkey PRIMARY KEY (date_key,metric)
)
TABLESPACE pg_default;
CREATE INDEX test_idx1
ON test USING btree
(metric)
TABLESPACE pg_default;
CREATE TABLE test_201908
(
CONSTRAINT const_test_chk CHECK (date_key >= 20190801::numeric AND date_key <= 20190831::numeric)
)
INHERITS (test)
TABLESPACE pg_default;
CREATE INDEX test_idx1_201908
ON test_201908 USING btree
(metric)
TABLESPACE pg_default;
AMZGQ3DW=> \d+ edibben.test
Table "edibben.test"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------+-----------+----------+---------+---------+--------------+-------------
date_key | numeric(15,0) | | not null | | main | |
metric | numeric(15,0) | | not null | | main | |
value | numeric(28,5) | | not null | | main | |
Indexes:
"test_pkey" PRIMARY KEY, btree (date_key, metric)
"test_idx1" btree (metric)
Child tables: edibben.test_201908
AMZGQ3DW=> \d+ edibben.test_201908
Table "edibben.test_201908"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------+-----------+----------+---------+---------+--------------+-------------
date_key | numeric(15,0) | | not null | | main | |
metric | numeric(15,0) | | not null | | main | |
value | numeric(28,5) | | not null | | main | |
Indexes:
"test_idx1_201908" btree (metric)
Check constraints:
"const_test_chk" CHECK (date_key >= 20190801::numeric AND date_key <= 20190831::numeric)
Inherits: edibben.test
I know that I can convert this table into a declarative partitioned table by doing the following:
Create a new partitioned table:
CREATE TABLE test_part
(
date_key numeric(15,0) NOT NULL,
metric numeric(15,0) NOT NULL,
value numeric(28,5) NOT NULL,
CONSTRAINT test_part_pkey PRIMARY KEY (date_key,metric)
) PARTITION BY RANGE (date_key)
TABLESPACE pg_default;
CREATE INDEX test_part_idx1
ON test_part USING btree
(metric)
TABLESPACE pg_default;
Drop the inheritance on the test_201908 table:
alter table test_201908 no inherit test;
And then add this table to the partitioned table. The doco says to keep the check constraint in place until after the data is loaded.
alter table test_part
attach partition test_201908
for VALUES FROM (20190801) TO (20190831);
The partition shows up attached to the table:
\d+ edibben.test_part
Table "edibben.test_part"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------+-----------+----------+---------+---------+--------------+-------------
date_key | numeric(15,0) | | not null | | main | |
metric | numeric(15,0) | | not null | | main | |
value | numeric(28,5) | | not null | | main | |
Partition key: RANGE (date_key)
Indexes:
"test_part_pkey" PRIMARY KEY, btree (date_key, metric)
"test_part_idx1" btree (metric)
Partitions: edibben.test_201908 FOR VALUES FROM ('20190801') TO ('20190831')
My question is about what happens to the indexes. When you examine the partition you see the primary key inherited from the partition table
and the original index (test_idx1_201908).
AMZGQ3DW-> \d+ edibben.test_201908
Table "edibben.test_201908"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------+-----------+----------+---------+---------+--------------+-------------
date_key | numeric(15,0) | | not null | | main | |
metric | numeric(15,0) | | not null | | main | |
value | numeric(28,5) | | not null | | main | |
Partition of: edibben.test_part FOR VALUES FROM ('20190801') TO ('20190831')
Partition constraint: ((date_key IS NOT NULL) AND (date_key >= '20190801'::numeric(15,0)) AND (date_key < '20190831'::numeric(15,0)))
Indexes:
"test_201908_pkey" PRIMARY KEY, btree (date_key, metric)
"test_idx1_201908" btree (metric)
Check constraints:
"const_test_chk" CHECK (date_key >= 20190801::numeric AND date_key <= 20190831::numeric)
If I add a new partition to the test_part table
CREATE TABLE test_201909 PARTITION OF test_part
FOR VALUES FROM ('20190901') TO ('20190930');
The new table has the primary key and the index but the index has a system generated name.
$\d+ edibben.test_201909
Table "edibben.test_201909"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------+-----------+----------+---------+---------+--------------+-------------
date_key | numeric(15,0) | | not null | | main | |
metric | numeric(15,0) | | not null | | main | |
value | numeric(28,5) | | not null | | main | |
Partition of: edibben.test_part FOR VALUES FROM ('20190901') TO ('20190930')
Partition constraint: ((date_key IS NOT NULL) AND (date_key >= '20190901'::numeric(15,0)) AND (date_key < '20190930'::numeric(15,0)))
Indexes:
"test_201909_pkey" PRIMARY KEY, btree (date_key, metric)
"test_201909_metric_idx" btree (metric)
Looking at pg_class for the objects I just created:
AMZGQ3DW=> select relname, reltype, relkind,relowner from pg_class where relname like 'test%';
relname | reltype | relkind | relowner
------------------------+---------+---------+----------
test_201908 | 365444 | r | 98603
test_201908_pkey | 0 | i | 98603
test_idx1_201908 | 0 | i | 98603
test_201909 | 366498 | r | 98603
test_201909_metric_idx | 0 | i | 98603
test_201909_pkey | 0 | i | 98603
test_part | 365449 | p | 98603
test_part_idx1 | 0 | I | 98603
test_part_pkey | 0 | I | 98603
The indexes on the partitioned table have a relkind of I and the indexes on the partitions have a rekind of i. Looking at pg_indexes
there are no entries for the indexes on the primary table:
AMZGQ3DW=> select schemaname, tablename, indexname from pg_indexes where schemaname = 'edibben' and tablename = 'test_part';
schemaname | tablename | indexname
------------+-----------+-----------
(0 rows)
The indexes on the partitions do show up:
AMZGQ3DW=> select schemaname, tablename, indexname from pg_indexes where schemaname = 'edibben' and tablename like 'test%' order by tablename;
schemaname | tablename | indexname
------------+-------------+------------------------
edibben | test_201908 | test_201908_pkey
edibben | test_201908 | test_idx1_201908
edibben | test_201909 | test_201909_pkey
edibben | test_201909 | test_201909_metric_idx
So, is this partitioned table properly indexed? (yes there was a question buried in this mess). I can't find any documentation of how the
partitioned indexes work but it appears that the partitioned 'Index' is just a definition and that the real indexes are on the partitions themselves. Is there a way to list all of the indexes associated with a partitioned index? Is there a way to see if the partitioned index is valid?
Also, the doco talks about creating the index on the partitioned table with the CREATE INDEX ON ONLY option. I don't think this
applies to what I need to do. Am I right?
"As explained above, it is possible to create indexes on partitioned tables and they are applied automatically
to the entire hierarchy. This is very convenient, as not only the existing partitions will become indexed,
but also any partitions that are created in the future will. One limitation is that it's not possible to use
the CONCURRENTLY qualifier when creating such a partitioned index. To overcome long lock times,
it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index is marked invalid,
and the partitions do not get the index applied automatically. The indexes on partitions can be created
separately using CONCURRENTLY, and later attached to the index on the parent using ALTER INDEX .. ATTACH PARTITION.
Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically."
The index test_idx1_201908 is automatically converted to be a partition of the partitioned index test_201909_metric_idx. It does not matter that its name is different from other index partitions.
You can verify that with the following query:
SELECT relispartition FROM pg_class WHERE relname = 'test_idx1_201908';
The result should be TRUE, signifying that the index is a partition of a partitioned index.
I have two remarks unrelated to your question:
I notice that you didn't define the upper bound of the range for the partitioning key wrong.
The upper bound you specify is excluded, so you should write
CREATE TABLE test_201909 PARTITION OF test_part
FOR VALUES FROM ('20190901') TO ('20191001');
It is probably too late for that, but you should have chosen date rather than numeric for the partitioning column.
That would make everything simpler and more readable, and it would be impossible to enter incorrect dates like 20190335.