ERROR: could not create exclusion constraint - postgresql

I have a table:
CREATE TABLE attendances
(
id_attendance serial PRIMARY KEY,
id_user integer NOT NULL
REFERENCES users (user_id) ON UPDATE CASCADE ON DELETE CASCADE,
entry_date timestamp with time zone DEFAULT NULL,
departure_date timestamp with time zone DEFAULT NULL,
created_at timestamp with time zone DEFAULT current_timestamp
);
I want to add an exclusion constraint avoiding attendance to overlap (There can be multiple rows for the same day, but time ranges cannot overlap).
So I wrote this code to add the constraint:
ALTER TABLE attendances
ADD CONSTRAINT check_attendance_overlaps
EXCLUDE USING GIST (box(
point(
extract(epoch from entry_date at time zone 'UTC'),
id_user
),
point(
extract(epoch from departure_date at time zone 'UTC') - 0.5,
id_user + 0.5
)
)
WITH && );
But when I tried to run it on the database I got this error:
Error: could not create exclusion constraint "check_attendance_overlaps"

To exclude overlapping time ranges per user, work with a multicolumn constraint on id_user and a timestamptz range (tstzrange).
You need the additional module btree_gist once per database:
CREATE EXTENSION IF NOT EXISTS btree_gist;
Then:
ALTER TABLE attendances ADD CONSTRAINT check_attendance_overlaps
EXCLUDE USING gist (id_user WITH =
, tstzrange(entry_date, departure_date) WITH &&)
See:
Store the day of the week and time?
Postgres constraint for unique datetime range
Or maybe spgist instead of gist. Might be faster. See:
Perform this hours of operation query in PostgreSQL
Of course, there cannot be overlapping rows in the table, or adding the constraint will fail with a similar error message.

Related

Multicolum index vs singel column index for time series data in Postgres

This table started out at short term storage for meter data before it was going to be validated and added to some long term storage tables.
Turns out the clients wants to keep this data for a long time since we saved it and it is growing fast.
create table metering_meterreading
(
id bigserial not null. # Primary Key
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
meter_device_id uuid not null, # FK to meter_device, BTREE index
series_id uuid not null # FK to series, BTREE index
organization_id uuid not null. # FK to org , BTREE index
);
I am planning on dropping the primary key since (org_id, meter_device_id, series_id, timestamp) makes it unique. It was just added by my ORM (django) and I didn't care when we started.
But since I pretty much always want to filter in organization, meter_device, and series to get a range of time series data I am wondering if it would be more efficient to have a multicolumn index on (organization_id, meter_device_id, series_id, timestamp) instead of the separate indexes.
I read somewhere that if I had a range it should be the rightmost in the index.
This is still not an super efficient table for timeseries data, since it will grow large, but I am planning in fixing that by partitioning on range, or maybe even use Timescale. But before partitioning I would like it to be as efficient as possible to look up data in it.
I also saw an example somewhere that used a separate table to identify the metric:
create table metric
(
id
organization_id
meter_device_id
series_id
) UNIQE (organization_id, meter_device_id, series_id)
;
create table metering_meterreading
(
metric_id. bigserial, FK to metric, BTREE index
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
);
But I am not sure if that is actually better than just putting them all in table. It might impact ingestion rate since there is another table involved now.
If (org_id, meter_device_id, series_id, timestamp) uniquely determine a table row, you need to use a multi-column primary key over all of them. So you automatically have a 4-column index on these columns. Just make sure that timestamp is last in the list, then that index will support your query ideally.

How to enforce a one-to-many relationship in PostgreSQL where there is no exact foreign key match between child and parent rows?

I'm having trouble modeling data that has a parent table with a start and end date in its primary key, and a child table with a timestamp in its primary key that must fall within the range of the parent table's start and end dates. In fact, this problem is nested, as that parent table is actually the child to another table - a "grandparent" table - which also has start and end dates in its primary key; the parent table's start and end dates must likewise fit within the range of the grandparent table's start and end dates.
For background, I work at a water treatment company. We treat water by deploying water treatment machines to various sites as part of treatment contracts. In more specific terms:
There are various sites that need their water treated.
The sites create contracts with us so that we can treat water. The contracts always have a known start date, but the contracts can be for either a specific period of time or indefinitely, so the end date can be known or unknown (so NULLable end dates)
A single water treatment machine is deployed to a site at a time in order to fulfill contract requirements. If a machine breaks down in the middle of a contract and it needs to be replaced, we replace it with another machine under the same contract.
While machines are treating water under a contract, we collect treatment data from them.
Thus, we have to keep track of sites, treatment_contracts, machine_deployments, machines, and treatment_datapoints. A site can have multiple treatment_contracts, a treatment_contract can have multiple machine_deployments and multiple treatment_datapoints, and a machine can have multiple machine_deployments.
So a simplified version of the data I'm trying to model is this:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, start_date, end_date)
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (site_id) MATCH SIMPLE
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
<some provision to require that machine_deployment.start_date and machine_deployment.end_date are between treatment_contract.start_date and treatment_contract.end_date, and that machine_deployment.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
<some provision to require time_stamp is between treatment_contract.start_date and treatment_contract.end_date, and that treatment_datapoint.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
I'm not sure how to proceed because PostgreSQL can only enforce foreign key relationships where there is an exact match between all foreign key fields - there is no provision in foreign key constraints that can enforce something like child.timestamp BETWEEN parent.start AND parent.end. treatment_datapoint should have a foreign key to treatment_contract, as a treatment_datapoint without a treatment_contract would make no sense, but there seems to be no way to enforce this foreign key relationship. Is the answer just to use triggers instead? I've always been told to avoid using triggers to define parent:child relationships, as that's what foreign keys are for.
Either way, though, there's got to be a way to model this, as I can't imagine that I'm the only one who's ever needed to enforce that a date within a child table is within a range defined in the parent table.
In short: to enforce relationship where there is no foreign key - make one.
For your model to work you have to have a foreign key to treatment_contract and since the primary key of treatment_contract contains fields site_id, start_date, end_date you have to add the contract_start_date and contract_end_date to the tables you need to reference the contract, namely machine_deployment and treatment_datapoint.
To make your life easier I'd advice against using NULL for a not yet known end date of a contract and machine deployment. I would consider it to be a "magic number" that means "infinity". This is not required but makes checks simpler.
Also I'd add a check constraint to ensure a contract ends after it starts.
And lastly you can use a check constraint to validate deployment start and end and datapoint timestamp.
In the example bellow I use daterange and range operators in my checks. This is for convenience. You can achieve the same result with comparison operators (<,<=...).
My proposed variant of your schema is:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, start_date, end_date),
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (id) MATCH SIMPLE
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
CONSTRAINT fk_machine_deployment__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_machine_deploiment_period CHECK (start_date <= end_date),
CONSTRAINT chk_machine_deploiment_in_contract CHECK (pg_catalog.daterange(start_date, end_date,'[]') <# pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
CONSTRAINT fk_treatment_datapoint__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_datapoint_in_contract CHECK (time_stamp::date <# pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);

Applying unique constraint of date on TIMESTAMP column in postgresql

I have a postgresql table as
CREATE TABLE IF NOT EXISTS table_name
(
expiry_date DATE NOT NULL,
created_at TIMESTAMP with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP(0),
CONSTRAINT user_review_uniq_key UNIQUE (expiry_date, created_at::date) -- my wrong attempt of using ::
)
I want to put uniue constraint on this table in such a way that expiry_date and date of created_at should be unique. Problem is created_at column is timestamp not date.
so is there any way to put unique constraint such that expire_date and created_at::date should be unique?
My attempt was to use
CONSTRAINT user_review_uniq_key UNIQUE (expiry_date, created_at::date) which is not valid.
If you do not need a time zone for your created date : create a unique index has follows :
create unique index idx_user_review_uniq_key on table_name (expiry_date, cast(created_at as date));
If you need that badly to have a time zone then you need to use a little trick (https://gist.github.com/cobusc/5875282) :
create unique index idx_user_review_uniq_key on table_name (expiry_date, date(created_at at TIME zone 'UTC'));

How can I use a WHERE BETWEEN clause in an INSERT query?

I have the following table.
CREATE TABLE public.ad
(
id integer NOT NULL DEFAULT nextval('ad_id_seq'::regclass),
uuid uuid NOT NULL DEFAULT uuid_generate_v4(),
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
cmdb_id integer,
platform character varying(100),
bidfloor numeric(15,6),
views integer NOT NULL DEFAULT 1,
year integer,
month integer,
day integer,
CONSTRAINT ad_pkey PRIMARY KEY (id),
CONSTRAINT ad_cmdb_id_foreign FOREIGN KEY (cmdb_id)
REFERENCES public.cmdb (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT ad_id_unique UNIQUE (uuid)
)
WITH (
OIDS=FALSE
);
Without going in too much detail, this table logs all the requests and impressions of advertisements on electronic screens throughout the country. This table is also being used to generate reports and consists of +- 50 million records.
Currently, the reports are filtered on the created_at timestamp. You can imagine that with +- 50 million records the query will get slow, even with an index on the created_at column. The reports are generated by selecting between which dates you want to request the data on the UI of the system.
The year, month and day columns are new columns that I just added to make the reporting more efficient. Instead of indexing on the date, I want the system to index on a year, month and day, all separate values.
The newly added columns are still empty. I want to run a query that inserts a value where the created_at column is between two dates. For example:
INSERT INTO ad (year) VALUES (2016) WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-12-31 23:59:59';
This doesn't work of course. I cannot seem to find anything on the internet where an INSERT statement makes use of a WHERE BETWEEN clause. I also tried using subqueries and the WITH clausule to generate a series of years between 2012 and 2020 using generate_series. It all didn't work out.
You don't want to insert new rows, you should update your table.
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE column_name BETWEEN value1 AND value2;
Otherwise you'll have 100 milions rows ;)

Postgres LIKE unique constraint possible?

I'm new to Postgres and am creating a table (metrics_reaches) using pgAdmin III.
In my table, I have an insertion_timestamp of timestamp with timezome type column.
I'd like to create a UNIQUE constraint that, amongst other fields, checks only the date portion of the insertion_timestamp and not the time.
Is there a way to do that? Here's what my script looks like at the moment (see the last CONSTRAINT).
-- Table: metrics_reaches
-- DROP TABLE metrics_reaches;
CREATE TABLE metrics_reaches
(
organizations_id integer NOT NULL,
applications_id integer NOT NULL,
countries_id integer NOT NULL,
platforms_id integer NOT NULL,
...
insertion_timestamp timestamp with time zone NOT NULL,
id serial NOT NULL,
CONSTRAINT metrics_reaches_pkey PRIMARY KEY (id),
CONSTRAINT metrics_reaches_applications_id_fkey FOREIGN KEY (applications_id)
REFERENCES applications (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT metrics_reaches_countries_id_fkey FOREIGN KEY (countries_id)
REFERENCES countries (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT metrics_reaches_organizations_id_fkey FOREIGN KEY (organizations_id)
REFERENCES organizations (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT metrics_reaches_platforms_id_fkey FOREIGN KEY (platforms_id)
REFERENCES platforms (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT metrics_reaches_organizations_id_key UNIQUE (organizations_id, applications_id, countries_id, platforms_id, insertion_timestamp)
)
WITH (
OIDS=FALSE
);
ALTER TABLE metrics_reaches
OWNER TO postgres;
Try a CAST():
CONSTRAINT metrics_reaches_organizations_id_key UNIQUE (
organizations_id,
applications_id,
countries_id,
platforms_id,
CAST(insertion_timestamp AS date)
)
This is really a comment to Frank's answer, but it's too long for the comment box.
If you are being paranoid, you need to watch the local timezone carefully when dealing with date casts:
bookings=> SET timezone='GMT';
SET
bookings=> SELECT now() at time zone 'GMT', (now() at time zone 'GMT')::date, now(), now()::date;
timezone | timezone | now | now
---------------------------+------------+------------------------------+------------
2013-05-30 19:36:04.23684 | 2013-05-30 | 2013-05-30 19:36:04.23684+00 | 2013-05-30
(1 row)
bookings=> set timezone='GMT-7';
SET
bookings=> SELECT now() at time zone 'GMT', (now() at time zone 'GMT')::date, now(), now()::date;
timezone | timezone | now | now
----------------------------+------------+-------------------------------+------------
2013-05-30 19:36:13.723558 | 2013-05-30 | 2013-05-31 02:36:13.723558+07 | 2013-05-31
Now, PG is smart enough to know this is a problem, and if you try to create a constraint with a date cast then you should see something like:
ERROR: functions in index expression must be marked IMMUTABLE
If you try to cast after applying "at time zone" then it really is immutable and you can have your constraint.
Of course the other option is to wrap the cast in a function and mark the function as immutable. If you're going to lie to the system like that though, don't come complaining when your database behaves oddly a year from now.