I have many time series stored in a PostgreSQL database over multiple tables. I would like to create a table 'anomalies' which references to time series with particuliar behaviour, for instance a value that is exceptionally high.
My question is the following: what is the best way to link the entries of 'anomalies' with other tables?
I could create a foreign key in each table referencing to an entry in anomaly, but then it would be not so obvious to go from the anomaly to the entry referencing the anomaly.
The other possibility I see is to store the name of the corresponding table in the entries of anomalies, but it does not seem like a good idea, as the table name might change, or the table might get deleted.
Is there a more elegant solution to do this?
CREATE TABLE type_1(
type_1_id SERIAL PRIMARY KEY,
type_1_name TEXT NOT NULL,
unique(type_1_name)
)
CREATE TABLE type_1_ts(
date DATE NOT NULL,
value REAL NOT NULL,
type_1_id INTEGER REFERENCES type_1(type_1_id) NOT NULL,
PRIMARY KEY(type_1_id, date)
)
CREATE TABLE type_2(
type_2_id SERIAL PRIMARY KEY,
type_2_name TEXT NOT NULL,
unique(type_2_name)
)
CREATE TABLE type_2_ts(
date DATE NOT NULL,
value REAL NOT NULL,
state INTEGER NOT NULL,
type_2_id INTEGER REFERENCES type_2(type_2_id) NOT NULL,
PRIMARY KEY(type_2_id, date)
)
CREATE TABLE anomalies(
anomaly_id SERIAL PRIMARY_KEY,
date DATE NOT NULL,
property TEXT NOT NULL,
value REAL NOT NULL,
-- reference to a table_name and an entry id?
table_name TEXT
data_id INEGER
)
What I'd like to do at the end is to be able to do:
SELECT * FROM ANOMALIES WHERE table_name='type_1',
or simply list the data_type corresponding to the entries
Related
I'm having trouble modeling data that has a parent table with a start and end date in its primary key, and a child table with a timestamp in its primary key that must fall within the range of the parent table's start and end dates. In fact, this problem is nested, as that parent table is actually the child to another table - a "grandparent" table - which also has start and end dates in its primary key; the parent table's start and end dates must likewise fit within the range of the grandparent table's start and end dates.
For background, I work at a water treatment company. We treat water by deploying water treatment machines to various sites as part of treatment contracts. In more specific terms:
There are various sites that need their water treated.
The sites create contracts with us so that we can treat water. The contracts always have a known start date, but the contracts can be for either a specific period of time or indefinitely, so the end date can be known or unknown (so NULLable end dates)
A single water treatment machine is deployed to a site at a time in order to fulfill contract requirements. If a machine breaks down in the middle of a contract and it needs to be replaced, we replace it with another machine under the same contract.
While machines are treating water under a contract, we collect treatment data from them.
Thus, we have to keep track of sites, treatment_contracts, machine_deployments, machines, and treatment_datapoints. A site can have multiple treatment_contracts, a treatment_contract can have multiple machine_deployments and multiple treatment_datapoints, and a machine can have multiple machine_deployments.
So a simplified version of the data I'm trying to model is this:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, start_date, end_date)
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (site_id) MATCH SIMPLE
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
<some provision to require that machine_deployment.start_date and machine_deployment.end_date are between treatment_contract.start_date and treatment_contract.end_date, and that machine_deployment.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
<some provision to require time_stamp is between treatment_contract.start_date and treatment_contract.end_date, and that treatment_datapoint.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
I'm not sure how to proceed because PostgreSQL can only enforce foreign key relationships where there is an exact match between all foreign key fields - there is no provision in foreign key constraints that can enforce something like child.timestamp BETWEEN parent.start AND parent.end. treatment_datapoint should have a foreign key to treatment_contract, as a treatment_datapoint without a treatment_contract would make no sense, but there seems to be no way to enforce this foreign key relationship. Is the answer just to use triggers instead? I've always been told to avoid using triggers to define parent:child relationships, as that's what foreign keys are for.
Either way, though, there's got to be a way to model this, as I can't imagine that I'm the only one who's ever needed to enforce that a date within a child table is within a range defined in the parent table.
In short: to enforce relationship where there is no foreign key - make one.
For your model to work you have to have a foreign key to treatment_contract and since the primary key of treatment_contract contains fields site_id, start_date, end_date you have to add the contract_start_date and contract_end_date to the tables you need to reference the contract, namely machine_deployment and treatment_datapoint.
To make your life easier I'd advice against using NULL for a not yet known end date of a contract and machine deployment. I would consider it to be a "magic number" that means "infinity". This is not required but makes checks simpler.
Also I'd add a check constraint to ensure a contract ends after it starts.
And lastly you can use a check constraint to validate deployment start and end and datapoint timestamp.
In the example bellow I use daterange and range operators in my checks. This is for convenience. You can achieve the same result with comparison operators (<,<=...).
My proposed variant of your schema is:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, start_date, end_date),
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (id) MATCH SIMPLE
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
CONSTRAINT fk_machine_deployment__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_machine_deploiment_period CHECK (start_date <= end_date),
CONSTRAINT chk_machine_deploiment_in_contract CHECK (pg_catalog.daterange(start_date, end_date,'[]') <# pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
CONSTRAINT fk_treatment_datapoint__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_datapoint_in_contract CHECK (time_stamp::date <# pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);
I have trigger function that automatically creates child tables based on date column from parent table (table 1). Hovewer I have to make modification to do that based on date column from another table (table 2)!
Is this possible at all? I have foreign key in table 1 which is linked with a id column in table 2.
I searched over the internet but mostly found different scripts for task I already solved (date column in parent table, not in another table).
EXAMPLE: Make monthly partitions of table invoice_details based on invoice_date in table invoice (foreign key invoice_details.invoice_id - > invoice.invoice_id)
CREATE TABLE public.invoice_details
(
id integer NOT NULL,
invoice_id integer NOT NULL,
charge_type integer,
charge_amount numeric(15,5),
charge_tax numeric(15,5),
charge_status character varying COLLATE pg_catalog."default")
TABLESPACE pg_default;
CREATE TABLE public.invoice
(
invoice_id integer NOT NULL,
customer character varying COLLATE pg_catalog."default",
invoice_date date NOT NULL)
I'm creating a database of assessments for courses using PostgreSQL.
I'd like assessment names to be unique within the course, but two courses can have assessments with the same name.
-- assessment contains the different assignments & labs that
-- students may submit their code to.
CREATE TABLE assessment (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
comments TEXT NOT NULL,
type ASSESSMENT_TYPE NOT NULL,
course_id SERIAL NOT NULL,
FOREIGN KEY (course_id) REFERENCES courses(id)
);
-- courses contains the information about a course. Since
-- the same course can run multiple times, a single course
-- is uniquely identified by (course_code, year, period)
CREATE TABLE courses (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL, -- Unique within all courses. Wrong!
course_code VARCHAR(20) NOT NULL,
period PERIOD NOT NULL,
year INTEGER NOT NULL
);
Two main points:
Can I do this without changing the schema?
If so, is there a more idiomatic solution that may include schema changes?
1. Can I do this without changing the schema?
No, since you have multiple issues here.
Your assessments are globally unique by name and not within a course.
assessment.course_id has its own sequence which is useless (SERIAL is just INTEGER + SEQUENCE)
Table courses defines a column data type that does not exist: PERIOD (at least not up to version 11)
2. If so, is there a more idiomatic solution that may include schema changes?
A modified schema that should do what you want would look like this following:
CREATE TABLE courses (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
course_code VARCHAR(20) NOT NULL,
period tstzrange NOT NULL
);
-- the following is required to build the proper unique constraint...
CREATE EXTENSION IF NOT EXISTS btree_gist;
-- the unique constraint: no two courses with same name at any point in time
ALTER TABLE courses
ADD CONSTRAINT idx_unique_courses
EXCLUDE USING GIST (name WITH =, period WITH &&);
CREATE TABLE assessment (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
comments TEXT NOT NULL,
type ASSESSMENT_TYPE NOT NULL,
course_id INTEGER NOT NULL REFERENCES courses(id),
UNIQUE (course_id, name)
);
I want to add revisioning for records in an existing application which stores data in a PostgreSQL database. I read about strategies e.g. in this question, this question and this blog post.
I think that the approach to create a second history table which will rarely be queried will work best. However I do have some practical problems. Let's say that this is my table I want to add revision control to:
create table people(
id serial not null primary key,
name varchar(255) not null
);
For this very simple table my history table could look like this:
create table people_history(
peopleId int not null references people(id) on delete cascade on update restrict,
revision int not null,
revisionTimestamp timestamptz not null default current_timestamp,
name character varying(255) not null,
primary key(peopleId, revision)
);
And this brings the first problems up:
How do I generate the revision number?
Of course I could create a sequence from which I request revision numbers which would be easy. However that would leave large gaps between revisions per person as many people share the same sequence and it would feel more natural if the revision numbers were ascending numbers without gaps per person.
So I am tempted to find my revision number by select max(revision)+1 from ... where peopleId=.... However that could lead to a race condition if two threads ask for the next revision number and try to insert. That is very unlikely I have to admit (especially in my case where only few updates happen anyway) and would not cause data to corrupt as that would be a duplicate primary key and thus cause a transaction rollback, but it is not pretty either. I wonder if there is a prettier solution.
How do I insert data into the history table?
Two ways come to mind: Manually on every statement that updates the main table or using a trigger. A trigger sounds less error-prone as it is less likely that I forget about a query somewhere. However I cannot communicate to the application exactly which revision number was just created, can I? So if I want to create a couple of event tables like this:
create table peopleUserEditEvent (
poepleId int not null,
revision int not null,
userId int not null references users(id) on delete set null on update restrict,
comment text not null default '',
primary key(paopleId, revision),
foreign key (peopleId, revision) references people_history
);
That lists some metadata for revisions which explains why the revision was changed. In this case a user with a specific ID edited the data and might have supplied a comment.
In another case (and another event table) a cronjob might have changed something and documents the event which probably has no userId and no comment but other metadata.
To add those event data I need the revision id and if the revision id was created by a trigger it will be difficult to find out (or is there a practical way to do so?).
Well, you need one replication strategy for all tables and column you have , you can create one table to maintain all changes and insert on anytime you make a UPDATE INSERT or DELETE statement, maybe with this exemple of framwork idempiere changelog can help you
CREATE TABLE ad_changelog (
ad_changelog_id NUMERIC(10,0) NOT NULL,
ad_session_id NUMERIC(10,0) NOT NULL,
ad_table_id NUMERIC(10,0) NOT NULL,
ad_column_id NUMERIC(10,0) NOT NULL,
isactive CHAR(1) DEFAULT 'Y'::bpchar NOT NULL,
created TIMESTAMP WITHOUT TIME ZONE DEFAULT now() NOT NULL,
createdby NUMERIC(10,0) NOT NULL,
updated TIMESTAMP WITHOUT TIME ZONE DEFAULT now() NOT NULL,
updatedby NUMERIC(10,0) NOT NULL,
record_id NUMERIC(10,0) NOT NULL,
oldvalue VARCHAR(2000),
newvalue VARCHAR(2000),
undo CHAR(1),
redo CHAR(1),
iscustomization CHAR(1) DEFAULT 'N'::bpchar NOT NULL,
description VARCHAR(255),
ad_changelog_uu VARCHAR(36) DEFAULT NULL::character varying,
CONSTRAINT adcolumn_adchangelog FOREIGN KEY (ad_column_id)
REFERENCES adempiere.ad_column(ad_column_id)
MATCH PARTIAL
ON DELETE CASCADE
ON UPDATE NO ACTION
DEFERRABLE
INITIALLY DEFERRED,
CONSTRAINT adsession_adchangelog FOREIGN KEY (ad_session_id)
REFERENCES adempiere.ad_session(ad_session_id)
MATCH PARTIAL
ON DELETE NO ACTION
ON UPDATE NO ACTION
DEFERRABLE
INITIALLY DEFERRED,
CONSTRAINT adtable_adchangelog FOREIGN KEY (ad_table_id)
REFERENCES adempiere.ad_table(ad_table_id)
MATCH PARTIAL
ON DELETE CASCADE
ON UPDATE NO ACTION
DEFERRABLE
INITIALLY DEFERRED
)
WITH (oids = false);
CREATE INDEX ad_changelog_speed ON adempiere.ad_changelog
USING btree (ad_table_id, record_id);
CREATE UNIQUE INDEX ad_changelog_uu_idx ON adempiere.ad_changelog
USING btree (ad_changelog_uu COLLATE pg_catalog."default");
I am learning Postgresql and db in general. I have a simple query like this and I want to understand what it does
CREATE TABLE adempiere.c_mom(
c_mom_id NUMERIC(10,0) NOT NULL,
isactive character(1) DEFAULT 'Y'::bpchar NOT NULL,
start_date date NOT NULL,
start_time timestamp without time zone NOT NULL,
end_time timestamp without time zone NOT NULL,
CONSTRAINT c_mom_pkey PRIMARY KEY (c_mom_id)
);
So after I execute this I got
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "c_mom_pkey" for table "c_mom"
Now I know that my PK is c_mom_id, but what is the purpose of creating an implicit index it under name c_mom_key?
What does DEFAULT 'Y'::bpchar, or in general what does :: in psql do?
Thank you
The :: notation is a PostgreSQL-specific type cast notation, in this case to type bpchar (blank-padded char).
An index is created to back primary keys to make them efficient. If there wasn't an index to back it, each insert statement would have to scan the whole table just to figure out if that insertion would create a duplicate key or not. Using an index speeds that up (dramatically if the table is large).
This is not PostgreSQL specific. A lot of relational databases will create unique indexes to back primary keys.