What would be the best strategy for partitioning a 'users' table that looks like this:
create table users
(
id uuid default gen_random_uuid() primary key,
username varchar not null unique,
password varchar not null,
email varchar not null unique,
created_at timestamp with time zone default now(),
updated_at timestamp with time zone default now()
)
The requirements for the table are:
support UPDATE of username, password, email, updated_at; and DELETE of records.
foreign keys referencing it, including the ones with on delete cascade clause.
the table has unique constraints.
NOTE: the last 2 requirements can be omitted, by implementing the logic in the application level.
The objective is to make the standard authentication flow queries faster.
Related
This table started out at short term storage for meter data before it was going to be validated and added to some long term storage tables.
Turns out the clients wants to keep this data for a long time since we saved it and it is growing fast.
create table metering_meterreading
(
id bigserial not null. # Primary Key
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
meter_device_id uuid not null, # FK to meter_device, BTREE index
series_id uuid not null # FK to series, BTREE index
organization_id uuid not null. # FK to org , BTREE index
);
I am planning on dropping the primary key since (org_id, meter_device_id, series_id, timestamp) makes it unique. It was just added by my ORM (django) and I didn't care when we started.
But since I pretty much always want to filter in organization, meter_device, and series to get a range of time series data I am wondering if it would be more efficient to have a multicolumn index on (organization_id, meter_device_id, series_id, timestamp) instead of the separate indexes.
I read somewhere that if I had a range it should be the rightmost in the index.
This is still not an super efficient table for timeseries data, since it will grow large, but I am planning in fixing that by partitioning on range, or maybe even use Timescale. But before partitioning I would like it to be as efficient as possible to look up data in it.
I also saw an example somewhere that used a separate table to identify the metric:
create table metric
(
id
organization_id
meter_device_id
series_id
) UNIQE (organization_id, meter_device_id, series_id)
;
create table metering_meterreading
(
metric_id. bigserial, FK to metric, BTREE index
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
);
But I am not sure if that is actually better than just putting them all in table. It might impact ingestion rate since there is another table involved now.
If (org_id, meter_device_id, series_id, timestamp) uniquely determine a table row, you need to use a multi-column primary key over all of them. So you automatically have a 4-column index on these columns. Just make sure that timestamp is last in the list, then that index will support your query ideally.
I'm having trouble modeling data that has a parent table with a start and end date in its primary key, and a child table with a timestamp in its primary key that must fall within the range of the parent table's start and end dates. In fact, this problem is nested, as that parent table is actually the child to another table - a "grandparent" table - which also has start and end dates in its primary key; the parent table's start and end dates must likewise fit within the range of the grandparent table's start and end dates.
For background, I work at a water treatment company. We treat water by deploying water treatment machines to various sites as part of treatment contracts. In more specific terms:
There are various sites that need their water treated.
The sites create contracts with us so that we can treat water. The contracts always have a known start date, but the contracts can be for either a specific period of time or indefinitely, so the end date can be known or unknown (so NULLable end dates)
A single water treatment machine is deployed to a site at a time in order to fulfill contract requirements. If a machine breaks down in the middle of a contract and it needs to be replaced, we replace it with another machine under the same contract.
While machines are treating water under a contract, we collect treatment data from them.
Thus, we have to keep track of sites, treatment_contracts, machine_deployments, machines, and treatment_datapoints. A site can have multiple treatment_contracts, a treatment_contract can have multiple machine_deployments and multiple treatment_datapoints, and a machine can have multiple machine_deployments.
So a simplified version of the data I'm trying to model is this:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, start_date, end_date)
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (site_id) MATCH SIMPLE
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
<some provision to require that machine_deployment.start_date and machine_deployment.end_date are between treatment_contract.start_date and treatment_contract.end_date, and that machine_deployment.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
<some provision to require time_stamp is between treatment_contract.start_date and treatment_contract.end_date, and that treatment_datapoint.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
I'm not sure how to proceed because PostgreSQL can only enforce foreign key relationships where there is an exact match between all foreign key fields - there is no provision in foreign key constraints that can enforce something like child.timestamp BETWEEN parent.start AND parent.end. treatment_datapoint should have a foreign key to treatment_contract, as a treatment_datapoint without a treatment_contract would make no sense, but there seems to be no way to enforce this foreign key relationship. Is the answer just to use triggers instead? I've always been told to avoid using triggers to define parent:child relationships, as that's what foreign keys are for.
Either way, though, there's got to be a way to model this, as I can't imagine that I'm the only one who's ever needed to enforce that a date within a child table is within a range defined in the parent table.
In short: to enforce relationship where there is no foreign key - make one.
For your model to work you have to have a foreign key to treatment_contract and since the primary key of treatment_contract contains fields site_id, start_date, end_date you have to add the contract_start_date and contract_end_date to the tables you need to reference the contract, namely machine_deployment and treatment_datapoint.
To make your life easier I'd advice against using NULL for a not yet known end date of a contract and machine deployment. I would consider it to be a "magic number" that means "infinity". This is not required but makes checks simpler.
Also I'd add a check constraint to ensure a contract ends after it starts.
And lastly you can use a check constraint to validate deployment start and end and datapoint timestamp.
In the example bellow I use daterange and range operators in my checks. This is for convenience. You can achieve the same result with comparison operators (<,<=...).
My proposed variant of your schema is:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, start_date, end_date),
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (id) MATCH SIMPLE
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
CONSTRAINT fk_machine_deployment__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_machine_deploiment_period CHECK (start_date <= end_date),
CONSTRAINT chk_machine_deploiment_in_contract CHECK (pg_catalog.daterange(start_date, end_date,'[]') <# pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
CONSTRAINT fk_treatment_datapoint__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_datapoint_in_contract CHECK (time_stamp::date <# pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);
I'm creating a database of assessments for courses using PostgreSQL.
I'd like assessment names to be unique within the course, but two courses can have assessments with the same name.
-- assessment contains the different assignments & labs that
-- students may submit their code to.
CREATE TABLE assessment (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
comments TEXT NOT NULL,
type ASSESSMENT_TYPE NOT NULL,
course_id SERIAL NOT NULL,
FOREIGN KEY (course_id) REFERENCES courses(id)
);
-- courses contains the information about a course. Since
-- the same course can run multiple times, a single course
-- is uniquely identified by (course_code, year, period)
CREATE TABLE courses (
id SERIAL PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL, -- Unique within all courses. Wrong!
course_code VARCHAR(20) NOT NULL,
period PERIOD NOT NULL,
year INTEGER NOT NULL
);
Two main points:
Can I do this without changing the schema?
If so, is there a more idiomatic solution that may include schema changes?
1. Can I do this without changing the schema?
No, since you have multiple issues here.
Your assessments are globally unique by name and not within a course.
assessment.course_id has its own sequence which is useless (SERIAL is just INTEGER + SEQUENCE)
Table courses defines a column data type that does not exist: PERIOD (at least not up to version 11)
2. If so, is there a more idiomatic solution that may include schema changes?
A modified schema that should do what you want would look like this following:
CREATE TABLE courses (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
course_code VARCHAR(20) NOT NULL,
period tstzrange NOT NULL
);
-- the following is required to build the proper unique constraint...
CREATE EXTENSION IF NOT EXISTS btree_gist;
-- the unique constraint: no two courses with same name at any point in time
ALTER TABLE courses
ADD CONSTRAINT idx_unique_courses
EXCLUDE USING GIST (name WITH =, period WITH &&);
CREATE TABLE assessment (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
comments TEXT NOT NULL,
type ASSESSMENT_TYPE NOT NULL,
course_id INTEGER NOT NULL REFERENCES courses(id),
UNIQUE (course_id, name)
);
I'm trying to implement an Audit table design in PostgreSQL, where I have different types of user id's that can be audited.
Let's say I have a table named admins (which belong to an organization), and table superadmins (which don't).
CREATE TABLE example.organizations (
id SERIAL UNIQUE,
company_name varchar(50) NOT NULL UNIQUE,
phone varchar(20) NOT NULL check (phone ~ '^[0-9]+$')
);
and an example of a potential admin design
CREATE TABLE example.admins (
id serial primary_key,
admin_type varchar not null,
#... shared data
check constraint admin_type in ("super_admins", "regular_admins")
);
CREATE TABLE example.regular_admins (
id integer primary key,
admin_type varchar not null default "regular_admins"
organization_id integer references example.organizations(id),
#... other regular admin fields
foreign key (id, admin_type) references example.admins (id, admin_type),
check constraint admin_type = "regular_admins"
);
CREATE TABLE example.super_admins (
id integer primary key,
admin_type varchar not null default "super_admins"
#... other super admin fields
foreign key (id, admin_type) references example.admins (id, admin_type),
check constraint admin_type = "super_admins"
);
Now an audit table
CREATE TABLE audit.organizations (
audit_timestamp timestamp not null default now(),
operation text,
admin_id integer primary key,
before jsonb,
after jsonb,
);
This calls for inheritance or polymorphism at some level, but I'm curious about how to design it. I've heard that using PostgreSQL's inheritance functionality is not always a great way to go, although I'm finding it to fit this use case.
I'll need to be able to reference a single admin id in the trigger that updates the audit table, and it would be nice to be able to get the admin information when selecting from the audit table without using multiple queries.
Would it be better to use PostgreSQL inheritance or are there other ideas I haven't considered?
I wouldn't say that it calls for inheritance or polymorphism. Admins and superadmins are both types of user, whose only difference is that the former belong to an organization. You can represent this with a single table and a nullable foreign key. No need to overcomplicate matters. Especially if you're using a serial as your primary key type: bad things happen if you confuse admin #2 for superadmin #2.
If I need to extend mysql.sql a table inside the field, such as users table, the default has the following three fields
username varchar (250) PRIMARY KEY,
password text NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
If I need to add an extra field here, like this
username varchar (250) PRIMARY KEY,
password text NOT NULL,
sex tinyint NOT NULL, // note add extra fields here.
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
So I ejabberd, I do what the job?
Thank you!
You can simply alter your schema to add the extra needed field. It will not be used by ejabberd, but should not cause any issue.