Prevent date overlap postgresql - postgresql

Is there a way to add a constraint to a postgresql table to prevent dates overlapping? For example, I have a table called workouts that has date columns week_start, week_end. I want to make sure that none of the week_start - week_end ranges overlaps with any existing ranges. HOWEVER, the end date of week_start can overlap with the start date of week_end.
Can someone help?
Thanks in advance!

You can do this with an exclusion constraint, using the overlap operator (&&) for the daterange type:
CREATE TABLE workouts (
week_start DATE,
week_end DATE,
EXCLUDE USING gist (daterange(week_start, week_end) WITH &&)
)

You can add an EXCLUDE table constraint to your table definition and then work with ranges to detect overlaps. This would work really nice if you can change your table definition to turn columns week_start and week_end into a single range, say weeks.
CREATE TABLE workouts (
...
weeks intrange
EXCLUDE USING gist (weeks WITH &&)
);

Related

PostgreSQL indexed columns choice

I have these two tables :
CREATE TABLE ref_dates(
date_id SERIAL PRIMARY KEY,
month int NOT NULL,
year int NOT NULL,
month_name CHAR(255)
);
CREATE TABLE util_kpi(
kpi_id SERIAL PRIMARY KEY,
kpi_description int NOT NULL,
kpi_value float,
date_id int NOT NULL,
dInsertion timestamp default CURRENT_TIMESTAMP,
CONSTRAINT fk_ref_kpi FOREIGN KEY (date_id) REFERENCES ref_dates(date_id)
);
Usually, the type of request i'd do is :
Selecting kpi_description and kpi_value for a specified month and year:
SELECT kpi_description, kpi_value FROM util_kpi u JOIN ref_dates r ON u.date_id = r.date_id WHERE month=X AND year=XXXX
Selecting kpi_description and kpi_value for a specified kpi_description, month and year:
SELECT kpi_description, kpi_value FROM util_kpi u JOIN ref_dates r ON u.date_id = r.date_id WHERE month=X AND year=XXXX AND kpi_description='XXXXXXXXXXX'
I tought about creating these indexes :
CREATE INDEX idx_ref_date_year_month ON ref_dates(year, month);
CREATE INDEX idx_util_kpi_date ON util_kpi(date_id);
First of all, i want to know if it's a good idea to create these indexes.
Second of all and finally, I was wondering if it's a good idea to add kpi_description to the indexes on util_kpi table.
Can you guys give me your opinion ?
Regards
It's not possible to give exact answer without looking on data.
So it's only possible to give an opinion.
A. ref_dates
This table looks very similar to date dimension in ROLAP-schemas.
So the first what I would do: is change date_id from SERIAL to:
DATE datatype
or even "smart integer": integer datatype but in form YYYYMMDD. E.g. 20210430. It may look strange but it's not uncommon to see such identificators in date dimensions
The main point in using such form is that date_id in fact tables became informative even without joining to date dimension.
B. util_kpi
I suppose that:
ref_dates is a date dimension. So it's ~365 * number of years rows. It could be populated once for 20-30 years for future and it's still will not be really big
util_kpi is fact table. Which must be big like "really big" - millions and more records.
For `util_kpi' I expected id of time dimension but did not found it. So no hourly stats are supposed yet.
I see util_kpi.dInsertion - which I suppose is planned to be used as time dimension. I would think to extract it into time_id where put hours, minutes and seconds (if milliseconds are not needed).
C.Indexing
ref_dates: it does not matters a lot how you index ref_dates because it's a relatively small table. Maybe unique index on date_id with INCLUDE options for all fields would be the best. Don't create individual index for fields with low selectivity like year or month - it will not make much sense but it will not harm a lot too.
util_kpi - you need an index on date_id (as for any foreign keys to other dimension tables that will appear in future).
That's my thoughts that based on what I supposed.

How do I make a simple day dimension table for data warehousing star schema with postgresql?

How would I go about creating and populating a simple DAY dimension table for a star schema in postgreSQL ?
It is for an intro course to data warehousing and so it only has a few fields but most of the examples online are very involved and seem very complicated for a beginner. This isn't for an assignment - it is for studying because I am trying to make my own simple Star Schema with a fact table so I can start getting comfortable with it.
Can anyone give me a simple example of how I'd create the table with just a few fields (day_key as the surrogate key, a string describing the day, and some integer values representing the days or months for example) so I can at least get started on understanding?
A very simple DAY dimension table that should work for most versions of PostgreSQL (I am using 10.5). This is just something that should help someone newer to Data Warehousing make a basic day dimension for use when just getting started.
Create a Day Table
CREATE TABLE day (
day_key SERIAL PRIMARY KEY, -- SERIAL is an integer that will auto-increment as new rows added
description VARCHAR(40), -- a 'string' for a description
full_date DATE, -- an actual date type
month_number INTEGER,
month_name VARCHAR(40),
year INTEGER
);
Inserting Rows into the Day dimension
INSERT INTO day(description, full_date, month_number, month_name, year)
SELECT
to_char(days.d, 'FMMonth DD, YYYY'),
days.d::DATE,
to_char(days.d, 'MM')::integer,
to_char(days.d, 'FMMonth'),
to_char(days.d, 'YYYY')::integer
from (
SELECT generate_series(
('2019-01-01')::date, -- 'start' date
('2019-12-31')::date, -- 'end' date
interval '1 day' -- one for each day between the start and day
)) as days(d);
Result
Notes:
Basically you are just using the rows generated by the nested SELECT generate_series(... to insert into the Day table.
I used the FM above twice to remove some of the white space padding automatically generated in some of these date formatting.
I'd recommend removing the INSERT INTO day(...) line the first time you do this just to make sure the format of each column is what you're after before inserting it into your table.
This is just what I've seen commonly used - check the PostgreSQL documentation has some more thorough and good examples of more ways to format date types and get all kinds of useful dimensions.

Iterate through every row and compare to table in PostgreSQL

I have a large data table in PostgreSQL containing, among other information, a column for start date and end date. For each row, I am hoping to tally the number of other rows in the same table which contain an overlapping date range and create a new table with these values.
To calculate overlapping date range, I had been planning to use:
where (start_date between [target_start_date] and [target_end_date]) or
(end_date between [target_start_date] and [target_end_date]))
But I have no idea how to do this for each row while comparing to its own table and tallying a count of times this condition is met.
Any ideas would be greatly appreciated, thank you!
One possibility is to use a subquery and the OVERLAPS operator.
SELECT t1.*,
(SELECT count(*)
FROM elbat t2
WHERE (t2.target_start_date,
t2.target_end_date) OVERLAPS (t1.target_start_date,
t1.target_end_date))
FROM elbat t1;

Exclusion constraint that allows overlapping at the boundaries

I tried to have a PostgreSQL constraint so that there will be no overlap between two date intervals. My requirement is that the date c_from for one entry can be the same as c_until for another date.
Eg: "01/12/2019 12/12/2019" and "12/12/2019 31/21/2019" are still date ranges that do not conflict. I have "[]" in my query but it seems not to work.
user_no INTEGER NOT NULL REFERENCES usr,
c_from DATE DEFAULT NOW(),
c_until DATE DEFAULT 'INFINITY',
CONSTRAINT unique_user_per_daterange EXCLUDE USING gist (user_no WITH =, daterange(c_from, c_until, '[]') WITH && )
When I have the date range above, I get this error:
(psycopg2.IntegrityError) conflicting key value violates exclusion constraint "unique_user_per_daterange"
Could you please help?
Use ranges that do not include one of the ends:
daterange(c_from, c_until, '[)')
Then they won't conflict, even if one interval ends at the same point where another begins.

Date Table/Dimension Querying and Indexes

I'm creating a robust date table want to know the best way to link to it. The Primary Key Clustered Index will be on the smart date integer key (per Kimball spec) with a name of DateID. Until now I have been running queries against it like so:
select Foo.orderdate -- a bunch of fields from Foo
,DTE.FiscalYearName
,DTE.FiscalPeriod
,DTE.FiscalYearPeriod
,DTE.FiscalYearWeekName
,DTE.FiscalWeekName
FROM SomeTable Foo
INNER JOIN
DateDatabase.dbo.MyDateTable DTE
ON DTE.date = CAST(FLOOR(CAST(Foo.forderdate AS FLOAT)) AS DATETIME)
Keep in mind that Date is a nonclustered index field with values such as:
2000-01-01 00:00:00.000
It just occured to me that since I have a clustered integer index (DATEID) that perhaps I should be converting the datetime in my database field to match it and linking based upon that field.
What do you folks think?
Also, depending on your first answer, if I am typically pulling those fields from the date table, what kind of index how can I optimize the retrieval of those fields? Covering index?
Even without changing the database structure, you'd get much better performance using a date range join like this:
select Foo.orderdate -- a bunch of fields from Foo
,DTE.FiscalYearName
,DTE.FiscalPeriod
,DTE.FiscalYearPeriod
,DTE.FiscalYearWeekName
,DTE.FiscalWeekName
FROM SomeTable Foo
INNER JOIN
DateDatabase.dbo.MyDateTable DTE
ON Foo.forderdate >= DTE.date AND Foo.forderdate < DATEADD(dd, 1, DTE.date)
However, if you can change it so that your Foo table includes a DateID field then, yes, you'd get the best performance by joining with that instead of any converted date value or date range.
If you change it to join on DateID and DateID is the first column of the clustered index of the MyDateTable then it's already covering (a clustered index always includes all other fields).