How can I use a WHERE BETWEEN clause in an INSERT query? - postgresql

I have the following table.
CREATE TABLE public.ad
(
id integer NOT NULL DEFAULT nextval('ad_id_seq'::regclass),
uuid uuid NOT NULL DEFAULT uuid_generate_v4(),
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
cmdb_id integer,
platform character varying(100),
bidfloor numeric(15,6),
views integer NOT NULL DEFAULT 1,
year integer,
month integer,
day integer,
CONSTRAINT ad_pkey PRIMARY KEY (id),
CONSTRAINT ad_cmdb_id_foreign FOREIGN KEY (cmdb_id)
REFERENCES public.cmdb (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT ad_id_unique UNIQUE (uuid)
)
WITH (
OIDS=FALSE
);
Without going in too much detail, this table logs all the requests and impressions of advertisements on electronic screens throughout the country. This table is also being used to generate reports and consists of +- 50 million records.
Currently, the reports are filtered on the created_at timestamp. You can imagine that with +- 50 million records the query will get slow, even with an index on the created_at column. The reports are generated by selecting between which dates you want to request the data on the UI of the system.
The year, month and day columns are new columns that I just added to make the reporting more efficient. Instead of indexing on the date, I want the system to index on a year, month and day, all separate values.
The newly added columns are still empty. I want to run a query that inserts a value where the created_at column is between two dates. For example:
INSERT INTO ad (year) VALUES (2016) WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-12-31 23:59:59';
This doesn't work of course. I cannot seem to find anything on the internet where an INSERT statement makes use of a WHERE BETWEEN clause. I also tried using subqueries and the WITH clausule to generate a series of years between 2012 and 2020 using generate_series. It all didn't work out.

You don't want to insert new rows, you should update your table.
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE column_name BETWEEN value1 AND value2;
Otherwise you'll have 100 milions rows ;)

Related

ERROR: could not create exclusion constraint

I have a table:
CREATE TABLE attendances
(
id_attendance serial PRIMARY KEY,
id_user integer NOT NULL
REFERENCES users (user_id) ON UPDATE CASCADE ON DELETE CASCADE,
entry_date timestamp with time zone DEFAULT NULL,
departure_date timestamp with time zone DEFAULT NULL,
created_at timestamp with time zone DEFAULT current_timestamp
);
I want to add an exclusion constraint avoiding attendance to overlap (There can be multiple rows for the same day, but time ranges cannot overlap).
So I wrote this code to add the constraint:
ALTER TABLE attendances
ADD CONSTRAINT check_attendance_overlaps
EXCLUDE USING GIST (box(
point(
extract(epoch from entry_date at time zone 'UTC'),
id_user
),
point(
extract(epoch from departure_date at time zone 'UTC') - 0.5,
id_user + 0.5
)
)
WITH && );
But when I tried to run it on the database I got this error:
Error: could not create exclusion constraint "check_attendance_overlaps"
To exclude overlapping time ranges per user, work with a multicolumn constraint on id_user and a timestamptz range (tstzrange).
You need the additional module btree_gist once per database:
CREATE EXTENSION IF NOT EXISTS btree_gist;
Then:
ALTER TABLE attendances ADD CONSTRAINT check_attendance_overlaps
EXCLUDE USING gist (id_user WITH =
, tstzrange(entry_date, departure_date) WITH &&)
See:
Store the day of the week and time?
Postgres constraint for unique datetime range
Or maybe spgist instead of gist. Might be faster. See:
Perform this hours of operation query in PostgreSQL
Of course, there cannot be overlapping rows in the table, or adding the constraint will fail with a similar error message.

Multicolum index vs singel column index for time series data in Postgres

This table started out at short term storage for meter data before it was going to be validated and added to some long term storage tables.
Turns out the clients wants to keep this data for a long time since we saved it and it is growing fast.
create table metering_meterreading
(
id bigserial not null. # Primary Key
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
meter_device_id uuid not null, # FK to meter_device, BTREE index
series_id uuid not null # FK to series, BTREE index
organization_id uuid not null. # FK to org , BTREE index
);
I am planning on dropping the primary key since (org_id, meter_device_id, series_id, timestamp) makes it unique. It was just added by my ORM (django) and I didn't care when we started.
But since I pretty much always want to filter in organization, meter_device, and series to get a range of time series data I am wondering if it would be more efficient to have a multicolumn index on (organization_id, meter_device_id, series_id, timestamp) instead of the separate indexes.
I read somewhere that if I had a range it should be the rightmost in the index.
This is still not an super efficient table for timeseries data, since it will grow large, but I am planning in fixing that by partitioning on range, or maybe even use Timescale. But before partitioning I would like it to be as efficient as possible to look up data in it.
I also saw an example somewhere that used a separate table to identify the metric:
create table metric
(
id
organization_id
meter_device_id
series_id
) UNIQE (organization_id, meter_device_id, series_id)
;
create table metering_meterreading
(
metric_id. bigserial, FK to metric, BTREE index
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
);
But I am not sure if that is actually better than just putting them all in table. It might impact ingestion rate since there is another table involved now.
If (org_id, meter_device_id, series_id, timestamp) uniquely determine a table row, you need to use a multi-column primary key over all of them. So you automatically have a 4-column index on these columns. Just make sure that timestamp is last in the list, then that index will support your query ideally.

Use index to speed up query using values from different tables

I have a table products, a table orders and a table orderProducts.
Products have a name as a PK (apple, banana, mango) and a price .
orders have a created_at date and an id as a PK.
orderProducts connects orders and products, so they have a product_name and an order_id. Now I would like to show all orders for a given product that happened in the last 24 hours.
I use the following query:
SELECT
orders.id,
orders.created_at,
products.name,
products.price
FROM
orderProducts
JOIN products ON
products.name=orderProducts.product
JOIN orders ON
orders.id=orderProducts.order
WHERE
products.name='banana'
AND
orders.created_at BETWEEN NOW() - INTERVAL '24 HOURS' AND NOW()
ORDER BY
orders.created_at
This works, but I would like to optimize this query with an index. This index would need to first be ordered by
the product name, so it can be filtered
then the created_at of the order in descending order, so it can select only the ones from 24 hours ago
The problem is, that from what I have seen, indexes can only be created on a single table, without the possibility of joining another tables values to it. Since two individual index do not solve this problem either, I was wondering if there was an alternative way to optimize this particular query.
Here are the table scripts:
CREATE TABLE products
(
name text PRIMARY KEY,
price integer,
)
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
)
CREATE TABLE orderProducts
(
product text REFERENCES products(name),
"order" integer REFERENCES orders(id),
)
First of all. Please do not put indices everywhere - that lead to slower changing operations...
As proposed by #Laurenz Albe - do not guess - check.
Other than that. Note that you know product name, price is repeated - so you can query that once. Question if in your case two queries are going to be faster then single one... Check that.
Please read docs. I would try this index:
create index orders_id_created_at on orders(created_at desc, id)
Normally id should go first, since that is unique, however here system should be able to filter out on both predicates - where/join. Just guessing here.
orderProducts I would like to see index on both columns, however for this query only one should be needed. In practice you are going from products to orders, or other way - both paths are possible, that is why I've wrote about indexing both columns. I would use two separate indexes:
create index orderproducts_product_id on orderproducts (product_id) include (order_id);
create index orderproducts_order_id on orderproducts (order_id) include (product_id);
Probably that is not changing much, but... idea is to use only index, but not the table itself.
These rules are important in terms of performance:
Integer index faster than string index, therefore, you should try to make the primary keys always be an integer. Because join the tables uses primary keys too.
If when in where clauses always use two fields then we must create an index for both fields.
Foreign-Keys are not indexed, you must create an index for foreign-key fields manually.
So, recommended table scripts will be are that:
CREATE TABLE products
(
id serial primary key,
name text,
price integer
);
CREATE UNIQUE INDEX products_name_idx ON products USING btree (name);
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX orders_created_at_idx ON orders USING btree (created_at);
CREATE TABLE orderProducts
(
product_id integer REFERENCES products(id),
order_id integer REFERENCES orders(id)
);
CREATE INDEX orderproducts_product_id_idx ON orderproducts USING btree (product_id, order_id);
---- OR ----
CREATE INDEX orderproducts_product_id ON orderproducts (product_id);
CREATE INDEX orderproducts_order_id ON orderproducts (order_id);

Applying unique constraint of date on TIMESTAMP column in postgresql

I have a postgresql table as
CREATE TABLE IF NOT EXISTS table_name
(
expiry_date DATE NOT NULL,
created_at TIMESTAMP with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP(0),
CONSTRAINT user_review_uniq_key UNIQUE (expiry_date, created_at::date) -- my wrong attempt of using ::
)
I want to put uniue constraint on this table in such a way that expiry_date and date of created_at should be unique. Problem is created_at column is timestamp not date.
so is there any way to put unique constraint such that expire_date and created_at::date should be unique?
My attempt was to use
CONSTRAINT user_review_uniq_key UNIQUE (expiry_date, created_at::date) which is not valid.
If you do not need a time zone for your created date : create a unique index has follows :
create unique index idx_user_review_uniq_key on table_name (expiry_date, cast(created_at as date));
If you need that badly to have a time zone then you need to use a little trick (https://gist.github.com/cobusc/5875282) :
create unique index idx_user_review_uniq_key on table_name (expiry_date, date(created_at at TIME zone 'UTC'));

DB2 Table with CAST or SUBSTR of TIME as part of the Primary Key?

I am trying to create a table, which could be updated very often by different apps, but I only care about the last entry for each hour regardless of what updated it. With that in mind, I was thinking that using the HOUR(CURRENT TIME) would be part of the primary key (just a part - not the whole thing). This could be anything like the following (if they worked):
NON-working sample
create table foo (
col1 varchar(10),
col2 varchar(10),
lastdate date not null with default,
lasthour varchar(2) not null with default cast(hour(current time) as varchar(2))
);
or
if I create lastvalue as: lastvalue time not null with default
create unique index foo_inx on foo (col1, col2, lastdate, hour(lastvalue));
I hope that's a reasonably interesting problem. :-)
Edit: The ddl does not work. I've added lastdate to the unique index.