Exclusion constraint on a bitstring column with bitwise AND operator - postgresql

I read about Exclusion Constraints in PostgreSQL but can't seem to find a way to use bitwise operators on bitstrings.
I have two columns (name text, value bit(8)). And I want to create a constraint that basically says this:
ADD CONSTRAINT route_method_overlap
EXCLUDE USING gist(name WITH =, value WITH &)
But this doesn't work since:
operator &(bit,bit) is not a member of operator family "gist_bit_ops"
I assume this is because the bit_ops operator & doesn't return boolean. But is there a way to do what I'm trying to do? Is there a way to coerce operator & to cast its return value as a boolean?
Currently using Postgres 9.1.4 with the "btree_gist" extension installed, all from the Ubuntu 12.04 repos. But the version doesn't matter. If there's fixes/updates upstream, I can install from the repos. I'm still in the design phase.

You installed the extension btree_gist. Without it, the example would already fail at name WITH =.
CREATE EXTENSION btree_gist;
The operator classes installed by btree_gist cover many operators. Unfortunately, the & operator is not among them. Obviously, because it does not return a boolean which would be expected of an operator to qualify.
Alternative solution
I would use a combination of a b-tree multi-column index (for speed) and a trigger instead. Consider this demo, tested on PostgreSQL 9.1:
CREATE TABLE t (
name text
, value bit(8)
);
INSERT INTO t VALUES ('a', B'10101010');
CREATE INDEX t_name_value_idx ON t (name, value);
CREATE OR REPLACE FUNCTION trg_t_name_value_inversion_prohibited()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF EXISTS (
SELECT FROM t
WHERE (name, value) = (NEW.name, ~ NEW.value) -- example: exclude inversion
) THEN
RAISE EXCEPTION 'Your text here!';
END IF;
RETURN NEW;
END
$func$;
CREATE TRIGGER insup_bef_t_name_value_inversion_prohibited
BEFORE INSERT OR UPDATE OF name, value -- only involved columns relevant!
ON t
FOR EACH ROW
EXECUTE FUNCTION trg_t_name_value_inversion_prohibited();
INSERT INTO t VALUES ('a', ~ B'10101010'); -- fails with your error msg.
In Postgres 10 or older use instead:
...
EXECUTE PROCEDURE trg_t_name_value_inversion_prohibited();
See:
Trigger function does not exist, but I am pretty sure it does
~ is the inversion operator.
The extension btree_gist is not required in this scenario.
I restricted the trigger to INSERT OR UPDATE OF relevant columns for efficiency.
A check constraint wouldn't work. I quote the manual on CREATE TABLE:
Currently, CHECK expressions cannot contain subqueries nor refer to
variables other than columns of the current row.
Bold emphasis mine.
Should perform very well, actually better than the exclusion constraint, because maintenance of a b-tree index is cheaper than a GiST index. And the look-up with basic = operators should be faster than hypothetical look-ups with the & operator.
This solution is not as bullet-proof as an exclusion constraint, because triggers can more easily be circumvented - in a subsequent trigger on the same event for instance, or if the trigger is disabled temporarily. Be prepared to run extra checks on the whole table if such conditions apply.
More complex condition
The example trigger only catches the inversion of value. As you clarified in your comment, you actually need a condition like this instead:
IF EXISTS (
SELECT FROM t
WHERE name = NEW.name
AND value & NEW.value <> B'00000000'::bit(8)
) THEN
This condition is slightly more expensive, but can still use an index. The multi-column index from above would work - if you have use for it anyway. Or, more efficiently, a simple index on name:
CREATE INDEX t_name_idx ON t (name);
You commented that there can only be a maximum of 8 distinct rows per name, fewer in practice. So this should still be fast.
Ultimate INSERT performance
If INSERT performance is paramount, especially if many attempted INSERTs fail the condition, you could do more: create a materialized view that pre-aggregated value per name:
CREATE TABLE mv_t AS
SELECT name, bit_or(value) AS value
FROM t
GROUP BY 1
ORDER BY 1;
name is guaranteed to be unique here. I'd use a PRIMARY KEY on name to provide the index we're after:
ALTER TABLE mv_t SET (FILLFACTOR=90);
ALTER TABLE mv_t
ADD CONSTRAINT mv_t_pkey PRIMARY KEY(name);
Then your INSERT could look like this:
WITH i(n,v) AS (SELECT 'a'::text, B'10101010'::bit(8))
INSERT INTO t (name, value)
SELECT n, v
FROM i
LEFT JOIN mv_t m ON m.name = i.n
AND m.value & i.v <> B'00000000'::bit(8)
WHERE m.n IS NULL; -- alternative syntax for EXISTS (...)
The fillfactor is only useful if your table gets a lot of updates.
Update rows in the materialized view in a TRIGGER AFTER INSERT OR UPDATE OF name, value OR DELETE to keep it current. Cost and gain of additional objects have to be weighed carefully. Largely depends on your typical load.

Related

Does Postgres support virtual columns? [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

How to check a sequence efficiently for used and unused values in PostgreSQL

In PostgreSQL (9.3) I have a table defined as:
CREATE TABLE charts
( recid serial NOT NULL,
groupid text NOT NULL,
chart_number integer NOT NULL,
"timestamp" timestamp without time zone NOT NULL DEFAULT now(),
modified timestamp without time zone NOT NULL DEFAULT now(),
donotsee boolean,
CONSTRAINT pk_charts PRIMARY KEY (recid),
CONSTRAINT chart_groupid UNIQUE (groupid),
CONSTRAINT charts_ichart_key UNIQUE (chart_number)
);
CREATE TRIGGER update_modified
BEFORE UPDATE ON charts
FOR EACH ROW EXECUTE PROCEDURE update_modified();
I would like to replace the chart_number with a sequence like:
CREATE SEQUENCE charts_chartnumber_seq START 16047;
So that by trigger or function, adding a new chart record automatically generates a new chart number in ascending order. However, no existing chart record can have its chart number changed and over the years there have been skips in the assigned chart numbers. Hence, before assigning a new chart number to a new chart record, I need to be sure that the "new" chart number has not yet been used and any chart record with a chart number is not assigned a different number.
How can this be done?
Consider not doing it. Read these related answers first:
Gap-less sequence where multiple transactions with multiple tables are involved
Compacting a sequence in PostgreSQL
If you still insist on filling in gaps, here is a rather efficient solution:
1. To avoid searching large parts of the table for the next missing chart_number, create a helper table with all current gaps once:
CREATE TABLE chart_gap AS
SELECT chart_number
FROM generate_series(1, (SELECT max(chart_number) - 1 -- max is no gap
FROM charts)) chart_number
LEFT JOIN charts c USING (chart_number)
WHERE c.chart_number IS NULL;
2. Set charts_chartnumber_seq to the current maximum and convert chart_number to an actual serial column:
SELECT setval('charts_chartnumber_seq', max(chart_number)) FROM charts;
ALTER TABLE charts
ALTER COLUMN chart_number SET NOT NULL
, ALTER COLUMN chart_number SET DEFAULT nextval('charts_chartnumber_seq');
ALTER SEQUENCE charts_chartnumber_seq OWNED BY charts.chart_number;
Details:
How to reset postgres' primary key sequence when it falls out of sync?
Safely and cleanly rename tables that use serial primary key columns in Postgres?
3. While chart_gap is not empty fetch the next chart_number from there.
To resolve possible race conditions with concurrent transactions, without making transactions wait, use advisory locks:
WITH sel AS (
SELECT chart_number, ... -- other input values
FROM chart_gap
WHERE pg_try_advisory_xact_lock(chart_number)
LIMIT 1
FOR UPDATE
)
, ins AS (
INSERT INTO charts (chart_number, ...) -- other target columns
TABLE sel
RETURNING chart_number
)
DELETE FROM chart_gap c
USING ins i
WHERE i.chart_number = c.chart_number;
Alternatively, Postgres 9.5 or later has the handy FOR UPDATE SKIP LOCKED to make this simpler and faster:
...
SELECT chart_number, ... -- other input values
FROM chart_gap
LIMIT 1
FOR UPDATE SKIP LOCKED
...
Detailed explanation:
Postgres UPDATE ... LIMIT 1
Check the result. Once all rows are filled in, this returns 0 rows affected. (you could check in plpgsql with IF NOT FOUND THEN ...). Then switch to a simple INSERT:
INSERT INTO charts (...) -- don't list chart_number
VALUES (...); -- don't provide chart_number
In PostgreSQL, a SEQUENCE ensures the two requirements you mention, that is:
No repeats
No changes once assigned
But because of how a SEQUENCE works (see manual), it can not ensure no-skips. Among others, the first two reasons that come to mind are:
How a SEQUENCE handles concurrent blocks with INSERTS (you could also add that the concept of Cache also makes this impossible)
Also, user triggered DELETEs are an uncontrollable aspect that a SEQUENCE can not handle by itself.
In both cases, if you still do not want skips, (and if you really know what you're doing) you should have a separate structure that assign IDs (instead of using SEQUENCE). Basically a system that has a list of 'assignable' IDs stored in a TABLE that has a function to pop out IDs in a FIFO way. That should allow you to control DELETEs etc.
But again, this should be attempted, only if you really know what you're doing! There's a reason why people don't do SEQUENCEs themselves. There are hard corner-cases (for e.g. concurrent INSERTs) and most probably you're over-engineering your problem case, that probably can be solved in a much better / cleaner way.
Sequence numbers usually have no meaning, so why worry? But if you really want this, then follow the below, cumbersome procedure. Note that it is not efficient; the only efficient option is to forget about the holes and use the sequence.
In order to avoid having to scan the charts table on every insert, you should scan the table once and store the unused chart_number values in a separate table:
CREATE TABLE charts_unused_chart_number AS
SELECT seq.unused
FROM (SELECT max(chart_number) FROM charts) mx,
generate_series(1, mx(max)) seq(unused)
LEFT JOIN charts ON charts.chart_number = seq.unused
WHERE charts.recid IS NULL;
The above query generates a contiguous series of numbers from 1 to the current maximum chart_number value, then LEFT JOINs the charts table to it and find the records where there is no corresponding charts data, meaning that value of the series is unused as a chart_number.
Next you create a trigger that fires on an INSERT on the charts table. In the trigger function, pick a value from the table created in the step above:
CREATE FUNCTION pick_unused_chart_number() RETURNS trigger AS $$
BEGIN
-- Get an unused chart number
SELECT unused INTO NEW.chart_number FROM charts_unused_chart_number LIMIT 1;
-- If the table is empty, get one from the sequence
IF NOT FOUND THEN
NEW.chart_number := next_val(charts_chartnumber_seq);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tr_charts_cn
BEFORE INSERT ON charts
FOR EACH ROW EXECUTE PROCEDURE pick_unused_chart_number();
Easy. But the INSERT may fail because of some other trigger aborting the procedure or any other reason. So you need a check to ascertain that the chart_number was indeed inserted:
CREATE FUNCTION verify_chart_number() RETURNS trigger AS $$
BEGIN
-- If you get here, the INSERT was successful, so delete the chart_number
-- from the temporary table.
DELETE FROM charts_unused_chart_number WHERE unused = NEW.chart_number;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tr_charts_verify
AFTER INSERT ON charts
FOR EACH ROW EXECUTE PROCEDURE verify_chart_number();
At a certain point the table with unused chart numbers will be empty whereupon you can (1) ALTER TABLE charts to use the sequence instead of an integer for chart_number; (2) delete the two triggers; and (3) the table with unused chart numbers; all in a single transaction.
While what you want is possible, it can't be done using only a SEQUENCE and it requires an exclusive lock on the table, or a retry loop, to work.
You'll need to:
LOCK thetable IN EXCLUSIVE MODE
Find the first free ID by querying for the max id then doing a left join over generate_series to find the first free entry. If there is one.
If there is a free entry, insert it.
If there is no free entry, call nextval and return the result.
Performance will be absolutely horrible, and transactions will be serialized. There'll be no concurrency. Also, unless the LOCK is the first thing you run that affects that table, you'll face deadlocks that cause transaction aborts.
You can make this less bad by using an AFTER DELETE .. FOR EACH ROW trigger that keeps track of entries you delete by INSERTing them into a one-column table that keeps track of spare IDs. You can then SELECT the lowest ID from the table in your ID assignment function on the default for the column, avoiding the need for the explicit table lock, the left join on generate_series and the max call. Transactions will still be serialized on a lock on the free IDs table. In PostgreSQL you can even solve that using SELECT ... FOR UPDATE SKIP LOCKED. So if you're on 9.5 you can actually make this non-awful, though it'll still be slow.
I strongly advise you to just use a SEQUENCE directly, and not bother with re-using values.

Postgres Rules Preventing CTE Queries

Using Postgres 9.3:
I am attempting to automatically populate a table when an insert is performed on another table. This seems like a good use for rules, but after adding the rule to the first table, I am no longer able to perform inserts into the second table using the writable CTE. Here is an example:
CREATE TABLE foo (
id INT PRIMARY KEY
);
CREATE TABLE bar (
id INT PRIMARY KEY REFERENCES foo
);
CREATE RULE insertFoo AS ON INSERT TO foo DO INSERT INTO bar VALUES (NEW.id);
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
INSERT INTO foo SELECT * FROM a
When this is run, I get the error
"ERROR: WITH cannot be used in a query that is rewritten by rules
into multiple queries".
I have searched for that error string, but am only able to find links to the source code. I know that I can perform the above using row-level triggers instead, but it seems like I should be able to do this at the statement level. Why can I not use the writable CTE, when queries like this can (in this case) be easily re-written as:
INSERT INTO foo SELECT * FROM (VALUES (1), (2)) a
Does anyone know of another way that would accomplish what I am attempting to do other than 1) using rules, which prevents the use of "with" queries, or 2) using row-level triggers? Thanks,
        
TL;DR: use triggers, not rules.
Generally speaking, prefer triggers over rules, unless rules are absolutely necessary. (Which, in practice, they never are.)
Using rules introduces heaps of problems which will needlessly complicate your life down the road. You've run into one here. Another (major) one is, for instance, that the number of affected rows will correspond to that of the very last query -- if you're relying on FOUND somewhere and your query is incorrectly reporting that no rows were affected by a query, you'll be in for painful bugs.
Moreover, there's occasional talk of deprecating Postgres rules outright:
http://postgresql.nabble.com/Deprecating-RULES-td5727689.html
As the other answer I definitely recommend using INSTEAD OF triggers before RULEs.
However if for some reason you don't want to change existing VIEW RULEs and still want use WITH you can do so by wrapping the VIEW in a stored procedure:
create function insert_foo(int) returns void as $$
insert into foo values ($1)
$$ language sql;
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
SELECT insert_foo(a.column1) from a;
This could be useful when using some legacy db through some system that wraps statements with CTEs.

PostgreSQL, triggers, and concurrency to enforce a temporal key

I want to define a trigger in PostgreSQL to check that the inserted row, on a generic table, has the the property: "no other row exists with the same key in the same valid time" (the keys are sequenced keys). In fact, I has already implemented it. But since the trigger has to scan the entire table, now i'm wondering: is there a need for a table-level lock? Or this is managed someway by the PostgreSQL itself?
Here is an example.
In the upcoming PostgreSQL 9.0 I would have defined the table in this way:
CREATE TABLE medicinal_products
(
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
market_time PERIOD,
EXCLUDE USING gist
(aic_code CHECK WITH =,
market_time CHECK WITH &&)
);
but in fact I have been defined it like this:
CREATE TABLE medicinal_products
(
PRIMARY KEY (aic_code, vs),
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
vs DATE NOT NULL,
ve DATE,
CONSTRAINT valid_time_range
CHECK (ve > vs OR ve IS NULL)
);
Then, I have written a trigger that check the costraint: "two distinct medicinal products can have the same code in two different periods, but not in same time".
So the code:
INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
return an error.
One solution is to have a second table to use for detecting clashes, and populate that with a trigger. Using the schema you added into the question:
CREATE TABLE medicinal_product_date_map(
aic_code char(9) NOT NULL,
applicable_date date NOT NULL,
UNIQUE(aic_code, applicable_date));
(note: this is the second attempt due to misreading your requirement the first time round. hope it's right this time).
Some functions to maintain this table:
CREATE FUNCTION add_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
INSERT INTO medicinal_product_date_map
SELECT $1, $2 + offset
FROM generate_series(0, $3 - $2)
$$;
CREATE FUNCTION clr_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
DELETE FROM medicinal_product_date_map
WHERE aic_code = $1 AND applicable_date BETWEEN $2 AND $3
$$;
And populate the table first time with:
SELECT count(add_medicinal_product_date_range(aic_code, vs, ve))
FROM medicinal_products;
Now create triggers to populate the date map after changes to medicinal_products: after insert calls add_, after update calls clr_ (old values) and add_ (new values), after delete calls clr_.
CREATE FUNCTION sync_medicinal_product_date_map()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
IF TG_OP = 'UPDATE' OR TG_OP = 'DELETE' THEN
PERFORM clr_medicinal_product_date_range(OLD.aic_code, OLD.vs, OLD.ve);
END IF;
IF TG_OP = 'UPDATE' OR TG_OP = 'INSERT' THEN
PERFORM add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve);
END IF;
RETURN NULL;
END;
$$;
CREATE TRIGGER sync_date_map
AFTER INSERT OR UPDATE OR DELETE ON medicinal_products
FOR EACH ROW EXECUTE PROCEDURE sync_medicinal_product_date_map();
The uniqueness constraint on medicinal_product_date_map will trap any products being added with the same code on the same day:
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT 0 1
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
ERROR: duplicate key value violates unique constraint "medicinal_product_date_map_aic_code_applicable_date_key"
DETAIL: Key (aic_code, applicable_date)=(1 , 2010-03-01) already exists.
CONTEXT: SQL function "add_medicinal_product_date_range" statement 1
SQL statement "SELECT add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve)"
PL/pgSQL function "sync_medicinal_product_date_map" line 6 at PERFORM
This depends on the values being checked for having a discrete space- which is why I asked about dates vs timestamps. Although timestamps are, technically, discrete since Postgresql only stores microsecond-resolution, adding an entry to the map table for every microsecond the product is applicable for is not practical.
Having said that, you could probably also get away with something better than a full-table scan to check for overlapping timestamp intervals, with some trickery on looking for only the first interval not after or not before... however, for easy discrete spaces I prefer this approach which IME can also be handy for other things too (e.g. reports that need to quickly find which products are applicable on a certain day).
I also like this approach because it feels right to leverage the database's uniqueness-constraint mechanism this way. Also, I feel it will be more reliable in the context of concurrent updates to the master table: without locking the table against concurrent updates, it would be possible for a validation trigger to see no conflict and allow inserts in two concurrent sessions, that are then seen to conflict when both transaction's effects are visible.
Just a thought, in case the valid time blocks could be coded with a number or something, creating a UNIQUE index on Id+TimeBlock would be blazingly fast and resolve all table lock problems.
It is managed by PostgreSQL itself. On a select it acquires an ACCESS_SHARE lock which means that you can query the table but do not perform updates.
A radical solution which might help you is to use a cache like ehcache or memcached to store the id/timeblock info and not use the postgresql at all. Many can be persisted so they would survive a server restart and they do not exhibit this locking behavior.
Why can't you use a UNIQUE constraint? Will be much faster (it's an index) and easier.

how to emulate "insert ignore" and "on duplicate key update" (sql merge) with postgresql?

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.
What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?
With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):
INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")
9.5 brings support for "UPSERT" operations.
INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.
...
Further example of new syntax:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!
Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
WHERE EXISTS(SELECT 1 FROM my_table
WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";
Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.
For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;
For those of us who have an earlier version, this right join will work instead:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.
You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.
There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.
That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.
This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)
To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.
INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
('935',' Citroën Brazil','Citroën'),
('ABC', 'Toyota', 'Toyota'),
('ZOM',' OM','OM')
) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
WHERE NOT EXISTS (
--ignore anything that has already been inserted
SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
INSERT INTO mytable(col1,col2)
SELECT 'val1','val2'
WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
As #hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:
query = "INSERT INTO db_table_name(column_name)
VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"
The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table.
I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1.
Example:
q = 'ALTER id_column serial RESTART WITH 1'
If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial.
I hope this information is helpful to everyone.
*I have no college degree in software development/coding. Everything I know in coding, I study on my own.
Looks like PostgreSQL supports a schema object called a rule.
http://www.postgresql.org/docs/current/static/rules-update.html
You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.
I haven't tried this myself, so I can't speak from experience or offer an example.
This solution avoids using rules:
BEGIN
INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION
WHEN unique_violation THEN
UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;
but it has a performance drawback (see PostgreSQL.org):
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.
For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:
DO
$do$
BEGIN
PERFORM id
FROM whatever_table;
IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;