Is it possible to create a char column which is always lowercase? - postgresql

I want to create a table like:
create table project_types (
id char(20) not null unique default 'xxx'
};
To use it from other tables as:
create table other_table (
...
fk_ptype char(20),
fk_ptype_on_other_table" foreign key (fk_ptype) references project_type(id)
);
The catch is I'd want all values inserted into project_types to become automatically lowercase: I don't want to be making the conversion on each possible query, I want a table that no matter what I throw at it, it returns back lowercase tokens.
I'm thinking about making a trigger on insert an on update, but I'm wondering if there's a better way to impose such a restriction. Also, this solution means I'll have to make the conversion on deletions.
For the ones that may suggest that I do this with enums: the types are dynamic, so I prefer this approach.
UPDATE 2017.04.17: The idea of this question is not to put controls/transformations everywhere in the stack: if database can handle whatever you throw at it, then you don't have to 1. check/transform in the front-end, 2. check/transform in the back-end code, and finally 3. check/transform in the database. You just avoid doing 1 and 2 because you know database will handle whatever you throw at it and that you'll have correct data when you select from it.
I'm tempted to choose #herbert-pimentel answer, but it seems the same approach cannot bo used for delete (I tried setting and on-delete trigger using the same function but it didn't work).

how about a trigger with before insert or update to ensure/transform your data lower case;
CREATE OR REPLACE FUNCTION public.fun_trg_lowercase()
RETURNS trigger AS
$BODY$
begin
NEW.my_char_field = lowercase(NEW.my_char_field);
RETURN NEW;
end;
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER biu_lowercase_field
BEFORE INSERT OR UPDATE
ON mytable
FOR EACH ROW
EXECUTE PROCEDURE fun_trg_lowercase();

Check constraint:
create table project_types (
id char(20) not null unique default 'xxx'
check (id = lower(id))
);

You can use a special type of data for this purpose, called CITEXT (=case insensitive text). It is an additionally supplied module standard within PostgreSQL.
Citing the PostgreSQL documentation on CITEXT:
F.8.1. Rationale
The standard approach to doing case-insensitive matches in PostgreSQL has been to use the lower function when comparing values, for example
SELECT * FROM tab WHERE lower(col) = LOWER(?);
This works reasonably well, but has a number of drawbacks:
It makes your SQL statements verbose, and you always have to remember to use lower on both the column and the query value.
It won't use an index, unless you create a functional index using lower.
If you declare a column as UNIQUE or PRIMARY KEY, the implicitly generated index is case-sensitive. So it's useless for case-insensitive searches, and it won't enforce uniqueness case-insensitively.
The citext data type allows you to eliminate calls to lower in SQL queries, and allows a primary key to be case-insensitive. citext is locale-aware, just like text, which means that the matching of upper case and lower case characters is dependent on the rules of the database's LC_CTYPE setting. Again, this behavior is identical to the use of lower in queries. But because it's done transparently by the data type, you don't have to remember to do anything special in your queries.
So, in your specific case, you just would need to do:
One time:
CREATE EXTENSION citext ;
CREATE TABLE project_types
(
id citext PRIMARY KEY default 'xxx'
);
CREATE TABLE other_table
(
/* ... */
fk_ptype citext,
fk_ptype_on_other_table foreign key (fk_ptype) references project_type(id)
);
... and then, do nothing to your queries. Don't have any extra constraints and don't have any (apparently feared) trigger.

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

Getting error for auto increment fields when inserting records without specifying columns

We're in process of converting over from SQL Server to Postgres. I have a scenario that I am trying to accommodate. It involves inserting records from one table into another, WITHOUT listing out all of the columns. I realize this is not recommended practice, but let's set that aside for now.
drop table if exists pk_test_table;
create table public.pk_test_table
(
recordid SERIAL PRIMARY KEY NOT NULL,
name text
);
--example 1: works and will insert a record with an id of 1
insert into pk_test_table values(default,'puppies');
--example 2: fails
insert into pk_test_table
select first_name from person_test;
Error I receive in the second example:
column "recordid" is of type integer but expression is of type
character varying Hint: You will need to rewrite or cast the
expression.
The default keyword will tell the database to grab the next value.
Is there any way to utilize this keyword in the second example? Or some way to tell the database to ignore auto-incremented columns and just them be populated like normal?
I would prefer to not use a subquery to grab the next "id".
This functionality works in SQL Server and hence the question.
Thanks in advance for your help!
If you can't list column names, you should instead use the DEFAULT keyword, as you've done in the simple insert example. This won't work with a in insert into ... select ....
For that, you need to invoke nextval. A subquery is not required, just:
insert into pk_test_table
select nextval('pk_test_table_id_seq'), first_name from person_test;
You do need to know the sequence name. You could get that from information_schema based on the table name and inferring its primary key, using a function that takes just the table name as an argument. It'd be ugly, but it'd work. I don't think there's any way around needing to know the table name.
You're inserting value into the first column, but you need to add a value in the second position.
Therefore you can use INSERT INTO table(field) VALUES(value) syntax.
Since you need to fetch values from another table, you have to remove VALUES and put the subquery there.
insert into pk_test_table(name)
select first_name from person_test;
I hope it helps
I do it this way via a separate function- though I think I'm getting around the issue via the table level having the DEFAULT settings on a per field basis.
create table public.pk_test_table
(
recordid integer NOT NULL DEFAULT nextval('pk_test_table_id_seq'),
name text,
field3 integer NOT NULL DEFAULT 64,
null_field_if_not_set integer,
CONSTRAINT pk_test_table_pkey PRIMARY KEY ("recordid")
);
With function:
CREATE OR REPLACE FUNCTION func_pk_test_table() RETURNS void AS
$BODY$
INSERT INTO pk_test_table (name)
SELECT first_name FROM person_test;
$BODY$
LANGUAGE sql VOLATILE;
Then just execute the function via a SELECT FROM func_pk_test_table();
Notice it hasn't had to specify all the fields- as long as constraints allow it.

Manipulate rows automatically before the `INSERT` statement

I'm looking for a way to manipulate rows automatically before adding them to a table in postgreSQL. Say for instance we have the following table:
CREATE TABLE foo (
id serial NOT NULL,
value integer NOT NULL,
CONSTRAINT "Foo_pkey" PRIMARY KEY (id),
CONSTRAINT "Foo_value_check" CHECK (value >= 0)
)
Now one can insert rows:
INSERT INTO foo (id,value) VALUES ('0','2')
And when one enters:
INSERT INTO foo (id,value) VALUES ('1','-2')
An error will occur. Is it possible to define a "rewrite rule" that given the value column contains a value less than zero, zero is used (for instance)?
Yes, it is possible. One way is to use triggers. A trigger causes a procedure to be run on particular actions, which can allow you to modify the data to be inserted (amongst other things).
To set up a trigger, you first create a function that will perform the checks and modifications you want. The variable new in your function will be implicitly declared and contain the new row to be inserted / updated so you can check and modify the values before they reach the table.
You then specify that this function is to be called before insert or update on one or more tables.
Example:
CREATE FUNCTION validate_foo_row()
RETURNS TRIGGER AS $$
BEGIN
IF new.value<0 THEN
new.value=0;
END IF;
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER trig_validate_foo BEFORE INSERT ON foo
FOR EACH ROW EXECUTE PROCEDURE validate_foo_row();
SqlFiddle Here
The above simplistic example only triggers for inserts, you might want to have it trigger for updates as well.
You can read more about triggers in the postgresql manual. They are powerful and are capable of a lot more than this simple example shows.

PostgreSQL: How to revalidate CHECKs

My database has the following structure:
CREATE TYPE instrument_type AS ENUM (
'Stock',
...
'Currency',
...
);
CREATE FUNCTION get_instrument_type(instrument_id bigint) RETURNS instrument_type
LANGUAGE plpgsql STABLE RETURNS NULL ON NULL INPUT
AS $$
BEGIN
RETURN (SELECT instr_type FROM instruments WHERE id = instrument_id);
END
$$;
CREATE TABLE instruments (
id bigserial PRIMARY KEY,
instr_type instrument_type NOT NULL,
...
);
CREATE TABLE countries_currencies (
...
curr bigint NOT NULL
REFERENCES instruments (id)
ON UPDATE CASCADE ON DELETE CASCADE
CHECK (get_instrument_type(curr) = 'Currency'),
...
);
As you can see, I use one common table for instruments. There are a lot of foreign keys referencing to that table. But some tables like countries_currencies require that referenced item is 'Currency'. Since I can't use subqueries in CHECK constraints, I have to use function.
One day it could happen that one bad man will change instrument_type from 'Currency' to something else. If there is a row in table countries_currencies, referencing to modified instrument, CHECK will become invalid for this row. But CHECK will be applied to new rows, not for already existing.
Is there any standard way to revalidate CHECKs? I want to run such procedure as a part of general data integrity test.
P.S. I know, I could write trigger on table instruments and forbid change if something could become broken. But it requires assurance that I check all referencing tables and their constraints, so it is error prone anyway.
You could simply update all rows in place to trigger the CHECK:
UPDATE countries_currencies SET curr = curr;

PostgreSQL ON INSERT CASCADE

I've got two tables - one is Product and one is ProductSearchResult.
Whenever someone tries to Insert a SearchResult with a product that is not listed in the Product table the foreign key constrain is violattet, hence i get an error.
I would like to know how i could get my database to automatically create that missing Product in the Product Table (Just the ProductID, all other attributes can be left blank)
Is there such thing as CASCADE ON INSERT? If there is, i was not able not get it working.
Rules are getting executed after the Insert, so because we get an Error beforehand there are useless if you USE an "DO ALSO". If you use "DO INSTEAD" and add the INSERT Command at the End you end up with endless recursion.
I reckon a Trigger is the way to go - but all my attempts to write one failed.
Any recommendations?
The Table Structure:
CREATE TABLE Product (
ID char(10) PRIMARY KEY,
Title varchar(150),
Manufacturer varchar(80),
Category smallint,
FOREIGN KEY(Category) REFERENCES Category(ID) ON DELETE CASCADE);
CREATE TABLE ProductSearchResult (
SearchTermID smallint NOT NULL,
ProductID char(10) NOT NULL,
DateFirstListed date NOT NULL DEFAULT current_date,
DateLastListed date NOT NULL DEFAULT current_date,
PRIMARY KEY (SearchTermID,ProductID),
FOREIGN KEY (SearchTermID) REFERENCES SearchTerm(ID) ON DELETE CASCADE,
FOREIGN KEY (ProductID) REFERENCES Product ON DELETE CASCADE);
Yes, triggers are the way to go. But before you can start to use triggers in plpgsql, you
have to enable the language. As user postgres, run the command createlang with the proper parameters.
Once you've done that, you have to
Write function in plpgsql
create a trigger to invoke that function
See example 39-3 for a basic example.
Note that a function body in Postgres is a string, with a special quoting mechanism: 2 dollar signs with an optional word in between them, as the quotes. (The word allows you to quote other similar quotes.)
Also note that you can reuse a trigger procedure for multiple tables, as long as they have the columns your procedure uses.
So the function has to
check if the value of NEW.ProductID exists in the ProductSearchResult table, with a select statement (you ought to be able to use SELECT count(*) ... INTO someint, or SELECT EXISTS(...) INTO somebool)
if not, insert a new row in that table
If you still get stuck, come back here.
In any case (rules OR triggers) the insert needs to create a new key (and new values for the attributes) in the products table. In most cases, this implies that a (serial,sequence) surrogate primary key should be used in the products table, and that the "real world" product_id ("product number") should default to NULL, and be degraded to a candidate key.
BTW: a rule can be used, rules just are tricky to implement correctly for N:1 relations (they need the same kind of EXISTS-logic as in Bart's answer above).
Maybe cascading on INSERT is not such a good idea after all. What do you want to happen if someone inserts a ProductSearchResult record for a not-existing product? [IMO a FK is always a domain; you cannot just extend a domain just by referring to a not-existant value for it; that would make the FK constraint meaningless]