UNACCENT when checking for UNIQUE contraint violations in PostgreSQL - postgresql

We have a UNIQUE constraint on a table to prevent our city_name and state_id combinations from being duplicated. The problem we have found is that accents circumvent this.
Example:
"Montréal" "Quebec"
and
"Montreal" "Quebec"
We need a way to have the unique constraint run UNACCENT() and preferably wrap it in LOWER() as well for good measure. Is this possible?

You can create an immutable version of unaccent:
CREATE FUNCTION noaccent(text) RETURNS text
LANGUAGE sql IMMUTABLE STRICT AS
'SELECT unaccent(lower($1))';
and use that in a unique index on the column.
An alternative is to use a BEGORE INSERT OR UPDATE trigger that fills a new column with the unaccented value and put a unique constraint on that column.

You can create unique indexes on expressions, see the Postgres manual:
https://www.postgresql.org/docs/9.3/indexes-expressional.html
So in your case it could be something like
CREATE UNIQUE INDEX idx_foo ON my_table ( UNACCENT(LOWER(city_name)), state_id )

Related

Is it possible to create a char column which is always lowercase?

I want to create a table like:
create table project_types (
id char(20) not null unique default 'xxx'
};
To use it from other tables as:
create table other_table (
...
fk_ptype char(20),
fk_ptype_on_other_table" foreign key (fk_ptype) references project_type(id)
);
The catch is I'd want all values inserted into project_types to become automatically lowercase: I don't want to be making the conversion on each possible query, I want a table that no matter what I throw at it, it returns back lowercase tokens.
I'm thinking about making a trigger on insert an on update, but I'm wondering if there's a better way to impose such a restriction. Also, this solution means I'll have to make the conversion on deletions.
For the ones that may suggest that I do this with enums: the types are dynamic, so I prefer this approach.
UPDATE 2017.04.17: The idea of this question is not to put controls/transformations everywhere in the stack: if database can handle whatever you throw at it, then you don't have to 1. check/transform in the front-end, 2. check/transform in the back-end code, and finally 3. check/transform in the database. You just avoid doing 1 and 2 because you know database will handle whatever you throw at it and that you'll have correct data when you select from it.
I'm tempted to choose #herbert-pimentel answer, but it seems the same approach cannot bo used for delete (I tried setting and on-delete trigger using the same function but it didn't work).
how about a trigger with before insert or update to ensure/transform your data lower case;
CREATE OR REPLACE FUNCTION public.fun_trg_lowercase()
RETURNS trigger AS
$BODY$
begin
NEW.my_char_field = lowercase(NEW.my_char_field);
RETURN NEW;
end;
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER biu_lowercase_field
BEFORE INSERT OR UPDATE
ON mytable
FOR EACH ROW
EXECUTE PROCEDURE fun_trg_lowercase();
Check constraint:
create table project_types (
id char(20) not null unique default 'xxx'
check (id = lower(id))
);
You can use a special type of data for this purpose, called CITEXT (=case insensitive text). It is an additionally supplied module standard within PostgreSQL.
Citing the PostgreSQL documentation on CITEXT:
F.8.1. Rationale
The standard approach to doing case-insensitive matches in PostgreSQL has been to use the lower function when comparing values, for example
SELECT * FROM tab WHERE lower(col) = LOWER(?);
This works reasonably well, but has a number of drawbacks:
It makes your SQL statements verbose, and you always have to remember to use lower on both the column and the query value.
It won't use an index, unless you create a functional index using lower.
If you declare a column as UNIQUE or PRIMARY KEY, the implicitly generated index is case-sensitive. So it's useless for case-insensitive searches, and it won't enforce uniqueness case-insensitively.
The citext data type allows you to eliminate calls to lower in SQL queries, and allows a primary key to be case-insensitive. citext is locale-aware, just like text, which means that the matching of upper case and lower case characters is dependent on the rules of the database's LC_CTYPE setting. Again, this behavior is identical to the use of lower in queries. But because it's done transparently by the data type, you don't have to remember to do anything special in your queries.
So, in your specific case, you just would need to do:
One time:
CREATE EXTENSION citext ;
CREATE TABLE project_types
(
id citext PRIMARY KEY default 'xxx'
);
CREATE TABLE other_table
(
/* ... */
fk_ptype citext,
fk_ptype_on_other_table foreign key (fk_ptype) references project_type(id)
);
... and then, do nothing to your queries. Don't have any extra constraints and don't have any (apparently feared) trigger.

POSTGRESQL:autoincrement for varchar type field

I'm switching from MongoDB to PostgreSQL and was wondering how I can implement the same concept as used in MongoDB for uniquely identifying each raws by MongoId.
After migration, the already existing unique fields in our database is saved as character type. I am looking for minimum source code changes.
So if any way exist in postgresql for generating auto increment unique Id for each inserting into table.
The closest thing to MongoDB's ObjectId in PostgreSQL is the uuid type. Note that ObjectId has only 12 bytes, while UUIDs have 128 bits (16 bytes).
You can convert your existsing IDs by appending (or prepending) f.ex. '00000000' to them.
alter table some_table
alter id_column
type uuid
using (id_column || '00000000')::uuid;
Although it would be the best if you can do this while migrating the schema + data. If you can't do it during the migration, you need to update you IDs (while they are still varchars: this way the referenced columns will propagate the change), drop foreign keys, do the alter type and then re-apply foreign keys.
You can generate various UUIDs (for default values of the column) with the uuid-ossp module.
create extension "uuid-ossp";
alter table some_table
alter id_column
set default uuid_generate_v4();
Use a sequence as a default for the column:
create sequence some_id_sequence
start with 100000
owned by some_table.id_column;
The start with should be bigger then your current maximum number.
Then use that sequence as a default for your column:
alter table some_table
alter id_column set default nextval('some_id_sequence')::text;
The better solution would be to change the column to an integer column. Storing numbers in a text (or varchar) column is a really bad idea.

Getting error for auto increment fields when inserting records without specifying columns

We're in process of converting over from SQL Server to Postgres. I have a scenario that I am trying to accommodate. It involves inserting records from one table into another, WITHOUT listing out all of the columns. I realize this is not recommended practice, but let's set that aside for now.
drop table if exists pk_test_table;
create table public.pk_test_table
(
recordid SERIAL PRIMARY KEY NOT NULL,
name text
);
--example 1: works and will insert a record with an id of 1
insert into pk_test_table values(default,'puppies');
--example 2: fails
insert into pk_test_table
select first_name from person_test;
Error I receive in the second example:
column "recordid" is of type integer but expression is of type
character varying Hint: You will need to rewrite or cast the
expression.
The default keyword will tell the database to grab the next value.
Is there any way to utilize this keyword in the second example? Or some way to tell the database to ignore auto-incremented columns and just them be populated like normal?
I would prefer to not use a subquery to grab the next "id".
This functionality works in SQL Server and hence the question.
Thanks in advance for your help!
If you can't list column names, you should instead use the DEFAULT keyword, as you've done in the simple insert example. This won't work with a in insert into ... select ....
For that, you need to invoke nextval. A subquery is not required, just:
insert into pk_test_table
select nextval('pk_test_table_id_seq'), first_name from person_test;
You do need to know the sequence name. You could get that from information_schema based on the table name and inferring its primary key, using a function that takes just the table name as an argument. It'd be ugly, but it'd work. I don't think there's any way around needing to know the table name.
You're inserting value into the first column, but you need to add a value in the second position.
Therefore you can use INSERT INTO table(field) VALUES(value) syntax.
Since you need to fetch values from another table, you have to remove VALUES and put the subquery there.
insert into pk_test_table(name)
select first_name from person_test;
I hope it helps
I do it this way via a separate function- though I think I'm getting around the issue via the table level having the DEFAULT settings on a per field basis.
create table public.pk_test_table
(
recordid integer NOT NULL DEFAULT nextval('pk_test_table_id_seq'),
name text,
field3 integer NOT NULL DEFAULT 64,
null_field_if_not_set integer,
CONSTRAINT pk_test_table_pkey PRIMARY KEY ("recordid")
);
With function:
CREATE OR REPLACE FUNCTION func_pk_test_table() RETURNS void AS
$BODY$
INSERT INTO pk_test_table (name)
SELECT first_name FROM person_test;
$BODY$
LANGUAGE sql VOLATILE;
Then just execute the function via a SELECT FROM func_pk_test_table();
Notice it hasn't had to specify all the fields- as long as constraints allow it.

Using TSQL, CAST() with COLLATE is non-deterministic. How to make it deterministic? What is the work-around?

I have a function that includes:
SELECT #pString = CAST(#pString AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS
This is useful, for example, to remove accents in french; for example:
UPPER(CAST('Éléctricité' AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS)
gives ELECTRICITE.
But using COLLATE makes the function non-deterministic and therefore I cannot use it as a computed persisted value in a column.
Q1. Is there another (quick and easy) way to remove accents like this, with a deterministic function?
Q2. (Bonus Question) The reason I do this computed persisted column is to search. For example the user may enter the customer's last name as either 'Gagne' or 'Gagné' or 'GAGNE' or 'GAGNÉ' and the app will find it using the persisted computed column. Is there a better way to do this?
EDIT: Using SQL Server 2012 and SQL-Azure.
You will find that it is in fact deterministic, it just has different behavior depending on the character you're trying to collate.
Check the page for Windows 1251 encoding for behavior on accepted characters, and unacceptable characters.
Here is a collation chart for Cyrillic_General_CI_AI. This is codepage 1251 Case Insensitive and Accent Insensitive. This will show you the mappings for all acceptable characters within this collation.
As for the search question, as Keith said, I would investigate putting a full text index on the column you are going to be searching on.
The best answer I got was from Sebastian Sajaroff. I used his example to fix the issue. He suggested a VIEW with a UNIQUE INDEX. This gives a good idea of the solution:
create table Test(Id int primary key, Name varchar(20))
create view TestCIAI with schemabinding as
select ID, Name collate SQL_Latin1_General_CP1_CI_AI as NameCIAI from Test
create unique clustered index ix_Unique on TestCIAI (Id)
create unique nonclustered index ix_DistinctNames on TestCIAI (NameCIAI)
insert into Test values (1, 'Sébastien')
--Insertion 2 will fail because of the unique nonclustered indexed on the view
--(which is case-insensitive, accent-insensitive)
insert into Test values (2, 'Sebastien')

How to create a case insensitive index or constraint in HSQL

How can I create a case insensitive index or constraint in HSQL running in PostgreSQL (;sql.syntax_pgs=true) mode?
In Postgres, it can be done with lower() or lcase():
CREATE UNIQUE INDEX lower_username_index ON enduser_table ((lcase(name)));
PostgreSQL also has the CITEXT datatype, but unfortunately it does not seem to be supported in HSQL.
I'm currently at HSQL 2.2.8 and PostgreSQL 9.0.5. Alternatively, other in-memory databases that might be a better fit for testing PostgreSQL DDL and SQL?
Thanks in advance!
With HSQLDB, it's better to define a UNIQUE constraint, rather than a unique index.
There are two ways of achieving your aim:
Change the type of the column to VARCHAR_IGNORECASE, then use ALTER TABLE enduser_table ADD CONSTRAINT CONST_1 UNIQUE(name)
Alternatively, create a generated column then create the UNIQUE constraint on this column. `ALTER TABLE enduser_table ADD COLUMN lc_name VARCHAR(1000) GENERATED ALWAYS AS (LCASE(name))'
With both methods, duplicate values in the NAME column are rejected. With the first method, the index is used for searches on the NAME column. With the second method, the index is used for searches on the lc_name column.
(UPDATE) If you want to use the PosgreSQL CITEXT type, define the type in HSQLDB, then use the first alternative.
CREATE TYPE CITEXT AS VARCHAR_IGNORECASE(2000)
CREATE TABLE enduser_table (name CITEXT, ...
ALTER TABLE enduser_table ADD CONSTRAINT CONST_1 UNIQUE(name)