Does tsvector works with citext - postgresql

I have a citext column named email and also a tsvector column named search to do full text search. My trigger is like this:
tsvector_update_trigger(search_vector, 'pg_catalog.english', name, email)
name is just a normal text column. However, because email is citext which is text, but just case-insensitive, it appear that the trigger will not work. I have error that mention email is not of "character type". I am wondering why Postgres has difficulty treating citext as just text or cast to text and go about its business of tokenizing it?
How to have email remain as citext and still full-text searchable?

why not just typecast the field to text and life is good? name::text should do. tsvector is case insensitive anyway.

Related

What should be the data type of field 'email' in Postgresql database in pgadmin 4?

You can see that I am getting 'No results found' when searching for varchar.
I need to know the data type that I should select for 'email' in postgresql database.
In the past I used text or varchar or character varying
Apart from using of VARCHAR (as suggested by #Maria), you might get some insight from this link:
https://www.dbrnd.com/2018/04/postgresql-how-to-validate-the-email-address-column/
and from this https://dba.stackexchange.com/questions/68266/what-is-the-best-way-to-store-an-email-address-in-postgresql
if you read some parts of it, they created their own functions or constraints, which would likely help you in understanding PSQL more.
TL/DR, or the links might change in the future (shamelessly taken from one of the links):
CREATE EXTENSION citext;
CREATE DOMAIN domain_email AS citext
CHECK(
VALUE ~ '^\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,3}$'
);
-- for valid samples
SELECT 'some_email#gmail.com'::domain_email;
SELECT 'accountant#dbrnd.org'::domain_email;
-- for an invalid sample
SELECT 'dba#aol.info'::domain_email;
As Neil had pointed out, yeah it's just like using custom TYPES.
CREATE DOMAIN creates a new domain. A domain is essentially a data type with optional constraints (restrictions on the allowed set of values).
source
For those of you unfamiliar with the weird characters used to check the value, it's a regex pattern.
And an example used with a table:
CREATE TABLE sample_table ( id SERIAL PRIMARY KEY, email domain_email );
-- The following is invalid, because ".info" has 4 characters
-- the regex pattern only allows 2-3 characters
INSERT INTO sample_table (email) VALUES ('sample_email#gmail.info');
ERROR: value for domain domain_email violates check constraint "domain_email_check"
-- The following query is valid
INSERT INTO sample_table (email) VALUES ('sample_email#gmail.com');
SELECT * FROM sample_table;
id | email
----+------------------------
1 | sample_email#gmail.com
(1 row)
Thanks Neil for the suggestion.
I recommend you yo use CITEXT type that ignores case in values comparison. It's important for email to prevent duplication like username#example.com and UserName#example.com.
This type is the part of the citext extension that could be activates by the following query:
CREATE EXTENSION citext;

Golang and Postgresql CREATE TABLE giving me problems

Using and following the documentation:
https://godoc.org/github.com/lib/pq
but can't see after hours and hours and research online to find any good example of passing variables to the db.Exec()
I'm building a program that will create new tables depending on certain names entered on the command arguments.
db.Exec(`CREATE TABLE $1(
ID INT PRIMARY KEY NOT NULL,
HOST TEXT NOT NULL,
PORTS TEXT,
BANNERS TEXT,
JAVASCRIPT TEXT,
HEADERS TEXT,
COMMENTS TEXT,
ROBOTS TEXT,
EMAILS TEXT,
CMS TEXT,
URLS TEXT,
BUSTIN TEXT,
VULN TEXT
)`, tablename)
But no luck, I obviously have try to change things around, even I have try
to build the CREATE TABLE syntax on a string and have try to pass that instead of db.Exec(string)
but no luck neither...
can someone give me a hand?
Thanks
You can check on https://golang.org/src/database/sql/sql.go?s=39599:39668#L1437, at line 1478, that sql statements will be first prepared then executed.
In PostgreSQL, prepare are only valid for SELECT, INSERT, UPDATE, DELETE, or VALUES, https://www.postgresql.org/docs/10/static/sql-prepare.html .
Here you can use Go's fmt.Sprintf to support creating different tables, and check table name manually, SQL table names can contain many special characters, but you can narrow it, mine validation is regexp.MustCompile("^[a-zA-Z_]+[0-9a-zA-Z_]*$") .

Postgresql regular expression in type rather than check constraint

This question is loosely based off How can I create a constraint to check if an email is valid in postgres?
I know I can use a string type and constrain it via a check constraint:
CREATE TABLE emails (
email varchar
CONSTRAINT proper_email CHECK (email ~* '^[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+[.][A-Za-z]+$')
);
However, I'd like to be able to create a custom type so that the syntax would be the following
create table emails (
email email_address
);
I would have thought that CREATE TYPE would be of use here but since this is not a composite, range nor enum type, I'm not sure how I'd approach it.
For the record, this is because I have multiple tables all with the same check constraint. I'd like to tweak the constraint in one spot (via a type perhaps) rather than go through all the tables one by one. I think it could also make the table definitions look a lot nicer (it's not for emails, but it's directly appliable if it were solved for an "email_address" type).
The documentation says you can autobox a string to a certain type using an input and output function. Perhaps if I raise an exception upon receipt of an invalid cstring it could be made to work that way, but it seems like a sledgehammer especially considering I do still want it to be a string after all; just a little syntactic sugar/de-duplication.
Use a domain.
create domain email_address as text
check (value ~* '^[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+[.][A-Za-z]+$')
Examples:
select 'abc'::email_address;
ERROR: value for domain email_address violates check constraint "email_address_check"
select 'abc#mail.com'::email_address;
email_address
---------------
abc#mail.com
(1 row)

How to use COLLATE Latin1_General_CI_AI with Full text search column

How to use 'COLLATE Latin1_General_CI_AI' with Full text search column in SQL Server 2008 R2. Select query should show all employees with first name Andres.It should also include names with accent.
Accent Sensitivity option is ON for full text catalogs, it works if i use like operator but doesn't work with contains.
select firstName from Employees
where firstName COLLATE Latin1_General_CI_AI like '%Andres%'
Results
Andres
André
Full Text Search
select firstName from Employees
where contains( FirstName , 'Andres')
Results
Andres
I have tried to alter the table and change the column to COLLATE Latin1_General_CI_AI but no success. (i had to drop the column from catalog first and then alter the column and then rebuild catalog)
ALTER TABLE Employees ALTER COLUMN firstname NVARCHAR(50) COLLATE Latin1_General_CI_AI
You need to turn off accent sensitivity. Unfortunately that is what takes precedence in full-text searches. Collation is not used as you would expect.

Using TSQL, CAST() with COLLATE is non-deterministic. How to make it deterministic? What is the work-around?

I have a function that includes:
SELECT #pString = CAST(#pString AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS
This is useful, for example, to remove accents in french; for example:
UPPER(CAST('Éléctricité' AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS)
gives ELECTRICITE.
But using COLLATE makes the function non-deterministic and therefore I cannot use it as a computed persisted value in a column.
Q1. Is there another (quick and easy) way to remove accents like this, with a deterministic function?
Q2. (Bonus Question) The reason I do this computed persisted column is to search. For example the user may enter the customer's last name as either 'Gagne' or 'Gagné' or 'GAGNE' or 'GAGNÉ' and the app will find it using the persisted computed column. Is there a better way to do this?
EDIT: Using SQL Server 2012 and SQL-Azure.
You will find that it is in fact deterministic, it just has different behavior depending on the character you're trying to collate.
Check the page for Windows 1251 encoding for behavior on accepted characters, and unacceptable characters.
Here is a collation chart for Cyrillic_General_CI_AI. This is codepage 1251 Case Insensitive and Accent Insensitive. This will show you the mappings for all acceptable characters within this collation.
As for the search question, as Keith said, I would investigate putting a full text index on the column you are going to be searching on.
The best answer I got was from Sebastian Sajaroff. I used his example to fix the issue. He suggested a VIEW with a UNIQUE INDEX. This gives a good idea of the solution:
create table Test(Id int primary key, Name varchar(20))
create view TestCIAI with schemabinding as
select ID, Name collate SQL_Latin1_General_CP1_CI_AI as NameCIAI from Test
create unique clustered index ix_Unique on TestCIAI (Id)
create unique nonclustered index ix_DistinctNames on TestCIAI (NameCIAI)
insert into Test values (1, 'Sébastien')
--Insertion 2 will fail because of the unique nonclustered indexed on the view
--(which is case-insensitive, accent-insensitive)
insert into Test values (2, 'Sebastien')