Include stop words on pre-defined phrases in Postgres tsvector - postgresql
I have built a search engine using Postgres that is working pretty well. I have used hunspell dictionaries for the main languages I support, this is how I set them up:
CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;
ALTER TEXT SEARCH CONFIGURATION english_unaccent_hunspell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH unaccent,
english_hunspell,
english_stem;
CREATE TEXT SEARCH CONFIGURATION portuguese_brazil_unaccent_hunspell (
COPY = portuguese_brazil_hunspell
);
ALTER TEXT SEARCH CONFIGURATION portuguese_brazil_unaccent_hunspell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH unaccent,
portuguese_brazil_hunspell,
portuguese_stem;
CREATE TEXT SEARCH CONFIGURATION spanish_unaccent_hunspell (
COPY = spanish_hunspell
);
ALTER TEXT SEARCH CONFIGURATION spanish_unaccent_hunspell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH unaccent,
spanish_hunspell,
spanish_stem;
CREATE TEXT SEARCH CONFIGURATION italian_unaccent_hunspell (
COPY = italian_hunspell
);
ALTER TEXT SEARCH CONFIGURATION italian_unaccent_hunspell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH unaccent,
italian_hunspell,
italian_stem;
CREATE TEXT SEARCH CONFIGURATION russian_unaccent_hunspell (
COPY = russian_hunspell
);
ALTER TEXT SEARCH CONFIGURATION russian_unaccent_hunspell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH unaccent,
russian_hunspell,
russian_stem;
CREATE TEXT SEARCH CONFIGURATION french_unaccent_hunspell (
COPY = french_hunspell
);
ALTER TEXT SEARCH CONFIGURATION french_unaccent_hunspell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH unaccent,
french_hunspell,
french_stem;
CREATE TEXT SEARCH CONFIGURATION german_unaccent_hunspell (
COPY = german_hunspell
);
ALTER TEXT SEARCH CONFIGURATION german_unaccent_hunspell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH unaccent,
german_hunspell,
german_stem;
ALTER TABLE "earliest_search_indices"
ADD COLUMN "documentFts" tsvector;
ALTER TABLE "latest_search_indices"
ADD COLUMN "documentFts" tsvector;
UPDATE
"earliest_search_indices"
SET
"documentFts" = (setweight(to_tsvector('english_unaccent_hunspell', coalesce(title,'')), 'A') || setweight(to_tsvector('english_unaccent_hunspell', coalesce("directoryDescription",'')), 'B') || setweight(to_tsvector('english_unaccent_hunspell', coalesce(body,'')), 'C'))
WHERE
"language" = 'english';
UPDATE
"earliest_search_indices"
SET
"documentFts" = (setweight(to_tsvector('portuguese_brazil_unaccent_hunspell', coalesce(title,'')), 'A') || setweight(to_tsvector('portuguese_brazil_unaccent_hunspell', coalesce("directoryDescription",'')), 'B') || setweight(to_tsvector('portuguese_brazil_unaccent_hunspell', coalesce(body,'')), 'C'))
WHERE
"language" = 'portuguese';
UPDATE
"earliest_search_indices"
SET
"documentFts" = (setweight(to_tsvector('spanish_unaccent_hunspell', coalesce(title,'')), 'A') || setweight(to_tsvector('spanish_unaccent_hunspell', coalesce("directoryDescription",'')), 'B') || setweight(to_tsvector('spanish_unaccent_hunspell', coalesce(body,'')), 'C'))
WHERE
"language" = 'spanish';
UPDATE
"earliest_search_indices"
SET
"documentFts" = (setweight(to_tsvector('french_unaccent_hunspell', coalesce(title,'')), 'A') || setweight(to_tsvector('french_unaccent_hunspell', coalesce("directoryDescription",'')), 'B') || setweight(to_tsvector('french_unaccent_hunspell', coalesce(body,'')), 'C'))
WHERE
"language" = 'french';
UPDATE
"earliest_search_indices"
SET
"documentFts" = (setweight(to_tsvector('italian_unaccent_hunspell', coalesce(title,'')), 'A') || setweight(to_tsvector('italian_unaccent_hunspell', coalesce("directoryDescription",'')), 'B') || setweight(to_tsvector('italian_unaccent_hunspell', coalesce(body,'')), 'C'))
WHERE
"language" = 'italian';
UPDATE
"earliest_search_indices"
SET
"documentFts" = (setweight(to_tsvector('german_unaccent_hunspell', coalesce(title,'')), 'A') || setweight(to_tsvector('german_unaccent_hunspell', coalesce("directoryDescription",'')), 'B') || setweight(to_tsvector('german_unaccent_hunspell', coalesce(body,'')), 'C'))
WHERE
"language" = 'german';
UPDATE
"earliest_search_indices"
SET
"documentFts" = (setweight(to_tsvector('russian_unaccent_hunspell', coalesce(title,'')), 'A') || setweight(to_tsvector('russian_unaccent_hunspell', coalesce("directoryDescription",'')), 'B') || setweight(to_tsvector('russian_unaccent_hunspell', coalesce(body,'')), 'C'))
WHERE
"language" = 'russian';
CREATE INDEX entries_document_fts ON "earliest_search_indices" USING GIN ("documentFts");
The dictionaries I use live here:
https://github.com/ericmackrodt/hunspell_dicts
That's all good, and it behaves exactly how I want, but there are some issues due to the stop word elimination. For the most part it works great, but there are some exceptions where keeping the stop words would be super relevant. Here are some examples:
The Sims - This results in a search for "sims" as the word "the" is eliminated.
Doctor Who - This results in a search for "doctor" as the word "who" is eliminated.
The Who - This results in a search for "" as both "the" and "who" are eliminated.
So, my question is, how could I add those kinds of exceptions to my dictionaries? Like, if the word "who" is preceded by "doctor", then index them together.
I don't mind having to add those exceptions by hand.
Thanks in advance.
You can change the list of stop words by configuring the needed dictionaries:
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html
as answered in this SO answer
Related
How to remove Firebird's triggers created automatically [duplicate]
I have a Firebird table like this: CREATE TABLE events ( event VARCHAR(6) NOT NULL CHECK (event IN ('deploy', 'revert', 'fail')), change_id CHAR(40) NOT NULL, change VARCHAR(512) NOT NULL ); Now I need to add another value to the IN() list in the CHECK constraint. How do I do that? Things I've tried so far: Updating the value in RDB$TRIGGERS.RDB$TRIGGER_SOURCE: UPDATE RDB$TRIGGERS SET RDB$TRIGGER_SOURCE = 'CHECK (event IN (''deploy'', ''revert'', ''fail'', ''merge''))' WHERE RDB$TRIGGER_SOURCE = 'CHECK (event IN (''deploy'', ''revert'', ''fail''))'; Does not seem to work, as the trigger is compiled in RDB$TRIGGERS.RDB$TRIGGER_BLR. Creating a new table with a new check, copying the data over, dropping the old table and renaming the new table. However, it seems that one cannot rename a Firebird table, so I can't make the new table have the same name as the old one. I suspect updating RDB$TRIGGERS is the way to go (idk!), if only I could get Firebird to recompile the code. But maybe there's a better way?
You need to drop and the re-create the check constraint. As you didn't specify a name for your constraint, Firebird created one, so you first need to find that name: select trim(cc.rdb$constraint_name), trg.rdb$trigger_source from rdb$relation_constraints rc join rdb$check_constraints cc on rc.rdb$constraint_name = cc.rdb$constraint_name join rdb$triggers trg on cc.rdb$trigger_name = trg.rdb$trigger_name where rc.rdb$relation_name = 'EVENTS' and rc.rdb$constraint_type = 'CHECK' and trg.rdb$trigger_type = 1; I just added the trigger source for informational reasons. Once you have the name, you can drop it, e.g. alter table events drop constraint integ_27; and then add the new constraint: alter table events add constraint check_event_type CHECK (event IN ('deploy', 'revert', 'fail', 'merge')); In the future you don't need to look for the constraint name because you already it.
Here's how to do it dynamically: SET AUTOddl OFF; SET TERM ^; EXECUTE BLOCK AS DECLARE trig VARCHAR(64); BEGIN SELECT TRIM(cc.rdb$constraint_name) FROM rdb$relation_constraints rc JOIN rdb$check_constraints cc ON rc.rdb$constraint_name = cc.rdb$constraint_name JOIN rdb$triggers trg ON cc.rdb$trigger_name = trg.rdb$trigger_name WHERE rc.rdb$relation_name = 'EVENTS' AND rc.rdb$constraint_type = 'CHECK' AND trg.rdb$trigger_type = 1 INTO trig; EXECUTE STATEMENT 'ALTER TABLE EVENTS DROP CONSTRAINT ' || trig; END^ SET TERM ;^ COMMIT; ALTER TABLE events ADD CONSTRAINT check_event_type CHECK ( event IN ('deploy', 'revert', 'fail', 'merge') ); COMMIT; I had to disable AUTOddl and put in explicit commits or else I got a deadlock on the ALTER TABLE ADD CONSTRAINT statement.
Here's how to do it dynamically: EXECUTE BLOCK RETURNS (STMT VARCHAR(1000)) AS BEGIN SELECT TRIM(R.RDB$CONSTRAINT_NAME) FROM RDB$RELATION_CONSTRAINTS R WHERE R.RDB$RELATION_NAME = 'TABLE_NAME' AND UPPER(R.RDB$CONSTRAINT_TYPE) = UPPER('PRIMARY KEY') INTO :STMT; IF (:STMT IS NOT NULL) THEN BEGIN EXECUTE STATEMENT 'ALTER TABLE TABLE_NAME DROP CONSTRAINT ' || :STMT || ';'; EXECUTE STATEMENT 'ALTER TABLE TABLE_NAME ADD CONSTRAINT ' || :STMT || ' PRIMARY KEY (FIELD1, FIELD2, FIELD3);'; END ELSE BEGIN EXECUTE STATEMENT 'ALTER TABLE FIELD1 ADD CONSTRAINT PK_PRIMARY_NAME PRIMARY KEY (FIELD1, FIELD2, FIELD3);'; END END;
How to change a default separator for postgresql arrays?
I want to import csv with Postgres' arrays into a Postgres table. This is my table: create table dbo.countries ( id char(2) primary key, name text not null, elements text[] CONSTRAINT const_dbo_countries_unique1 unique (id), CONSTRAINT const_dbo_countries_unique2 unique (name) ); and I want to insert into that a csv which looks like this: AC,ac,{xx yy} When I type copy dbo.mytable FROM '/home/file.csv' delimiter ',' csv; then the array is read as a one string: {"xx yy"}. How to change a deafault separator for arrays from , to ?
You cannot to change array's separator symbol. You can read data to table, and later you can run a update on this table: UPDATE dbo.countries SET elements = string_to_array(elements[1], ' ') WHERE strpos(elements[1], ' ') > 0;
Why postgresql recognize only numbers in full text search?
I learn full text search in postgresql and I need to make english dictionary with FTS. I made dictionary mydict_en. I calculate words with my dictionary and other case with simple dictionary. CREATE TEXT SEARCH DICTIONARY mydict_en ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english ); CREATE TEXT SEARCH CONFIGURATION public.mydict_en (PARSER = default); ALTER TEXT SEARCH CONFIGURATION mydict_en ADD MAPPING FOR email, url, url_path, host, file, version, sfloat, float, int, uint, numword, hword_numpart, numhword WITH simple; ALTER TEXT SEARCH CONFIGURATION mydict_en ADD MAPPING FOR word, hword_part, hword WITH mydict_en; My test table (I add FTS field): CREATE TABLE matches ( id Serial NOT NULL, opponents Varchar(1024) NOT NULL, metaKeywords Varchar(2048), metaDescription Varchar(1024), score Varchar(100) NOT NULL, primary key (id) ); ALTER TABLE matches ADD COLUMN fts tsvector; When I insert data to this table, for example: INSERT INTO matches (opponents, metaKeywords, metaDescription, score) VALUES ('heat - thunder', 'nba, ball', 'Heat plays at home.', '99 - 85'); I update my fts field based on priority: UPDATE matches SET fts = setweight( coalesce( to_tsvector('mydict_en', opponents),''),'A') || setweight( coalesce( to_tsvector('mydict_en', metaKeywords),''),'B') || setweight( coalesce( to_tsvector('mydict_en', metaDescription),''),'C') || setweight( coalesce( to_tsvector('mydict_en', score),''),'D'); And my fts contain this record: '85':2 '99':1 Why it contain only numbers, where are words?
PostgreSQL - is there any way that I can find all of the foreign keys that reference a certain table?
I can find all of the foreign keys belonging to a certain table pretty well using the information_schema But I can't figure out how I can find the foreign keys from OTHER tables which reference a certain table. All I want to know is which rows from which tables in my database are referencing the primary key of one of my tables.
Is this what you're looking for? SELECT * FROM pg_constraint WHERE confrelid=<oid of destination table> Or if you just want to see them interactively, they shown in the output of \d <table name> in psql.
Let's make a few tables we can use for testing. create table test ( n integer primary key ); -- There might be more than one schema. create schema scratch; create table scratch.a ( test_n integer not null references test (n), incept_date date not null default current_date, primary key (test_n, incept_date) ); create table b ( test_n integer not null references test (n), incept_date date not null default current_date, primary key (test_n, incept_date) ); -- The same table name can exist in different schemas. create table scratch.b ( test_n integer not null references test (n), incept_date date not null default current_date, primary key (test_n, incept_date) ); I prefer to use information_schema views for this kind of stuff, because what I learn is portable to other database management systems. I usually leave concatenation up to application programs, but I think it's easier to understand the output here if I concatenate columns and give them aliases. The careful programmer will use the "full name" in all the joins--catalog (database), schema, and name. select distinct KCU2.table_catalog || '.' || KCU2.table_schema || '.' || KCU2.table_name referenced_table, RC.constraint_catalog || '.' || RC.constraint_schema || '.' || RC.constraint_name full_constraint_name, KCU1.table_catalog || '.' || KCU1.table_schema || '.' || KCU1.table_name referencing_table from information_schema.referential_constraints RC inner join information_schema.key_column_usage KCU1 on RC.constraint_catalog = KCU1.constraint_catalog and RC.constraint_schema = KCU1.constraint_schema and RC.constraint_name = KCU1.constraint_name inner join information_schema.key_column_usage KCU2 on RC.unique_constraint_catalog = KCU2.constraint_catalog and RC.unique_constraint_schema = KCU2.constraint_schema and RC.unique_constraint_name = KCU2.constraint_name where KCU2.table_catalog = 'sandbox' and KCU2.table_schema = 'public' and KCU2.table_name = 'test' order by referenced_table, referencing_table ; referenced_table full_constraint_name referencing_table -- sandbox.public.test sandbox.public.b_test_n_fkey sandbox.public.b sandbox.public.test sandbox.scratch.a_test_n_fkey sandbox.scratch.a sandbox.public.test sandbox.scratch.b_test_n_fkey sandbox.scratch.b I think that will get you started. A foreign key doesn't have to reference a primary key; it can reference any candidate key. This query tells you which tables have a foreign key to our test table, sandbox.public.test, which seems to be what you're looking for.
postgresql use column name value when quoted with single quotes
I'm trying to update hstore key value with another table reference column. Syntax as simple as SET misc = misc || ('domain' => temp.domain) But I get error because everything in parenthesis should be quoted: SET misc = misc || ('domain=>temp.domain')::hstore But this actually inserts temp.domain as a string and not its value. How can I pass temp.domain value instead?
You can concatenate text with a subquery, and cast the result to type hstore. create temp table temp ( temp_id integer primary key, domain text ); insert into temp values (1, 'wibble'); select ('domain => ' || (select domain from temp where temp_id = 1) )::hstore as key_value from temp key_value hstore -- "domain"=>"wibble" Updates would work in a similar way.