Postgresql - full text search index - unexpected query results - postgresql

I have a table with a bunch of cols
I have created a full text index on a table like this:
CREATE INDEX phrasetable_exp_idx ON msc.mytable
USING gin(to_tsvector('norwegian', coalesce(msc.mytable.col1,'') || ' ' ||
coalesce(msc.mytable.col2,'') || ' ' ||
coalesce(msc.mytable.col3,'') || ' ' ||
coalesce(msc.mytable.col4,'') || ' ' ||
coalesce(msc.mytable.col5,'') || ' ' ||
coalesce(msc.mytable.col6,'') || ' ' ||
coalesce(msc.mytable.col7,'')));
I try some searches and they are lightning fast, however, for one particular search I don't get the expected results.
I have a row in my table where both col1 and col2 have the exact value "Importkompetanse Oslo AS"
in col3 it has the value "9999".
Only the query to_tsquery('9999') returns the row, which shows me that it does have the value "Importkompetanse Oslo AS" in the both col1 and col2, but the first two queries return no matches.
SELECT *
FROM msc.mytable
WHERE to_tsvector('norwegian', coalesce(msc.col1,'') || ' ' ||
coalesce(msc.mytable.col2,'') || ' ' ||
coalesce(msc.mytable.col3,'') || ' ' ||
coalesce(msc.mytable.col4,'') || ' ' ||
coalesce(msc.mytable.col5,'') || ' ' ||
coalesce(msc.mytable.col6,'') || ' ' ||
coalesce(msc.mytable.col7,'')));
## --to_tsquery('Importkompetanse&Oslo&AS') -- nada
plainto_tsquery('Importkompetanse') -- nada
--to_tsquery('9999') -- OK!
Does anyone have an idea why my searches yields no results?
EDIT:
For some reason, to_tsquery returns something like this:
"'9999':9 'importkompetans':1,6"
The word importkompetanse seems to be cut off?
However, if I set it to simple instead of norwegian, I get the expected results and everything looks good. Why is that?

You used cross configuration between your tsvector and tsquery values. You should use consistent configuration, like:
select to_tsvector('norwegian', 'Importkompetanse Oslo AS')
## to_tsquery('norwegian', 'Importkompetanse&Oslo&AS');
SQLFiddle
This is why it worked with the 'simple' configuration (that is your default).
Note: you can always debug text search with ts_debug(): f.ex. 'Importkompetanse' has not been cut off, 'importkompetans' is just the appropriate lexeme for this word (in the 'norwegian' configuration).
Off: you use a really long, expression-based index, which will only be used, if you use the exact expression in your queries too. You used it right in your example, but this makes your queries really long, and if you change your index expression some time later, you need to make sure all "uses" updated as well.
You could use a simple (sql) function, to simplify your queries:
create or replace function col_tsvector(mytable)
returns tsvector
immutable
language sql
as $function$
return to_tsvector('norwegian',
coalesce($1.col1, '') || ' ' ||
coalesce($1.col2, '') || ' ' ||
coalesce($1.col3, '') || ' ' ||
coalesce($1.col4, '') || ' ' ||
coalesce($1.col5, '') || ' ' ||
coalesce($1.col6, '') || ' ' ||
coalesce($1.col7, ''))
$function$;
With this, you can greatly simplify your index definition & your queries too. (You can even use the attribute notation.)

Related

Building a routine to generate CREATE TRIGGER code in Postgres

I haven't found a straightforward way to retrieve trigger definition code. I mean the trigger/binding declaration, not the trigger function. I figured I'd use some of the system catalogs to build up a script. The following, incomplete, version produces sensible output:
CREATE OR REPLACE FUNCTION dba.ddl_get_build_trigger_code(trigger_id oid) -- I starting from having the OID.
RETURNS text
AS $BODY$
DECLARE
trigger_name_in text;
code text;
BEGIN
/*
What the original declaration looks like:
CREATE TRIGGER trigger_hsys_after_delete
AFTER DELETE
ON data.hsys
REFERENCING OLD TABLE AS deleted_rows
FOR EACH STATEMENT
EXECUTE PROCEDURE data.trigger_function_log_deletion_count();
*/
SELECT tgname FROM pg_trigger WHERE oid = trigger_id INTO trigger_name_in; -- information_schema tables don't use PG OIDs.
RETURN
'CREATE TRIGGER ' || trigger_name || chr(10) ||
chr(9) || action_timing || ' ' || event_manipulation || chr(10) ||
chr(9) || 'ON ' || event_object_schema || '.' || event_object_table ||
CASE WHEN action_reference_old_table IS NOT NULL THEN
chr(10) || chr(9) || 'REFERENCING OLD TABLE AS ' || action_reference_old_table || chr(10) END ||
-- CASE WHEN action_reference_new_table IS NOT NULL THEN
-- chr(10) || chr(9) || 'REFERENCING NEW TABLE AS ' || action_reference_new_table || chr(10) END ||
chr(9) || 'FOR EACH ' || action_orientation || chr(10) ||
chr(9) || action_statement || ';' as create_trigger_code
FROM information_schema.triggers
WHERE trigger_name = trigger_name_in;
END;
$BODY$
LANGUAGE plpgsql;
Here's a sample, matching my actual case:
CREATE TRIGGER trigger_hsys_after_delete
AFTER DELETE
ON data.hsys
REFERENCING OLD TABLE AS deleted_rows
FOR EACH STATEMENT
EXECUTE PROCEDURE trigger_function_log_deletion_count();
There are several more attributes in information_schema.triggers that may have values, such as action_reference_new_table. When I enable the lines below and action_reference_new_table is NULL, the the script returns NULL:
CASE WHEN action_reference_new_table IS NOT NULL THEN
chr(10) || chr(9) || 'REFERENCING NEW TABLE AS ' || action_reference_new_table || chr(10) END ||
I don't understand why the NULL value for action_reference_new_table blows up my concatenation code and makes the entire result NULL.
Apart from help on this specific question, feel free to point out whatever I should do to write more sensible PL/PgSQL code. It's proving to be harder for me to master than I would have guessed.
Simply use
SELECT pg_get_triggerdef(oid)
FROM pg_trigger
WHERE tgname = trigger_name_in;
Besides, never use string concatenation when composing SQL code. The danger of SQL injection is too great. Use the format() function with the %I placeholder.

Fulltext search missing words

I have a table with the following columns:
ordinance_number (text)
description (text)
keywords (text)
document_vectors (tsvector)
I insert into the column document_vectors by combining the other column data:
let ordinanceVecs = `${data.ordinance_number} ${keywords} ${entry} ${description}`;
I noticed that some words are not in the column document_vectors. For example I inserted the following keywords:
eric-test ordinance trash bin <p>data</p> ordinance out
but in the column I only have the following data inside:
'bin':6 'data':7 'eric':2 'eric-test':1 'ordin':4,8 'test':3 'trash':5
So when I want to search for the word 'Ordinance' :
select *
from ordinances.ordinance
where
(
document_vectors ## to_tsquery('ordinance')
or
document_vectors ## to_tsquery('simple', 'ordinance:*')
)
I get the result back. Partial search up to 'ordinan' works but 'ordinanc' returns 0 results:
select *
from ordinances.ordinance
where
(
document_vectors ## to_tsquery('ordinanc')
or
document_vectors ## to_tsquery('simple', 'ordinanc:*')
)
I'm assuming it because of the way PostgreSQL full-text search and lexemes. But how can I fix it so that any part of a word is searchable and return a result?
If you want to search for substrings, full text search is not the tool for you.
This will work much better using a trigram index:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON ordinances.ordinance USING gin
(ordinance_number || ' ' || keywords || ' ' || entry || ' ' || description) gin_trgm_ops);
Then you can query:
SELECT * FROM ordinances.ordinance
WHERE (ordinance_number || ' ' || keywords || ' ' || entry || ' ' || description)
LIKE '%ordinanc%';
To search for a string that begins at a word boundary, you can use regular expressions:
WHERE (ordinance_number || ' ' || keywords || ' ' || entry || ' ' || description)
~ '\mordinanc'

‬‎ <column name> ‪is‬‎ ‪not‬‎ ‪valid‬‎ ‪in‬‎ ‪the‬‎ ‪context‬‎ ‪where‬‎ ‪it‬‎ ‪is‬‎ ‪used‬‎.‪‬‎

I've been at this Create Trigger for a while...
I'm using IBM Data Studio 4.1.3 while making this Trigger. At first I had problems with ending statements with ';' but on the IBM website it says to use 'x' and it works.
My main problem, however, wondering why I get this message:
‬‪‬‎"‪N.ITEMNAME"‬‎ ‪is‬‎ ‪not‬‎ ‪valid‬‎ ‪in‬‎ ‪the‬‎ ‪context‬‎ ‪where‬‎ ‪it‬‎ ‪is‬‎ ‪used‬‎.‪‬‎.‪‬‎ ‪SQLCODE‬‎=‪‬‎-‪206‬‎,‪‬‎ ‪SQLSTATE‬‎=‪42703‬‎,‪‬‎ ‪DRIVER‬‎=‪3‬‎.‪69‬‎.‪56
This also applies to all the others: o.itemid, o.quantity, and n.quantity. I found this out when switching/swapping the names around each other.
The editor is telling me that it has no errors in the statement but when executing, problems arise.
-- <ScriptOptions statementTerminator="x" />
CREATE TRIGGER DB2ADMIN.SUPPLIES_I
AFTER UPDATE OF QUANTITY ON DB2ADMIN.SUPPLIES
REFERENCING NEW TABLE AS n
OLD TABLE AS o
FOR EACH ROW MODE DB2SQL NOT SECURED
BEGIN ATOMIC
INSERT INTO db2admin.tran_log VALUES (USER, CURRENT TIMESTAMP || ' ' || n.itemname || ' ( ' || o.itemid || ' ) from ' || CHAR(o.quantity) || ' to ' || CHAR(n.quantity));
END
Remove the TABLE word from the CREATE TRIGGER statement:
CREATE TRIGGER DB2ADMIN.SUPPLIES_I
AFTER UPDATE OF QUANTITY ON DB2ADMIN.SUPPLIES
REFERENCING NEW AS n
OLD AS o
FOR EACH ROW MODE DB2SQL NOT SECURED
BEGIN ATOMIC
INSERT INTO db2admin.tran_log VALUES (USER, CURRENT TIMESTAMP || ' ' || n.itemname || ' ( ' || o.itemid || ' ) from ' || CHAR(o.quantity) || ' to ' || CHAR(n.quantity));
END
You can't reference a table transition variable in the way you try. Imagine, that we build the new table as below.
It's possible:
with n(i) as (values 1, 2, 3)
select i
from n;
It's not possible, and you get the same error message:
with n(i) as (values 1, 2, 3)
values (n.i);
Alternative solution with a FOR EACH STATEMENT trigger
If you your table have a key (one or more columns), and it doesn't include the updated column QUANTITY:
CREATE TRIGGER DB2ADMIN.SUPPLIES_I2
AFTER UPDATE OF QUANTITY ON DB2ADMIN.SUPPLIES
REFERENCING NEW TABLE AS n
OLD TABLE AS o
FOR EACH STATEMENT
INSERT INTO db2admin.tran_log
SELECT USER, CURRENT TIMESTAMP || ' ' || n.itemname || ' ( ' || o.itemid || ' ) from ' || CHAR(o.quantity) || ' to ' || CHAR(n.quantity)
FROM n, o
WHERE n.<key>=o.<key>

How to use a WITH block with dynamic sql query

I've got a plpgsql function that needs to prepare data from 3 tables based on user input, and export the data using COPY TO. The data are road accidents, so the 3 tables are accident, casualty and vehicle, each accident links to zero or more records in the vehicle and casualty tables via an accidentid column that exists in all three tables. severity and local_authorities are input parameters (both text []).
sql_query = 'SELECT COUNT(*) FROM accident WHERE severity = ANY(' || quote_literal(severity)
|| ') AND local_auth = ANY (' || quote_literal(local_authorities) || ')';
EXECUTE sql_query INTO result_count;
IF result_count > 0 THEN
-- replace Select Count(*) With Select *
sql_query = Overlay(sql_query placing '*' from 8 for 8);
-- copy the accident data first
EXECUTE 'COPY (' || sql_query || ') TO ' || quote_literal(file_path || file_name_a) ||
' CSV';
This first bit will get the relevant accidents, so I'm now looking for the most efficient way to use the accidentid's from the first query to download the related vehicle and casualty data.
I thought I'd be able to use a WITH block like this:
-- replace * with accidentid
sql_query = Overlay(sql_query placing 'accidentid' from 8 for 1);
WITH acc_ids AS (sql_query)
EXECUTE 'COPY (SELECT * FROM vehicle WHERE accidentid IN (SELECT accidentid FROM
acc_ids)) TO ' || out_path_and_vfilename || ' CSV';
EXECUTE 'COPY (SELECT * FROM casualty WHERE accidentid IN (SELECT accidentid FROM
acc_ids)) TO ' || out_path_and_cfilename || ' CSV';
but get an error:
ERROR: syntax error at or near "$1"
LINE 1: WITH acc_ids AS ( $1 ) EXECUTE 'COPY (SELECT * FROM accident....
I have tried the above in a non-dynamic test case e.g.
WITH acc_ids AS (
SELECT accidentid FROM accident
WHERE severity = ANY ('{3,2}')
AND local_auth = ANY ('{E09000001,E09000002}')
)
SELECT * FROM vehicle
WHERE accidentid IN (
SELECT accidentid FROM acc_ids);
which works. Unfortunately the server is still running Postgres 8.4 so I can't use format() for the time being.
Perhaps this isn't possible with a WITH block, but I hope it at least illustrates what I'm trying to achieve.
Edit/Update
The main goal is to get the relevant data from the 3 tables in 3 separate csv files, ideally without having to run the selection on the accident table 3 times
If you want to run a query (part) that is stored in a string variable, you need a dynamic query like
EXECUTE 'WITH acc_ids AS (' || sql_query || ')'
'SELECT ... ';
Either the whole query is a string executed by EXECUTE, or the whole query is static SQL. You cannot mix them.
Do you need a CTE? If you can express the query as a join, the optimizer has more options.
This does what I need to do without CTE but I can't see this being the most efficient way of solving this since I have to perform the same query on the accident table 3 times:
sql_query = sql_query || which_tab || ' WHERE severity = ANY ('||
quote_literal(severity) ||') AND ' || date_start || ' AND ' ||
date_end || ' AND local_auth = ANY (' ||
quote_literal(local_authorities) || ')';
-- replace * with COUNT(*)
sql_query = Overlay(sql_query placing 'COUNT(*)' from 8 for 1);
EXECUTE sql_query INTO result_count;
IF result_count > 0 THEN
-- replace COUNT(*) with *
sql_query = Overlay(sql_query placing '*' from 8 for 8);
-- copy the accident data first
EXECUTE 'COPY (' || sql_query || ') TO ' || quote_literal(file_path ||
file_name_a) || ' CSV';
sql_query = Overlay(sql_query placing 'accidentid' from 8 for 1);
-- vehicles
EXECUTE 'COPY (SELECT * FROM vehicle WHERE accidentid IN (
SELECT accidentid FROM accident
WHERE severity = ANY (' || quote_literal(severity) || ')
AND local_auth = ANY (' || quote_literal(local_authorities) ||')))
TO ' || quote_literal(file_path || file_name_v) || ' CSV';
-- casualties
EXECUTE 'COPY (SELECT * FROM casualty WHERE accidentid IN (
SELECT accidentid FROM accident
WHERE severity = ANY (' || quote_literal(severity) || ')
AND local_auth = ANY (' || quote_literal(local_authorities) ||')))
TO ' || quote_literal(file_path || file_name_c) || ' CSV';
END IF;

plpgsql function text concatenation error

Can any one correct below statement?
strdir := 'copy '
|| t_name.relname
|| ' from E'''' || C: || '''''
|| t_name.relname || '''.txt'' using delimiters '|'';
strdir := 'copy '
|| t_name.relname
|| ' from E''C:"' -- one ' to many here, included C:, which had no '
-- and I suspect you need a double quote here "
|| t_name.relname
|| '".txt'' using delimiters ''|'''; -- closing ", double ' around |
I think the single quotes around the last | should be two single quotes each.