Fulltext search missing words - postgresql

I have a table with the following columns:
ordinance_number (text)
description (text)
keywords (text)
document_vectors (tsvector)
I insert into the column document_vectors by combining the other column data:
let ordinanceVecs = `${data.ordinance_number} ${keywords} ${entry} ${description}`;
I noticed that some words are not in the column document_vectors. For example I inserted the following keywords:
eric-test ordinance trash bin <p>data</p> ordinance out
but in the column I only have the following data inside:
'bin':6 'data':7 'eric':2 'eric-test':1 'ordin':4,8 'test':3 'trash':5
So when I want to search for the word 'Ordinance' :
select *
from ordinances.ordinance
where
(
document_vectors ## to_tsquery('ordinance')
or
document_vectors ## to_tsquery('simple', 'ordinance:*')
)
I get the result back. Partial search up to 'ordinan' works but 'ordinanc' returns 0 results:
select *
from ordinances.ordinance
where
(
document_vectors ## to_tsquery('ordinanc')
or
document_vectors ## to_tsquery('simple', 'ordinanc:*')
)
I'm assuming it because of the way PostgreSQL full-text search and lexemes. But how can I fix it so that any part of a word is searchable and return a result?

If you want to search for substrings, full text search is not the tool for you.
This will work much better using a trigram index:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON ordinances.ordinance USING gin
(ordinance_number || ' ' || keywords || ' ' || entry || ' ' || description) gin_trgm_ops);
Then you can query:
SELECT * FROM ordinances.ordinance
WHERE (ordinance_number || ' ' || keywords || ' ' || entry || ' ' || description)
LIKE '%ordinanc%';
To search for a string that begins at a word boundary, you can use regular expressions:
WHERE (ordinance_number || ' ' || keywords || ' ' || entry || ' ' || description)
~ '\mordinanc'

Related

‬‎ <column name> ‪is‬‎ ‪not‬‎ ‪valid‬‎ ‪in‬‎ ‪the‬‎ ‪context‬‎ ‪where‬‎ ‪it‬‎ ‪is‬‎ ‪used‬‎.‪‬‎

I've been at this Create Trigger for a while...
I'm using IBM Data Studio 4.1.3 while making this Trigger. At first I had problems with ending statements with ';' but on the IBM website it says to use 'x' and it works.
My main problem, however, wondering why I get this message:
‬‪‬‎"‪N.ITEMNAME"‬‎ ‪is‬‎ ‪not‬‎ ‪valid‬‎ ‪in‬‎ ‪the‬‎ ‪context‬‎ ‪where‬‎ ‪it‬‎ ‪is‬‎ ‪used‬‎.‪‬‎.‪‬‎ ‪SQLCODE‬‎=‪‬‎-‪206‬‎,‪‬‎ ‪SQLSTATE‬‎=‪42703‬‎,‪‬‎ ‪DRIVER‬‎=‪3‬‎.‪69‬‎.‪56
This also applies to all the others: o.itemid, o.quantity, and n.quantity. I found this out when switching/swapping the names around each other.
The editor is telling me that it has no errors in the statement but when executing, problems arise.
-- <ScriptOptions statementTerminator="x" />
CREATE TRIGGER DB2ADMIN.SUPPLIES_I
AFTER UPDATE OF QUANTITY ON DB2ADMIN.SUPPLIES
REFERENCING NEW TABLE AS n
OLD TABLE AS o
FOR EACH ROW MODE DB2SQL NOT SECURED
BEGIN ATOMIC
INSERT INTO db2admin.tran_log VALUES (USER, CURRENT TIMESTAMP || ' ' || n.itemname || ' ( ' || o.itemid || ' ) from ' || CHAR(o.quantity) || ' to ' || CHAR(n.quantity));
END
Remove the TABLE word from the CREATE TRIGGER statement:
CREATE TRIGGER DB2ADMIN.SUPPLIES_I
AFTER UPDATE OF QUANTITY ON DB2ADMIN.SUPPLIES
REFERENCING NEW AS n
OLD AS o
FOR EACH ROW MODE DB2SQL NOT SECURED
BEGIN ATOMIC
INSERT INTO db2admin.tran_log VALUES (USER, CURRENT TIMESTAMP || ' ' || n.itemname || ' ( ' || o.itemid || ' ) from ' || CHAR(o.quantity) || ' to ' || CHAR(n.quantity));
END
You can't reference a table transition variable in the way you try. Imagine, that we build the new table as below.
It's possible:
with n(i) as (values 1, 2, 3)
select i
from n;
It's not possible, and you get the same error message:
with n(i) as (values 1, 2, 3)
values (n.i);
Alternative solution with a FOR EACH STATEMENT trigger
If you your table have a key (one or more columns), and it doesn't include the updated column QUANTITY:
CREATE TRIGGER DB2ADMIN.SUPPLIES_I2
AFTER UPDATE OF QUANTITY ON DB2ADMIN.SUPPLIES
REFERENCING NEW TABLE AS n
OLD TABLE AS o
FOR EACH STATEMENT
INSERT INTO db2admin.tran_log
SELECT USER, CURRENT TIMESTAMP || ' ' || n.itemname || ' ( ' || o.itemid || ' ) from ' || CHAR(o.quantity) || ' to ' || CHAR(n.quantity)
FROM n, o
WHERE n.<key>=o.<key>

How to use a WITH block with dynamic sql query

I've got a plpgsql function that needs to prepare data from 3 tables based on user input, and export the data using COPY TO. The data are road accidents, so the 3 tables are accident, casualty and vehicle, each accident links to zero or more records in the vehicle and casualty tables via an accidentid column that exists in all three tables. severity and local_authorities are input parameters (both text []).
sql_query = 'SELECT COUNT(*) FROM accident WHERE severity = ANY(' || quote_literal(severity)
|| ') AND local_auth = ANY (' || quote_literal(local_authorities) || ')';
EXECUTE sql_query INTO result_count;
IF result_count > 0 THEN
-- replace Select Count(*) With Select *
sql_query = Overlay(sql_query placing '*' from 8 for 8);
-- copy the accident data first
EXECUTE 'COPY (' || sql_query || ') TO ' || quote_literal(file_path || file_name_a) ||
' CSV';
This first bit will get the relevant accidents, so I'm now looking for the most efficient way to use the accidentid's from the first query to download the related vehicle and casualty data.
I thought I'd be able to use a WITH block like this:
-- replace * with accidentid
sql_query = Overlay(sql_query placing 'accidentid' from 8 for 1);
WITH acc_ids AS (sql_query)
EXECUTE 'COPY (SELECT * FROM vehicle WHERE accidentid IN (SELECT accidentid FROM
acc_ids)) TO ' || out_path_and_vfilename || ' CSV';
EXECUTE 'COPY (SELECT * FROM casualty WHERE accidentid IN (SELECT accidentid FROM
acc_ids)) TO ' || out_path_and_cfilename || ' CSV';
but get an error:
ERROR: syntax error at or near "$1"
LINE 1: WITH acc_ids AS ( $1 ) EXECUTE 'COPY (SELECT * FROM accident....
I have tried the above in a non-dynamic test case e.g.
WITH acc_ids AS (
SELECT accidentid FROM accident
WHERE severity = ANY ('{3,2}')
AND local_auth = ANY ('{E09000001,E09000002}')
)
SELECT * FROM vehicle
WHERE accidentid IN (
SELECT accidentid FROM acc_ids);
which works. Unfortunately the server is still running Postgres 8.4 so I can't use format() for the time being.
Perhaps this isn't possible with a WITH block, but I hope it at least illustrates what I'm trying to achieve.
Edit/Update
The main goal is to get the relevant data from the 3 tables in 3 separate csv files, ideally without having to run the selection on the accident table 3 times
If you want to run a query (part) that is stored in a string variable, you need a dynamic query like
EXECUTE 'WITH acc_ids AS (' || sql_query || ')'
'SELECT ... ';
Either the whole query is a string executed by EXECUTE, or the whole query is static SQL. You cannot mix them.
Do you need a CTE? If you can express the query as a join, the optimizer has more options.
This does what I need to do without CTE but I can't see this being the most efficient way of solving this since I have to perform the same query on the accident table 3 times:
sql_query = sql_query || which_tab || ' WHERE severity = ANY ('||
quote_literal(severity) ||') AND ' || date_start || ' AND ' ||
date_end || ' AND local_auth = ANY (' ||
quote_literal(local_authorities) || ')';
-- replace * with COUNT(*)
sql_query = Overlay(sql_query placing 'COUNT(*)' from 8 for 1);
EXECUTE sql_query INTO result_count;
IF result_count > 0 THEN
-- replace COUNT(*) with *
sql_query = Overlay(sql_query placing '*' from 8 for 8);
-- copy the accident data first
EXECUTE 'COPY (' || sql_query || ') TO ' || quote_literal(file_path ||
file_name_a) || ' CSV';
sql_query = Overlay(sql_query placing 'accidentid' from 8 for 1);
-- vehicles
EXECUTE 'COPY (SELECT * FROM vehicle WHERE accidentid IN (
SELECT accidentid FROM accident
WHERE severity = ANY (' || quote_literal(severity) || ')
AND local_auth = ANY (' || quote_literal(local_authorities) ||')))
TO ' || quote_literal(file_path || file_name_v) || ' CSV';
-- casualties
EXECUTE 'COPY (SELECT * FROM casualty WHERE accidentid IN (
SELECT accidentid FROM accident
WHERE severity = ANY (' || quote_literal(severity) || ')
AND local_auth = ANY (' || quote_literal(local_authorities) ||')))
TO ' || quote_literal(file_path || file_name_c) || ' CSV';
END IF;

Postgresql - full text search index - unexpected query results

I have a table with a bunch of cols
I have created a full text index on a table like this:
CREATE INDEX phrasetable_exp_idx ON msc.mytable
USING gin(to_tsvector('norwegian', coalesce(msc.mytable.col1,'') || ' ' ||
coalesce(msc.mytable.col2,'') || ' ' ||
coalesce(msc.mytable.col3,'') || ' ' ||
coalesce(msc.mytable.col4,'') || ' ' ||
coalesce(msc.mytable.col5,'') || ' ' ||
coalesce(msc.mytable.col6,'') || ' ' ||
coalesce(msc.mytable.col7,'')));
I try some searches and they are lightning fast, however, for one particular search I don't get the expected results.
I have a row in my table where both col1 and col2 have the exact value "Importkompetanse Oslo AS"
in col3 it has the value "9999".
Only the query to_tsquery('9999') returns the row, which shows me that it does have the value "Importkompetanse Oslo AS" in the both col1 and col2, but the first two queries return no matches.
SELECT *
FROM msc.mytable
WHERE to_tsvector('norwegian', coalesce(msc.col1,'') || ' ' ||
coalesce(msc.mytable.col2,'') || ' ' ||
coalesce(msc.mytable.col3,'') || ' ' ||
coalesce(msc.mytable.col4,'') || ' ' ||
coalesce(msc.mytable.col5,'') || ' ' ||
coalesce(msc.mytable.col6,'') || ' ' ||
coalesce(msc.mytable.col7,'')));
## --to_tsquery('Importkompetanse&Oslo&AS') -- nada
plainto_tsquery('Importkompetanse') -- nada
--to_tsquery('9999') -- OK!
Does anyone have an idea why my searches yields no results?
EDIT:
For some reason, to_tsquery returns something like this:
"'9999':9 'importkompetans':1,6"
The word importkompetanse seems to be cut off?
However, if I set it to simple instead of norwegian, I get the expected results and everything looks good. Why is that?
You used cross configuration between your tsvector and tsquery values. You should use consistent configuration, like:
select to_tsvector('norwegian', 'Importkompetanse Oslo AS')
## to_tsquery('norwegian', 'Importkompetanse&Oslo&AS');
SQLFiddle
This is why it worked with the 'simple' configuration (that is your default).
Note: you can always debug text search with ts_debug(): f.ex. 'Importkompetanse' has not been cut off, 'importkompetans' is just the appropriate lexeme for this word (in the 'norwegian' configuration).
Off: you use a really long, expression-based index, which will only be used, if you use the exact expression in your queries too. You used it right in your example, but this makes your queries really long, and if you change your index expression some time later, you need to make sure all "uses" updated as well.
You could use a simple (sql) function, to simplify your queries:
create or replace function col_tsvector(mytable)
returns tsvector
immutable
language sql
as $function$
return to_tsvector('norwegian',
coalesce($1.col1, '') || ' ' ||
coalesce($1.col2, '') || ' ' ||
coalesce($1.col3, '') || ' ' ||
coalesce($1.col4, '') || ' ' ||
coalesce($1.col5, '') || ' ' ||
coalesce($1.col6, '') || ' ' ||
coalesce($1.col7, ''))
$function$;
With this, you can greatly simplify your index definition & your queries too. (You can even use the attribute notation.)

automatically selecting databasis and columns and searching data

Is it possible to write a query that automatically selects all database names and column names from dbc.Columns table in Teradata, and searches a particular set of values?
Set of values:
WHERE abc in (1,2,3)
Selecting dbc.columns:
SELECT DatabaseName, TableName FROM dbc.COLUMNS
WHERE ColumnName LIKE '%abc%'
How can I combine this and make a query that will return only those combinations of DatabaseName and TableName where ColumnName has specific subset of values?
UPDATE:
This query finds all database - column combinations:
SELECT TRIM(BOTH FROM a.DatabaseName) || '.' || TRIM( BOTH FROM a.TableName)
FROM dbc.COLUMNS AS a
WHERE ColumnName LIKE '%abc%'
is it possible to define some variables or sthg. else?
You need to write Dynamic SQL statements like
SELECT
'SELECT ''' || DatabaseName || '.' || TableName || '.' || ColumnName || ''''
' WHERE EXISTS (SELECT * FROM ' || DatabaseName || '.' || TableName ||
' WHERE ' || ColumnName || ' IN (1,2,3));'
FROM dbc.ColumnsVX
WHERE ColumnName LIKE '%abc%';
Running the resulting queries will return one result set with zero or one row for each table.
To get a single result set you need to write a Stored Procedure with a cursor on the dbc.columnsVX result (adding an INSERT INTO temptable), EXECTE IMMEDIATE each row. Finally return the rows of the temptable.
Unless you're an experienced SQL programmer your DBA will not grant you the right to create SPs.
But why do you actually need this kind of info? Looking for a needle in a haystack?

postgres 9.1 full text search returning no results

I have searched on the web for many days by it seems the internet has never heard of my problem:
I have a postal address database table holding about 37M records for United Kingdom, which has a geospatial index and a derived full text index created like so:
create index on gb_locations using gin(to_tsvector('english', "Postcode" || ' ' || "Postcode_outcode" || ' ' || "Road" || ' ' || "Neighbourhood" || ' ' || "Admin2" || ' ' || "Admin3");)
My full text search is in the form:
SELECT * FROM gb_locations
WHERE
to_tsvector('english', "Postcode" || ' ' || "Postcode_outcode" || ' ' || "Road" || ' ' || "Neighbourhood" || ' ' || "Admin2" || ' ' || "Admin3") ## plainto_tsquery('english', 'greenham road rg14')
The query works fine for most uk addresses, especially in the London area, but for locations furtuher afield the query returns no results.
I have verified that the record exists in the table as I can find it using a geospatial search but for a full text searches, it seems like the the database is not aware of it.
This is the explaination:
Bitmap Heap Scan on gb_locations (cost=52.04..56.10 rows=1 width=521)
Recheck Cond: (to_tsvector('english'::regconfig, ((((((((((("Postcode")::text || ' '::text) || ("Postcode_outcode")::text) || ' '::text) || "Road") || ' '::text) || ("Neighbourhood")::text) || ' '::text) || ("Admin2")::text) || ' '::text) || ("Admin3")::text)) ## '''greenham'' & ''road'' & ''rg14'''::tsquery)
-> Bitmap Index Scan on text_search_index (cost=0.00..52.04 rows=1 width=0)
Index Cond: (to_tsvector('english'::regconfig, ((((((((((("Postcode")::text || ' '::text) || ("Postcode_outcode")::text) || ' '::text) || "Road") || ' '::text) || ("Neighbourhood")::text) || ' '::text) || ("Admin2")::text) || ' '::text) || ("Admin3")::text)) ## '''greenham'' & ''road'' & ''rg14'''::tsquery)
Any poiners would be much appreciated.
If certain fields can be NULL, you need to apply coalesce(field, '') on them in the global concatenation that gets the string to search into.
Otherwise it seems to work with the example values given in the comments:
select to_tsvector('english','RG147SW RG14 Greenham Road Newbury West Berkshire')
## plainto_tsquery('english', 'greenham road rg14');
?column?
----------
t
(1 row)
But this one won't match (the result is NULL) and this would be the case when Admin2 is NULL, or more generally any other field passed as is to the || operator.
select to_tsvector('english','RG147SW RG14 Greenham Road ' || NULL || ' Newbury West Berkshire')
## plainto_tsquery('english', 'greenham road rg14');
?column?
----------
(1 row)
Just to add to what Daniel Vérité said,
a full text index must be created as follows if any of the fields are expected to be NULL:
create index [index name] on [table name] using gin(to_tsvector('english', coalesce("Field1",'') || ' ' || coalesce("Field2",'') || ' ' || coalesce("Field3",'') ....));
Furthermore the same template must be used in the query itself as follows:
SELECT * FROM [table name] WHERE to_tsvector('english', coalesce("Field1",'') || ' ' || coalesce("Field2",'') || ' ' || coalesce("Field3",'') ....) ## plainto_tsquery('english', '[your search sentance/phrase]');