PostgreSQL prevent non-matching tsqueries from matching tsvector - postgresql

Given the following query:
select to_tsvector('fat cat ate rat') ## plainto_tsquery('cats ate');
This query will return true as a result. Now, what if I don't want "cats" to also match the word "cat", is there any way I can prevent this?
Also, is there any way I can make sure that the tsquery matches the entire string in that particular order (e.g. the "cats ate" is counted as a single token rather than two). At the moment the following query will also match:
select to_tsvector('fat cat ate rat') ## plainto_tsquery('ate cats');

cat matching cats is due to english stemming, english being probably your default text search configuration. See the result of show default_text_search_config to be sure.
It can be avoided by using the simple configuration. Try the function calls with explicit text configurations:
select to_tsvector('simple', 'fat cat ate rat') ## plainto_tsquery('simple', 'cats ate');
Or change it with:
set default_text_search_config='simple';

Related

Prefix/wildcard searches with 'websearch_to_tsquery' in PostgreSQL Full Text Search?

I'm currently using the websearch_to_tsquery function for full text search in PostgreSQL. It all works well except for the fact that I no longer seem to be able to do partial matches.
SELECT ts_headline('english', q.\"Content\", websearch_to_tsquery('english', {request.Text}), 'MaxFragments=3,MaxWords=25,MinWords=2') Highlight, *
FROM (
SELECT ts_rank_cd(f.\"SearchVector\", websearch_to_tsquery('english', {request.Text})) AS Rank, *
FROM public.\"FileExtracts\" f, websearch_to_tsquery('english', {request.Text}) as tsq
WHERE f.\"SearchVector\" ## tsq
ORDER BY rank DESC
) q
Searches for customer work but cust* and cust:* do not.
I've had a look through the documentation and a number of articles but I can't find a lot of info on it. I haven't worked with it before so hopefully it's just something simple that I'm doing wrong?
You can't do this with websearch_to_tsquery but you can do it with to_tsquery (because ts_query allows to add a :* wildcard) and add the websearch syntax yourself in in your backend.
For example in a node.js environment you could do smth. like this:
let trimmedSearch = req.query.search.trim()
let searchArray = trimmedSearch.split(/\s+/) //split on every whitespace and remove whitespace
let searchWithStar = searchArray.join(' & ' ) + ':*' //join word back together adds AND sign in between an star on last word
let escapedSearch = yourEscapeFunction(searchWithStar)
and than use it in your SQL
search_column ## to_tsquery('english', ${escapedSearch})
You need to write the tsquery directly if you want to use partial matching. plainto_tsquery doesn't pass through partial match notation either, so what were you doing before you switched to websearch_to_tsquery?
Anything that applies a stemmer is going to have hard time handling partial match. What is it supposed to do, take off the notation, stem the part, then add it back on again? Not do stemming on the whole string? Not do stemming on just the token containing the partial match indicator? And how would it even know partial match was intended, rather than just being another piece of punctuation?
To add something on top of the other good answers here, you can also compose your query with both websearch_to_tsquery and to_tsquery to have everything from both worlds:
select * from your_table where ts_vector_col ## to_tsquery('simple', websearch_to_tsquery('simple', 'partial query')::text || ':*')
Another solution I have come up with is to do the text transform as part of the query so building the tsquery looks like this
to_tsquery(concat(regexp_replace(trim(' all the search terms here '), '\W+', ':* & '), ':*'));
(trim) Removes leading/trailing whitespace
(regexp_replace) Splits the search string on non word chars and adds trailing wildcards to each term, then ANDs the terms (:* & )
(concat) Adds a trailing wildcard to the final term
(to_tsquery) Converts to a ts_query
You can test the string manipulation by running
SELECT concat(regexp_replace(trim(' all the search terms here '), '\W+', ':* & ', 'gm'), ':*')
the result should be
all:* & the:* & search:* & terms:* & here:*
So you have multi word partial matches e.g. searching spi ma would return results matching spider man

In DB2 SQL RegEx, how can a conditional replacement be done without CASE WHEN END..?

I have a DB2 v7r3 SQL SELECT statement with three instances of REGEXP_SUBSTR(), all with the same regex pattern string, each of which extract one of three groups.
I'd like to change the first SUBSTR to REGEXP_REPLACE() to do a conditional replacement if there's no match, to insert a default value similarly to the ELSE section of a CASE...END. But I can't make it work. I could easily use a CASE, but it seems more compact & efficient to use RegEx.
For example, I have descriptions of food containers sizes, in various states of completeness:
12X125
6X350
1X1500
1500ML
1000
The last two don't have the 'nnX' part at the beginning, in which case '1X' is assumed and needs to be inserted.
This is my current working pattern string:
^(?:(\d{1,3})(?:X))?((?:\d{1,4})(?:\.\d{1,3})?)(L|ML|PK|Z|)$
The groups returned are: quantity, size, and unit.
But only the first group needs the conditional replacement:
(?:(\d{1,3})(?:X))?
This RexEgg webpage describes the (?=...) operator, and it seems to be what I need, but I'm not sure. It's in the list of operators for my version of DB2, but I can't make it work. Frankly, it's a bit deeper than my regex knowledge, and I can't even make it work in my favorite online regex tester, Regex101.
So...does anyone have any idea or suggestions..? Thanks.
Try this (replace "digits not followed by X_or_digit"):
with t(s) as (values
'12X125'
, '6X350'
, '1X1500'
, '1500'
, '1125'
)
select regexp_replace(s, '^([\d]+(?![X\d]))', '1X\1')
from t;

PostgreSQL similar to operator behaviour

I find PostgreSQL similar to operator works little strange. I accidentally checked for space in below query but surprised with the result.
select 'Device Reprocessing' similar to '%( )%' --return true
select 'Device Reprocessing' similar to '%()%' --return true
select 'DeviceReprocessing' similar to '%()%' --return true
Why 2nd and the 3rd query returns true? Is empty pattern always return true?
What I understand about SIMILAR TO operator is returns true or false depending on whether its pattern matches the given string.
You have defined a group with nothing in it, meaning anything will match. I think you will find any string matches %()%, even an empty string.
Normally you would use this grouping to list options so:
select 'DeviceReprocessing' similar to '%(Davinci|Dog)%'
Would return false since it contains neither "Davinci" nor "Dog", but this:
select 'DeviceReprocessing' similar to '%(vice|Dog)%'
would return true since it does contain at least one of the options.
Your first condition is true because the expression does contain a space.
I actually prefer the Regular Expression notation that does not require the % wildcards:
select 'DeviceReprocessing' ~ 'vice|Dog'

postgres full text search like operator

what is the query to use to search for text that matches the like operator.
I am asking about full text search query, which is of the form
SELECT * FROM eventlogging WHERE description_tsv ## plainto_tsquery('mess');
I want results with "message" as well but right not it does not return anything
If I read the description in the manual correctly, you'll need to use to_tsquery() instead of plainto_tsquery together with a wildcard to allow prefix matching:
SELECT *
FROM eventlogging
WHERE description_tsv ## to_tsquery('mess:*');
You can use LIKE for exact case matches or ILIKE for case insensitive. Use the % as your wildcard.
This will match anything with 'SomeThing' in it (case sensitive).
SELECT * FROM table WHERE name LIKE '%SomeThing%'
This will match anything ending with "ing" (case insensitive).
SELECT * FROM table WHERE name ILIKE '%ing'

Use multiple words in FullText Search input string

I have basic stored procedure that performs a full text search against 3 columns in a table by passing in a #Keyword parameter. It works fine with one word but falls over when I try pass in more than one word. I'm not sure why. The error says:
Syntax error near 'search item' in the full-text search condition 'this is a search item'
SELECT S.[SeriesID],
S.[Name] as 'SeriesName',
P.[PackageID],
P.[Name]
FROM [Series] S
INNER JOIN [PackageSeries] PS ON S.[SeriesID] = PS.[PackageID]
INNER JOIN [Package] P ON PS.[PackageID] = P.[PackageID]
WHERE CONTAINS ((S.[Name],S.[Description], S.[Keywords]),#Keywords)
AND (S.[IsActive] = 1) AND (P.[IsActive] = 1)
ORDER BY [Name] ASC
You will have to do some pre-processing on your #Keyword parameter before passing it into the SQL statement. SQL expects that keyword searches will be separated by boolean logic or surrounded in quotes. So, if you are searching for the phrase, it will have to be in quotes:
SET #Keyword = '"this is a search item"'
If you want to search for all the words then you'll need something like
SET #Keyword = '"this" AND "is" AND "a" AND "search" AND "item"'
For more information, see the T-SQL CONTAINS syntax, looking in particular at the Examples section.
As an additional note, be sure to replace the double-quote character (with a space) so you don't mess up your full-text query. See this question for details on how to do that: SQL Server Full Text Search Escape Characters?
Further to Aaron's answer, provided you are using SQL Server 2016 or greater (130), you could use the in-built string fuctions to pre-process your input string. E.g.
SELECT
#QueryString = ISNULL(STRING_AGG('"' + value + '*"', ' AND '), '""')
FROM
STRING_SPLIT(#Keywords, ' ');
Which will produce a query string you can pass to CONTAINS or FREETEXT that looks like this:
'"this*" AND "is*" AND "a*" AND "search*" AND "item*"'
or, when #Keywords is null:
""