Is it possible to achieve regular expression search (~) in PostgreSQL using CriteriaBuilder? - postgresql

I am currently attempting to search by regular expression using CriteriaBuilder.
Here is the query I want to generate
select * from table where column ~ '.*';
Then I tried using 'criteriaBuilder.function()'.
Expression<String> expression = root.get("column");
Expression<String> regex = criteriaBuilder.function("~", String.class, expression.as(String.class),criteriaBuilder.literal(regexValue));
This was an error because '~' is an operator, not a function.
Is there any way to achieve a regular expression search with ~ in CriteriaBuilder?

Related

postgres regexp_matches strange behavior

Following the short docs on regexp_matches:
Return all captured substrings resulting from matching a POSIX regular expression against the string.
Example: regexp_matches('foobarbequebaz', '(bar)(beque)') returns {bar,beque}
With that in mind, I'd expect the result of regexp_matches('barbarbar', '(bar)') to be {bar,bar,bar}
However, only {bar} is returned.
Is this the expected behavior? Am I missing something?
Note:
calling regexp_matches('barbarbar', '(bar)', 'g') does return all 3 bars, but in table form:
regexp_matches text[]
{bar}
{bar}
{bar}
This behavior is described more in details in 9.7.3. POSIX Regular Expressions :
The regexp_matches function returns a set of text arrays of captured
substring(s) resulting from matching a POSIX regular expression
pattern to a string. It has the same syntax as regexp_match. This
function returns no rows if there is no match, one row if there is a
match and the g flag is not given, or N rows if there are N matches
and the g flag is given. Each returned row is a text array containing
the whole matched substring or the substrings matching parenthesized
subexpressions of the pattern, just as described above for
regexp_match. regexp_matches accepts all the flags shown in Table
9.24, plus the g flag which commands it to return all matches, not just the first one.
This is expected behavior. The function returns a set of text[] which means that multiple matches are presented in multiple rows. Why is it organized this way? The goal is to make it possible to find more than one token from a single match. In this case, they are presented in the form of an array. The documentation delivers a telling example:
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
regexp_matches
----------------
{bar,beque}
{bazil,barf}
(2 rows)
The query returns two matches, each of them containing two tokens found.

Prefix/wildcard searches with 'websearch_to_tsquery' in PostgreSQL Full Text Search?

I'm currently using the websearch_to_tsquery function for full text search in PostgreSQL. It all works well except for the fact that I no longer seem to be able to do partial matches.
SELECT ts_headline('english', q.\"Content\", websearch_to_tsquery('english', {request.Text}), 'MaxFragments=3,MaxWords=25,MinWords=2') Highlight, *
FROM (
SELECT ts_rank_cd(f.\"SearchVector\", websearch_to_tsquery('english', {request.Text})) AS Rank, *
FROM public.\"FileExtracts\" f, websearch_to_tsquery('english', {request.Text}) as tsq
WHERE f.\"SearchVector\" ## tsq
ORDER BY rank DESC
) q
Searches for customer work but cust* and cust:* do not.
I've had a look through the documentation and a number of articles but I can't find a lot of info on it. I haven't worked with it before so hopefully it's just something simple that I'm doing wrong?
You can't do this with websearch_to_tsquery but you can do it with to_tsquery (because ts_query allows to add a :* wildcard) and add the websearch syntax yourself in in your backend.
For example in a node.js environment you could do smth. like this:
let trimmedSearch = req.query.search.trim()
let searchArray = trimmedSearch.split(/\s+/) //split on every whitespace and remove whitespace
let searchWithStar = searchArray.join(' & ' ) + ':*' //join word back together adds AND sign in between an star on last word
let escapedSearch = yourEscapeFunction(searchWithStar)
and than use it in your SQL
search_column ## to_tsquery('english', ${escapedSearch})
You need to write the tsquery directly if you want to use partial matching. plainto_tsquery doesn't pass through partial match notation either, so what were you doing before you switched to websearch_to_tsquery?
Anything that applies a stemmer is going to have hard time handling partial match. What is it supposed to do, take off the notation, stem the part, then add it back on again? Not do stemming on the whole string? Not do stemming on just the token containing the partial match indicator? And how would it even know partial match was intended, rather than just being another piece of punctuation?
To add something on top of the other good answers here, you can also compose your query with both websearch_to_tsquery and to_tsquery to have everything from both worlds:
select * from your_table where ts_vector_col ## to_tsquery('simple', websearch_to_tsquery('simple', 'partial query')::text || ':*')
Another solution I have come up with is to do the text transform as part of the query so building the tsquery looks like this
to_tsquery(concat(regexp_replace(trim(' all the search terms here '), '\W+', ':* & '), ':*'));
(trim) Removes leading/trailing whitespace
(regexp_replace) Splits the search string on non word chars and adds trailing wildcards to each term, then ANDs the terms (:* & )
(concat) Adds a trailing wildcard to the final term
(to_tsquery) Converts to a ts_query
You can test the string manipulation by running
SELECT concat(regexp_replace(trim(' all the search terms here '), '\W+', ':* & ', 'gm'), ':*')
the result should be
all:* & the:* & search:* & terms:* & here:*
So you have multi word partial matches e.g. searching spi ma would return results matching spider man

What does ~ ‘^[0-9]+$’ mean in PostgreSQL

I’m trying to understand what ~ '^[0-9]+$' means. Would it be any integer that contains 0-9? Or doesn’t contain 0-9?
Is the ~ equivalent to LIKE in MS SQL?
Looking at https://www.postgresql.org/docs/9.6/static/functions-matching.html#FUNCTIONS-POSIX-TABLE, you will find, that ~ means:
"Matches regular expression, case sensitive"
'^[0-9]+$' is a regular expression, with:
^: from string start
$: til string end
[0-9]+: one or more digits.
I don't know how you define integer, but e.g. '0000' is matched as well.
SqlServer does not support complete regular expression syntax out of the box and like does not handle regular expressions in PostgreSQL as well, therefore it is not equivalent to ~.

LotusScript - How to escape double quotes in the SELECT statement?

I have a multiple documents which are containing the field named organization.
Almost every second document contains double quotes in this field, for example: Medical Center "James Goodwin Corp." e.t.c
I have a search query with the name of some organization, which also contains quotes and trying to use this name in the search query to find all needed documents.
I have tryed many variants and each time I am getting the query syntax error about double quotes.
Can you please give some small example or some advise how to escape double quotes in the SELECT statement?
Thank you!
Update:
Yes, I am using Replace function like this:
searchValue = Replace(docByUi.search(0),{"},{|"|})
to change this double quotes to |"|.
And I am getting an error in my select query
Or maybe I am wrong in something?
Update #2:
My query looks like this:
query = {Form="Person" & #Contains(} & docByUi.fields(0) & {;"} & searchValue & {")}
I meaned that I am already using {} to create a part-to-part query.
You can use curly braces {} in your search statement. You don't need to escape double quotes inside braces.
Here is example of your search query:
Form = "Person" & #Contains(Level0; {Filia "Department of Y"})
In your lotus script you can use | symbol to make your string:
query$ = |Form="Person" & #Contains(| & docByUi.fields(0) & |; {| & searchValue & |})|
Instead of using double quotes, use the pipe character:
Select #Contains(Organization; |"|);
Is that what you are trying to do?

Use multiple words in FullText Search input string

I have basic stored procedure that performs a full text search against 3 columns in a table by passing in a #Keyword parameter. It works fine with one word but falls over when I try pass in more than one word. I'm not sure why. The error says:
Syntax error near 'search item' in the full-text search condition 'this is a search item'
SELECT S.[SeriesID],
S.[Name] as 'SeriesName',
P.[PackageID],
P.[Name]
FROM [Series] S
INNER JOIN [PackageSeries] PS ON S.[SeriesID] = PS.[PackageID]
INNER JOIN [Package] P ON PS.[PackageID] = P.[PackageID]
WHERE CONTAINS ((S.[Name],S.[Description], S.[Keywords]),#Keywords)
AND (S.[IsActive] = 1) AND (P.[IsActive] = 1)
ORDER BY [Name] ASC
You will have to do some pre-processing on your #Keyword parameter before passing it into the SQL statement. SQL expects that keyword searches will be separated by boolean logic or surrounded in quotes. So, if you are searching for the phrase, it will have to be in quotes:
SET #Keyword = '"this is a search item"'
If you want to search for all the words then you'll need something like
SET #Keyword = '"this" AND "is" AND "a" AND "search" AND "item"'
For more information, see the T-SQL CONTAINS syntax, looking in particular at the Examples section.
As an additional note, be sure to replace the double-quote character (with a space) so you don't mess up your full-text query. See this question for details on how to do that: SQL Server Full Text Search Escape Characters?
Further to Aaron's answer, provided you are using SQL Server 2016 or greater (130), you could use the in-built string fuctions to pre-process your input string. E.g.
SELECT
#QueryString = ISNULL(STRING_AGG('"' + value + '*"', ' AND '), '""')
FROM
STRING_SPLIT(#Keywords, ' ');
Which will produce a query string you can pass to CONTAINS or FREETEXT that looks like this:
'"this*" AND "is*" AND "a*" AND "search*" AND "item*"'
or, when #Keywords is null:
""