What is the syntax for AND OR NOT in Postgres trigram search? - postgresql

I have implemented Postgres 9.6 trigram search https://www.postgresql.org/docs/9.6/static/pgtrgm.html into my application which works fine for a single search term.
I can't see how to allow my users to do AND OR NOT searches though.
Currently, if I put "perl" into the search field, it will return hundreds of results. That's great and works fine.
Now if I want to search for documents containing "perl" and also containing "javascript", no matter what search term I put in, no results come back.
I have tried for example:
"perl javascript"
"perl AND javascript"
"perl && javascript"
So I am trying to work out how I can provide to my end users a more sophisticated search than single term only. I would like my application users to be able to do full text searches with and/or/not.
Is it possible? If yes, what is the syntax?

This query finds ...
documents containing "perl" and also containing "javascript"
SELECT *
FROM tbl
WHERE document ~ 'perl'
AND document ~ 'javascript';
Note that "perlane" or "javascripting" or "Kaperl" also qualify. To search for whole words, you might be interested in text search instead. Overview:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Related

Is there anyway to replicated regex term extraction with sphinx query?

Using a simple regex:
Status: (.*?),(.*?)\s
I can easily extract "Updated" and "In-Progress" from
Status: In-Progress,Updated
see https://regex101.com/r/mV7gF5/1
I am trying to do something similar with Sphinx since it is much faster. Is there any way to do this with SphinxQL? I don't even mind if it requires post-processing but I can't for the life of me figure out a sphinxQL since it seems far more literal.
Well sphinx could give you a list of documents containing the word 'Status' and even ones containing Status: .*,.* if was to add : and , to charset_table.
But it can't do any sort of term extraction, would need to post-process those documents (and probably execute the regular expression against them!). The closest would be to CALL SNIPPETS, which sort of does text matching, but it doesnt have a regex syntax.

Override a stemmed word on the fly in a query with Spinx?

If I turn on stemming/lemmatizer in sphinx can I push a term to it "as needed" that does not utilize stemming? I know I can use wordforms to always ignore that word from stemming e.g. Radiology > Radiology but that results in never stemming the word. I'm looking for a way to not add as a wordform exception but be able to in a query in essence say 'look exactly for "Radiology" and do not stem/lemmatize". I have tried "Radiology" instead of Radiology to no avail.
http://sphinxsearch.com/docs/current.html#conf-index-exact-words
:)
Then can do
=Radiology
(in extended match mode)

overpass-api: regex on keys

According to http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL
queries can use regular expressions on both the values and the keys. While I have no trouble using regex on the values, I'm having a problem with the keys.
The example on the wiki referenced above says (among other examples):
/* finds addr:* tags with value exactly "Foo" */
node[~"^addr:.*$"~"^Foo$"];
So, that's an example of using regex on the keys and the values.
What I am interested in is the name key. Specifically the name:en key. There are a couple problems with searching by name. Not all names are in English, and for those nodes/way/relations whose names are not in English, there is no guarantee there will be a name:en tag with an English version of the name.
In general, there is no way to know in advance if the name will be in English or that there is a name:en tag. If you only ask for name or name:en, you run the risk of finding no hit. (Of course, searching for both is no guarantee of success, either.)
I have a case where I know name fails, but name:en succeeds. That is my test case. I can query the overpass-api.de/api/interpreter using this:
[out:json][timeout:25][bbox:33.465530,36.156006,33.608615,36.574516];
(
node[name~"duma",i][place];
way[name~"duma",i][place];
>;
relation[name~"duma",i][place];
node["name:en"~"duma",i][place];
way["name:en"~"duma",i][place];
>;relation["name:en"~"duma",i][place];
);
out center;
see it on overpass
and it works fine ("duma" is not found through name, but it is found with name:en), but I find it lengthy and somewhat repetitive.
I would like to use a regular expression involving the name and name:en tags, but either the server does not understand the query or I simply am using an incorrect regex.
Using the example shown in the wiki: node[~"^addr:.*$"~"^Foo$"]
I have tried:
[~"name|name:en"~"duma",i]
[~"name.*"~"duma",i]
[~"^name.*$"~"duma",i]
and several others. I even mimicked the example with [~"^name:.*"~"duma",i] just to see if anything would be returned.
Does overpass-api.de recognize regular expressions on the keys or do I just have the regex wrong? I don't get an error from overpass-api.de, just the coordinates of the bbox and an empty result. It's usually very strict about reacting to a poortly formatted query. Thanks in advance.
That's really a bug in the Overpass API implementation concerning case-insensitive key regex matching, see this Github ticket for details.
For the time being, you can already test the patch on the development box:
http://overpass-turbo.eu/s/b1l
BTW: If you don't need case-insensitive regexp matching, this should already work on overpass-api.de as of today.

What are the possible values of regconfig in postgresql?

http://www.postgresql.org/docs/9.3/static/textsearch-controls.html often refers to an optional regconfig parameter, but I can't find its possible values and their meanings. Where is it documented? If this can't be answered (e.g. because it depends on my installed database-components or like), how can I determine it myself?
I'd like a "plain text" regconfig, without any human-language transformation. What is the argument for it?
A text search configuration is a grouping of configuration objects: parsers, templates, and dictionaries. As far as I can tell the only configuration choices built-in are the language-specific ones like 'english' or 'finnish'.
You can see a list of configurations in your database via the \dF command in the psql command line tool, or via a query:
select * from pg_catalog.pg_ts_config;
Regarding a "plain text" configuration (#2), I'm not sure what you need but look at creating a custom dictionary. Perhaps start with the built-in "simple" dictionary and remove the stop words?
COMMENT ON TEXT SEARCH DICTIONARY simple IS 'simple dictionary: just lower case and check for stopword';
Reference: Configurations
The blog article Mastering PostgreSQL Tools: Full-Text Search and Phrase Search answers your question in full.
To summarize:
The regconfig is the Search Configuration object which is, itself, a collection of templates, parser dictionaries and stopwords. You can find it using the command \dF. To find out more about a particular regconfig, such as english/dutch use the command \dF+ dutch
For setting up own regconfig, you'll need access to the postgres.conf file, which isn't always allowed.

Sphinx search configuration for words ending with apostrophes

I am trying to improve my Sphinx configuration and I have a trouble with words ending with apostrophes.
For example, for Surfin' USA result, searching with "Surfin USA" returns match but "Surfing USA" doesn't return anything. How can I set Sphinx to return result for such situation?
Hmm, thats an interesting one. Not sure sphinx can automatically deal with this, because it has no way of knowing what the Apostrophe is meant to represent. I suppose there are cases where it could be multiple things.
The only way I can think would be to list them in exceptions, you can build a list of all words want to support
Surfin' > Surfing
Have to use exceptions to be able to use the apostrophe
You might want to add
Surfin > Surfing
too, so can search without the apostrophe too.