postgres stop a recursive search - postgresql

I'm doing a search using a postgres "with recursive" construct, but I want to stop after finding a single occurrence of something in the tree. I'm not confident the "limit 1" on the calling select clause will do it. What is the conventional approach?

Related

Kotlin and PostgreSQL full text search prepared statement (Ktorm)

I'm using Ktorm in Kotlin to run queries against a PostgreSQL database, and currently implementing a full text search feature. Naturally I'd like to avoid SQL injection attacks, and while it doesn't appear Ktorm has full text search built in, I also can't figure out how to implement this properly using a prepared statement given there are variable-many search terms. Manually sanitizing the input is both janky and fragile at best.
Example query for input "foo bar" I'd like to sanitize:
select *, ts_rank_cd(field, query) as rank
from table, to_tsquery('foo:* & bar:*') as query
where query ## field
order by rank desc;
Any thoughts on how to approach this?

PostgreSQL: to_tsquery starts with and ends with search

Recently, I implemented a PostgreSQL 11 full-text search on a huge table I have in a system to solve the problem of hitting LIKE queries in it. This table has over 200 million rows and querying using to_tsquery worked pretty well for the column of type tsvector.
Now I need to hit the following queries but reading the documentation I couldn't find how to do it (or it's there and I didn't understand because full-text search is something new to me yet)
Starts with
Ends with
How can I make the query below return true only if the query is "The cat" (starts with) and "the book" (ends with), if it's possible in full-text search.
select to_tsvector('The cat is on the book') ## to_tsquery('Cat')
I implemented a PostgreSQL 11 full-text search on a huge table I have in a system to solve the problem of hitting LIKE queries in it.
How did you do that? FTS doesn't apply for LIKE queries. It applies for FTS queries, such as ##.
You can't directly look for strings starting and ending with certain words. You can use the index to filter on cat and book, then refilter those rows for ones having them in the right place.
select * from whatever where tsv_col ## to_tsquery('cat & book') and text_col LIKE 'The cat % the book';
Unless you want to match something like 'The cathe book' then you would have to do something else, with two different LIKE.

Use Postgresql full text search to fuzzy match all search terms

I have 2 tables (projects and tasks) that both contain a name field. I want users to be able to search both tables at the same time when entering a new item. I want to rank results based on all the terms entered. A user should be able to enter text in any order he/she chooses.
For example, searching on:
office bmt
should yield these results:
PR BMT Time - Office
BMT Office - Development
BMT Office - Development
...
The following search should also work:
BMT canter
should contain this result:
Canterburry - BMT time
So partial matches need to work too.
Ideally if the user would type a small error like:
ofice bmt
The results should still appear.
I now use something like this:
where to_tsvector(projects.name || ' - ' || tasks.name) ## to_tsquery('OFF:*&BMT:*')
I build the search string itself in the Ruby backend by splitting the user entry according to its spaces.
This works fine, however in some cases it doesn't and I believe that's because it interprets it like English and ignores some words like of, off, in, etc...
For example searching for:
off bmt
Gives results that don't contain Off at all because off is ignored completely.
Is there a way to avoid this but still have good performance and fuzzy search? I'm not keen on having to sync my PG with ElasticSearch for this.
I could do it by building a list of AND statements in the WHERE clause with LIKE '% ... %' but that would probably hurt performance and doesn't support fuzzysearch.
Ideally if the user would type a small error like:
ofice bmt
The results should still appear.
This could be very hard to do on more than a best-effort basis. If someone enters "Canter", how should the system know if they meant a shortening of Canterburry, or a misspelling of "cancer", or of "cantor", or if they really meant a horse's gait? Perhaps you can create a dictionary of common typos for your specific field? Also, without the specific knowledge that time zones are expected and common, "bmt" seems like a misspelling of, well, something.
This works fine, however in some cases it doesn't and I believe that's because it interprets it like English and ignores some words like of, off, in, etc...
Don't just believe, check and see!
select to_tsquery('english','OFF:*&BMT:*');
to_tsquery
------------
'bmt':*
Yes indeed, to_tsquery does omit stop words, even with the :* thingy.
One option is to use 'simple' rather than 'english' as your configuration:
select to_tsquery('simple','OFF:*&BMT:*');
to_tsquery
-------------------
'off':* & 'bmt':*
Another option is to write tsquery directly rather than processing through to_tsquery. Note that in this case, you have to lower-case it yourself:
select 'off:*&bmt:*'::tsquery;
tsquery
-------------------
'off':* & 'bmt':*
Also note that if you do this with 'office:*', you will never get a match in an 'english' configuration, because 'office' in the document gets stemmed to 'offic', while no stemming occurs when you write 'office:*'::tsquery. So you could use 'simple' rather than 'english' to avoid both stemming and stop words. Or you could test each word in the query individually to see if it gets stemmed before deciding to add :* to it.
Is there a way to avoid this but still have good performance and fuzzy search? I'm not keen on having to sync my PG with ElasticSearch for this.
What do you mean by fuzzysearch? You don't seem to be using that now. You are just using prefix matching, and accidentally using stemming and stopwords. How large is your table to be searched, and what kind of performance is acceptable?
If did you use ElasticSearch, how would you then phrase your searches? If you explained how you would phrase the search in ES, maybe someone can help you do the same thing in PostgreSQL. I don't think we can take it as a given that switching to ES will just magically do the right thing.
I could do it by building a list of AND statements in the WHERE clause
with LIKE '% ... %' but that would probably hurt performance and
doesn't support fuzzysearch.
Have you looked into pg_trgm? It can make those types of queries quite fast. Also, LIKE '%...%' is lot more fuzzy than what you are currently doing, so I don't understand how you will lose that. pg_trgm also provides the '<->' operator which is even fuzzier, and might be your best bet. It can deal with typos fairly well when embedded in long strings, but in short strings they can really be a problem.
In your case, to_tsquery() need to indicate that all words are required, you can use to_tsquery('english', 'off & bmt') and indicates a particular dictionary containing the 'off' word, listed in the link 4, below.
Some tips to use tsvector:
Create a field on your table that contains all fields with terms that you want to search, this field should be the type tsvector
Your search should use tsquery as you mentioned in your answer. In search, you can make some good tricks, like as follow:
2.a. Create a rank, with ts_rank(), indicating the search priority, this indicates the priority and how much the tsquery approximates with original terms
2.b. If you have specific words (like my case, search of chemical terms), you can create a dictionary with the commonly words used, this words can be used to extract radical or parts to compare the similarity.
2.c. About the performance: The tsquery works very well with gin and gist indexes. I have used full text search in a table with +200k registers and the search returns in < 0.4secs.
If you need more fuzzy search in words, you can also use the fuzzy match. I used with tsquery, the levenshtein_less_equal search, using a distance of 3. The function searches words with 3 or minus letters differing from the search, for unique words is a good way to search.
tsquery and tsvector: https://www.postgresql.org/docs/10/datatype-textsearch.html
text search: https://www.postgresql.org/docs/10/textsearch-controls.html#TEXTSEARCH-RANKING
Fuzzy: https://www.postgresql.org/docs/11/fuzzystrmatch.html#id-1.11.7.24.6
Lexize: https://www.postgresql.org/docs/10/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

Understanding SQL query complexity

I'm currently having trouble understanding why a seemingly simple query is taking much longer to return results than a much more complicated (looking) query.
I have a view, performance_summary (which in turn selects from another view). Currently, within psql, when I run a query like
SELECT section
FROM performance_summary
LIMIT 1;
it takes a minute or so to return a result, whereas a query like
SELECT section, version, weighted_approval_rate
FROM performance_summary
WHERE version in ('1.3.10', '1.3.11') AND section ~~ '%WEST'
ORDER BY 1,2;
gets results almost instantly. Without knowing how the view is defined, is there any obvious or common reason why this is?
Not really, without knowing how the view is defined. It could be that the "more complex" query uses an index to select just two rows and then perform some trivial grouping sorting on the two. The query without the where clause might see postgres operating on millions of rows, trillions of operations and producing a single row out after discarding 999999999 rows, we just don't know unless you post the view definition and the explain plan output for both queries
You might be falling into the trap of thinking that a View is somehow a cache of info - it isn't. It's a stored query, that is inserted into the larger query when you select from it/include it in another query- this means that the whole thing must be planned and executed from scratch. There isn't a notion that creating a View does any pre planning etc, onto which other further improvement is done. It's more like the definition of the View is pasted into any query that uses it, then the query is run as if it were just written there and then

HIVE HQL is there a way to use user variable in SELECT as MySQL does?

Under MySQL, when I am doing :
SELECT #a:=concat("he","llo"),concat(#a," world")
I get "hello world".
It is very useful because you don't have to rebuild "hello" in the second field and if I wish to modify the first "concat", it will affect the second...
With complex constructions, the gain is obvious I think...
But I didn't find the same way in HiveQL and so far, I am forced to rewrite each field construction...
I know that a subselect would do the same thing but I would prefer to avoid this way (for some practical reasons...).