Searching for a user named "Don" with PostgreSQL full text search - postgresql

We hit a bug with our PostreSQL full text search system where a user whose first name is "Don" was not being included in search results. After some digging, we found that "don" is listed as a stopword in the default full text search dictionary in PostgreSQL (https://github.com/postgres/postgres/blob/master/src/backend/snowball/stopwords/english.stop).
We are using a hosted DB solution so we don't have access to the file system and thus can't create a modified version of the stopword file.
Are there any workarounds for this other than doing a string comparison check? Given that there can be multiple search tokens, it seems pretty bad to have to perform a string comparison of the name fields against every search token.
All the other words in the English stopword file seem pretty reasonable, but I'm really surprised I don't see any other Google/SO results complaining about users named "Don".

Maybe this makes it clear why don is a stopword:
SELECT to_tsvector('english', 'don''t');
to_tsvector
-------------
(1 row)
You wouldn't want to remove that stop word.
Full text search is not useful for proper names.
Normally, trigram indexes are better for that.

Related

How to search for approximate word in database

My client's requirement is, search and get the results for the word if he types approximately closely matching word from the DB.
Eg. He searches for Gogle, it should return Google OR if he searches for GOOOgle, it should return Google.
What I know is LIKE will definitely not going to work.But not sure how can I achieve this.
Try adding extension -
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
Then using some variation of levenshtein to see how closely the word matches eg
SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1); --- gives result = 3
I'm sure a bit of research into this area will meet your requirement

Mysql to Sphinx query conversion

how can i write this query on sphinx select * from vehicle_details where make LIKE "%john%" OR id IN (1,2,3,4), can anyone help me? I've search a lot and i can't find the answer. please help
Well if you really want to use sphinx, could perhaps make id into a fake keyword, so can use it in the MATCH, eg
sql_query = SELECT id, CONCAT('id',id) as _id, make, description FROM ...
Now you have a id based keyword you can match on.
SELECT * FROM index WHERE MATCH('(#make *john*) | (#_id id1|id2|id3|id4)')
But do read up on sphinx keyword matching, as sphinx by default only matches whole words, you need to enable part word matching with wildcards, (eg with min_infix_len) so you can get close to a simple LIKE %..% match (which doesnt take into account words)
Actually pretty hard to do, becuase you mixing a string search (the LIKE which will be a MATCH) - with an attribute filter.
Would suggest two seperate queries, one to sphinx for the text filter. And the IN filter just do directly in database (mysql?). Merge the results in the application.

Must be possible to filter table names in a single database?

As far as I can tell, the search filter in the navigator will only search available database names, not table names.
If you click on a table name and start typing, it appears that a simple search can be performed beginning with the first letter of the tables.
I'm looking for way to be able to search all table names in a selected database. Sometimes there can be a lot of tables to sort through. It seems like a feature that would likely be there and I can't find it.
Found out the answer...
If you type for example *.test_table or the schema name instead of the asterisk it will filter them. The key is that the schema/database must be specified in the search query. The asterisk notation works with the table names as well. For example *.*test* will filter any table in any schema with test anywhere in the table name.
You can use the command
SHOW TABLES like '%%';
To have it always in your tools, you can add it as a snippet to SQL aditions panel on the right.
Then you can always either bring it in your editor and type your search key between %%, or just execute it as it is (It will fetch all the tables of the database) and then just filter using the "filter rows" input of the result set.

Is it possible to perform a Sphinx search on one string attribute?

sql_query=SELECT id,headline,summary,body,tags,issues,published_at
FROM sphinx_search
I am working on the search feature of my Web site and I am using Sphinx, Perl and Sphinx::Search. As long as I want to search in all the attributes and I don't restrict it to just one, everything goes well. However when the user searches for a specific tag, I can't just give the result of a fuzzy search, I want to use the power of Sphinx to search only on tags or issues, maybe sometimes the user wants to search on headline and issues.
How can I perform such a task?
You need to put it in Extended Match Mode
https://metacpan.org/module/JJSCHUTZ/Sphinx-Search-0.27.2/lib/Sphinx/Search.pm#SetMatchMode
Then you can use Extended Query syntax
http://sphinxsearch.com/docs/current.html#extended-syntax
Which includes the field search operator
#tags keyword1
(Be careful with sphinx, the word "attribute" has a specific meaning - values attached to the document, useful for sorting/grouping/filtering and returning with the resultset. Whereas I think you are talking about fields. All the columns from the sql_query you dont mark as an attribute, are a field - and full text searchable)

PostgreSQL full-text search headlines do not contain enough context

I'm using PostgreSQL's full-text search capability to implement a search feature on a client's site. I'm using the ts_headline function to get the context that the search terms appear in, but the client is not happy with the selection of words displayed. In particular, the headline seems to consistently begin with the search term, whereas the client would like it to start a few words earlier.
Is there any way to either configure PostgreSQL to have this behavior, or modify the ts_headline call to get the desired results?
Edit: Apologies for not including some sample SQL in the first place.
SELECT
ts_headline('english', "text", plainto_tsquery('"endpoints"'))
FROM "Page"
WHERE to_tsvector("text") ## plainto_tsquery('"endpoints"')
ORDER BY ts_rank(to_tsvector("text"), plainto_tsquery('"endpoints"'))
Using the MaxFragments option, you might get better results. Similarly you can play with MinWords and MaxWords, e.g.
SELECT
ts_headline('english', "text", plainto_tsquery('"endpoints"'), 'MaxFragments=0, MinWords=5, MaxWords=9')
FROM "Page"
WHERE
to_tsvector("text") ## plainto_tsquery('"endpoints"')
ORDER BY
ts_rank(to_tsvector("text"), plainto_tsquery('"endpoints"'))
You will probably need to experiment.
See MinWords, MaxWords and MaxFragments in http://www.postgresql.org/docs/current/interactive/textsearch-controls.html