How to search for approximate word in database

How to search for approximate word in database - postgresql

My client's requirement is, search and get the results for the word if he types approximately closely matching word from the DB.
Eg. He searches for Gogle, it should return Google OR if he searches for GOOOgle, it should return Google.
What I know is LIKE will definitely not going to work.But not sure how can I achieve this.

Try adding extension -
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
Then using some variation of levenshtein to see how closely the word matches eg
SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1); --- gives result = 3
I'm sure a bit of research into this area will meet your requirement

Related

Searching for a user named "Don" with PostgreSQL full text search

We hit a bug with our PostreSQL full text search system where a user whose first name is "Don" was not being included in search results. After some digging, we found that "don" is listed as a stopword in the default full text search dictionary in PostgreSQL (https://github.com/postgres/postgres/blob/master/src/backend/snowball/stopwords/english.stop).
We are using a hosted DB solution so we don't have access to the file system and thus can't create a modified version of the stopword file.
Are there any workarounds for this other than doing a string comparison check? Given that there can be multiple search tokens, it seems pretty bad to have to perform a string comparison of the name fields against every search token.
All the other words in the English stopword file seem pretty reasonable, but I'm really surprised I don't see any other Google/SO results complaining about users named "Don".

Maybe this makes it clear why don is a stopword:
SELECT to_tsvector('english', 'don''t');
to_tsvector
-------------
(1 row)
You wouldn't want to remove that stop word.
Full text search is not useful for proper names.
Normally, trigram indexes are better for that.

Openedge SDO -> smart data browser - I want to filter the query results

I have an SDO supplying data to a read-only browser. The SDO query joins several tables and has calculated fields as well as natural data fields.
The users now want a search facility so the browser will only show rows where the search word appears in ANY of the text fields.
For example they want to see rows where
customer.name matches "*bob*" OR
customer.address1 matches "*bob*" OR
product.description matches "*bob*" OR
calc_field_1 matches "*bob*" OR
calc_field_2 matches "*bob*" OR ...
Ideally the answer will filter the SDO output as it is created - but I am also happy to filter the data on the way to the smartbrowser or in the smartbrowser.

The business problem you're trying to solve in fraught with performance issues if you implement it as written. I'd suggest
adding another character column to the table or db,
putting all the words from the other columns in it,
applying a word-index to the new column,
doing a search on that column, and then linking back to the source tables.
It'll be much faster and easier to use.

I used a very simple solution in the end. Users can enter a string they are looking for. If the string is in a cell in the browser then the cell is highlighted in yellow.
Before this the users had to scroll up and down trying to spot the cells of interest in hundreds of rows. We did not have the time or budget for anything fancier.
The important bit of code in the smartbrowser is like this...
on row-display of br_table in frame f-main
do:
if rowObject.field1 matches "*BOB*" then
rowObject.field1:BGCOLOR in browse br_table = 14.
if rowObject.field2 matches "*BOB*" then
rowObject.field2:BGCOLOR in browse br_table = 14.
if rowObject.field3 matches "*BOB*" then
rowObject.field3:BGCOLOR in browse br_table = 14.
... etc ...
it's not hard-coded to only look for Bob - but you should get the idea.

haystack/elasticsearch: trying to find "s04e07"

i have a kinda weird problem with haystack/elasticsearch trying to find tv episodes stored in my database based on a string like this: 's04e07' which means season 4 episode 7 and is a kind of standard format, but the search index has its problems with that.
Trying a few different things it looks like numbers are not indexed in EdgeNgramFields.
In a CharField i can only find exact word matches like '2013' if contained in the titel, but i have no luck finding 's04e07'.
How do i get my results out of the index?
How could i possibly change the hardcoded default mapping in haystack to index my stuff correctly?

I actually wrote about haystack a few days ago, I would suggest reading that one first:
Django Haystack Distinct Value for Field
It's not directly on point, but my advice here is the same. Stop using haystack.
Haystack comes with an outofthebox edgengram and ngram analyzers, which is cool, except these analyzers don't work in nearly all use cases.
They will especially not work in yours because you are mixing numbers and chars.
But my first question is, why can't you index the data like this:
"season":1
"episode":1
And then at search time break down the users search into the above format?
If that isn't possible, you can still PUT a mapping manually without letting haystack do it for you (which I recommend highly anyway because it's mappings are not correct). It's pretty easy to do with elasticutils.
Keep in mind, I don't think edgengram is exactly what you want here in any event. because it only grams from the edge and is most useful for autocompletes, for example if someone is typing s04e and you want to display a list of possible matches.
So, this depends on how users will query the data. Will it always be the above string whole, or parts of the string, or will they sometimes search for e07 and you want to show all seasons with episode 7's?
The last possibility here is to just index it as normal (haystack will choose snowball) and use prefix queries / regex queries to get what you want.

How to form an Endeca query where a field must start with certain letters

Is it possible to form an Endeca query to retrieve a field that must start with certain letters? Say like get all users who's first letter is A? I checked with Range filters but it is supporting only numerical fields as well as Wild card search. But nothing worked well so far.

Creating a dimension is one way of approaching the problem as Paul Lemke mentioned.Wildcard is not an option since the performance overhead as well as irrelevant records.
But we solved it using couple of other alternatives.
Create a new property for the Object called "StartWith", store the first letter of the Object and make it searchable. We found it easier than creating a Dimension.
There is a problem where letters like 'A' are usually stop words in Endeca. We can do you a couple of work around to solve this.
Get the ASCII value of the first letter and store the numerical value in to that property. One more advantage with this approach is that we can use Range Filters. But you can't search for 'AB' kind of requirements.
Pre-pend some characters like ^^^My name and search for ^^^M. The advantage with this approach is you can search conditions like letters starts with AB.

Endeca at it's current version (6.1) does not have a search filter that works like a "startswith" function in other programming languages.
I do have two options that might possibly get you close:
If you are truly only looking for the first letter you can setup a Dimension value for each letter of the Alphabet (A,B,C...). You can then refine on each letter and see only the values that start with letter A, B, C, etc. The only downside to this is you can only filter based on how many dimension values you setup. So if you added "A", you couldn't filter anything that started with "AB". You could go down the line and add "AB", "BA, "CA", and so on but that would get unwieldy very fast.
If you want something closer to a "startswith" function the only other option is to use a wildcard search. Basically you would do a property search like this: N=0&Ntk=Username&Ntt=ab*
The trick with wildcard searching is it will do that across multiple words in that property. So assuming you had a data set of these values:
Smithers Smith
Larry Smith
Jenna-Smith
Doing a search of sm* would actually return all 3 results because "sm" was in their last name. Even the one with the dash would return as Endeca think's that is a seperate word. (It might be possible to turn that off though, not sure).
So basically it comes down to this: Stick a one word in a property, set that property to allow wildcard search, then do a "blah*" against that property and you should have the results you're looking for.

Have you tried the First relevance rank module which is supposed to rank based on proximity to the beginning of the field?
It sounds similar to what you are looking for and together with a wild card may produce your intended results.

Is it possible to perform a Sphinx search on one string attribute?

sql_query=SELECT id,headline,summary,body,tags,issues,published_at
FROM sphinx_search
I am working on the search feature of my Web site and I am using Sphinx, Perl and Sphinx::Search. As long as I want to search in all the attributes and I don't restrict it to just one, everything goes well. However when the user searches for a specific tag, I can't just give the result of a fuzzy search, I want to use the power of Sphinx to search only on tags or issues, maybe sometimes the user wants to search on headline and issues.
How can I perform such a task?

You need to put it in Extended Match Mode
https://metacpan.org/module/JJSCHUTZ/Sphinx-Search-0.27.2/lib/Sphinx/Search.pm#SetMatchMode
Then you can use Extended Query syntax
http://sphinxsearch.com/docs/current.html#extended-syntax
Which includes the field search operator
#tags keyword1
(Be careful with sphinx, the word "attribute" has a specific meaning - values attached to the document, useful for sorting/grouping/filtering and returning with the resultset. Whereas I think you are talking about fields. All the columns from the sql_query you dont mark as an attribute, are a field - and full text searchable)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to search for approximate word in database - postgresql

Try adding extension - CREATE EXTENSION IF NOT EXISTS fuzzystrmatch; Then using some variation of levenshtein to see how closely the word matches eg SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1); --- gives result = 3 I'm sure a bit of research into this area will meet your requirement

Related

Searching for a user named "Don" with PostgreSQL full text search

Openedge SDO -> smart data browser - I want to filter the query results

haystack/elasticsearch: trying to find "s04e07"

How to form an Endeca query where a field must start with certain letters

Is it possible to perform a Sphinx search on one string attribute?

Categories

Resources