I am trying to find the appropriate syntax within full text search to search for fragments. I know that something like this:
document ## to_tsquery('2161:*) will return anything that started with 2161, but if my token is ABC021613 it does not return that item. What is the syntax to wild card both before and after the 2161?
This is because https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES supports prefix matching, but not "suffix"...
I believe it is answered well here: https://stackoverflow.com/a/13072165/5315974
Related
I'm using postgresql full text tsvector column.
But I found a problem:
When I search for "calça"
The results contains the following results:
1- calça red
2- calça blue
3- calçado red
Why "calçado" is being returned when I search for "calça" ?
Is there any configuration so I can solve this?
Thanks.
It isn't just a matter that one string contains the other. The Portuguese stemmer thinks this is the way they should be stemmed. If you turn the longer word into 'calçadot', for example, it no longer stems it, because (presumably) 'adot' is not recognized as a Portuguese suffix which ought to be removed the way 'ado' is.
If you don't want stemming at all, then you could change the config to 'simple', which doesn't stem. But at that point, maybe you don't want full text search at all, and could just use LIKE instead with a pg_trgm index.
If it is just this particular word that you don't want stemmed, I think you can set up a synonym dictionary which will map calçado to itself, which will bypass stemming.
I'm trying to search for the following text in a field called th.descripition. The text is
Previous ticket, Priority: STANDARD --> RELOCATE
Searching using '%RELOCATE%' at the beginning, and found everything I needed except the text above. I then started researching and the best that I could find suggested something using square brackets like th.description like '[a-zA-Z0-9]RELOCATE%'. That doesn't work either. What am I missing or should be doing?
I was experimenting with PostgreSQL's text search feature - particularly with the normalization function to_tsquery.
I was using english dictionary(config) and for some reason s and t won't normalize. I understand why i and a would not, but s and t? Interesting.
Are they matched to single space and tab?
Here is the query:
select
to_tsquery('english', 'a:*') as for_a,
to_tsquery('english', 's:*') as for_s,
to_tsquery('english', 't:*') as for_t,
to_tsquery('english', 'u:*') as for_u
fiddle just in case.
You would see 'u:*' is returning as 'u:*' and 'a:*' is not returning anything.
The letters s and t are considered stop words in the english text search dictionary, therefore they get discarded. You can read the stop word list under tsearch_data/english.stop in the postgres shared folder, which you can locate by typing pg_config --sharedir
With pg 11 on ubuntu/debian/mint, that would be
cat /usr/share/postgresql/11/tsearch_data/english.stop
Quoting from the docs,
Stop words are words that are very common, appear in almost every document, and have no discrimination value. Therefore, they can be ignored in the context of full text searching.
It is best to discard english grammar and think of words in a programmatic and logical way as described above. Full text search does not try to infer context based on sentence structuring so it has no use for these words. After all, it's called full text search and not natural language search.
As to how they arrived on the conclusion to add s and t to the stop word list, statistical analysis must have revealed these characters to be noise.
For example, searching for the word cat, I would want to match words with #*cat*. I already have the hash tag indexing setup, and I have sphinx setup for star searches.
Its not well documented and I've never tried it myself, but a post here:
http://sphinxsearch.com/forum/view.html?id=9847
from a Sphinx developer, suggests that using a plain index, and dict=keywords, would allow wildcard, so could search
#%cat*
For some reason % seems to be used as the wildcard in the middle of a string.
I have a tableview (linked to a database) and a search bar. When I type something in the search bar, I do a quick search in the database and display the results as I type.
The query looks like this:
SELECT * FROM MyTable WHERE name LIKE '%NAME%'
Everything works fine as long as I use only ASCII characters. What I want is to type ASCII characters and to match their equivalent with diacritics. For instance, if I type "Alizee" I would expect it to match "Alizée".
Is there a way to do make the query locale-insensitive? I've red about the COLLATE option in SQL, but there seems to be of no use with SQLite.I've also red that iPhone SDK 3.0 has "Localized collation" but I was unable to find any documentation about what this means...
Thank you.
There are a few options for solving this:
Replacing all accented chars in the
query before executing it, e.g.
"Psychédélices" => "Psychedelices"
"À contre-courant" => "A contre-courant"
"Tempête" => "Tempete"
etc.
but this only works for the input so
you must not have accented chars in
the database itself. Simple solution but
far from perfect.
Using a 3rd party library, namely ICU (links below). Not sure if it's the best choice for iPhone though.
Writing one or more custom C functions that will do the comparison. More in the links below.
A few posts here on StackOverflow that discuss the various options:
How to sort text in sqlite3 with specified locale?
Case-insensitive UTF-8 string collation for SQLite (C/C++)
How to implement the accent/diacritic insensitive search in Sqlite?
Also a couple of external links:
SQLite and native UNICODE LIKE support in C/C++
sqlite case and accent insensitive searches
I'm not sure about SQL, but I think you can definitely use the NSDiacriticInsensitivePredicateOption to compare in-memory NSStrings.
An example would be an NSArray full of the strings you're searching over. You could just iterate over the array comparing strings using the NSDiacriticInsensitivePredicateOption as your comparison option and displaying the successful matches.