I remember in old days one could use "?" for a wildcard of only one character long. For example "H?" whould return Hi, or He. Is there an equivalent for this in Sphinx?
Thanks.
Actully it is a ? in sphinx too!
The main 'gotcha' is you need to use dict=keywords, and it does need to be a recent version of sphinx. (and need to enable 'substring matching' using min_infix_len/min_prefix_len
See
http://sphinxsearch.com/docs/current.html#conf-dict
that section also mentions the wildcard syntax.
Related
I'm using postgresql full text tsvector column.
But I found a problem:
When I search for "calça"
The results contains the following results:
1- calça red
2- calça blue
3- calçado red
Why "calçado" is being returned when I search for "calça" ?
Is there any configuration so I can solve this?
Thanks.
It isn't just a matter that one string contains the other. The Portuguese stemmer thinks this is the way they should be stemmed. If you turn the longer word into 'calçadot', for example, it no longer stems it, because (presumably) 'adot' is not recognized as a Portuguese suffix which ought to be removed the way 'ado' is.
If you don't want stemming at all, then you could change the config to 'simple', which doesn't stem. But at that point, maybe you don't want full text search at all, and could just use LIKE instead with a pg_trgm index.
If it is just this particular word that you don't want stemmed, I think you can set up a synonym dictionary which will map calçado to itself, which will bypass stemming.
I'm having an issue with specific entries in my wordforms file that are not being
interpreted as expected.
Here are a couple of examples:
1/48 > forty-eighth
1/96 > ninety-sixth
As you can see, these entries contain both slashes and hyphens, which may be related to
my issue.
For some reason, Sphinx doesn't correctly equate each fraction to the spelled out
version. Search results for "1/48" are not the same as for "forty-eighth", as they should
be. In other words, the mapping between these equivalent forms is not working.
In my Sphinx config, I have the forward slash (/) set as a blend character, so I assume
that the fraction is being recognized properly.
In support of that belief, the following wordforms entry does work correctly:
1/4 > fourth
Does anyone have any idea why my multi-term synonyms would not be working as expected?
I have tried replacing the hyphen with a space, but this doesn't change the result at
all. Would it help to change the order of the terms (i.e., on which side of the ">" they
should be placed)?
Thank you very much for any help.
When using characters in Sphinx it is always good to keep in mind the following:
By default, the Sphinx tokenizer handles unknown characters as whitespace
https://sphinxsearch.com/blog/2014/11/26/sphinx-text-processing-pipeline/
That has given me weird results too when using wordforms.
I would suggest you add the hyphen to charset_tables so ninety-sixth becomes one word. ignore_chars is also an option but then you will be looking for ninetysixth instead.
Much depends on the rest of your dataset and use cases ofcourse.
registerModule() expects a submodule key as a third parameter.
I think it should probably not contain a space and only alphabetic characters (or alphanumeric?) and underscore ('_'), but I'm not really sure.
I could not find specific information for this.
The function makes use of \TYPO3\CMS\Core\Utility\GeneralUtility::underscoredToUpperCamelCase to generate the full module name combined of main module and sub module connected with an _
So you already guessed the right answer.
It's a bit complicated strange to answer!
Official API document does not provide exact information. I have worked around some extension which has multiple sub-module. I'm quite sure this not allow special character as you sub-module key.
eg. web_TestTestbe123 (mainModulename_subModuleKey)
I have noticed bellow characteristic for the key:
Key must be lowercase
No space allowed
Numerica value would be fine
Does this make sense?
I found this in the documentation just now:
Backend modules
1. The modkey is made up of alphanumeric characters only. It does not contain underscores and starts with a letter.
https://docs.typo3.org/m/typo3/reference-coreapi/master/en-us/ExtensionArchitecture/NamingConventions/Index.html
Hi I'm building a RESTful app and can't find the recommended way to pattern optional fuzzy or LIKE queries. For example a strict query might be,
/place?city=New+York&state=NY
Corresponds to SQL "... WHERE city="New York" AND state="NY"
But what if I wanted to search for the city field for any row with "York" in city name?
"... WHERE city LIKE "%{parameter}%" AND state="{parameter2}"
I'm thinking about just adding some kind of url-valid character to the request like this:
/place?city=*York*&state=NY
Is there an established or recommended pattern I should use? Thanks!
It's fine to use query string for searching, but it's a little bit weird to use macro character like "*" or "?" in query string(unless you decide to build a really powerful search engine like Google). More importantly, search is usually considered in fuzzy mode by default, so it's redundant to append/prepend the keyword with "*". If you do need exact search, you could surround the exact(or strict) keyword with double quotes. Namely, instead of using /place?city=*York*&state=NY, I recommend /place?city=York&state="NY".
In fact, Google uses quotes to search for an exact word or set of words, and I also found this site takes this pattern.
For example, searching for the word cat, I would want to match words with #*cat*. I already have the hash tag indexing setup, and I have sphinx setup for star searches.
Its not well documented and I've never tried it myself, but a post here:
http://sphinxsearch.com/forum/view.html?id=9847
from a Sphinx developer, suggests that using a plain index, and dict=keywords, would allow wildcard, so could search
#%cat*
For some reason % seems to be used as the wildcard in the middle of a string.