I am trying to improve my Sphinx configuration and I have a trouble with words ending with apostrophes.
For example, for Surfin' USA result, searching with "Surfin USA" returns match but "Surfing USA" doesn't return anything. How can I set Sphinx to return result for such situation?
Hmm, thats an interesting one. Not sure sphinx can automatically deal with this, because it has no way of knowing what the Apostrophe is meant to represent. I suppose there are cases where it could be multiple things.
The only way I can think would be to list them in exceptions, you can build a list of all words want to support
Surfin' > Surfing
Have to use exceptions to be able to use the apostrophe
You might want to add
Surfin > Surfing
too, so can search without the apostrophe too.
Related
I am working on a large project in Xcode. I'm wanting to search, using the Find Navigator (See Below), for all arrays regardless of their name. I only care about any array that has this format, someArray[index].
Some Examples That Should Match
people[12]
section[0].rows[0]
Should Not Match
people[index]
section[section].row[row]
The regex should only return arrays, it should not return any dictionaries or other types that are not a subscripted array.
Why am I doing this? Well, it appears there have been some issues within our app where devs have not properly handled index out of bounds errors or nil values. There are far too many arrays for me to manually go through line by line to find them, so this is the best option I've come up with and it may not even be possible. If anyone has other recommendations, please feel free to share.
You can create a regex to match any word followed by another word with optional period enclosed by brackets. Something like:
\w+\[\w+(\.\w+)?\]
For more info about the regex above you can check this link
For numbers only use \d+ instead:
\w+\[\d+\]
For more info about the regex above you can check this link
If I turn on stemming/lemmatizer in sphinx can I push a term to it "as needed" that does not utilize stemming? I know I can use wordforms to always ignore that word from stemming e.g. Radiology > Radiology but that results in never stemming the word. I'm looking for a way to not add as a wordform exception but be able to in a query in essence say 'look exactly for "Radiology" and do not stem/lemmatize". I have tried "Radiology" instead of Radiology to no avail.
http://sphinxsearch.com/docs/current.html#conf-index-exact-words
:)
Then can do
=Radiology
(in extended match mode)
I am trying to convert my documents into vector-space format using doc2mat
On the website, it says I can use my specified text file where words are white-space separated or on multiple lines. So, I use some code similar to this one:
./doc2mat -mystoplist=stopword.txt -skipnumeric mydocuments.txt myvectorspace.txt
However, when I check the output .clabel file, it still has stop words that's in stopword.txt.
I really do not know how to do this. Someone help me out please? Thank you!
There's one important thing I should remember: I should include ALL the unwanted words in my stop list. This is somewhat difficult since there's always some variations available...
For example, if I want to exclude method I add it to my list. However, the resulting vocabulary may also contain method since there are words like methodist, methods, etc. Then doc2mat by default stems these words and I will still get method in the output.
Another thing is to make sure that "-nostop" option must be provided for user-specified stop list.
we are an anatomy platform and use sphinx for our search. We want to make our search more fuzzier and started to use metaphone to correct spelling mistakes. It finds for example phalanges even though the search word is falanges.
That's good but we want more. We want that the user could type in falange or even falang and we still find phalanges. Any ideas how to accomplish this?
If you are interested you can checkout our sphinx config file here.
Thanks!
Well you can enable both metaphone and min_prefix_len on an index at once. It will sort of work.
falange*
might then just work. (to match phalanges)
The problem is the 'stripped' letters may change the 'sound' of the word (because change the pronunciation)
eg falange becomes FLNJ, but falang acully becomes FLNK - so they no longer 'substrings' of one another. (ie phalanges becomes FLNJS, which FLNK* wont match)
... to be honest I dont know a good solution. You could perhaps get better results, if was to apply stemming, BEFORE metaphone. (so the endings that change the pronouncation of the words are removed.
Alas Sphinx can't do this. If you enable stemming and metaphone together, only ONE of the processors will ever fire.
Two possible solutions, implement stemming outside of sphinx (or maybe with regexp_filter. Not sure if say a porter stemmer can be implemnented purely with regular expressions)
or modify sphinx, so that ALL morphology processors apply. (rather than just the first one that changes the word)
I'd like to check hardcoded values in (a lot of) Smartforms and SAPScript forms.
I have found a way to read the source code of both of these, but it seems that i will have to go through a lot of parsing before I get anything reliable.
I've come across function module GET_LITERAL but that doesn't seem to help me much since i have to specify the offset of the value, if i got right what the function is doing in the first place.
I also found RS_LITERAL_LIST but that also doesn't do what i expect.
I also tried searching for reports and methods, but haven't found anything that seemed to help.
A backup plan would be to get some good parsing tool, so do you know of anything like that.
Anyway, any hints would be helpful and appreciated.
[EDIT]
Forgot to mention, the version of my system is 4.6C
If you have a fairly recent version of ABAP, you can use a regex.
Follow the pattern of this example, but use your source as the text and create your own regex. Have it look for any single quotes on the end of a word separated by spaces or any integers with spaces on either side. That's just a start, you might need to work on a better pattern.
String functions count, find, and match