How to get wordforms in sphinx? - sphinx

How i can get all morphology forms of the word?
For example, searching keyword is:
runner
Result should be:
run,running ... etc

You usually don't need one. Use a stemmer, which can do the reverse. ie it removes the "ending", so that matching works, rather than trying to figure out all the possible endings.
https://en.wikipedia.org/wiki/Stemming
ie use morphology, rather than worforms.
http://sphinxsearch.com/docs/current.html#conf-morphology

Related

Only autocomplete on an exact match in Sublime Text 2

I'm making a custom .tmLanguage file to highlight the syntax I'm using correctly and generally make coding with it easier. I'm almost done, and I got the autocompletion working using a .sublime-completions file.
There's just one minor flaw I'd like to change. I have a pretty long list of functions, and almost all of them contain an abbreviation of the word 'parameter', PAR. When I start typing that word, the following are all in the list of completions:
PAR command
DEFPAR command
JDATA command (because the description contains PAR)
SPAA command (because there's a P in the command and an A and an R in the description)
What I want is only for the commands that begin with PAR to show up, so from the list above, only the first item.
So, like this:
In other words, I want the completions to show up based on the literal string I'm typing, and only from the trigger part of my completions file, before the \t only.
That completions file looks like this:
Highlighted in orange is what I want my completions list to be based on.
I hope this is understandable. Any help is greatly appreciated.
This is not possible. By design Sublime's autocomplete feature uses fuzzy matching, so if there are a number of options that all contain the same pattern, but you don't quite remember which one you want, you can type the pattern and have all of the options available. The more you type, the smaller the list of possible options becomes. This is a good thingĀ®, otherwise you'd have to remember the exact command you're looking for, which kind of defeats the purpose of autocomplete and code hinting.

CLUTO doc2mat specified stop word list not working

I am trying to convert my documents into vector-space format using doc2mat
On the website, it says I can use my specified text file where words are white-space separated or on multiple lines. So, I use some code similar to this one:
./doc2mat -mystoplist=stopword.txt -skipnumeric mydocuments.txt myvectorspace.txt
However, when I check the output .clabel file, it still has stop words that's in stopword.txt.
I really do not know how to do this. Someone help me out please? Thank you!
There's one important thing I should remember: I should include ALL the unwanted words in my stop list. This is somewhat difficult since there's always some variations available...
For example, if I want to exclude method I add it to my list. However, the resulting vocabulary may also contain method since there are words like methodist, methods, etc. Then doc2mat by default stems these words and I will still get method in the output.
Another thing is to make sure that "-nostop" option must be provided for user-specified stop list.

Sphinx with metaphone and wildcard search

we are an anatomy platform and use sphinx for our search. We want to make our search more fuzzier and started to use metaphone to correct spelling mistakes. It finds for example phalanges even though the search word is falanges.
That's good but we want more. We want that the user could type in falange or even falang and we still find phalanges. Any ideas how to accomplish this?
If you are interested you can checkout our sphinx config file here.
Thanks!
Well you can enable both metaphone and min_prefix_len on an index at once. It will sort of work.
falange*
might then just work. (to match phalanges)
The problem is the 'stripped' letters may change the 'sound' of the word (because change the pronunciation)
eg falange becomes FLNJ, but falang acully becomes FLNK - so they no longer 'substrings' of one another. (ie phalanges becomes FLNJS, which FLNK* wont match)
... to be honest I dont know a good solution. You could perhaps get better results, if was to apply stemming, BEFORE metaphone. (so the endings that change the pronouncation of the words are removed.
Alas Sphinx can't do this. If you enable stemming and metaphone together, only ONE of the processors will ever fire.
Two possible solutions, implement stemming outside of sphinx (or maybe with regexp_filter. Not sure if say a porter stemmer can be implemnented purely with regular expressions)
or modify sphinx, so that ALL morphology processors apply. (rather than just the first one that changes the word)

Sphinx search configuration for words ending with apostrophes

I am trying to improve my Sphinx configuration and I have a trouble with words ending with apostrophes.
For example, for Surfin' USA result, searching with "Surfin USA" returns match but "Surfing USA" doesn't return anything. How can I set Sphinx to return result for such situation?
Hmm, thats an interesting one. Not sure sphinx can automatically deal with this, because it has no way of knowing what the Apostrophe is meant to represent. I suppose there are cases where it could be multiple things.
The only way I can think would be to list them in exceptions, you can build a list of all words want to support
Surfin' > Surfing
Have to use exceptions to be able to use the apostrophe
You might want to add
Surfin > Surfing
too, so can search without the apostrophe too.

How to find literals in source code of Smartforms and in SAPScripts (or reports, if the others can't be done)

I'd like to check hardcoded values in (a lot of) Smartforms and SAPScript forms.
I have found a way to read the source code of both of these, but it seems that i will have to go through a lot of parsing before I get anything reliable.
I've come across function module GET_LITERAL but that doesn't seem to help me much since i have to specify the offset of the value, if i got right what the function is doing in the first place.
I also found RS_LITERAL_LIST but that also doesn't do what i expect.
I also tried searching for reports and methods, but haven't found anything that seemed to help.
A backup plan would be to get some good parsing tool, so do you know of anything like that.
Anyway, any hints would be helpful and appreciated.
[EDIT]
Forgot to mention, the version of my system is 4.6C
If you have a fairly recent version of ABAP, you can use a regex.
Follow the pattern of this example, but use your source as the text and create your own regex. Have it look for any single quotes on the end of a word separated by spaces or any integers with spaces on either side. That's just a start, you might need to work on a better pattern.
String functions count, find, and match