CLUTO doc2mat specified stop word list not working - perl

I am trying to convert my documents into vector-space format using doc2mat
On the website, it says I can use my specified text file where words are white-space separated or on multiple lines. So, I use some code similar to this one:
./doc2mat -mystoplist=stopword.txt -skipnumeric mydocuments.txt myvectorspace.txt
However, when I check the output .clabel file, it still has stop words that's in stopword.txt.
I really do not know how to do this. Someone help me out please? Thank you!

There's one important thing I should remember: I should include ALL the unwanted words in my stop list. This is somewhat difficult since there's always some variations available...
For example, if I want to exclude method I add it to my list. However, the resulting vocabulary may also contain method since there are words like methodist, methods, etc. Then doc2mat by default stems these words and I will still get method in the output.
Another thing is to make sure that "-nostop" option must be provided for user-specified stop list.

Related

Should I use FIND function or use For Loop to search matching record

I am writing a function to search a record from sheet2 and return value to sheet1. I use the Find function and it works for most of the case. However, I encounter below issus.
The FIND function only support one matching key only. If I have more than one value to match, it didn't work.
Sometime, the search value may have space like "Key1 ". Then, if I search the word by "Key1", no record return. I can't use the wildcard search as I need exact match except the space.
To solve this, should I simply use a For loop to search the result instead of FIND? I have to loop through every row. I am not sure the performance is okay or not.
No related to FIND. As I need to copy the value as well as the backgroup color. I use the return cells.interior.colorindex to set the field. But the color is a little bit different. For example, the original is light green but it turn out to become darken color. I check both field has same colorindex. Any idea?
goodfit
Thanks,
see anyone has idea how to resolve it.

Lisp - Extracting info from a list of comma separated values

I've tried searching this but have yet to find something that suits anything close to my needs. I'm trying to create a Autocad LISP that takes a text file, which is a list of comma-separated values, and place a block at coordinates defined by the list. BUT, only for items on the list where the last entry starts with "HP"
So that's sounds a bit complex, but the text file is basically a UTM survey output, and looks like this:
1000,Easting,Northing,Elevation,Identifier
1001,Easting,Northing,Elevation,Identifier
Etc.
The identifier is a variety of values, but I want to extract the Northing,Easting,Elevation, and insert a block (this last part I've got) at that location when the identifier begins with "HP". The list can be long and the number of HPs can be 1 or 5000. I'm assuming there's a "for x=1:end, do" type of loop than can be made that reuses the same variables over and over.
I'm a newbie to LISP so I'm stuck in that spot between "here are I've-never-programmed-before tutorials to make hello world" and "here is a library of the 3000 different commands in alphabetical order"
I believe the functions you are needing to solve this question are open, read-line or read-char, close,strlen, and substr. The first four functions relate to AutoLisp writing and reading a file. The last two functions manipulate the string variables that were pulled from the file. With them, you can find the "HP" within the text. To loop through the same code, three come to my mind: repeat, while, and foreach.
For a list of variables to quickly reference with their descriptions, here's a good starting point. This particular page has the information broken up by category instead of alphabetical order.
https://help.solidworks.com/2022/English/api/draftsightlispreference/html/lisp_functions_overview.htm
Here are a few tutorials where AutoLisp code is used to write and read other files:
https://www.afralisp.net/autolisp/tutorials/file-handling.php
https://www.afralisp.net/autolisp/tutorials/external-data.php
Lastly, here's an example of AutoLisp writing and reading attributes from and to blocks.
https://github.com/GitHubUser5376/AttributeImportExport
You can use Lee-Mac's Reacd-CSV function to get a list of the csv values.
And for the "HP" detection yes you might have to go through(using loop options mentioned above like while, repeat,foreach) each and use
(substr Identifier 1 2)
to validate

Collision with two lines of code makes code does not work the way it is meant by me(?)

Try running the following code yourself, and you would notice that "/hello" changes to "/HELLO", but I want it to change it to "hi". On the other hand, I want to keep the 1.st line of code, which changes "hello" to "HELLO". How could I achieve this(?)
This code problem is very related to my last problem:
Collision with two lines of code make code does not work the way it is meant by me, what could I do different to get this work(?)
The soltuion for my last problem was good for that problem, but it is not working for the new above mentioned problem.
::hello::HELLO
::/hello::hi
That is interesting. I really expected it to work by removing / from the EndChars. But after looking at it for a while, it becomes obvious why it's behaving this way. When you type "/hello" it actually matches to both hotstrings, so AHK chooses the first one defined. Anyway, there are two solutions that I know of:
Reorder your hotstrings. Place ::/hello::hi above the other one and you'll always get the desired result. Additionally, you don't need to change the EndChars since / is the first character.
Use the asterisk option on the second hotstring. This will make it update immediately, which may or may not be desirable (I prefer it).

How to find literals in source code of Smartforms and in SAPScripts (or reports, if the others can't be done)

I'd like to check hardcoded values in (a lot of) Smartforms and SAPScript forms.
I have found a way to read the source code of both of these, but it seems that i will have to go through a lot of parsing before I get anything reliable.
I've come across function module GET_LITERAL but that doesn't seem to help me much since i have to specify the offset of the value, if i got right what the function is doing in the first place.
I also found RS_LITERAL_LIST but that also doesn't do what i expect.
I also tried searching for reports and methods, but haven't found anything that seemed to help.
A backup plan would be to get some good parsing tool, so do you know of anything like that.
Anyway, any hints would be helpful and appreciated.
[EDIT]
Forgot to mention, the version of my system is 4.6C
If you have a fairly recent version of ABAP, you can use a regex.
Follow the pattern of this example, but use your source as the text and create your own regex. Have it look for any single quotes on the end of a word separated by spaces or any integers with spaces on either side. That's just a start, you might need to work on a better pattern.
String functions count, find, and match

Test Suite for Double Metaphone?

I've translated Double-Metaphone into ActionScript3 and I want to test it (obviously) before I release the source to ... um ... the open.
I'm looking for a long list of names with the primary and secondary codes. Google does not find anything except one list with pairs of names (presumably they should match).
Thanks
You could find someone else's double metaphone implementation, run it on the same long list of words, and compare the results to your own.
For long lists of words, I like infochimps. They have lots of word lists, like this one of 350,000 english words or this one of place names, and many more.
Here are implementations you can compare your results against. Here is an online example, except that it tests only one word at a time - I guess you'll have to download and run one of the scripts to test a large list of words.
For each word, two codes will be returned; you'll probably want to test that both codes returned match the ones returned of another implementation. You probably know that the reference implementation is here with full source code here, but including the links anyway for others' benefit.