Java Text Preprocessing and Clean-Up [closed] - text-processing

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Could you please recomend me java libraries for text prerprocessing and clean up? The lib should perform such tasks:
convert all verbs to infinitive
convert all nouns to singular form
remove useless (for the sense of a text) words

Converting words to canonical forms (verbs to infinitives and nouns to singular, for example) is called lemmatization. One Java-based lemmatizer is Standford CoreNLP.
For "useless words" you probably want "stop words" - there's no standard list, but there's a lot floating around the Internet which function in more or less the same way with the only difference being how many words they include (typically between 100 and 1000). I've known people to use this list before. When removing stop words, remember to ignore case when looking for matches.

Not sure if this does everything you need, but check out mrsqg.
http://code.google.com/p/mrsqg/

Related

How to style pills in a Word document? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
In an MS Word document, how can I style text into pills, for example like the Bootstrap pills or in the image below?
Doesn't have to be exactly like this, just something similar.
I can highlight a word but it is very limited.
Apologies if this is off topic. I could find a better location.
Additionally, I would rather keep the elements within the flow of the page, so that it can scraped correctly by CV scanners.
I.E. I don't want to insert a load of floating textboxes.
Use a combination of range.border and font properties
Option Explicit
Public Sub MakePill(ByVal ipRange As Word.Range)
' Ensure a space before and after the text in the range
myRange.InsertBefore Text:=" "
ipRange.InsertAfter Text:=" "
myRange.Borders.Enable = True
myRange.Font.Shading.BackgroundPatternColor = wdColorAqua
End Sub

What does 'let' in swift mean? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm learning swift, but i'm not a native english speaker and just want to ask what does 'let' mean? I know its a constant but then why it's not 'cons'?
Is 'let' an abbrevation of some word?
I won't die without knowing it, i'm just curious ;)
Thanks.
There are other languages where let is used as a keyword before a variable declaration, such as BASIC and LISP (or Scheme), and I presume it was taken from there. It's not an abbreviation; it's the normal English word "let", used to introduce a command, as in "Let there be light;" in mathematics it is common to announce a symbol this way, as in "Let x be the unknown number of years we are trying to calculate."
To answer your question a little more fully, though: in my view, there is nothing about this word that makes it particularly suitable for constants. They seems to have made an arbitrary choice. var makes sense for a "variable" that can vary (get it?), so now they just needed another word, and they picked let. Personally, I think const would have been better.

Fast Perl Parser Modules? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
At my work I wrote a little parser for C-like expressions in one of our in-house Perl tools. I initially chose Parse::RecDescent because of its extreme ease of use and straightforward grammar syntax, but I'm finding that it's excessively slow (which is corroborated by general opinion found on the web). It's safe to assume that the grammar of the expressions is no more complicated than that of C.
What are the fastest (but still with a straightforward and uncumbersome grammar format) lexxer/parser modules for the use case of thousands of simple expressions (I'd guestimate the median length is 1 token, mean is 2 or so, and max is 30)? Additionally, thanks to unsavory IT choices, it must work in Perl 5.8.8 and it and any non-core dependencies must be pure Perl.
Parse::Eyapp looks like satysfying 5.8.8, pure perl and dependency requirements. As for speed, it claims LALR parsers, which must be faster than recursive descent. A grammar for expressions is given in the doc. Hope it helps.

Correct way of deleting items in a numbered list? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I have a numbered list in org-mode like
1. A
2. B
3. C
4. D
Now when I kill the second line the list incorrectly gets ordered as,
1. A
3. C
4. D
instead of
1. A
2. C
3. D
I know I can always re-order the list before deleting something, but for long lists this becomes a hassle.
Is there a smarter way to avoid this?
You can kill such lines with no fear in mind. Just use C-c C-c afterwards, or S-right and S-left to go back to the previous list style (with up-to-date numbers).

Fastest Perl Template Library [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Whats the fastest perl template library that allows me to do the following:
variable substitution,
loops (Hashes & Arrays),
layout (wrapper templates)
and at least some conditional logic (< > != == %).
..also has anybody used pltenjin? the benchmarks suggest this is pretty rapid.
I recommend you the Xslate template engine (http://xslate.org/), and it's about 50-100 times faster than others. Please, see this comparative benchmarks: http://xslate.org/benchmark.html
The engine enables the use of Template Toolkit (another template engine) compatible template tokens ('[%', '%]'), and you can use commands like: INCLUDE, FOREACH, WHILE, ...
No, I didn't use plTenjin. From my experience,
this looks almost like HTML::Mason minus the
nice block syntax of Mason.
What site do you manage which is able to saturate
any modern CPU during template processing? I don't
think this would happen easily.
In most cases, there are different bottlenecks
to site performance than any cpu-bound template
processing.
(BTW, from what I read in the plTenjin doc,
you should give HTML::Mason a try..)
Regards
rbo