How can I get case-insensitive completion with Term::ReadLine::Gnu? - perl

I can't seem to get case-insensitive completion when using Term::ReadLine::Gnu. Take this example script:
use strict;
use warnings;
use 5.010;
use Term::ReadLine;
my $term = Term::ReadLine->new('test');
say "Using " . $term->ReadLine;
if (my $attr = $term->Attribs) {
$term->ornaments(0);
$attr->{basic_word_break_characters} = ". \t\n";
$attr->{completer_word_break_characters} = " \t\n";
$attr->{completion_function} = \&complete_word;
} # end if attributes
my #words = qw(apple approve Adam America UPPER UPPERCASE UNUSED);
sub complete_word
{
my ($text, $line, $start) = #_;
return grep(/^$text/i, #words);
} # end complete_word
while (1) {
$_ = $term->readline(']');
last unless /\S/; # quit on empty input
} # end while 1
Note that complete_word does case-insensitive matching. If I run this with Term::ReadLine::Perl (by doing PERL_RL=Perl perl script.pl) it works as I expect. Typing a<TAB><TAB> lists all 4 words. Typing u<TAB><TAB> converts u to U and lists 3 words.
When I use Term::ReadLine::Gnu instead (PERL_RL=Gnu perl script.pl or just perl script.pl), it only does case-sensitive completion. Typing a<TAB> gives app. Typing u<TAB><TAB> doesn't list any completions.
I even have set completion-ignore-case on in my /etc/inputrc, but it still doesn't work here. (It works fine in bash, though.)
Is there any way to get Term::ReadLine::Gnu to do case-insensitive completion?

It would appear the problem is in the Term::ReadLine::Gnu::XS::_trp_completion_function() (a wrapper for the user-defined completion function).
Your matches are retrieved correctly from your complete_word() function, but then the following snippet from the wrapper does its own case-sensitive match:
for (; $_i <= $#_matches; $_i++) {
return $_matches[$_i] if ($_matches[$_i] =~ /^\Q$text/);
}
where #_matches is the result of your complete_word() and $text is the completed text so far.
So it looks like the answer is no, there is no supported way to get Term::ReadLine::Gnu to do case-insensitive completion. You would have to to override the private Term::ReadLine::Gnu::XS::_trp_completion_function (an ugly hack to be sure) -- or modify XS.pm directly (arguably an even uglier hack).
EDIT: Term::ReadLine::Gnu version used: 1.20

Related

Does perl cache regex generation?

Suppose I have a function that dynamically generates regular expressions and then matches against them.
For example, in the following function match_here a \G anchor is inserted at the beginning of the regex. This simplifies the API because the caller does not need to remember to include the pos anchor in the pattern.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use Data::Dumper;
sub match_here {
my ($str, $index, $rg) = #_;
pos($str) = $index;
croak "index ($index) out of bounds" unless pos($str) == $index;
my $out;
if ($str =~ /\G$rg/) {
$out = $+[0];
}
return $out;
}
# no match starting at position 0
# prints '$VAR1 = undef;'
print Dumper(match_here("abc", 0, "b+"));
# match from 1 to 2
# prints '$VAR1 = 2;'
print Dumper(match_here("abc", 1, "b+"));
I'm wondering whether an anonymous regex object is "compiled" every time the function is evaluated or if there's some caching so that identical strings will not cause additional regex objects to be compiled.
Also, assuming that no caching is done by the Perl interpreter, is compiling a regex object expensive enough to be worth caching (possibly in an XS extension)?
From perlop(1), under the m// operator:
PATTERN may contain variables, which will be interpolated every time the pattern search is evaluated
[...]
Perl will not recompile the pattern unless an interpolated variable that it contains changes. You can force Perl to skip the test and never recompile by adding a "/o" (which stands for "once") after the trailing delimiter. Once upon a time, Perl would recompile regular expressions unnecessarily, and this modifier was useful to tell it not to do so, in the interests of speed.
So yes, there is a cache, and you can even force the use of the cache even when it's invalid by saying /o, but you really shouldn't do that.
But that cache only stores one compiled regexp per instance of the m// or s/// operator, so it only helps if the regexp is used with the same variables (e.g. your $rg) many times consecutively. If you alternate between calling it with $rg='b+' and $rg='c+' you will get a recompile every time.
For that kind of situation, you can do your own caching with the qr// operator. It explicitly compiles the regexp and returns an object that you can store and use to execute the regexp later. That could be incorporated into your match_here like this:
use feature 'state';
sub match_here {
my ($str, $index, $rg) = #_;
pos($str) = $index;
croak "index ($index) out of bounds" unless pos($str) == $index;
my $out;
state %rg_cache;
my $crg = $rg_cache{$rg} ||= qr/\G$rg/;
if ($str =~ /$crg/) {
$out = $+[0];
}
return $out;
}
To add more detail on the basic cache (when not using qr//): the fact that $rg is a newly allocated lexical variable each time makes no difference. It only matters that the value is the same as the previous one.
Here's an example to prove the point:
use re qw(Debug COMPILE);
while(<>) {
chomp;
# Insane interpolation. Do not use anything remotely like this in real code
print "MATCHED: $_\n" if /^${\(`cat refile`)}/;
}
Every time the match operator executes, it reads refile. The regular expression is ^ followed by the contents of refile. The debugging output shows that it is recompiled only if the contents of the file have changed. If the file still has the same contents as the last time, the operator notices that the same string is being passed to the regexp compiler again, and reuses the cached result.
Or try this less dramatic example:
use re qw(Debug COMPILE);
#patterns = (
'\d{3}',
'\d{3}',
'[aeiou]',
'[aeiou]',
'\d{3}',
'\d{3}'
);
for ('xyz', '123', 'other') {
for $i (0..$#patterns) {
if(/$patterns[$i]/) {
print "$_ matches $patterns[$i]\n";
} else {
print "$_ does not match $patterns[$i]\n";
}
}
}
in which there are 18 compilations and 11 of them are cache hits, even though the same "variable" (the same element of the #patterns array) is never used twice in a row.

Text::SpellChecker module and Unicode

#!/usr/local/bin/perl
use strict;
use warnings;
use Text::SpellChecker;
my $text = "coördinator";
my $checker = Text::SpellChecker->new( text => $text );
while ( my $word = $checker->next_word ) {
print "Bad word is $word\n";
}
Output: Bad word is rdinator
Desired: Bad word is coördinator
The module is breaking if I have Unicode in $text. Any idea how can this be solved?
I have Aspell 0.50.5 installed which is being used by this module. I think this might be the culprit.
Edit: As Text::SpellChecker requires either Text::Aspell or Text::Hunspell, I removed Text::Aspell and installed Hunspell, Text::Hunspell, then:
$ hunspell -d en_US -l < badword.txt
coördinator
Shows correct result. This means there's something wrong either with my code or Text::SpellChecker.
Taking Miller's suggestion in consideration I did the below
#!/usr/local/bin/perl
use strict;
use warnings;
use Text::SpellChecker;
use utf8;
binmode STDOUT, ":encoding(utf8)";
my $text = "coördinator";
my $flag = utf8::is_utf8($text);
print "Flag is $flag\n";
print "Text is $text\n";
my $checker = Text::SpellChecker->new(text => $text);
while (my $word = $checker->next_word) {
print "Bad word is $word\n";
}
OUTPUT:
Flag is 1
Text is coördinator
Bad word is rdinator
Does this mean the module is not able to handle utf8 characters properly?
It is Text::SpellChecker bug - the current version assumes ASCII only words.
http://cpansearch.perl.org/src/BDUGGAN/Text-SpellChecker-0.11/lib/Text/SpellChecker.pm
#
# next_word
#
# Get the next misspelled word.
# Returns false if there are no more.
#
sub next_word {
...
while ($self->{text} =~ m/([a-zA-Z]+(?:'[a-zA-Z]+)?)/g) {
IMHO the best fix would use per language/locale word splitting regular expression or leave word splitting to underlaying library used. aspell list reports coördinator as single word.
I've incorporated Chankey's solution and released version 0.12 to the CPAN, give it a try.
The validity of diaeresis in words like coördinator is interesting. The default aspell and hunspell dictionaries seem to mark it as incorrect, though some publications may disagree.
best,
Brian

How do you override substitution operations?

I'm playing around with Perl and creating a string object. I know that this is a very bad idea to do in the real world. I'm doing it purely for fun.
I'm using overload to overload standard Perl string operators with the standard operators you would find in most other languages.
use strict;
use warnings;
use feature qw(say);
my $obj_string1 = Object::String->new("foo");
my $obj_string2 = Object::String->new("bar");
my $reg_string1 = "foobar";
my $reg_string2 = "barfu";
# Object::String "stringifies" correctly inside quotes
say "$obj_string1 $obj_string2";
# Use "+" for concatenations
say $obj_string1 + $obj_string2; # Works
say $obj_string1 + $reg_string1 + $reg_string2 # Works
say $reg_string1 + $obj_string1 # Still works!
say $reg_string1 + $obj_string1 + $reg_string2; # Still works!
say $reg_string1 + $reg_string2 + $obj_string1; # Does't work, of course.
# Overload math booleans with their string boolean equivalents
my $forty = Object::String(40);
my $one_hundred = "100";
if ( $forty > $one_hundred ) { # Valid
say "$forty is bigger than $one_hundred (in strings!)";
}
if ( $one_hundred < $forty ) { # Also Valid
say "$one_hundred is less than $forty (In strings!)";
}
# Some standard "string" methods
say $forty->length # Prints 5
say $forty->reverse; # Prints "ytrof"
say $forty; # Prints "ytrof"
Now comes the hard part:
my $string = Object::String("I am the best programmer around!");
say $string; # Prints "I am the best programmer around"
say $string->get_value; # Prints "I am the best programmer around" with get_value Method
# But, it's time to speak the truth...
$string =~ s/best programer/biggest liar/;
say $string; # Prints "I am the biggest liar around"
say $string->get_value; # Whoops, no get_value method on scalar strings
As you can see, when I do my substitution, it works correctly, but returns a regular scalar string instead of an Object::String.
I am trying to figure out how to override the substitution operation. I've looked in the Perldoc, and I've gone through various Perl books (Advance Perl Programming, Intermediate Perl Programming, Perl Cookbook, etc.), but haven't found a way to override the substitution operation, so it returns an Object::String.
How do I override the substitution operation?
Unfortunately Perl's overload support isn't very universal in the area of strings. There's many operations that overloading isn't party to; and s/// is one of them.
I have started a module to fix this; overload::substr but as yet it's incomplete. It allows you to overload the substr() function for your object, but so far it doesn't yet have power to apply to m// or s///.
You might however, be able to use lvalue (or 4-argument) substr() on your objects as a way to cheat this; if the objects at least stringify into regular strings that can be matched upon, the substitution can be done using the substr()-rewrite trick.
Turn
$string =~ s/pattern/replacement/;
into
$string =~ m/pattern/ and substr($string, $-[0], $+[0]-$-[0]) = "replacement";
and then you'll have some code which will respect a substr() overload on the $string object, if you use my module above.
At some point of course it would be nice if overload::substr can perform that itself; I just haven't got around to writing it yet.

Perl - How to create commands that users can input in console?

I'm just starting in Perl and I'm quite enjoying it. I'm writing some basic functions, but what I really want to be able to do is to use those functions intelligently using console commands. For example, say I have a function adding two numbers. I'd want to be able to type in console "add 2, 4" and read the first word, then pass the two numbers as parameters in an "add" function. Essentially, I'm asking for help in creating some basic scripting using Perl ^^'.
I have some vague ideas about how I might do this in VB, but Perl, I have no idea where I'd start, or what functions would be useful to me. Is there something like VB.net's "Split" function where you can break down the contents of a scalar into an array? Is there a simple way to analyse one word at a time in a scalar, or iterate through a scalar until you hit a separator, for example?
I hope you can help, any suggestions are appreciated! Bear in mind, I'm no expert, I started Perl all of a few weeks ago, and I've only been doing VB.net half a year.
Thank you!
Edit: If you're not sure what to suggest and you know any simple/intuitive resources that might be of help, that would also be appreciated.
Its rather easy to make a script which dispatches to a command by name. Here is a simple example:
#!/usr/bin/env perl
use strict;
use warnings;
# take the command name off the #ARGV stack
my $command_name = shift;
# get a reference to the subroutine by name
my $command = __PACKAGE__->can($command_name) || die "Unknown command: $command_name\n";
# execute the command, using the rest of #ARGV as arguments
# and print the return with a trailing newline
print $command->(#ARGV);
print "\n";
sub add {
my ($x, $y) = #_;
return $x + $y;
}
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
This script (say its named myscript.pl) can be called like
$ ./myscript.pl add 2 3
or
$ ./myscript.pl subtract 2 3
Once you have played with that for a while, you might want to take it further and use a framework for this kind of thing. There are several available, like App::Cmd or you can take the logic shown above and modularize as you see fit.
You want to parse command line arguments. A space serves as the delimiter, so just do a ./add.pl 2 3 Something like this:
$num1=$ARGV[0];
$num2=$ARGV[1];
print $num1 + $num2;
will print 5
Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of #args
my ($var, $command, #args) = parse($_);
# Execute the anonymous sub specified by $command with the #args
my $value = $functions{ $command }->(#args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = #_; # get the $line, which is a argument
my ($var, $command, #args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push #args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push #args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push #args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, #args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).

What's an easy way to print a multi-line string without variable substitution in Perl?

I have a Perl program that reads in a bunch of data, munges it, and then outputs several different file formats. I'd like to make Perl be one of those formats (in the form of a .pm package) and allow people to use the munged data within their own Perl scripts.
Printing out the data is easy using Data::Dump::pp.
I'd also like to print some helper functions to the resulting package.
What's an easy way to print a multi-line string without variable substitution?
I'd like to be able to do:
print <<EOL;
sub xyz {
my $var = shift;
}
EOL
But then I'd have to escape all of the $'s.
Is there a simple way to do this? Perhaps I can create an actual sub and have some magic pretty-printer print the contents? The printed code doesn't have to match the input or even be legible.
Enclose the name of the delimiter in single quotes and interpolation will not occur.
print <<'EOL';
sub xyz {
my $var = shift;
}
EOL
You could use a templating package like Template::Toolkit or Text::Template.
Or, you could roll your own primitive templating system that looks something like this:
my %vars = qw( foo 1 bar 2 );
Write_Code(\$vars);
sub Write_Code {
my $vars = shift;
my $code = <<'END';
sub baz {
my $foo = <%foo%>;
my $bar = <%bar%>;
return $foo + $bar;
}
END
while ( my ($key, $value) = each %$vars ) {
$code =~ s/<%$key%>/$value/g;
}
return $code;
}
This looks nice and simple, but there are various traps and tricks waiting for you if you DIY. Did you notice that I failed to use quotemeta on my key names in the substituion?
I recommend that you use a time-tested templating library, like the ones I mentioned above.
You can actually continue a string literal on the next line, like this:
my $mail = "Hello!
Blah blah.";
Personally, I find that more readable than heredocs (the <<<EOL thing mentioned elsewhere).
Double quote " interpolates variables, but you can use '. Note you'll need to escape any ' in your string for this to work.
Perl is actually quite rich in convenient things to make things more readable, e.g. other quote-operations. qq and q correspond to " and ' and you can use whatever delimiter makes sense:
my $greeting = qq/Hello there $name!
Nice to meet you/; # Interpolation
my $url = q|http://perlmonks.org/|; # No need to escape /
(note how the syntax coloring here didn't quite keep up)
Read perldoc perlop (find in page: "Quote and Quote-like Operators") for more information.
Use a data section to store the Perl code:
#!/usr/bin/perl
use strict;
use warnings;
print <DATA>;
#print munged data
__DATA__
package MungedData;
use strict;
use warnings;
sub foo {
print "foo\n";
}
Try writing your code as an actual perl subroutine, then using B::Deparse to get the source code at runtime.