Is there a better way to detab (expand tabs) using Perl? - perl

I wanted to detab my source files. (Please, no flame about WHY I wanted to detab my sources. That's not the point :-) I couldn't find a utility to do that. Eclipse didn't do it for me, so I implemented my own.
I couldn't fit it into a one liner (-e) program.
I came with the following, which did the job just fine.
while( <> )
{
while( /\t/ ) {
s/^(([^\t]{4})*)\t/$1 /;
s/^((([^\t]{4})*)[^\t]{1})\t/$1 /;
s/^((([^\t]{4})*)[^\t]{2})\t/$1 /;
s/^((([^\t]{4})*)[^\t]{3})\t/$1 /;
}
print;
}
However, it makes me wonder if Perl - the champion language of processing text - is the right tool. The code doesn't seem very elegant. If I had to detab source that assume tab=8 spaces, the code would look even worse.
Specifically because I can think of a deterministic state machine with only 4 states to do the job.
I have a feeling that a more elegant solution exists. Am I missing a Perl idiom? In the spirit of TIMTOWTDI I'm curious about the other ways to do it.
u.

What ever happened to the old Unix program "expand"? I used to use that all the time.

I remember a detabify script from one of the O'Reilly books, but I can't seem to find a link now.
I have had to solve this problem as well, and I settled on this concise solution to detabify a line:
1 while $line =~ s/\t/" " x ($tablength - ($-[0] % $tablength))/e ;
In this regular expression $-[0] is the length of the "pre-matched" portion of the line -- the number of characters before the tab character.
As a one-liner:
perl -pe '1 while s/\t/" "x(4-($-[0]%4))/e' input

Whenever I want to expand tabs, I use Text::Tabs.

This can be easily done in vim:
:set expandtab
:retab
http://vim.wikia.com/wiki/Converting_tabs_to_spaces

Can't let vi be all alone here. Emacs:
M-x tabify
M-x untabify

I do this in vim with:
:%s/^V^I/ /g
(That's a literal ^V followed by a literal tab), and then :%gq to fix incorrect spacing. Perl is overkill.

The exact expression is:
1 while $line =~ s/\t/" " x ($tablength+1 - ($-[0] % $tablength))/e ;
And expand is useful for command line not inside a program which may expand or not some lines.

Related

Using an awk command inside a Perl script

This may not be the best way to do the below, so any comments are appreciated.
I'm currently tailing a number of log files and outputting them to screen, so that I get a quick overview of the system.
What I would like to do is to highlight different messages [INFO], [WARN] and [ERROR]
The following syntax works fine on the command line, but fails when being called from Perl
system ("tail -n 5 $ArchiverLog | awk '{if ($4 ~ /DEBUG/)print "\033[1;33m"$0"\033[0m"; else if ($6 ~ /ERROR/) print "\033[1;31m"$0"\033[0m"; else print $0}'");
I believe Perl can do this
Should I read in the file line by line, match on the words and print to screen (I only want the last 10 lines). Is that a better option?
I've also seen reference to a2p, which is an awk to Perl translator. Would that be people's preferred choice?
It seems crazy to use one powerful scripting language to call up another one so it can do something which the first one can do very well, so I would not persist with trying to call up awk from perl.
I have not had much experience with a2p, rather I tend to just translate such snippets by hand.
#!/usr/bin/perl
use strict;
foreach(`tail -n 5 $ArchiverLog`) {
my #f = split;
if ($f[4] =~ /DEBUG/) {
print "\033[1;33m$_\033[0m";
} elsif ($f[6] =~ /ERROR/) {
print "\033[1;31m$_\033[0m";
} else {
print $_;
}
}
(Hard to say if the above is completely correct without some sample input to test it with).
As Borodin says in the comments a more elegant solution would be to use a Tailing module from CPAN rather than calling up tail as a subprocess. But for a quick tool that might be overkill.
NB: if $ArchiverLog comes from anywhere you don't have control of, remember to sanitise it, otherwise you are creating a nice security hole.

Perl operator: $|++; dollar sign pipe plus plus

I'm working on a new version of an already released code of perl, and found the line:
$|++;
AFAIK, $| is related with pipes, as explained in this link, and I understand this, but I cannot figure out what the ++ (plus plus) means here.
Thank you in advance.
EDIT: Found the answer in this link:
In short: It forces to print (flush) to your console before the next statement, in case the script is too fast.
Sometimes, if you put a print statement inside of a loop that runs really really quickly, you won’t see the output of your print statement until the program terminates. sometimes, you don’t even see the output at all. the solution to this problem is to “flush” the output buffer after each print statement; this can be performed in perl with the following command:
$|++;
[update]
as has been pointed out by r. schwartz, i’ve misspoken; the above command causes print to flush the buffer preceding the next output.
$| defaults to 0; doing $|++ thus increments it to 1. Setting it to nonzero enables autoflush on the currently-selected file handle, which is STDOUT by default, and is rarely changed.
So the effect is to ensure that print statements and the like output immediately. This is useful if you're outputting to a socket or the like.
$| is an abbreviation for $OUTPUT_AUTOFLUSH, as you had found out. The ++ increments this variable.
$| = 1 would be the clean way to do this (IMHO).
It's an old idiom, from the days before IO::Handle. In modern code this should be written as
use IO::Handle;
STDOUT->autoflush(1);
It increments autoflush, which is most probably equivalent to turning it on.

How can I have Perl take input from STDIN one character at a time?

I am somewhat a beginner at Perl (compared to people here). I know enough to be able to write programs to do many things with through the command prompt. At one point, I decided to write a command prompt game that constructed a maze and let me solve it. Besides quality graphics, the only thing that it was missing was the ability for me to use the WASD controls without pressing enter after every move I made in the maze.
To make my game work, I want to be able to have Perl take a single character as input from STDIN, without requiring me to use something to separate my input, like the default \n. How would I accomplish this?
I have tried searching for a simple answer online and in a book that I have, but I didn't seem to find anything. I tried setting $/="", but that seemed to bypass all input. I think that there may be a really simple answer to my question, but I am also afraid that it might be impossible.
Also, does $/="" actually bypass input, or does it take input so quickly that it assumes there isn't any input if I'm not already pressing the key?
IO::Prompt can be used:
#!/usr/bin/env perl
use strict;
use warnings;
use IO::Prompt;
my $key = prompt '', -1;
print "\nPressed key: $key\n";
Relevant excerpt from perldoc -v '$/' related to setting $/ = '':
The input record separator, newline by default. This influences Perl's
idea of what a "line" is. Works like awk's RS variable, including
treating empty lines as a terminator if set to the null string (an empty line cannot contain any spaces or tabs).
The shortest way to achieve your goal is to use this special construct:
$/ = \1;
This tells perl to read one character at a time. The next time you read from any stream (not just STDIN)
my $char = <STREAM>;
it will read 1 character per assignment. From perlvar "Setting $/ to a reference to an integer, scalar containing an integer, or scalar that's convertible to an integer will attempt to read records instead of lines, with the maximum record size being the referenced integer number of characters."
If you are using *nix, you will find Curses useful.
It has a getch method that does what you want.
Term::TermKey also looks like a potential solution.
IO::Prompt is no longer maintained but IO::Prompter
has a nice example (quoted from that site):
use IO::Prompter;
# This call has no automatically added options...
my $assent = prompt "Do you wish to take the test?", -yn;
{
use IO::Prompter [-yesno, -single, -style=>'bold'];
# These three calls all have: -yesno, -single, -style=>'bold' options
my $ready = prompt 'Are you ready to begin?';
my $prev = prompt 'Have you taken this test before?';
my $hints = prompt 'Do you want hints as we go?';
}
# This call has no automatically added options...
scalar prompt 'Type any key to start...', -single;

Why is 'last' called 'last' in Perl?

What is the historical reason to that last is called that in Perl rather than break as it is called in C?
The design of Perl was influenced by C (in addition to awk, sed and sh - see man page below), so there must have been some reasoning behind not going with the familiar C-style naming of break/last.
A bit of history from the Perl 1.000 (released 18 December, 1987) man page:
[Perl] combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC|PLUS.)
The semantics of 'break' or 'last' are
defined by the language (in this case
Perl), not by you.
Why not think of 'last' as "this is
the last statement to run for the
loop".
It's always struck me as odd that the
'continue' statement in 'C' starts the
next pass of a loop. This is
definitely a strange use of the
concept of "continue". But it is the
semantics of 'C', so I accept it.
By trying to map particular
programming concepts into single
English words with existing meaning
there is always going to be some sort
of mismatching oddity
Source
Plus, Larry Wall is kinda weird. Have you seen his picture?
(source: wired.com)
I expect that this is because Perl was created by a linguist, not a computer scientist. In normal English usage, the concept of declaring that you have completed your final pass through a loop is more strongly connected to the word "last" ("this is the last pass") than to the word "break" ("break the loop"? "break out of the loop"? - it's not even clear how "break" is intended to relate to exiting the loop).
The term 'last' makes more sense when you remember that you can use it with more than just the immediate looping control. You can apply it to labeled blocks one or more levels above
the block it is in:
LINE: while( <> ) {
WORD: foreach ( split ) {
last LINE if /^__END__\z/;
...
}
}
It reads more naturally to say "last" in english when you read it as "last line if it matches ...".
Theres an additional reason you might want to consider:
Last does more than just loop control.
sub hello {
my ( $arg ) = #_;
scope: {
foo();
bar();
last if $arg > 4;
baz();
quux();
}
}
Last as such is a general flow control mechanism not limited to loops. While of course, you can generalise the above as a loop that runs at most 1 times, the absence of a loop to me indicates "Break? What are we breaking out of?"
Instead, I think of "last" as "Jump to the position of the last brace", which is for this purpose, more semantically sensible.
I was asking the same question to Damian Conway about say. Perl 6 will introduce say, which is nothing more than print that automatically adds a newline. My question was why not simply use echo, because this is what echo does in Bash (and probably elsewhere).
His answer was: echo is 33% longer than say.
He has a point there. :)
Because it goes to the last of the loop. And because Larry Wall was a weird guy.

How to truncate STDIN line length?

I've been parsing through some log files and I've found that some of the lines are too long to display on one line so Terminal.app kindly wraps them onto the next line. However, I've been looking for a way to truncate a line after a certain number of characters so that Terminal doesn't wrap, making it much easier to spot patterns.
I wrote a small Perl script to do this:
#!/usr/bin/perl
die("need max length\n") unless $#ARGV == 0;
while (<STDIN>)
{
$_ = substr($_, 0, $ARGV[0]);
chomp($_);
print "$_\n";
}
But I have a feeling that this functionality is probably built into some other tools (sed?) That I just don't know enough about to use for this task.
So my question sort of a reverse question: how do I truncate a line of stdin Without writing a program to do it?
Pipe output to:
cut -b 1-LIMIT
Where LIMIT is the desired line width.
Another tactic I use for viewing log files with very long lines is to pipe the file to "less -S". The -S option for less will print lines without wrapping, and you can view the hidden part of long lines by pressing the right-arrow key.
Not exactly answering the question, but if you want to stick with Perl and use a one-liner, a possibility is:
$ perl -pe's/(?<=.{25}).*//' filename
where 25 is the desired line length.
The usual way to do this would be
perl -wlne'print substr($_,0,80)'
Golfed (for 5.10):
perl -nE'say/(.{0,80})/'
(Don't think of it as programming, think of it as using a command line tool with a huge number of options.) (Yes, the python reference is intentional.)
A Korn shell solution (truncating to 70 chars - easy to parameterize though):
typeset -L70 line
while read line
do
print $line
done
You can use a tied variable that clips its contents to a fixed length:
#! /usr/bin/perl -w
use strict;
use warnings
use String::FixedLen;
tie my $str, 'String::FixedLen', 4;
while (defined($str = <>)) {
chomp;
print "$str\n";
}
This isn't exactly what you're asking for, but GNU Screen (included with OS X, if I recall correctly, and common on other *nix systems) lets you turn line wrapping on/off (C-a r and C-a C-r). That way, you can simply resize your terminal instead of piping stuff through a script.
Screen basically gives you "virtual" terminals within one toplevel terminal application.
use strict;
use warnings
use String::FixedLen;
tie my $str, 'String::FixedLen', 4;
while (defined($str = <>)) {
chomp;
print "$str\n";
}
Unless I'm missing the point, the UNIX "fold" command was designed to do exactly that:
$ cat file
the quick brown fox jumped over the lazy dog's back
$ fold -w20 file
the quick brown fox
jumped over the lazy
dog's back
$ fold -w10 file
the quick
brown fox
jumped ove
r the lazy
dog's bac
k
$ fold -s -w10 file
the quick
brown fox
jumped
over the
lazy
dog's back