How to remove leading comment whitespace in Perl::Tidy? - perl

I'm just configuring Perl::Tidy to match my preference. I have only one issue left which I can't find a fix.
Sample script:
#!/usr/bin/perl
# | | | | | < "|" indicates first five "tabs" (1 tab 4 spaces).
use strict; # Enable strict programming mode.
use warnings; # Enable Perl warnings.
use utf8; # This is an UTF-8 encoded script.
1;
perltidyrc:
# Perl Best Practices (plus errata) .perltidyrc file
-l=76 # Max line width is 76 cols
-i=4 # Indent level is 4 cols
-ci=4 # Continuation indent is 4 cols
-et=4 # 1 tab represent 4 cols
-st # Output to STDOUT
-se # Errors to STDERR
-vt=2 # Maximal vertical tightness
-cti=0 # No extra indentation for closing brackets
-pt=0 # Medium parenthesis tightness
-bt=1 # Medium brace tightness
-sbt=1 # Medium square bracket tightness
-bbt=1 # Medium block brace tightness
-nsfs # No space before semicolons
-nolq # Don't outdent long quoted strings
-wbb="% + - * / x != == >= <= =~ < > | & **= += *= &= <<= &&= -= /= |= >>= ||= .= %= ^= x="
# Break before all operators
# extras/overrides/deviations from PBP
#--maximum-line-length=100 # be slightly more generous
--warning-output # Show warnings
--maximum-consecutive-blank-lines=2 # default is 1
--nohanging-side-comments # troublesome for commented out code
-isbc # block comments may only be indented if they have some space characters before the #
# for the up-tight folk :)
-pt=2 # High parenthesis tightness
-bt=2 # High brace tightness
-sbt=2 # High square bracket tightness
Result:
#!/usr/bin/perl
# | | | | | < "|" indicates first five "tabs" (1 tab 4 spaces).
use strict; # Enable strict programming mode.
use warnings; # Enable Perl warnings.
use utf8; # This is an UTF-8 encoded script.
1;
As you can see there is a leading space which causes that the "#" doesn't match the forth tab.
How to remove this leading space?

Perltidy is only able to change perl code, as it knows the meaning of perl code. Comments can contain entirely arbitrary data and as such perltidy cannot touch it. So, this kind of thing you'll have to resolve yourself.

Related

Can somebody explain this obfuscated perl regexp script?

This code is taken from the HackBack DIY guide to rob banks by Phineas Fisher. It outputs a long text (The Sixth Declaration of the Lacandon Jungle). Where does it fetch it? I don't see any alphanumeric characters at all. What is going on here? And what does the -r switch do? It seems undocumented.
perl -Mre=eval <<\EOF
''
=~(
'(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})')
;$:="\."^ '~';$~='#'
|'(';$^= ')'^'[';
$/='`' |'.';
$,= '('
EOF
The basic idea of the code you posted is that each alphanumeric character has been replaced by a bitwise operation between two non-alphanumeric characters. For instance,
'`'|'%'
(5th line of the "star" in your code)
Is a bitwise or between backquote and modulo, whose codepoints are respectively 96 and 37, whose "or" is 101, which is the codepoint of the letter "e". The following few lines all print the same thing:
say '`' | '%' ;
say chr( ord('`' | '%') );
say chr( ord('`') | ord('%') );
say chr( 96 | 37 );
say chr( 101 );
say "e"
Your code starts with (ignore whitespaces which don't matter):
'' =~ (
The corresponding closing bracket is 28 lines later:
^'(').'"})')
(C-f this pattern to see it on the web-page; I used my editor's matching parenthesis highlighting to find it)
We can assign everything in between the opening and closing parenthesis to a variable that we can then print:
$x = '(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})';
print $x;
This will print:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
The remaining of the code is a bunch of assignments into some variables; probably here only to complete the pattern: the end of the star is:
$:="\."^'~';
$~='#'|'(';
$^=')'^'[';
$/='`'|'.';
$,='(';
Which just assigns simple one-character strings to some variables.
Back to the main code:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
This code is inside a regext which is matched against an empty string (don't forget that we had first '' =~ (...)). (?{...}) inside a regex runs the code in the .... With some whitespaces, and removing the string within the eval, this gives us:
# fetch an url from the web using curl _quitely_ (-s)
($: = `curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)
# replace end of paragraphs with newlines in the HTML fetched
=~ s|</p>|\n|g;
# Remove all HTML tags
$: =~ s/<.*?>//g;
# Print everything between SEXTA and Mexicano (+2 chars)
print $: =~ /(SEXTA.*?Mexicano..)/s
You can automate this unobfuscation process by using B::Deparse: running
perl -MO=Deparse yourcode.pl
Will produce something like:
'' =~ m[(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})];
$: = 'P';
$~ = 'h';
$^ = 'r';
$/ = 'n';
$, = '(';

perl - matching at even positions and remove non-printable chars

I have a hex2string from database table dump that is like
"41424320202020200A200B000C"
what I want to do is to match at the even positions and detect the control chars that could break the string when printed.. i.e remove ascii null \x00, \n, \r, \f and \x80 to \xFF, etc..
I tried removing ascii null like
perl -e ' $x="41424320202020200A200B000C"; $x=~s/00//g; print "$x\n" '
but the result is incorrect as it removed 0 from trailing hex value of space \x20 and leading 0 of newline \x0A i.e 20 0A to 2A
414243202020202A2B0C
what i wanted is
414243202020202020
say unpack("H*", pack("H*", "41424320202020200A200B000C") =~ s/[^\t[:print:]]//arg);
or
my $hex = "41424320202020200A200B000C";
my $bytes = pack("H*", $hex);
$bytes =~ s/[^\t[:print:]]//ag;
$hex = unpack("H*", $bytes);
say $hex;
or
my $hex = "41424320202020200A200B000C";
my $bytes = pack("H*", $hex);
$bytes =~ s/[^\t\x20-\x7E]//g;
$hex = unpack("H*", $bytes);
say $hex;
Solutions using /a and /r require Perl 5.14+.
The above starts with the following string:
41424320202020200A200B000C
It is converted into the following using pack:
ABC␠␠␠␠␠␊␠␋␀␌
The substitution removes all non-ASCII and all non-printable characters except TAB, leaving us with the following:
ABC␠␠␠␠␠␠
It is converted into the following using unpack:
414243202020202020
This solution is not only shorter than the previous solutions, it is also faster because it allocates far fewer variables and only starts the regex match once.
detect the control chars that could break the string when printed.. i.e remove ascii null \x00, \n, \r, \f and \x80 to \xFF, etc..
Building on Hakon's answer (Which only strips out nul bytes, not all the other ones):
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
my $x="41424320202020200A200B000C";
say $x;
say grep { chr(hex($_)) =~ /[[:print:]\t]/ && hex($_) < 128 } unpack("(A2)*", $x);
gives you
41424320202020200A200B000C
414243202020202020
The character class [:print:] inside a character set matches all printable characters including space (but not control characters like newline and linefeed), and I added in tab as well. Then it also checks to make sure the byte is in the ASCII range (Since higher characters are still printable in many locales).
It is possible to work directly with the hex form of the characters, but it's far more complicated. I recommend against using this approach. This answer serves to illustrate why this solution wasn't proposed.
You wish to exclude all characters except the following:
ASCII printables (2016 to 7E16)
TAB (0916)
That means you wish to exclude the following characters:
0016 to 0816
0A16 to 1F16
7F16 to FF16
If we group these by leading digits, we get
0016 to 0816, 0A16 to 0F16
1016 to 1F16
7F16
8016 to FF16
We can therefore use the following:
$hex =~ s/\G(?:..)*?\K(?:0[0-8A-Fa-f]|7F|[189A-Fa-f].)//sg; # 5.10+
$hex =~ s/\G((?:..)*?)(?:0[0-8A-Fa-f]|7F|[189A-Fa-f].)/$1/sg; # Slower
You can try split the string into 2 bytes substrings using unpack:
my $x="41424320202020200A200B000C";
say $x;
say join '', grep { $_ !~ /00/} unpack "(A2)*", $x;
Output:
41424320202020200A200B000C
41424320202020200A200B0C

perltidy indentation on method calls with or operator

I am attempting to get perltidy to indent correctly. It works almost perfectly, but there are issues with some lines of code.
For example:
$foo = something()
or Foo->throw(
'a string which is longer than -l line length. Gets wrapped to next line, but not indented further than line above'
);
which should be:
$foo = something()
or Foo->throw(
'a string which is longer than -l line length. Gets wrapped to next line, but not indented further than line above'
);
Also, if a line break already exists, it does not get the indentation right:
$foo = something()
or Foo->throw(
'string'
);
should be:
$foo = something()
or Foo->throw(
'string'
);
Funnily enough, it gets it right if the function call contains a hashref...
The perltidyrc:
# Line
-l=78 # Max line width is 78 cols
-ole=unix # Unix line endings
# Indentation
-i=4 # Indent level is 4 cols
-ci=4 # Continuation indent is 4 cols
-dt=4 # Default tab size is 4 cols
-noll # Don't outdent long quoted strings or lines
# Comments
-iscl # Ignore inline comment (side comments) length
# Blank lines
-blbs=1 # Ensure a blank line before methods
-bbb # Ensure a blank line before blocks
-mbl=1 # Maximum consecutive blank lines
# Braces/parens/brackets
-nbl # Opening braces on same line (incl. methods)
-pt=0 # Low parenthesis tightness
-sbt=0 # Low square bracket tightness
-bt=0 # Low brace tightness
-bbt=0 # Low block brace tightness
# Semicolons
-nsfs # No space for semicolons within for loops
-nsts # No space before terminating semicolons
# Spaces / Tightness
-baao # Break after all operators
-bbao # Break before all operators
-cti=0 # No extra indentation for closing brackets
# General perltidy settings
-conv # Use as many iterations as necessary to beautify, until successive runs produce identical output (converge)
-b # Backup files and modify in-place
-se # Errors to STDERR
I've gone back and forth a lot with varying degrees of success, but not managed to get it exactly right. Any pointers?

Add a new line in a variable using perl

I am trying to add a new line in a variable after certain number of words. For example: If we have a variable:
$x = "This a variable, start a new line here, This is a new line.";
If I print the above variable
print $x;
I should get the below output:
This is a variable,
start a new line here,
This is a new line.
How can I achieve this in Perl from the variable itself?
I do not agree to the formula "after certain number of words".
Note that the first target line has 4 words, whereas remaining 2 have
5 words each.
Actually you need to replace each comma and following sequence of
spaces (if any) with a comma and \n.
So the intuitive way to do it is:
$x =~ s/,\s*/,\n/g;
The simplest way is to split the string on comma followed by a space and then
join the word groups with a comma followed by a newline.
my $x = "This a variable, start a new line here, This is a new line.";
print join(",\n", split /, /, $x) . "\n";
output
This a variable,
start a new line here,
This is a new line.
For solving the general, how do I reformat this string with line breaks after n-columns? problem, use the Text::Wrap library (as suggested by #ikegami):
use Text::Wrap;
my $x = "The quick brown fox jumped over the lazy dog.";
$Text::Wrap::columns = 15;
# wrap() needs an array of words
my #words = split /\s+/, $x;
# Initial tab, subsequent tab values set to '' (think indent amount)
print wrap('', '', #words) . "\n";
output
The quick
brown fox
jumped over
the lazy dog.
You probably want to use regular expressions. You can do this:
$x =~ s/^(\S+\s+){3}\K/\n/;
Or if this is about the commas and not the spaces:
$x =~ s/^([^,]+,+){2}\s*\K/\n/;
(in this case I also remove any potential space that would be after the comma)
You can also configure separately how many words or comma you want, by putting this in a variable:
my $nbwords = 7; # add a line after the 7th word
$x =~ s/^(\S+\s+){$nbwords}\K/\n/;
Now, that would keep the last space so you may want to do this:
my $nbwords = 7; # add a line after the 7th word
$nbwords--; # becomes 6 because there is another word after that we match as well
$x =~ s/^(\S+\s+){$nbwords}\S+\K\s+/\n/;
You should probably learn to use Regexps but just to explain the above:
\s is any space character (like space, tab, line feed, etc)
\S (uppercase) is any character except a space character
+ means any number of characters of that type described with what is before. So \s+ means any number of consecutive space characters.
{123} means 123 times that type of character ...
{3,80} means 3 to 80 times. So + is equivalent to {1,} (one to unlimited)
\K means that whatever is before will not be replaced, only what is after.

Perltidy autoformat hashref as parameter

I have the following code snippet:
my $obj = $class->new({
schema => $schema,
reminder => $reminder,
action => $action,
dt => $dt,
});
My problem is, that perltidy tries to format it into something, like this:
my $obj = $class->new(
{ schema => $schema,
reminder => $reminder,
action => $action,
dt => $dt,
}
);
I don't like the curly brace placement. Can I somehow configure perltidy to format it like the first example? (Skipping the formatting for the block is not an option. I want to format every longer hashref into that format, so it is more compact and readable)
My perltidyrc so far:
-l=79 # Max line width is 78 cols
-i=4 # Indent level is 4 cols
-ci=4 # Continuation indent is 4 cols
-st # Output to STDOUT
-se # Errors to STDERR
-vt=2 # Maximal vertical tightness
-cti=0 # No extra indentation for closing brackets
-pt=1 # Medium parenthesis tightness
-bt=1 # Medium brace tightness
-sbt=1 # Medium square bracket tightness
-bbt=1 # Medium block brace tightness
-nsfs # No space before semicolons
-nolq # Don't outdent long quoted strings
If I remove the '{}' and pass the parameters as a list, it does the right thing btw. But i have to pass a hashref.
Or could you recommend a sane way of formatting such code?
How about this option?
perltidy -lp -vt=2 -vtc=1
which yields
my $obj = $class->new( { schema => $schema,
reminder => $reminder,
action => $action,
dt => $dt,
} );
which is here http://perltidy.sourceforge.net/perltidy.html#line_break_control
Closing tokens (except for block braces) are controlled by -vtc=n, or
--vertical-tightness-closing=n, where
-vtc=0 always break a line before a closing token (default), -vtc=1
do not break before a closing token which is followed
by a semicolon or another closing token, and is not in
a list environment. -vtc=2 never break before a closing token.
EDIT
I suspect you were missing the -lp (line up parameters) option which is also needed for vertical tightness (-vt and -vtc)
The following seems to solve the above problem and works for me:
# perltidy configuration file created Thu Sep 24 15:54:07 2015
# using: -
# I/O control
--standard-error-output # -se
--nostandard-output # -nst
# Basic formatting options
--indent-columns=4 # -i=4 [=default]
--maximum-line-length=140 # -l=140
# Code indentation control
--closing-brace-indentation=0 # -cbi=0 [=default]
--closing-paren-indentation=0 # -cpi=0 [=default]
--closing-square-bracket-indentation=0 # -csbi=0 [=default]
--continuation-indentation=4 # -ci=4
--nooutdent-labels # -nola
--nooutdent-long-quotes # -nolq
# Whitespace control
--block-brace-tightness=1 # -bbt=1
--brace-tightness=1 # -bt=1 [=default]
--paren-tightness=2 # -pt=2
--nospace-for-semicolon # -nsfs
--square-bracket-tightness=1 # -sbt=1 [=default]
--square-bracket-vertical-tightness=0 # -sbvt=0 [=default]
# Comment controls
--ignore-side-comment-lengths # -iscl
--minimum-space-to-comment=2 # -msc=2
--static-side-comment-prefix="#" # -sscp="#"
--static-side-comments # -ssc
# Linebreak controls
--brace-vertical-tightness=0 # -bvt=0 [=default]
--paren-vertical-tightness=0 # -pvt=0 [=default]
--stack-closing-hash-brace # -schb
--stack-closing-paren # -scp
--stack-closing-square-bracket # -scsb
--stack-opening-hash-brace # -sohb
--stack-opening-paren # -sop
--stack-opening-square-bracket # -sosb
--want-break-before="% + - * / x != == >= <= =~ < > | & **= += *= &= <<= &&= -= /= |= + >>= ||= .= %= ^= x=" # -wbb="% + - * / x != == >= <= =~ < > | & **= += *= &= <<= &&= -= /= |= + >>= ||= .= %= ^= x="
# Blank line control
--noblanks-before-comments # -nbbc