Deparsing/Decomposing - step-by-step this obfuscated perl script - perl

As the title - please can anyone explain how the next scripts works
this prints the text: "Perl guys are smart"
''=~('(?{'.('])##^{'^'-[).*[').'"'.('-[)#{:__({:)[{(-:)^}'^'}>[,[]*&[[[[>[[#[[*_').',$/})')
this prints only "b"
use strict;
use warnings;
''=~('(?{'.('_/).+{'^'/]##_[').'"'.('=^'^'_|').',$/})')
the perl -MO=Deparse shows only this:
use warnings;
use strict 'refs';
'' =~ m[(?{print "b",$/})];
but havent any idea why... ;(
What is the recommended way decomposing like scripts? How to start?
so, tried this:
'' =~
(
'(?{'
.
(
'])##^{' ^ '-[).*['
)
.
'"'
.
(
'-[)#{:__({:)[{(-:)^}' ^ '}>[,[]*&[[[[>[[#[[*_'
)
.
',$/})'
)
several parts are concatenated by .. And the result of the bitwise ^ probably gives the text parts. The:
perl -e "print '-[)#{:__({:)[{(-:)^}' ^ '}>[,[]*&[[[[>[[#[[*_'"
prints "Perl guys are smart" and the first ^ generating "print".
But when, i rewrite it to:
'' =~
(
'(?{'
.
(
'print'
)
.
'"'
.
(
'Perl guys are smart'
)
.
',$/})'
)
My perl told me:
panic: top_env
Strange, first time i saw like error message...
Thats mean: it isn't allowed replace the 'str1' ^ 'str2' with the result, (don't understand why) and why the perl prints the panic message?
my perl:
This is perl 5, version 12, subversion 4 (v5.12.4) built for darwin-multi-2level
Ps: examples are generated here

In the line
.('_/).+{' ^ '/]##_[
when you evaluate ']' ^ '-', the result will be the letter p. ^ is a bitwise string operation, so after that we follow letter by letter to get result string.
Check my script, it works like your example. I hope it will help you.
use v5.14;
# actually we obfuscated print and your word + "
# it looks like that (print).'"'.(yor_word")
my $print = 'print';
my $string = 'special for stackoverflow by fxzuz"';
my $left = get_obfuscated($print);
my $right = get_obfuscated($string);
# prepare result regexp
my $result = "'' =~ ('(?{'.($left).'\"'.($right).',\$/})');";
say 'result obfuscated ' . $result;
eval $result;
sub get_obfuscated {
my $string = shift;
my #letters = split //, $string;
# all symbols like :,&? etc (exclude ' and \)
# we use them for obfuscation
my #array = (32..38, 40..47, 58..64, 91, 93..95, 123..126);
my $left_str = '';
my $right_str = '';
# obfuscated letter by letter
for my $letter (#letters) {
my #result;
# get right xor letters
for my $symbol (#array) {
# prepare xor results
my $result = ord $letter ^ $symbol;
push #result, { left => $result, right => $symbol } if $result ~~ #array;
}
my $rand_elem = $result[rand $#result];
$left_str .= chr $rand_elem->{left};
$right_str .= chr $rand_elem->{right};
}
my $obfuscated = "'$left_str' ^ '$right_str'";
say "$string => $obfuscated";
return $obfuscated;
}

The trick to understanding what's going on here is to look at the string being constructed by the XORs and concatenations:
(?{print "Perl guys are smart",$/})
This is an experimental regular expression feature of the form (?{ code }). So what you see printed to the terminal is the result of
print "Perl guys are smart",$/
being invoked by ''=~.... $/ is Perl's input record separator, which by default is a newline.

Related

how to extract the subset from a special character string using perl

I need to get the subset of a string starting from a specific start word and end before the specified word. Store in the string variable.
Example: pre-wrap">test-for??maths/camp
I need to fetch the subset.
Expected output: test-for??maths
After: pre-wrap"> or may be starting with: test
and before: /camp
I have no clue how to achieve this in Perl.
Here is the code I tried. The output is not coming as expected:
#!/usr/bin/perl
use warnings;
use strict;
my $string = 'pre-wrap">test-for??maths/camp';
my $quoted_substring = quotemeta($string);
my ($quoted_substring1) = split('/camp*', $quoted_substring);
my (undef, $substring2) = split('>\s*', $quoted_substring1);
print $string, "\n";
print $substring2, "\n";
Output:
$ perl test.pl
pre-wrap">test-for??maths/camp
test\-for\?\?maths\ # but why this \ is coming
The following code extracts the part between $before and $after (which may contain regex metacharacters, they are treated as pure characters inside the \Q...\E expressions):
my $string = 'pre-wrap">test-for??maths/camp';
my $before = 'pre-wrap">';
my $after = '/camp';
if ($string =~ /\Q$before\E(.*?)\Q$after\E/) {
print $1; # prints 'test-for??maths'
}
pre-wrap">test-for??maths/camp is in 'd',
perl -ne '/((?<=pre-wrap">)|(?<=>)(?=test))\S+(?=\/camp)/ ; print $&' d

Perl printf to use commas as thousands-separator

Using awk, I can print a number with commas as thousands separators.
(with a export LC_ALL=en_US.UTF-8 beforehand).
awk 'BEGIN{printf("%\047d\n", 24500)}'
24,500
I expected the same format to work with Perl, but it does not:
perl -e 'printf("%\047d\n", 24500)'
%'d
The Perl Cookbook offers this solution:
sub commify {
my $text = reverse $_[0];
$text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
return scalar reverse $text;
}
However I am assuming that since the printf option works in awk, it should also work in Perl.
The apostrophe format modifier is a non-standard POSIX extension.
The documentation for Perl's printf has this to say about such extensions
Perl does its own "sprintf" formatting: it emulates the C
function sprintf(3), but doesn't use it except for
floating-point numbers, and even then only standard modifiers
are allowed. Non-standard extensions in your local sprintf(3)
are therefore unavailable from Perl.
The Number::Format module will do this for you, and it takes its default settings from the locale, so is as portable as it can be
use strict;
use warnings 'all';
use v5.10.1;
use Number::Format 'format_number';
say format_number(24500);
output
24,500
A more perl-ish solution:
$a = 12345678; # no comment
$b = reverse $a; # $b = '87654321';
#c = unpack("(A3)*", $b); # $c = ('876', '543', '21');
$d = join ',', #c; # $d = '876,543,21';
$e = reverse $d; # $e = '12,345,678';
print $e;
outputs 12,345,678.
I realize this question was from almost 4 years ago, but since it comes up in searches, I'll add an elegant native Perl solution I came up with. I was originally searching for a way to do it with sprintf, but everything I've found indicates that it can't be done. Then since everyone is rolling their own, I thought I'd give it a go, and this is my solution.
$num = 12345678912345; # however many digits you want
while($num =~ s/(\d+)(\d\d\d)/$1\,$2/){};
print $num;
Results in:
12,345,678,912,345
Explanation:
The Regex does a maximal digit search for all leading digits. The minimum number of digits in a row it'll act on is 4 (1 plus 3). Then it adds a comma between the two. Next loop if there are still 4 digits at the end (before the comma), it'll add another comma and so on until the pattern doesn't match.
If you need something safe for use with more than 3 digits after the decimal, use this modification: (Note: This won't work if your number has no decimal)
while($num =~ s/(\d+)(\d\d\d)([.,])/$1\,$2$3/){};
This will ensure that it will only look for digits that ends in a comma (added on a previous loop) or a decimal.
Most of these answers assume that the format is universal. It isn't. CLDR uses Unicode information to figure it out. There's a long thread in How to properly localize numbers?.
CPAN has the CLDR::Number module:
#!perl
use v5.10;
use CLDR::Number;
use open qw(:std :utf8);
my $locale = $ARGV[0] // 'en';
my #numbers = qw(
123
12345
1234.56
-90120
);
my $cldr = CLDR::Number->new( locale => $locale );
my $decf = $cldr->decimal_formatter;
foreach my $n ( #numbers ) {
say $decf->format($n);
}
Here are a few runs:
$ perl comma.pl
123
12,345
1,234.56
-90,120
$ perl comma.pl es
123
12.345
1234,56
-90.120
$ perl comma.pl bn
১২৩
১২,৩৪৫
১,২৩৪.৫৬
-৯০,১২০
It seems heavyweight, but the output is correct and you don't have to allow the user to change the locale you want to use. However, when it's time to change the locale, you are ready to go. I also prefer this to Number::Format because I can use a locale that's different from my local settings for my terminal or session, or even use multiple locales:
#!perl
use v5.10;
use CLDR::Number;
use open qw(:std :utf8);
my #locales = qw( en pt bn );
my #numbers = qw(
123
12345
1234.56
-90120
);
my #formatters = map {
my $cldr = CLDR::Number->new( locale => $_ );
my $decf = $cldr->decimal_formatter;
[ $_, $cldr, $decf ];
} #locales;
printf "%10s %10s %10s\n" . '=' x 32 . "\n", #locales;
foreach my $n ( #numbers ) {
printf "%10s %10s %10s\n",
map { $_->[-1]->format($n) } #formatters;
}
The output has three locales at once:
en pt bn
================================
123 123 ১২৩
12,345 12.345 ১২,৩৪৫
1,234.56 1.234,56 ১,২৩৪.৫৬
-90,120 -90.120 -৯০,১২০
Here's an elegant Perl solution I've been using for over 20 years :)
1 while $text =~ s/(.*\d)(\d\d\d)/$1\.$2/g;
And if you then want two decimal places:
$text = sprintf("%0.2f", $text);
1 liner: Use a little loop whith a regex:
while ($number =~ s/^(\d+)(\d{3})/$1,$2/) {}
Example:
use strict;
use warnings;
my #numbers = (12321, 12.12, 122222.3334, '1234abc', '1.1', '1222333444555,666.77');
for(#numbers) {
my $number = $_;
while ($number =~ s/^(\d+)(\d{3})/$1,$2/) {}
print "$_ -> $number\n";
}
Output:
12321 -> 12,321
12.12 -> 12.12
122222.3334 -> 122,222.3334
1234abc -> 1,234abc
1.1 -> 1.1
1222333444555,666.77 -> 1,222,333,444,555,666.77
Pattern:
(\d+)(\d{3})
-> Take all numbers but the last 3 in group 1
-> Take the remaining 3 numbers in group2 on the beginning of $number
-> Followed is ignored
Substitution
$1,$2
-> Put a seperator sign (,) between group 1 and 2
-> The rest remains unchanged
So if you have 12345.67 the numers the regex uses are 12345. The '.' and all followed is ignored.
1. run (12345.67):
-> matches: 12345
-> group 1: 12,
group 2: 345
-> substitute 12,345
-> result: 12,345.67
2. run (12,345.67):
-> does not match!
-> while breaks.
Parting from #Laura's answer, I tweaked the pure perl, regex-only solution to work for numbers with decimals too:
while ($formatted_number =~ s/^(-?\d+)(\d{3}(?:,\d{3})*(?:\.\d+)*)$/$1,$2/) {};
Of course this assumes a "," as thousands separator and a "." as decimal separator, but it should be trivial to use variables to account for that for your given locale(s).
I used the following but it does not works as of perl v5.26.1
sub format_int
{
my $num = shift;
return reverse(join(",",unpack("(A3)*", reverse int($num))));
}
The form that worked for me was:
sub format_int
{
my $num = shift;
return scalar reverse(join(",",unpack("(A3)*", reverse int($num))));
}
But to use negative numbers the code must be:
sub format_int
{
if ( $val >= 0 ) {
return scalar reverse join ",", unpack( "(A3)*", reverse int($val) );
} else {
return "-" . scalar reverse join ",", unpack( "(A3)*", reverse int(-$val) );
}
}
Did somebody say Perl?
perl -pe '1while s/(\d+)(\d{3})/$1,$2/'
This works for any integer.
# turning above answer into a function
sub format_float
# returns number with commas..... and 2 digit decimal
# so format_float(12345.667) returns "12,345.67"
{
my $num = shift;
return reverse(join(",",unpack("(A3)*", reverse int($num)))) . sprintf(".%02d",int(100*(.005+($num - int($num)))));
}
sub format_int
# returns number with commas.....
# so format_int(12345.667) returns "12,345"
{
my $num = shift;
return reverse(join(",",unpack("(A3)*", reverse int($num))));
}
I wanted to print numbers it in a currency format. If it turned out even, I still wanted a .00 at the end. I used the previous example (ty) and diddled with it a bit more to get this.
sub format_number {
my $num = shift;
my $result;
my $formatted_num = "";
my #temp_array = ();
my $mantissa = "";
if ( $num =~ /\./ ) {
$num = sprintf("%0.02f",$num);
($num,$mantissa) = split(/\./,$num);
$formatted_num = reverse $num;
#temp_array = unpack("(A3)*" , $formatted_num);
$formatted_num = reverse (join ',', #temp_array);
$result = $formatted_num . '.'. $mantissa;
} else {
$formatted_num = reverse $num;
#temp_array = unpack("(A3)*" , $formatted_num);
$formatted_num = reverse (join ',', #temp_array);
$result = $formatted_num . '.00';
}
return $result;
}
# Example call
# ...
printf("some amount = %s\n",format_number $some_amount);
I didn't have the Number library on my default mac OS X perl, and I didn't want to mess with that version or go off installing my own perl on this machine. I guess I would have used the formatter module otherwise.
I still don't actually like the solution all that much, but it does work.
This is good for money, just keep adding lines if you handle hundreds of millions.
sub commify{
my $var = $_[0];
#print "COMMIFY got $var\n"; #DEBUG
$var =~ s/(^\d{1,3})(\d{3})(\.\d\d)$/$1,$2$3/;
$var =~ s/(^\d{1,3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3$4/;
$var =~ s/(^\d{1,3})(\d{3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3,$4$5/;
$var =~ s/(^\d{1,3})(\d{3})(\d{3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3,$4,$5$6/;
#print "COMMIFY made $var\n"; #DEBUG
return $var;
}
A solution that produces a localized output:
# First part - Localization
my ( $thousands_sep, $decimal_point, $negative_sign );
BEGIN {
my ( $l );
use POSIX qw(locale_h);
$l = localeconv();
$thousands_sep = $l->{ 'thousands_sep' };
$decimal_point = $l->{ 'decimal_point' };
$negative_sign = $l->{ 'negative_sign' };
}
# Second part - Number transformation
sub readable_number {
my $val = shift;
#my $thousands_sep = ".";
#my $decimal_point = ",";
#my $negative_sign = "-";
sub _readable_int {
my $val = shift;
# a pinch of PERL magic
return scalar reverse join $thousands_sep, unpack( "(A3)*", reverse $val );
}
my ( $i, $d, $r );
$i = int( $val );
if ( $val >= 0 ) {
$r = _readable_int( $i );
} else {
$r = $negative_sign . _readable_int( -$i );
}
# If there is decimal part append it to the integer result
if ( $val != $i ) {
( undef, $d ) = ( $val =~ /(\d*)\.(\d*)/ );
$r = $r . $decimal_point . $d;
}
return $r;
}
The first part gets the symbols used in the current locale to be used on the second part.
The BEGIN block is used to calculate the sysmbols only once at the beginning.
If for some reason there is need to not use POSIX locale, one can ommit the first part and uncomment the variables on the second part to hardcode the sysmbols to be used ($thousands_sep, $thousands_sep and $thousands_sep)

stripping off numbers and alphabetics in perl

I have an input variable, say $a. $a can be either number or string or mix of both.
My question is how can I strip off the variable to separate numeric digits and alphabetic characters?
Example;
$a can be 'AB9'
Here I should be able to store 'AB' in one variable and '9' in other.
How can I do that?
Check this version, it works with 1 or more numeric and alphabetic characters in a variable.
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $var = '11a';
my (#digits, #alphabetics);
while ($var =~ /([a-zA-Z]+)/g) {
push #alphabetics, $1;
}
while ($var =~ /(\d+)/g) {
push #digits, $1;
}
print Dumper(\#alphabetics);
print Dumper(\#digits);
Here's one way to express it very shortly:
my ($digits) = $input =~ /(\d+)/;
my ($alpha) = $input =~ /([a-z]+)/i;
say 'digits: ' . ($digits // 'none');
say 'non-digits: ' . ($alpha // 'none');
It's important to use the match operator in list context here, otherwise it would return if the match succeeded.
If you want to get all occurrences in the input string, simply change the scalar variables in list context to proper arrays:
my #digits = $input =~ /(\d+)/g;
my #alpha = $input =~ /([a-z]+)/gi;
say 'digits: ' . join ', ' => #digits;
say 'non-digits: ' . join ', ' => #alpha;
For my $input = '42AB17C', the output is
digits: 42, 17
non-digits: AB, C

Perl: Writing text into new line when a particular character is found

I have a big continuous text with characters like {, },//,; and white spaces in between. I want to read this text and write into new line wherever it finds these characters.
Input text is like :
apple{{mango } guava ; banana; // pear berry;}
Expected formatted output data should be as shown in image
apple
{
{
mango
}
guava ;
banana;
// pear
berry;
}
I want to do this in perl.Thanks in advance.
Of course you will have to adapt this for your needs (most notably loop while reading lines), but here is a way to do it that doesn't (really) rely on regexes. As others have said, this is a starting point, you may adapt to what you need.
#!/usr/bin/perl
use strict;
use warnings;
my $string = 'apple{{mango } guava ; banana; // pear berry;}';
my $new_string = join("\n", grep {/\S/} split(/(\W)/, $string));
print $new_string . "\n";
This splits the line into an array, splitting on non-word characters but keeps the element. Then greps for non-whitespace characters (removing array elements which contain whitespace). Then joins the array elements with newline characters into one string. From what your specification says you need // together, I leave that as an exercise to the reader.
Edit:
After looking at your request again, it looks like to have a specific but complicated structure that you are trying to parse. To do it correctly you may have to use something more powerful like the Regexp::Grammars module. It will take some learning, but you can define a very complicated set of parsing instructions to do exactly whatever you need.
Edit 2:
Since I have been looking for a reason to learn more about Regexp::Grammars, I took this opportunity. This is a basic example that I came up with. It prints the parsed data structure to a file named "log.txt". I know it doesn't look like the structure you asked for, but it contains all of that information and may be reconstituted however you like. I did so with a recursive function that is basically the opposite of the parser.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Regexp::Grammars;
my $grammar = qr{
<nocontext:>
<Line>
<rule: Line> <[Element]>*
<rule: Element> <Words> | <Block> | <Command> | <Comment>
<rule: Command> <[Words]> ;
<rule: Block> \{ <[Element]>* \}
<rule: Comment> // .*? \s{2,} #/ Syntax Highlighter fix
<rule: Words> (?:\b\w+\b) ** \s
}x;
my $string = 'apple{{mango kiwi } guava ; banana; // pear berry;}';
if ($string =~ $grammar) {
open my $log, ">", "log.txt";
print $log Dumper \%/; #/
print elements($/{Line}{Element});
} else {
die "Did not match";
}
sub elements {
my #elements = #{ shift() };
my $indent = shift || 0;
my $output;
foreach my $element (#elements) {
$output .= "\t" x $indent;
foreach my $key (keys %$element) {
if ($key eq 'Words') {
$output .= $element->{$key} . "\n";
} elsif ($key eq 'Block') {
$output .= "{\n" . elements($element->{$key}->{Element}, $indent + 1) . ("\t" x $indent) . "}\n";
} elsif ($key eq 'Comment') {
$output .= $element->{$key} . "\n";
} elsif ($key eq 'Command') {
$output .= join(" ", #{ $element->{$key}->{Words} }) . ";\n";
} elsif ($key eq 'Element') {
$output .= elements($element->{$key}, $indent + 1);
}
}
}
return $output;
}
Edit 3: In light of the comments from the OP, I have adapted the above example to allow for multiple words on the same line, as of right now those words can only be separated by one space. I also made comments match anything that starts in // and ends in two or more spaces. Also since I was making changes, and since I believe this to be a code pretty-printer, I added tabbing to the block formatter. If this isn't desired it should be easy enough to strip away. Go now and learn Regexp::Grammars and make it fit your specific case. (I know I should have made the OP do even this change, but I am enjoying learning it as well)
Edit 4: One more thing, if in fact you are trying to recover useful code from serialized to a single line code, your only real problem is extracting the line comments and separating them from the useful code (assuming you are using a whitespace ignoring language which it looks as though you are). If that is the case, then perhaps try this variation on my original code:
#!/usr/bin/perl
use strict;
use warnings;
my $string = 'apple{{mango } guava ; banana; // pear berry;}';
my $new_string = join("\n", split(/((?:\/\/).*?\s{2,})/, $string));
print $new_string . "\n";
whose output is
apple{{mango } guava ; banana;
// pear
berry;}
Your specification sucks. Sometimes you want a newline before and after. Sometimes you want a newline after. Sometimes you want a newline before. You have "pear" and "berry" on separate lines, but it does not meet any of the conditions in your spec.
The quality of an answer is directly proportional to the care given in composing the question.
With a careless question, you are likely to get a careless answer.
#!/usr/bin/perl
use warnings;
use strict;
$_ = 'apple{{mango } guava ; banana; // pear berry;}';
s#([{}])#\n$1\n#g; # curlies
s#;#;\n#g; # semicolons
s#//#\n//#g; # double slashes
s#\s\s+#\n#g; # 2 or more whitespace
s#\n\n#\n#g; # no blank lines
print;
Not exactly what you want, but imho for the start will be enough:
echo 'apple{{mango } guava ; banana; // pear berry;}' |\
perl -ple 's/(\b\w+\b)/\n$1\n/g'
will produce:
apple
{{
mango
}
guava
;
banana
; //
pear
berry
;}
You can start improving it...
As you said this is not homework, something like the following comes to mind:
my $keeps = qr#(//\s+\w+)#; #special tokens to keep (e.g., // perl)
my $breaks = qr#(\s+|\[|\]|\{|\})#; #simple tokens to split words at
while ( my $text = <> )
{
#tokens = grep /\S/, split( qr($keeps|$breaks), $text );
print join(".\n.", #tokens ), "\n";
}
You will have to work out the actual rules yourself.

Why aren't newlines being printed in this Perl code?

I have some simple Perl code:
#!/usr/bin/perl
use strict; # not in the OP, recommended
use warnings; # not in the OP, recommended
my $val = 1;
for ( 1 .. 100 ) {
$val = ($val * $val + 1) % 8051;
print ($val / 8050) . " \n";
}
But when I run it, the output is:
bash-3.2$ perl ./rand.pl
0.0002484472049689440.000621118012422360.003229813664596270.08409937888198760.92
... <snipped for brevity> ...
2919250.9284472049689440.3526708074534160.1081987577639750.2295652173913040.1839
751552795030.433540372670807bash-3.2$
Am I doing something wrong?
C:\> perldoc -f print:
Also be careful not to follow the
print keyword with a left parenthesis
unless you want the corresponding
right parenthesis to terminate the
arguments to the print--interpose a +
or put parentheses around all the
arguments.
Therefore, what you need is:
print( ($val / 8050) . "\n" );
or
print +($val / 8050) . "\n";
The statement you have prints the result of $val / 8050 and then concatenates "\n" to the return value of print and then discards the resulting value.
Incidentally, if you:
use warnings;
then perl will tell you:
print (...) interpreted as function at t.pl line 5.
Useless use of concatenation (.) or string in void context at t.pl line 5.
This is more of a comment than an answer, but I don't know how else to make it and the question is already answered anyway.
Note that using say instead of print neatly sidesteps the whole issue. That is,
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my $val = 1;
for ( 1 .. 100 ) {
$val = ($val * $val + 1) % 8051;
say ($val / 8050);
}
works as intended without the issue even coming up. I'm still amazed at how useful say is, given it's such a tiny difference.
It is possible that the line is interpreted as follows
(print($val / 8050)) . " \n";
i.e. the parentheses being used as delimiters for a function argument list, with the ."\n" being silently discarded. Try:
print( ($val/8050) . "\n" );