Can somebody explain this obfuscated perl regexp script? - perl

This code is taken from the HackBack DIY guide to rob banks by Phineas Fisher. It outputs a long text (The Sixth Declaration of the Lacandon Jungle). Where does it fetch it? I don't see any alphanumeric characters at all. What is going on here? And what does the -r switch do? It seems undocumented.
perl -Mre=eval <<\EOF
''
=~(
'(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})')
;$:="\."^ '~';$~='#'
|'(';$^= ')'^'[';
$/='`' |'.';
$,= '('
EOF

The basic idea of the code you posted is that each alphanumeric character has been replaced by a bitwise operation between two non-alphanumeric characters. For instance,
'`'|'%'
(5th line of the "star" in your code)
Is a bitwise or between backquote and modulo, whose codepoints are respectively 96 and 37, whose "or" is 101, which is the codepoint of the letter "e". The following few lines all print the same thing:
say '`' | '%' ;
say chr( ord('`' | '%') );
say chr( ord('`') | ord('%') );
say chr( 96 | 37 );
say chr( 101 );
say "e"
Your code starts with (ignore whitespaces which don't matter):
'' =~ (
The corresponding closing bracket is 28 lines later:
^'(').'"})')
(C-f this pattern to see it on the web-page; I used my editor's matching parenthesis highlighting to find it)
We can assign everything in between the opening and closing parenthesis to a variable that we can then print:
$x = '(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})';
print $x;
This will print:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
The remaining of the code is a bunch of assignments into some variables; probably here only to complete the pattern: the end of the star is:
$:="\."^'~';
$~='#'|'(';
$^=')'^'[';
$/='`'|'.';
$,='(';
Which just assigns simple one-character strings to some variables.
Back to the main code:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
This code is inside a regext which is matched against an empty string (don't forget that we had first '' =~ (...)). (?{...}) inside a regex runs the code in the .... With some whitespaces, and removing the string within the eval, this gives us:
# fetch an url from the web using curl _quitely_ (-s)
($: = `curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)
# replace end of paragraphs with newlines in the HTML fetched
=~ s|</p>|\n|g;
# Remove all HTML tags
$: =~ s/<.*?>//g;
# Print everything between SEXTA and Mexicano (+2 chars)
print $: =~ /(SEXTA.*?Mexicano..)/s
You can automate this unobfuscation process by using B::Deparse: running
perl -MO=Deparse yourcode.pl
Will produce something like:
'' =~ m[(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})];
$: = 'P';
$~ = 'h';
$^ = 'r';
$/ = 'n';
$, = '(';

Related

Add a new line in a variable using perl

I am trying to add a new line in a variable after certain number of words. For example: If we have a variable:
$x = "This a variable, start a new line here, This is a new line.";
If I print the above variable
print $x;
I should get the below output:
This is a variable,
start a new line here,
This is a new line.
How can I achieve this in Perl from the variable itself?
I do not agree to the formula "after certain number of words".
Note that the first target line has 4 words, whereas remaining 2 have
5 words each.
Actually you need to replace each comma and following sequence of
spaces (if any) with a comma and \n.
So the intuitive way to do it is:
$x =~ s/,\s*/,\n/g;
The simplest way is to split the string on comma followed by a space and then
join the word groups with a comma followed by a newline.
my $x = "This a variable, start a new line here, This is a new line.";
print join(",\n", split /, /, $x) . "\n";
output
This a variable,
start a new line here,
This is a new line.
For solving the general, how do I reformat this string with line breaks after n-columns? problem, use the Text::Wrap library (as suggested by #ikegami):
use Text::Wrap;
my $x = "The quick brown fox jumped over the lazy dog.";
$Text::Wrap::columns = 15;
# wrap() needs an array of words
my #words = split /\s+/, $x;
# Initial tab, subsequent tab values set to '' (think indent amount)
print wrap('', '', #words) . "\n";
output
The quick
brown fox
jumped over
the lazy dog.
You probably want to use regular expressions. You can do this:
$x =~ s/^(\S+\s+){3}\K/\n/;
Or if this is about the commas and not the spaces:
$x =~ s/^([^,]+,+){2}\s*\K/\n/;
(in this case I also remove any potential space that would be after the comma)
You can also configure separately how many words or comma you want, by putting this in a variable:
my $nbwords = 7; # add a line after the 7th word
$x =~ s/^(\S+\s+){$nbwords}\K/\n/;
Now, that would keep the last space so you may want to do this:
my $nbwords = 7; # add a line after the 7th word
$nbwords--; # becomes 6 because there is another word after that we match as well
$x =~ s/^(\S+\s+){$nbwords}\S+\K\s+/\n/;
You should probably learn to use Regexps but just to explain the above:
\s is any space character (like space, tab, line feed, etc)
\S (uppercase) is any character except a space character
+ means any number of characters of that type described with what is before. So \s+ means any number of consecutive space characters.
{123} means 123 times that type of character ...
{3,80} means 3 to 80 times. So + is equivalent to {1,} (one to unlimited)
\K means that whatever is before will not be replaced, only what is after.

Perl - Convert integer to text Char(1,2,3,4,5,6)

I am after some help trying to convert the following log I have to plain text.
This is a URL so there maybe %20 = 'space' and other but the main bit I am trying convert is the char(1,2,3,4,5,6) to text.
Below is an example of what I am trying to convert.
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
What I have tried so far is the following while trying to added into the char(in here) to convert with the chr($2)
perl -pe "s/(char())/chr($2)/ge"
All this has manage to do is remove the char but now I am trying to convert the number to text and remove the commas and brackets.
I maybe way off with how I am doing as I am fairly new to to perl.
perl -pe "s/word to remove/word to change it to/ge"
"s/(char(what goes in here))/chr($2)/ge"
Output try to achieve is
select -x1-Q-,-x2-Q-,-x3-Q-
Or
select%20-x1-Q-,-x2-Q-,-x3-Q-
Thanks for any help
There's too much to do here for a reasonable one-liner. Also, a script is easier to adjust later
use warnings;
use strict;
use feature 'say';
use URI::Escape 'uri_unescape';
my $string = q{select%20}
. q{char(45,120,49,45,81,45),char(45,120,50,45,81,45),}
. q{char(45,120,51,45,81,45)};
my $new_string = uri_unescape($string); # convert %20 and such
my #parts = $new_string =~ /(.*?)(char.*)/;
$parts[1] = join ',', map { chr( (/([0-9]+)/)[0] ) } split /,/, $parts[1];
$new_string = join '', #parts;
say $new_string;
this prints
select -x1-Q-,-x2-Q-,-x3-Q-
Comments
Module URI::Escape is used to convert percent-encoded characters, per RFC 3986
It is unspecified whether anything can follow the part with char(...)s, and what that might be. If there can be more after last char(...) adjust the splitting into #parts, or clarify
In the part with char(...)s only the numbers are needed, what regex in map uses
If you are going to use regex you should read up on it. See
perlretut, a tutorial
perlrequick, a quick-start introduction
perlre, the full account of syntax
perlreref, a quick reference (its See Also section is useful on its own)
Alright, this is going to be a messy "one-liner". Assuming your text is in a variable called $text.
$text =~ s{char\( ( (?: (?:\d+,)* \d+ )? ) \)}{
my #arr = split /,/, $1;
my $temp = join('', map { chr($_) } #arr);
$temp =~ s/^|$/"/g;
$temp
}xeg;
The regular expression matches char(, followed by a comma-separated list of sequences of digits, followed by ). We capture the digits in capture group $1. In the substitution, we split $1 on the comma (since chr only works on one character, not a whole list of them). Then we map chr over each number and concatenate the result into a string. The next line simply puts quotation marks at the start and end of the string (presumably you want the output quoted) and then returns the new string.
Input:
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
Output:
select%20"-x1-Q-","-x2-Q-","-x3-Q-"
If you want to replace the % escape sequences as well, I suggest doing that in a separate line. Trying to integrate both substitutions into one statement is going to get very hairy.
This will do as you ask. It performs the decoding in two stages: first the URI-encoding is decoded using chr hex $1, and then each char() function is translated to the string corresponding to the character equivalents of its decimal parameters
use strict;
use warnings 'all';
use feature 'say';
my $s = 'select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)';
$s =~ s/%(\d+)/ chr hex $1 /eg;
$s =~ s{ char \s* \( ( [^()]+ ) \) }{ join '', map chr, $1 =~ /\d+/g }xge;
say $s;
output
select -x1-Q-,-x2-Q-,-x3-Q-

Append string in the beginning and the end of a line containing certain string

all
I want to know how to append string in the beginning and the end of a line containing certain string using perl?
So for example, my line contains:
%abc %efd;
and I want to append 123 at the beginning of the line and 456 at the end of the line, so it would look like this:
123 %abc %efd 456
8/30/16 UPDATE--------------------------------
So far I have done something like this:
foreach file (find . -type f)
perl -ne 's/^\%abc\s+(\S*)/**\%abc $1/; print;' $file > tmp; mv tmp $file
end
foreach file (find . -type f)
perl -ne 's/$\%def\;\s+(\S*)/\%def\;**\n $1/; print;' $file > tmp; mv tmp $file
end
so this does pretty well except that when abc and def are not in one string.
for example:
%abc
something something something
%def
this would turn out to be
%abc
something something something
%def;
which is not what I want.
Thank you
In you case, you want to append string when line of file match the certain string, it means match and replace.
Firstly, read each line of your input file.
Secondly, check if it match with the string you want to append string into the beginning and the end.
Then replace the match string by the new string which contain additional beginning string, the match string and additional end string.
my $input_file = 'your file name here';
my $search_string = '%abc %efd';
my $add_begin = '123';
my $add_end = '456';
# Read file
open(my $IN, '<', $input_file) or die "cannot open file $input_file";
# Check each line of file
while (my $row = <$IN>) {
chomp $row;
$row =~ s/^($search_string)$/$add_begin $1 $add_end/g;
print $row."\n";
}
Try with input file as below:
%abc %efd
asdahsd
234234
%abc
%efd
%abc%efd
You will receive the result as we expected:
123 %abc %efd 456
asdahsd
234234
%abc
%efd
%abc%efd
Modify the code as your requirement and contact me if there's any issue.
Use m modifier to replacing beginning and ending with line by line.
s/^\%abc/123 $&/mg;
s/\%def$/ 456/mg;
Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string. source
Welcome to StackOverflow. We strive to help people solve problems in their existing code and learn languages, rather than simply answer one-off questions, the solutions to which can be easily found in 101 tutorials and documentation. The type of question you've posted doesn't leave a lot of room for learning, and doesn't do much to help future learners. It would help us greatly if you could post a more complete example, including what you've tried so far to get it working.
All that being said, there are two main ways to prepend and append to a string in Perl: 1. the concatenation operator, . and 2. string interpolation.
Concatenation
Use a . to join two strings together. You can chain operations together to compose a longer string.
my $str = '%abc %efd';
$str = '123 ' . $str . ' 456';
say $str; # prints "123 %abc %efd 456" with a trailing newline
Interpolation
Enclose a string in double quotes to instruct Perl to interpolate (i.e. find and evaluate) any Perl-style variables enclosed within the string.
my $str = '%abc %efd';
$str = "123 $str 456";
say $str; # prints "123 %abc %efd 456" with a trailing newline
You'll notice that in both examples we prepended and appended to the existing string. You can also create new variable(s) to hold the result(s) of these operations. Other methods of manipulating and building strings include the printf and sprintf functions, the substr function, the join function, and regular expressions, all of which you will encounter as you continue learning Perl.
As far as looking to see if a string contains a certain substring before performing the operation, you can use the index function or a regular expression:
if (index($str, '%abc %efd') >= 0) {
# or...
if ($str =~ /%abc %efd/) {
Remember to use strict; at the top of your Perl scripts and always (at least while you're learning) declare variables with my. If you're having trouble with the say function, you may need to add the statement use feature 'say'; to the top of your script.
You can find an index of excellent Perl tutorials at learn.perl.org. Good luck and have fun!
UPDATE Here is (I believe) a complete answer to your revised question:
find . -type f -exec perl -i.bak -pe's/^(%abc)\s+(\S*)\s+(%def;)$/**\1 \2 \3**/'
This will modify the files in place and create backup files with the extension .bak. Keep in mind that the expression \S* will only match non-whitespace characters; if you need to match strings that contain whitespace, you will need to update this expression (something like .*? might be workable for you).

Unpack and x option in TEMPLATE

I have a row which looks like (no new lines, this is one line and I replaces the spaces with _ as otherwise they are trimmed):
46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P
The the use of unpack perl method returns as follows:
unpack("x49 A4", $line); # where $line is the above example line
returns: CALL
unpack("x68 A4", $line);
returns: 0122
unpack("x238 A4", $line);
return: 2015
Apparently, the column numbers do not match with the number given after 'x' in the TEMPLATE, as x238 is not equal to column 238 ('0000'), I have '2015' on column 251, not 238. The same for the other.
Please, explain how exactly the numbers given after 'x' in TEMPLATE work.
Thank you
First of all, your data isn't what you said it is. The data you provided produces the desired result.
$ perl -E'
my $line = "46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P";
$line =~ s/_/ /g;
say unpack("x238 A4", $line);
'
0000
Maybe your actual data contained non-printable characters. Or maybe some of the spaces were actually tabs.
$ perl -E'
$_ = "46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P";
s/_/ /g;
s/LTD\K\s+/\t\t/; # If the spaces after LTD were tabs
say unpack("x238 A4", $_);
'
2015
If you want to extract the characters at based on how your viewer expands tabs, you will need to expand tabs to spaces in the same fashion before passing the string to unpack.

Perl regex replace first name last name with first name last initial

I want to have the output of $var below to be John D
my $var = "John Doe";
I have tried
$var =~ s/(.+\b.).+\z],'\1.'//g;
Here's a general solution (feel free to swap in '\w' where I used '.', and add a \s where I used \s+)
my $var = "John Doe";
(my $fname, my $linitial) = $var =~ /(.*)\s+(.).*/
Then you have the values
$fname = 'John';
$linitial = 'D';
and you can do:
print "$fname $linitial";
to get
"John D"
EDIT
Until you do your next match, each of the capture parentheses creates a variable ($1 and $2, respectively), so the whole thing can be shortened a bit as follows:
my $var = "John Doe";
$var =~ /(.*)\s+(.).*/
print "$1 $2";
To replace the last sequence of non-whitespace characters with just the initial character, you could write this
use strict;
use warnings;
my $var = "John Doe";
$var =~ s/(\S)\S*\s*$/$1/;
print $var;
output
John D
Assuming your string has ascii names this will work
$var =~ s/([a-zA-Z]+)\s([a-zA-Z]+)/$1." ".substr($2,0,1)/ge;
$var = "John Doe";
s/^(\w+)\s+(\w)/$1 \u$2/ for $var;
A simple regex that solves this problem is the substitution
s/^\w+\s+\K(\w).*/\U$1/s
What does this do?
^ \w+ \s+ matches a word at the beginning of the string, plus whitespace towards the next word
\K is the keep escape. It keeps the currently matched part outside of that substring that is considered “matched” by the regex engine. This avoids an extra capture group, and is practically a look-behind.
(\w) matches and captures one “word” character. This is the leading character of the second word in the string.
.* matches the rest of the string. I do this to overwrite any other names that may come: you stated that Lester del Ray should be transformed to Lester D, not Lester D Ray as a solution with \w* instead of the .* part would have done. The /s modifier is relevant for this, as it enables . to match every character including newlines (who knows what's inside the string?).
The substitution uses the \U modifier to uppercase the rest of the string, which consists of the value of the capture.
Test:
$ perl -E'$_ = shift; s/^\w+\s+\K(\w).*/\U$1/s; say' "Lester del Ray"
Lester D
$ perl -E'$_ = shift; s/^\w+\s+\K(\w).*/\U$1/s; say' "John Doe"
John D
Something like this might be a little more usable/reusable in the long run.
$initial = sub { return substr shift, 0, 1 ; };
make a get initial function
$var =~ s/(\w)\s+(\w)/&$initial($1) &$initial($2)/sge;
Then replace the first and second results using execute in the regex;