How does this obfuscated Perl code work? - perl

How does this code work at all?
#!/usr/bin/perl
$i=4;$|=#f=map{("!"x$i++)."K$_^\x{0e}"}
"BQI!\\","BQI\\","BQI","BQ","B","";push
#f,reverse#f[1..5];#f=map{join"",undef,
map{chr(ord()-1)}split""}#f;{;$f=shift#
f;print$f;push#f,$f;select undef,undef,
undef,.25;redo;last;exit;print or die;}

Lets first put this through perltidy
$i = 5;
$| = #f = map { ("!" x $i++) . "9$_*\x{0e}" } ">>>E!)", ">>>E)", ">>>E", ">>>", ">>", ">", "";
push #f, reverse #f[ 1..5 ];
#f = map {
join "",
map { chr(ord() - 1) }
split //
} #f;
{
$f = shift #f;
print $f;
push #f, $f;
select undef, undef, undef, .25;
redo;
last;
exit;
print or die;
}
The first line is obvious.
The second line makes a list ">>>E!)", ">>>E)", ">>>E", ">>>", ">>", ">", "", and spaces them all to be equally long and appends an asterisk and a 'Shift Out' (the character after a carriage return).
The third line appends items 5 to 1 (in that order) to that list, , so it will be ">>>E!)", ">>>E)", ">>>E", ">>>", ">>", ">", "", ">", ">>", ">>>", ">>>E".
The map decrements the all characters by one, thus creating elements like 8===D ().
The second loop simply prints the elements in the list in a loop every 0.25 seconds. The carriage return causes them to overwrite each other, so that an animation is seen. The last couple of lines are never reached and thus bogus.

Data from the file is loaded into a program called a Perl interpreter. The interpreter parses the code and converts it to a series of "opcodes" -- a bytecode language that is sort of halfway between Perl code and the machine language that the code is running on. If there were no errors in the conversion process (called "compiling"), then the code is executed by another part of the Perl interpreter. During execution, the program may change various states of the machine, such as allocating, deallocating, reading, and writing to memory, or using the input/output and other features of the system.
(CW - More hardcore hackers than I are welcome to correct any errors or misconceptions and to add more information)

There's no magic going on here, just obfuscation. Let's take a high-level view. The first thing to notice is that later on, every character in strings is interpreted as if it were the previous character:
[1] map{chr(ord()-1)} ...
Thus, a string like "6qD" will result in "5rC" (the characters before '6', 'q', and 'D', respectively). The main point of interest is the array of strings near the beginning:
[2] ">>>E!)",">>>E)",">>>E",">>>",">>",">",""
This defines a sequence of "masks" that we will substitute later on, into this string:
[3] "9$_*\x{0e}"
They'll get inserted at the $_ point. The string \x{0e} represents a hex control character; notice that \x{0d}, the character just before it, is a carriage return. That's what'll get substituted into [3] when we do [1].
Before the [3] string is assembled, we prepend a number of ! equal to i to each element in [2]. Each successive element gets one more ! than the element before it. Notice that the character whose value is just before ! is a space .
The rest of the script iterates over each of the assembled array elements, which now look more like this:
[4] "!!!!!9>>>E!)\x{0e}", ---> " 8===D ("
"!!!!!!9>>>E)\x{0e}", ---> " 8===D("
"!!!!!!!9>>>E\x{0e}", ---> " 8===D"
"!!!!!!!!9>>>\x{0e}", ---> " 8==="
"!!!!!!!!!9>>\x{0e}", ---> " 8=="
"!!!!!!!!!!9>\x{0e}", ---> " 8="
"!!!!!!!!!!!9\x{0e}", ---> " 8"
Then the reverse operation appends the same elements in reverse, creating a loop.
At this point you should be able to see the pattern emerge that produces the animation. Now it's just a matter of moving through each step in the animation and back again, which is accomplished by the rest of the script. The timestep delay of each step is governed by the select statement:
[5] select undef, undef, undef, 0.25
which tells us to wait 250 milliseconds between each iteration. You can change this if you want to see it speed up or slow down.

Related

delete previous and next lines in perl

I have the following file:
#TWEETY:150:000000000-ACFKE:1:2104:27858:17965
AAATTAGCAAAAAACAATAACAAAACTGGGAAAATGCAATTTAACAACGAAAATTTTCCGAGAACTTGAAAGCGTACGAAAACGATACGCTCC
+
D1FFFB11FDG00EE0FFFA1110FAA1F/ABA0FGHEGDFEEFGDBGGGGFEHBFDDG/FE/EGH1#GF#F0AEEEEFHGGFEFFCEC/>EE
#TWEETY:150:000000000-ACFKE:1:1105:22044:20029
AAAAAATATTAAAACTACGAATGCATAAATTATTTCGTTCGAAATAAACTCACACTCGTAACATTGAACTACGCGCTCC
+
CCFDDDFGGGGGGGGGGHGGHHHHGHHHHHHHHHHHHHHHGHHGHHHHHHHHHHHHHGHGHGGHHHHHHGHHEGGGGGG
#TWEETY:150:000000000-ACFKE:1:2113:14793:7182
TATATAAAGCGAGAGTAGAAACTTTTTAATTGACGCGGCGAGAAAGTATATAGCAACAAGCGAGCACCCGCTCC
+
BBFFFFFGGGGFFGGFGHHHHHHHHHHHHHHHHHGGAEEEAFGGGHHFEGHHGHHHHHGHHGGGGFHHGG?EEG
#TWEETY:150:000000000-ACFKE:1:2109:5013:22093
AAAAAAATAATTCATATCGCCATATCGACTGACAGATAATCTATCTATAATCATAACTTTTCCCTCGCTCC
+
DAFAADDGF1EAGG3EG3A00ECGDFFAEGFCHHCAGHBGEAGBFDEDGGHBGHGFGHHFHHHBDG?/FA/
#TWEETY:150:000000000-ACFKE:1:2106:25318:19875
+
CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
The lines are in groups of four (each time there is a name, starting with #TWEETY, a string of letters, a + character, and another string of letters).
The second and fourth lines should have the same number of characters.
But there are cases where the second line is empty, as in the last four lines.
In these cases, I would like to get rid of the whole block (the previous line before the empty line and the next two lines).
I have just started perl and have been trying to write a script for my problem, but am having a hard time. Does anyone have some feedback?
Thanks!
Keep an array buffer of the last four lines. When it's full, check the second line, print the lines or not, empty the buffer, repeat.
#!/usr/bin/perl
use warnings;
use strict;
my #buffer;
sub output {
print #buffer unless 1 == length $buffer[1];
#buffer = ();
}
while (<>) {
if (4 == #buffer) {
output();
}
push #buffer, $_;
}
output(); # Don't forget to process the last four lines.
Yes. Start with looking at $/ and set it so you can work on a chunk at a time. I would suggest you can treat # as a record separator in your example.
Then iterate your records using a while loop. E.g. while ( <> ) {
Use split on \n to turn the current chunk into an array of lines.
Perform your test on the appropriate lines, and either print - or not - depending on whether it passed.
If you get stuck with that, then I'm sure a specific question including your code and where you're having problems will be well received here.
If you chunk the data correctly, this becomes almost trivial.
#!/usr/bin/perl
use strict;
use warnings;
# Use '#TWEETY' as the record separator to make it
# easy to chunk the data.
local $/ = '#TWEETY';
while (<DATA>) {
# The first entry will be empty (as the separator
# is the first thing in the file). Skip that record.
next unless /\S/;
# Skip any records with two consecutive newlines
# (as they will be the ones with the empty line 2)
next if /\n\n/;
# Print the remaining records
# (with $/ stuck back on the front)
print "$/$_";
}
__DATA__
#TWEETY:150:000000000-ACFKE:1:2104:27858:17965
AAATTAGCAAAAAACAATAACAAAACTGGGAAAATGCAATTTAACAACGAAAATTTTCCGAGAACTTGAAAGCGTACGAAAACGATACGCTCC
+
D1FFFB11FDG00EE0FFFA1110FAA1F/ABA0FGHEGDFEEFGDBGGGGFEHBFDDG/FE/EGH1#GF#F0AEEEEFHGGFEFFCEC/>EE
#TWEETY:150:000000000-ACFKE:1:1105:22044:20029
AAAAAATATTAAAACTACGAATGCATAAATTATTTCGTTCGAAATAAACTCACACTCGTAACATTGAACTACGCGCTCC
+
CCFDDDFGGGGGGGGGGHGGHHHHGHHHHHHHHHHHHHHHGHHGHHHHHHHHHHHHHGHGHGGHHHHHHGHHEGGGGGG
#TWEETY:150:000000000-ACFKE:1:2113:14793:7182
TATATAAAGCGAGAGTAGAAACTTTTTAATTGACGCGGCGAGAAAGTATATAGCAACAAGCGAGCACCCGCTCC
+
BBFFFFFGGGGFFGGFGHHHHHHHHHHHHHHHHHGGAEEEAFGGGHHFEGHHGHHHHHGHHGGGGFHHGG?EEG
#TWEETY:150:000000000-ACFKE:1:2109:5013:22093
AAAAAAATAATTCATATCGCCATATCGACTGACAGATAATCTATCTATAATCATAACTTTTCCCTCGCTCC
+
DAFAADDGF1EAGG3EG3A00ECGDFFAEGFCHHCAGHBGEAGBFDEDGGHBGHGFGHHFHHHBDG?/FA/
#TWEETY:150:000000000-ACFKE:1:2106:25318:19875
+
CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Thanks everyone for the feedback!
It was all really useful. Thanks to your suggestions, I explored all the options and learned the unless statement.
The easiest solution given my existing code, was just to add an unless statement at the end.
### Write to output, but remove non-desired Gs
open OUT, ">$outfile";
my #accorder = #{$store0{"accorder"}};
foreach my $acc (#accorder){
# retrieve seq(2nd line) and qual(4th line)
my $seq = $store0{$acc}{"seq"};
my $qual = $store0{$acc}{"qual"};
# clean out polyG at end
$seq =~ s/G{3,}.{0,1}$//;
my $lenseq = length($seq);
my $lenqual = length($qual);
my $startqual = $lenqual - $lenseq;
$qual = substr($qual, 0, $lenseq);
#the above was in order to remove multiple G characters at the end of the
#second line, which is what led to empty lines (lines that were made up of
#only Gs got cut out)
# print to output, unless sequence has become empty
unless($lenseq == 0){ #this is the unless statement I added
print OUT "\#$acc\n$seq\n+\n$qual\n";
}
}
close(OUT);

why the square bracket next to another square bracket is considered one single character?

my $str = "[[SELECT * FROM Personnel]]";
print substr($str, 0, 1); #that return [[ not [
print substr($str, 1, 6); #that return SELECT
Take the two square bracket like only one, this is my html return example:
[[SELECT
[[ is the first print. That is wrong must be only one [.
SELECT is the second print
When I run into these problems, I break the problem down and ensure I have ways to see where each part of the output comes from. My process through this particular problem would have gone something like this:
First, get rid of parts of the program until I stop seeing the problem. Once I've done that, I've probably found the point that's causing the problem since it's the bit I just removed. I would have removed the second output statement. Here I use Perl 5.10's say which appends a newline automatically:
#!/usr/bin/perl
use v5.10;
my $str = "[[SELECT * FROM Personnel]]";
say substr($str, 0, 1); #that return [[ not [
If I run that, I see that it's not that output statement that's giving the doubled ].
Now I put the next bit of code back and keep doing that until I see the problem again. I know that I removed the problem and got closer to the problem, but now it's time to zero in on the statement that causes it.
In this case, I have two output statements together that show the problem. I want to see what output comes from each statement. Adding some text around the output and numbering the strings can show me which statement did what:
#!/usr/bin/perl
use v5.10;
my $str = "[[SELECT * FROM Personnel]]";
say "1. <" . substr($str, 0, 1) . ">";
say "2. <" . substr($str, 1, 6) . ">";
Now I see what each say did:
1. <[>
2. <[SELEC>
Your first print statement only prints [, you just can't see it because you are not printing a newline. Your second print statement outputs [SELECT which is expected from your inputs to substr:
my $str = "[[SELECT * FROM Personnel]]";
print substr($str, 0, 1),"\n";
print substr($str, 1, 6),"\n";
output:
[
[SELEC

Correct use of input file in perl?

database.Win.txt is a file that contains a multiple of 3 lines. The second of every three lines is a number. The code is supposed to print out the three lines (in a new order) on one line separated by tabs, but only if the second line is 1.
Am I, by this code, actually getting the loop to create an array with three lines of database.Win.txt each time it runs through the loop? That's my goal, but I suspect this isn't what the code does, since I get an error saying that the int() function expects a numeric value, and doesn't find one.
while(<database.Win.txt>){
$new_entry[0] = <database.Win.txt>;
$new_entry[1] = <database.Win.txt>;
$new_entry[2] = <database.Win.txt>;
if(int($new_entry[1]) == 1) {
chomp($new_entry);
print "$new_entry[1], \t $new_entry[2], \t $new_entry[0], \n"
}
}
I am a total beginner with Perl. Please explain as simply as possible!
I think you've got a good start on the solution. However, your while reads one line right before the next three lines are read (if those were <$file_handles>). int isn't necessary, but chomp is--before you check the value of $new_entry[1] else there's still a record separator at the end.
Given this, consider the following:
use strict;
use warnings;
my #entries;
open my $fh, '<', 'database.Win.txt' or die $!;
while (1) {
last if eof $fh;
chomp( $entries[$_] = <$fh> ) for 0 .. 2;
if ( $entries[1] == 1 ) {
print +( join "\t", #entries ), "\n";
}
}
close $fh;
Always start with use strict; use warnings. Next, open the file using the three-argument form of open. A while (1) is used here, so three lines at a time can be read within the while loop. Since it's an 'infinite' while loop, the last if eof $fh; gives a way out, viz., if the next file read produces an end of file, it's the last. Right below that is a for loop that effectively does what you did: assign a file line to an array position. Note that chomp is used to remove the record separator during the assignment. The last part is also similar to yours, as it checks whether the second of the three lines is 1, and then the line is printed if it is.
Hope this helps!

Skipping particular positions in a string using substitution operator in perl

Yesterday, I got stuck in a perl script. Let me simplify it, suppose there is a string (say ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD), first I've to break it at every position where "E" comes, and secondly, break it specifically where the user wants to be at. But, the condition is, program should not cut at those sites where E is followed by P. For example there are 6 Es in this sequence, so one should get 7 fragments, but as 2 Es are followed by P one will get 5 only fragments in the output.
I need help regarding the second case. Suppose user doesn't wants to cut this sequence at, say 5th and 10th positions of E in the sequence, then what should be the corresponding script to let program skip these two sites only? My script for first case is:
my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD';
$otext=~ s/([E])/$1=/g; #Main cut rule.
$otext=~ s/=P/P/g;
#output = split( /\=/, $otext);
print "#output";
Please do help!
To split on "E" except where it's followed by "P", you should use Negative look-ahead assertions.
From perldoc perlre "Look-Around Assertions" section:
(?!pattern)
A zero-width negative look-ahead assertion.
For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar".
my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD';
# E E EP E EP E
my #output=split(/E(?!P)/, $otext);
use Data::Dumper; print Data::Dumper->Dump([\#output]);"
$VAR1 = [
'ABCD',
'ABCD',
'ABCDEPABCD',
'ABCDEPABCD',
'ABCD'
];
Now, in order to NOT cut at occurences #2 and #4, you can do 2 things:
Concoct a really fancy regex that automatically fails to match on given occurence. I will leave that to someone else to attempt in an answer for completeness sake.
Simply stitch together the correct fragments.
I'm too brain-dead to come up with a good idiomatic way of doing it, but the simple and dirty way is either:
my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
my #output_final;
for(my $i=0; $i < #output; $i++) {
if ($no_cuts{$i}) {
$output_final[-1] .= $output[$i];
} else {
push #output_final, $output[$i];
}
}
print Data::Dumper->Dump([\#output_final];
$VAR1 = [
'ABCD',
'ABCDABCDEPABCD',
'ABCDEPABCDABCD'
];
Or, simpler:
my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
for(my $i=0; $i < #output; $i++) {
$output[$i-1] .= $output[$i];
$output[$i]=undef; # Make the slot empty
}
my #output_final = grep {$_} #output; # Skip empty slots
print Data::Dumper->Dump([\#output_final];
$VAR1 = [
'ABCD',
'ABCDABCDEPABCD',
'ABCDEPABCDABCD'
];
Here's a dirty trick that exploits two facts:
normal text strings never contain null bytes (if you don't know what a null byte is, you should as a programmer: http://en.wikipedia.org/wiki/Null_character, and nb. it is not the same thing as the number 0 or the character 0).
perl strings can contain null bytes if you put them there, but be careful, as this may screw up some perl internal functions.
The "be careful" is just a point to be aware of. Anyway, the idea is to substitute a null byte at the point where you don't want breaks:
my $s = "ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD";
my #nobreak = (4,9);
foreach (#nobreak) {
substr($s, $_, 1) = "\0";
}
"\0" is an escape sequence representing a null byte like "\t" is a tab. Again: it is not the character 0. I used 4 and 9 because there were E's in those positions. If you print the string now it looks like:
ABCDABCDABCDEPABCDEABCDEPABCDEABCD
Because null bytes don't display, but they are there, and we are going to swap them back out later. First the split:
my #a = split(/E(?!P)/, $s);
Then swap the zero bytes back:
$_ =~ s/\0/E/g foreach (#a);
If you print #a now, you get:
ABCDEABCDEABCDEPABCD
ABCDEPABCD
ABCD
Which is exactly what you want. Note that split removes the delimiter (in this case, the E); if you intended to keep those you can tack them back on again afterward. If the delimiter is from a more dynamic regex it is slightly more complicated, see here:
http://perlmeme.org/howtos/perlfunc/split_function.html
"Example 9. Keeping the delimiter"
If there is some possibility that the #nobreak positions are not E's, then you must also keep track of those when you swap them out to make sure you replace with the correct character again.

How do I build a 2d matrix using STDIN in Perl?

How do I build a 2d matrix using STDIN?
If I input a matrix like so:
1 2 3
4 5 6
7 5 6
7 8 9
4 5 6
3 3 3
how do I input this and create two matrices out of this?
Here's my code so far
while (defined ($a=<STDIN>)) {
chomp ($a);
push #a,($a);
}
This is just for the input.
My understanding is I can just add each row to a stack. When the matrices are all put in I can take each line, break by space to create an array. I then need to create an array reference and push this reference into an array to create my matrix. How the heck do I do this? Is there an easier way? I've been bashing my head on this for 3 days now. I feel pretty damn stupid right now...
Let's make that code you have a little more Perl-y, and we'll do everything you need done in one pass:
my #a = ();
while(<>) {
push #a, [ split ];
}
This is taking a lot out of your answer, so I'll opt to explain it, rather than aiming for John Wayne-like answering reflexes. We'll start with your line here:
while(defined(my $a = <STDIN>))
Perl users know that many loops will implicitly use the $_ variable. If you need lots of nested loops, you should avoid using that variable, and use well-named variables for each level of looping, but in this case we only have one level, so let's go ahead and use it:
while(defined($_ = <STDIN>))
Now, Perl is kind enough to understand that we want to test for defined()ness a lot, so it will allow us to shorten that to this:
while(<STDIN>)
This is implicitly translated by Perl as assigning the line read to $_ and returning true as long as the result is defined (and therefore until end-of-file occurs). However, Perl gives us one more trick:
while(<>)
This will loop over STDIN or, if arguments are given on the command line, it will open those as files and loop over them. So this still reads from STDIN:
./myscript.pl
But we can also read from one or more files:
./myscript.pl myfile [myfile2 [myfile3 ...]]
It's easier and more intuitive than using the shell to do the same (though this will still work):
cat myfile [myfile2 [myfile3 ...]] | ./myscript.pl
If you don't want this behavior, you can change it back to <STDIN>, but consider keeping it.
The loop is:
push #a, [ split ];
First, split() with no arguments is identical to split /\s+/, $_ (i.e. it splits the $_ string on occurrences of whitespace characters), and due to the subtleties of split empty trailing fields are removed, so a chomp() is unnecessary. Then, [] creates an anonymous array reference (which, in this case, contains the contents of our split $_ string). Then, we push that array reference onto #a. Simple as pie, you now have a two-dimensional matrix from your standard input.
Try this:
use strict;
use warnings;
use Data::Dumper;
my #matrix;
while (my $line = <>) {
chomp $line;
my #row = split /\s+/, $line, 3;
push #matrix, \#row;
}
print Dumper(\#matrix);
Instead of using <STDIN> explicitly, you can read from either stdin or a piped file with <>.
Inputting one matrix gives the result:
$VAR1 = [
[
'1',
'2',
'3'
],
[
'4',
'5',
'6'
],
[
'7',
'8',
'9'
]
];
From here you should be able to see what you need to do to read in two matrices.
The other answers seem to be missing the requirement to read multiple matrices from the same input, breaking on a blank line. There are a few different ways to go about this, including frobbing $/, but here's one that appeals to me.
# Read a matrix from a handle, with columns delimited by whitespace
# and rows delimited by newlines. A matrix ends at a blank line
# (which is consumed) or EOF.
sub read_matrix_from {
my ($handle) = #_;
my #out;
while (<$handle>) {
last unless /\S/;
push #out, [ split ];
}
return \#out;
}
my #matrices;
push #matrices, read_matrix_from(\*ARGV) until eof();
Season the last part to taste, of course -- you might be using an explicitly opened handle instead of ARGV magic, and you might know in advance how many things you're reading instead of going to EOF, etc.