Having a code:
Line1
Line2
..
LineN
I want to automatically produce
Line1
JunkLines2-100
Line2
JunkLines102-200
Line3
etc.
where JunkLines are Perl junk code that does not change state of program but looks like continuation of legitimate program.
I don't want obfuscate my code with obfuscators that, say, rename variable names to something unreadable - this unreadability is a signal that code is obfuscated.
I don't understand why so many downvotes: this looks like an interesting problem.
I had some spare time and tried something on my own:
Garbage code is in a separate file with blocks separated by #---
Ofuscator is changing names of variables and functions in garbage code if this exists in original good code (I assumed you define variables with my and functions with sub)
Ofusctaor is randomly adding content from garbage file (looping if needed)
That looks like this
Good code (original.pl)
#!perl
use strict;
my $var1='hello';
my $var2='there';
sub f {
print "This is a function";
++$var1
}
print $var1;
print "\n$var2";
Garbage code (garbage_code.pl)
#!perl
#---
my $var2='__NEW';
my $var3="This is garbage";
#---
sub g {
$var3+="changed";
print "This shall be removed";
return $var3;
}
#---
sub f {
return g(shift).$var2;
}
#---
f("function name shall be changed");
Offuscator
#!perl
use strict;
use Data::Dumper;
#-Get Original 'good' code
open CODE,"<original.pl" or die $!;
my #code=<CODE>;
close CODE;
#-Get Garbage code
open GARBAGE,"<garbage_code.pl" or die $!;
my #garbage=split(/^#---/m, join("",<GARBAGE>));
shift #garbage; #Remove header
map { s/^.*?(\w.*?)\s*$/\1/s } #garbage; #Trail spaces and EOL at beginning and end
map { s/print .*?;//g } #garbage; #Remove print calls
close GARBAGE;
#-List variables and functions in good code
my %list_var;
my %list_func;
for my $line (#code) {
if ($line=~/my \s*[\$#%](\w+)/) { $list_var{$1}=undef; }
elsif ($line=~/sub \s*(\w+)/) { $list_func{$1}=undef; }
else { }
}
#-List variables and functions in garbage code
my #list_var_garbage;
my #list_func_garbage;
for my $line (#garbage) {
while ($line=~/my \s*[\$#%](\w+)/g) { push(#list_var_garbage,$1); }
while ($line=~/sub \s*(\w+)/g) { push(#list_func_garbage,$1); }
}
#-Replace names of variables and functions in garbage code if it exists in good code
#Get equivalent name
for my $type ('var', 'func') {
my $rh_list = ($type eq 'var' ? \%list_var : \%list_func);
my #list_names=(keys %$rh_list, ($type eq 'var' ? #list_var_garbage : #list_func_garbage));
for my $name (#list_names) {
#Get new name
my $new_name=$name;
#For names of good code OR new names in garbage code
if (!defined $rh_list->{$new_name}) {
while (exists $rh_list->{$new_name}) { $new_name.="1"; }
#Store in hash table
$rh_list->{$new_name}=undef; #to ensure uniqueness of replacements
$rh_list->{$name}=$new_name; #Replacement name in garbage code
} else {}
}
}
#Replace
map { s/(?:sub \s*|&)\K(\w+)/$list_func{$1}/g } #garbage;
map { s/(\w+)\(/$list_func{$1}(/g } #garbage;
map { s/[\$#%]\K(\w+)/$list_var{$1}/g } #garbage;
#-Function to get garbage content
my $i_garbage=0;
sub get_garbage {
return $garbage[ ($i_garbage++) % scalar(#garbage) ]."\n";
}
#-Copy garbage in good code
my #new_code;
for my $line (#code) {
#-Add the line
push(#new_code, $line);
#-Add garbage
#Blocks: add garbage at the end
if ($line=~/\{/ .. $line=~/\}/) {
if ($line=~/\}/) { push(#new_code, get_garbage()); }
#Other: randomly add garbage
} else {
if (int(rand(2))) { push(#new_code, get_garbage()); }
}
}
#Print file with new code
open NEW_CODE, ">new.pl" or die $!;
print NEW_CODE #new_code;
close NEW_CODE;
Result
#!perl
use strict;
my $var21='__NEW';
my $var3="This is garbage";
sub g {
$var3+="changed";
return $var3;
}
my $var1='hello';
sub f1 {
return g(shift).$var21;
}
my $var2='there';
f1("function name shall be changed");
sub f {
print "This is a function";
++$var1
}
my $var21='__NEW';
my $var3="This is garbage";
sub g {
$var3+="changed";
return $var3;
}
print $var1;
print "\n$var2";
sub f1 {
return g(shift).$var21;
}
f1("function name shall be changed");
It probably does not take into account all cases but this is definitely a good working prototype.
Cheers
I'm not sure if you're receiving negative votes from lack of information or what, but this is a relatively easy fix. Assuming you want to do it in Perl:
just create another perl file which reads in your file one line at a time,prints it out to a new file, and appends a line of random nonsense afterwords (in this case, from a large file which is assumed to contain a bunch of random lines of perl code "obfuscatedlines.txt")
A small example:
use strict;
use warnings;
my #randomlines;
open (INHANDLE, "<perlfile.pl");
open (OBHANDLE, "obfuscatedlines.txt");
open (OUTHANDLE, ">newfile.pl");
while(<OBHANDLE>) {push #randomlines, $_;}
my $i=0;
while(<INHANDLE>)
{
print OUTHANDLE $_;
print OUTHANDLE $randomlines[$i];
$i++;
}
Related
I would like to read 100 KB from <>, do some testing on that and then put the 100 KB back, so they will be read by <> later.
In metacode:
$data100kb = read(<>,100000);
testing($data100kb);
unget(<>,$data100kb);
while(<>) {
do stuff;
}
I do not know in advance if <> will supply me an actual file, a pipe or a concatenation of actual files. So it should work with:
cat bigfile_a bigfile_b | perl my_program
Assume bigfiles are 1000*RAM size, so copying the input is prohibitively expensive.
It is acceptable if I can only read from STDIN.
Background
The first 100kb tells me how to parse the full input, but the parser needs this input as well.
This seems to work for STDIN. It would be great if it could be done faster.
read(STDIN, $first, 100000);
unget($first);
compute($first);
while($_=get_line()) {
# Similar to while(<>)
}
my #line_cache;
sub get_line {
if(#line_cache) {
my $line = shift #line_cache;
if(#line_cache) {
# not last line
return $line;
} else {
# last line - may be incomplete
if(substr($line, -1, 1) eq $/) {
# Line is complete
return $line;
} else {
return $line. scalar(<STDIN>);
}
}
} else {
return scalar(<STDIN>);
}
}
sub unget {
for(#_) {
# Split into lines
push #line_cache, split m:(?<=$/):;
}
}
For posterity... I wrote FileHandle::Unget to address this problem.
I don't know whether this satisfies your need. If you insist on using <>, then I guess you have to use tie.
#copy STDIN to another filehandle: $fh
my $fakefile = join '', <STDIN>;
open my $fh, '<', \$fakefile;
#read 100kb
read $fh, my $data100kb, 100_000;
#do something with the data
#$data100kb =~ y/a/b/;
#print $data100kb;
#reset $fh
seek $fh, 0, 0;
while(<$fh>){
print;# do some stuff
}
I'm maintaining old Perl code and need to enable strict pragma in all modules. I have a problem in passing a file handle as a reference between modules and subs. We have a common module responsible for opening the log file which is passed as typeglob reference. In other modules, the run function first calls open_log() from the common module, then it passes this file handle to other subs.
Here I've written a simple test to simulate the situation.
#!/usr/bin/perl -w
use strict;
$::STATUS_OK = 0;
$::STATUS_NOT_OK = 1;
sub print_header {
our $file_handle = #_;
print { $$file_handle } "#### HEADER ####"; # reference passing fails
}
sub print_text {
my ($file_handle, $text)= #_;
print_header(\$file_handle);
print { $$file_handle } $text;
}
sub open_file_handle {
my ($file_handle, $path, $name) = #_;
my $filename = $path."\\".$name;
unless ( open ($$file_handle, ">".$filename)) {
print STDERR "Failed to open file_handle $filename for writing.\n";
return $::STATUS_NOT_OK;
}
print STDERR "File $filename was opened for writing successfully.\n";
return $::STATUS_OK;
}
my $gpath = "C:\\Temp";
my $gname = "mylogfile.log";
my $gfile_handle;
if (open_file_handle(\$gfile_handle, $gpath, $gname) == $::STATUS_OK) {
my $text = "BIG SUCCESS!!!\n";
print_text(\$gfile_handle, $text);
print STDERR $text;
} else {
print STDERR "EPIC FAIL!!!!!!!!\n";
}
The Main function first calls open_file_handle and passes a file handle reference to the print_text function. If I comment out the row:
print_header(\$file_handle);
Everything works fine, but I need to pass the file handle reference to other functions from the print_text function, and this doesn't work.
I'm a Java developer and Perl's reference handling is not familiar to me. I don't want to change the open_log() sub to return a file handle (now it returns only status), since I have lots of modules and hundreds of code lines to go through to make this change in all places.
How can I fix my code to make it work?
There are two types of filehandles in Perl. Lexical and global bareword filehandles:
open my $fh, '>', '/path/to/file' or die $!;
open FILEHANDLE, '>', '/path/to/file' or die $!;
You are dealing with the first, which is good. The second one is global and should not be used.
The file handles you have are lexical, and they are stored in a scalar variable. It's called scalar because it has a dollar sign $. These can be passed as arguments to subs.
foo($fh);
They can also be referenced. In that case, you get a scalar reference.
my $ref = \$fh;
Usually you reference stuff if you hand it over to a function so Perl does not make a copy of the data. Think of a reference like a pointer in C. It's only the memory location of the data (structure). The piece of data itself remains where it is.
Now, in your code you have references to these scalars. You can tell because it is dereferenced in the print statement by saying $$fh.
sub print_text {
my ($file_handle, $text)= #_;
print_header(\$file_handle);
print { $$file_handle } $text;
}
So the $file_handle you get as a parameter (that's what the = #_ does) is actually a reference. You do not need to reference it again when you pass it to a function.
I guess you wrote the print_header yourself:
sub print_header {
our $file_handle = #_;
print { $$file_handle } "#### HEADER ####"; # reference passing fails
}
There are a few things here:
- our is for globals. Do not use that. Use my instead.
- Put parenthesis around the parameter assignment: my ($fh) = #_
- Since you pass over a reference to a reference to a scalar, you need to dereference twice: ${ ${ $file_handle } }
Of course the double-deref is weird. Get rid of it passing the variable $file_hanlde to print_header instead of a refence to it:
sub print_text {
my ($file_handle, $text)= #_;
print_header($file_handle); # <-- NO BACKSLASH HERE
print { $$file_handle } $text;
}
That is all you need to to make it work.
In general, I would get rid of all the references to the $file_handle vars here. You don't need them. The lexical filehandle is already a reference to an IO::Handle object, but don't concern yourself with that right now, it is not important. Just remember:
use filehandles that have a $ up front
pass them without references and you do not need to worry about \ and ${} and stuff like that
For more info, see perlref and perlreftut.
You are having difficulties because you added multiple extra level of references. Objects like lexical filehandles already are references.
If you have difficulties keeping track of what is a reference, you might want to use some kind of hungarian notation, like a _ref suffix.
In print_text, this would be:
sub print_text {
my ($file_handle_ref, $text)= #_;
print_header(\$file_handle_ref);
print { $$file_handle_ref } $text;
}
And in print_header:
sub print_header {
my ($file_handle_ref_ref) = #_; # don't use `our`, and assign to a lvalue list!
print { $$$file_handle_ref_ref } "#### HEADER ####"; # double derefernence … urgh
}
A far superior solution is to pass the filehandle around directly, without references.
sub print_header {
my ($file_handle) = #_;
print {$file_handle} "#### HEADER ####"; # no reference, no cry
}
sub print_text {
my ($file_handle, $text)= #_;
print_header($file_handle);
print {$file_handle} $text;
}
And in the main part:
my $gpath = "C:/Temp"; # forward slashes work too, as long as you are consistent
my $gname = "mylogfile.log";
if (open_file_handle(\my $gfile_handle, $gpath, $gname) == $::STATUS_OK) {
my $text = "BIG SUCCESS!!!\n";
print_text($gfile_handle, $text);
...
} else {
...
}
the reference operator is "\" (backslash)
anything includes arrays, hashes and even sub-routines can be referenced
the 5th line to count backwards
print_text(\$gfile_handle, $text);
you passed a referenced variable \$gfile_handle to the sub-routine print_text
sub print_text {
my ($file_handle, $text)= #_;
print_header(\$file_handle);
print { $$file_handle } $text;
}
and in this sub-routine, $file_handle is already a reference
then your referenced it again and pass it to the sub-routine print_header
so, you can solve this problem by putting off the reference operator the 5th line to count backwards like this:
print_text($gfile_handle, $text);
and try again :-)
As a beginner I have what I think is a rather complicated problem I am hoping someone could help with.
I have the following text file (tab delminated)...
FILE1.txt
Dog Big
Dog Medium
Dog Small
Rabbit Huge
Rabbit Tiny
Rabbit Middle
Donkey Massive
Donkey Little
Donkey Gigantic
I need to read FILE1.txt into a hash reference to get something like the following... (using Data::Dumper)
$VAR1 = {
'Dog' => {
'Big',
'Medium',
'Small'
},
'Rabbit => {
'Huge',
'Tiny',
'Middle'
},
'Donkey => {
'Massive',
'Little',
'Gigantic'
},
};
The problem I am having:
I then need to loop through each branch of the hash reference one at a time, I will use the value from the hash reference to check if this matches my keyword, if so it will then return it's corresponding key.... for example...
What I need it to do:
my $keyword == "Little";
Dog->Big
if 'Big' matches my keyword then return $found = Dog
else go to the next branch
Rabbit->Huge
if 'Huge' matches my keyword then return $found = Rabbit
else go to the next branch
Donkey->Massive
if 'Massive' matches my keyword then return $found = Donkey
else go to the next branch (which is Dog again, but the second element this time)
Dog->Medium
if 'Medium' matches my keyword then return $found = Dog
else go to the next branch
Rabbit->Tiny
if 'Tiny' matches my keyword then return $found = Rabbit
else go the the next branch
Donkey->Little
if 'Little' matches my keyword then return $found = Donkey
..... and so on until the keyword is found or we reach the end of the hash reference
This is the kind of thing I am trying to achieve but don't know how to go about doing this, or whether a hash reference is the best way to do this, or if it can even be done with a hash/hash reference?
your help with this is much appreciated, thanks
Choosing proper data structure is often key step to the solution, but first of all you should define what you are trying achieve. What is overall goal? For example I have this data file and in mine application/program I need frequently ask for this information. It is crucial to ask proper question because for example if you don't need ask frequently for keyword it doesn't make sense creating hash at all.
perl -anE'say $F[0] if $F[1] eq "Little"' FILE1.txt
Yes it is that simple. Look in perlrun manpage for switches and what they mean and how to do same thing in bigger application.
If you need frequently ask for this question you should arrange your data in way which helps you and not in way you have to battle with.
use strict;
use warnings;
use feature qw(say);
use autodie;
open my $f, '<', 'FILE1.txt';
my %h;
while(<$f>) {
chomp;
my ($animal, $keyword) = split' ';
$h{$keyword} = $animal unless exists $h{$keyword};
}
close $f;
for my $keyword (qw(Little Awkward Small Tiny)) {
say $h{$keyword} ? "$keyword $h{$keyword}" : "keyword $keyword not found";
}
But if you still insist you want to traverse hash you can do it but you has been warned.
open my $f, '<', 'FILE1.txt';
my %h;
while (<$f>) {
chomp;
my ( $animal, $keyword ) = split ' ';
push #{ $h{$animal} }, $keyword;
}
close $f;
KEYWORD:
for my $keyword (qw(Little Awkward Small Tiny)) {
for my $animal (keys %h) {
for my $k (#{$h{$animal}}) {
if($k eq $keyword) {
say "$keyword $animal";
next KEYWORD;
}
}
}
say "keyword $keyword not found";
}
To critique my own answer: the structuring of the part that does the search could be better. And maybe it is pointless even using an ordered hash as the search is through a linear list. Maybe it should be an array of arrays
use strict;
use warnings;
use Tie::IxHash;
#open file
open(my $fh,"ani.txt") ||die $!;
#make an ordered hash
tie my %sizes, 'Tie::IxHash';
#read file into hash of arrays
while(<$fh>) {
(my $animal,my $size)=split(/\s+/);
if (!exists($sizes{$animal})) {
$sizes{$animal} = [$size];
} else {
push #{$sizes{$animal}},$size;
}
}
my $keyword="Little";
my $running=1;
my $depth=0;
while( $running ) {
$running = 0;
for my $search (keys %sizes) {
next if ($depth > #{$sizes{$search}});
$running = 1;
if ($keyword eq $sizes{$search}[$depth]) {
print "FOUND!!!!!! $search $depth";
exit(0);
}
}
$depth++;
}
Here is another version of a solution to the stated problem. To solve the actual problem given there is no need to store anything except the first "size" key for each animal in a hash
This hash can then be trivally used to look up the animal
use strict;
use warnings;
open(my $fh,"ani.txt") ||die $!;
my %animals;
#read file into hash
while(<$fh>) {
(my $animal,my $size)=split(/\s+/);
#only add the animal the first time the size is found
if (!exists($animals{$size})) {
$animals{$size} = $animal;
}
}
my $keyword="Little";
print "animal is ", $animals{$keyword};
So I have something like this:
fork.pl
for $str (#files)
{
my($command) = "perl command.pl ".$str;
exec( $command );
}
command.pl
$file=$ARGV[0].".csv";
#code that counts rows here
print $rowcount;
So as the end result I have 10 files launched which count how many rows are in each csv file.
I do not need help editting this code, it works (this is just a compressed version). I need help figuring out how to take the output ($rowcount) of ten files and combine it into one for further processing.
I keep some utility code around for just this purpose... this is tweaked slightly to your question and including a synchronized global counting method.
#!/usr/bin/perl
use threads;
use Thread::Queue;
my #workers;
my $num_threads = 10;
my $queue = new Thread::Queue;
my $total_ines = 0;
for (0..$num_threads-1) {
$workers[$_] = new threads(\&worker);
}
while ($_ = shift #ARGV) {
$queue->enqueue($_);
}
sub worker() {
while ($file = $queue->dequeue) {
#line counting code here
global_counter($lines_counted);
}
}
sub global_counter() :locked {
#add to the number of lines counted
$total_lines += shift
}
for (0..$num_threads-1) { $queue->enqueue(undef); }
for (0..$num_threads-1) { $workers[$_]->join; }
print $total_lines;
This kind of communication is solved using pipes (let me write a simple example):
# -- fork.pl -------------------------
for (1..3) {
open my $PIPE, "perl command.pl |";
print "catch: $_\n" while(<$PIPE>);
close $PIPE;
}
# -- command.pl ----------------------
print rand(1);
It prints (random numbers):
catch: 0.58929443359375
catch: 0.1290283203125
catch: 0.907012939453125
You need to look either at threads or Interprocess communication with e.g. sockets or shared memory when using fork.
Compressed but won't work. I'm assuming that in fork.pl, you fork before exec'ing? Backticks capture the output of the called process, namely your prints:
fork.pl
for $str (#files)
{
my($command) = "perl command.pl ".$str;
print `$command`;
}
But rather than forking and launching processes, wouldn't it be smarter to turn the second file into a module?
package MyCommand;
use Exporter;
our #EXPORT = qw( command );
sub command {
my $file = $_[0] . '.csv';
...
return $rowcount;
}
1;
fork.pl:
use MyCommand;
...
my #rowcounts;
for my $str (#files) {
push #rowcounts, command($str);
}
A bit of self-promotion, but I just posted this in your other thread, which seems relevant enough: How to run in parallel two child command from a parent one?
Accumulate pipes from children:
#!/usr/bin/perl -w
use strict;
my $files = qw/one.csv two.csv three.csv/;
my $command = "perl command.pl";
my #pipes;
foreach (#files) {
my $fd;
open $fd, "-|", "$command $_" and push #pipes, $fd;
};
my $sum = 0;
foreach my $pp (#pipes) {
$sum += $_ if defined ($_=<$pp>);
};
print $sum;
Then you can just read them one by one (as in example), or use IO::Select to read data as it appears in each pipe.
A hash table in addition to array is also good if you want to know which data comes from which source.
Is there any way to have a subroutine send data back while still processing? For instance (this example used simply to illustrate) - a subroutine reads a file. While it is reading through the file, if some condition is met, then "return" that line and keep processing. I know there are those that will answer - why would you want to do that? and why don't you just ...?, but I really would like to know if this is possible.
A common way to implement this type of functionality is with a callback function:
{
open my $log, '>', 'logfile' or die $!;
sub log_line {print $log #_}
}
sub process_file {
my ($filename, $callback) = #_;
open my $file, '<', $filename or die $!;
local $_;
while (<$file>) {
if (/some condition/) {
$callback->($_)
}
# whatever other processing you need ....
}
}
process_file 'myfile.txt', \&log_line;
or without even naming the callback:
process_file 'myfile.txt', sub {print STDERR #_};
Some languages offer this sort of feature using "generators" or "coroutines", but Perl does not. The generator page linked above has examples in Python, C#, and Ruby (among others).
The Coro module looks like it would be useful for this problem, though I have no idea how it works and no idea whether it does what it advertises.
The easiest way to do this in Perl is probably with an iterator-type solution. For example, here we have a subroutine which forms a closure over a filehandle:
open my $fh, '<', 'some_file.txt' or die $!;
my $iter = sub {
while( my $line = <$fh> ) {
return $line if $line =~ /foo/;
}
return;
}
The sub iterates over the lines until it finds one matching the pattern /foo/ and then returns it, or else returns nothing. (undef in scalar context.) Because the filehandle $fh is defined outsite the scope of the sub, it remains resident in memory between calls. Most importantly, its state, including the current seek position in the file, is retained. So each call to the subroutine resumes reading the file where it last left off.
To use the iterator:
while( defined( my $next_line = $iter->() ) ) {
# do something with each line here
}
If you really want do this you can by using threading. One option would be to fork a separate thread that reads the file and when it finds a certain line, place it in an array that is shared between threads. Then the other thread could take the lines, as they are found, and process them. Here is an example that reads a file, looks for an 'X' in a file's line, and does an action when it is found.
use strict;
use threads;
use threads::shared;
my #ary : shared;
my $thr = threads->create('file_reader');
while(1){
my ($value);
{
lock(#ary);
if ($#ary > -1){
$value = shift(#ary);
print "Found a line to process: $value\n";
}
else{
print "no more lines to process...\n";
}
}
sleep(1);
#process $value
}
sub file_reader{
#File input
open(INPUT, "<test.txt");
while(<INPUT>){
my($line) = $_;
chomp($line);
print "reading $line\n";
if ($line =~ /X/){
print "pushing $line\n";
lock(#ary);
push #ary, $line;
}
sleep(4)
}
close(INPUT);
}
Try this code as the test.txt file:
line 1
line 2X
line 3
line 4X
line 5
line 6
line 7X
line 8
line 9
line 10
line 11
line 12X
If your language supports closures, you may be able to do something like this:
By the way, the function would not keep processing the file, it would run just when you call it, so it may be not what you need.
(This is a javascript like pseudo-code)
function fileReader (filename) {
var file = open(filename);
return function () {
while (s = file.read()) {
if (condition) {
return line;
}
}
return null;
}
}
a = fileReader("myfile");
line1 = a();
line2 = a();
line3 = a();
What about a recursive sub? Re-opening existing filehandles do not reset the input line number, so it carries on from where it's left off.
Here is an example where the process_file subroutine prints out blank-line-separated "\n\n" paragraphs that contain foo.
sub process_file {
my ($fileHandle) = #_;
my $paragraph;
while ( defined(my $line = <$fileHandle>) and not eof(<$fileHandle>) ) {
$paragraph .= $line;
last unless length($line);
}
print $paragraph if $paragraph =~ /foo/;
goto &process_file unless eof($fileHandle);
# goto optimizes the tail recursion and prevents a stack overflow
# redo unless eof($fileHandle); would also work
}
open my $fileHandle, '<', 'file.txt';
process_file($fileHandle);