Reference to a string as a class variable - perl

I'm trying to save a reference to a string in a class variable.
I wish to access this variable by dereferencing it.
For example in the routine getHeaders instead of using:
my $fileContentsRef = $this->getFileContent;
my $fileContentsRef1 = $$fileContentsRef;
$fileContentsRef1 =~ /Spaltenname.*?Datentyp.*?---\n(.*?)\n\n/gsmi;
I would like to use:
my $fileContentsRef = $this->getFileContent;
$$fileContentsRef =~ /Spaltenname.*?Datentyp.*?---\n(.*?)\n\n/gsmi;
For more details you should see the code at the end.
My problem is, that the program doesn't work when I don't work with the copy( i.e when I don't use $fileContentsRef1). What am I doing / getting wrong? Is it possible to reach the goal in the way I described? Could some give me clues how?
open FILE, "a1.bad";
$file_contents .= do { local $/; <FILE> };
close FILE;
my $log = auswerter->new(\$file_contents);
#-----------------------------------------------------------------
# Subs
#-----------------------------------------------------------------
# CONSTRUCTOR
sub new
{
my $fileRef = $_[1];
my $self = {};
bless $self;
$self->initialize();
if($fileRef) { $self->{fileRef} = $fileRef; }
return $self;
}
sub initialize
{
#-----------------------------------------------------------------
# Configuration
#-----------------------------------------------------------------
my $this = shift;
}
sub setFile {
my $this = shift;
$this->{file} = shift;
}
sub getFileContent
{
my $this = shift;
return $this->{fileRef};
}
sub getHeaders
{
print "HEADERS...\n";
my $this = shift;
my #headers = ();
my $fileContentsRef = $this->getFileContent;
my $fileContentsRef1 = $$fileContentsRef;
$fileContentsRef1 =~ /Spaltenname.*?Datentyp.*?---\n(.*?)\n\n/gsmi;
#headers = split ("\n", $1 );
foreach (#headers)
{
$_ =~ s/^(.*?)\s.*/$1/;
}
return \#headers;
}
sub getErrList
{
print "ERR LIST...\n";
my $this = shift;
my #errors = ();
my $fileContentsRef = $this->getFileContent;
my $fileContentsRef1 = $$fileContentsRef;
$fileContentsRef1 =~ /Spaltenname.*?(Satz.*)ORA.*?^Tabelle/gsmi;
return \#errors if !$1;
#errors = split ("\n\n", $1 );
foreach (#errors)
{
$_ =~ s/.*Spalte (.*?)\..*/$1/msgi;
}
return \#errors;
}
sub getEntries
{
my $this = shift;
my #entries = ();
my $fileContentsRef = $this->getFileContent;
my $fileContentsRef1 = $$fileContentsRef;
$fileContentsRef1 =~ /.*==\n(.*)/gsmi;
#entries = split ("\n", $1 );
return \#entries;
}
sub sqlldrAnalyze
{
my $this = shift;
my $token = shift;
my $errRef =$this->getErrList();
return "" if $#$errRef < 0 ;
my $headersRef = $this->getHeaders();
my $entriesRef = $this->getEntries();
my $i = 0;
my $str = "";
$str = "<html>";
$str .= "<table rules=\"all\">";
$str .= "<tr>";
foreach ( #$headersRef)
{
$str .= "<th>".$_."</th>";
}
$str .= "</tr>";
foreach ( #$entriesRef)
{
my #errOffset = grep { $headersRef->[$_] =~ $errRef->[$i] }0..$#$headersRef ;
my #entries = split($token, $_);
$str .= "<tr>";
foreach (my $j =0; $j <= $#entries;$j++)
{
$str .= "<td nowrap";
$str .= " style=\"background-color: red\"" if $j == $errOffset[0];;
$str .= ">";
$str .= "<b>" if $j == $errOffset[0];
$str .= $entries[$j];
$str .= "</b>" if $j == $errOffset[0];
$str .= "</td>";
}
$str .= "</tr>\n";
$i++;
}
$str .= "</table>";
$str .= "</html>";
return $str;
}
return 1;

When you call your class->new(...) constructor with a filename argument, the new subroutine gets the class name as the first argument, and the filename as the second argument.
In your constructor, you are simply copying the value of $_[1] (the filename) into $self->{FileRef}, but that value is not a reference.
So when you access it, there is no need to use a doubled sigil to dereference the value.
You should run all of your code with the following two lines at the top, which will catch many errors for you (including trying to use strings as references when they are not references):
use strict;
use warnings;
These two lines basically move Perl out of quick one-liner mode, and into a mode more suitable for large development (improved type safety, static variable name checking, and others).
Per the update: If the code you have is working properly when copying the string, but not when dereferencing it directly, it sounds like you may be running into an issue of the string reference preserving the last match position (the g flag).
Try running the following:
my $fileContentsRef = $this->getFileContent;
pos($$fileContentsRef) = 0; # reset the match position
$$fileContentsRef =~ /Spaltenname.*?Datentyp.*?---\n(.*?)\n\n/gsmi;

Related

Unable to retrieve multiple column values from file in Perl

I have a file with following contents:
TIME
DATE TIME DAY
191227 055526 FRI
RC DEV SERVER
RC1 SERVER1
RC2 SERVER2
RC3 SERVER3
END
I am fetching argument values from this file, say if I pass DATE as an argument to the script I am getting corresponding value of the DATE. i.e., 191227
When I pass multiple arguments say DATE, DAY I should get values:
DATE=191227
DAY=FRI
But what I am getting here is:
DATE=191227
DAY=NULL
And if I pass RC as an argument I should get:
RC=RC1,RC2,RC3
The script looks below:
#!/usr/bin/perl
use strict;
use Data::Dumper;
print Dumper(\#ARGV);
foreach my $name(#ARGV){
print "NAME:$name\n";
my ($result, $start, $stop, $width) = "";
while(my $head = <STDIN>)
{
if( $head =~ (m/\b$name\b/g))
{
$start = (pos $head) - length($name);
$stop = (pos $head);
my $line = <STDIN>;
pos $head = $stop+1;
$head =~ (m/\b/g);
$width = (pos $head) - $start;
$result = substr($line,$start,$width);
}
}
$result =~ s/^\s*(.*?)\s*$/$1/;
print "$name=";
$result = "NULL" if ( $result eq "" );
print "$result\n";
}
Can someone please help me to get values if I pass multiple arguments also if suppose argument value have data in multiple lines it should be printed comma separated values (ex: for RC, RC=RC1,RC2,RC3).
Here is an example, assuming the input file is named file.txt and the values are starting at the same horizontal position as the keys:
package Main;
use feature qw(say);
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my $self = Main->new(fn => 'file.txt', params => [#ARGV]);
$self->read_file();
$self->print_values();
sub read_file {
my ( $self ) = #_;
my $fn = $self->{fn};
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
local $/ = ""; #Paragraph mode
my #blocks = <$fh>;
close $fh;
$self->{values} = {};
for my $block (#blocks) {
$self->parse_block( $block );
}
}
sub parse_block {
my ( $self, $block ) = #_;
my #lines = split /\n/, $block;
my $header = shift #lines;
my ($keys, $startpos) = $self->get_block_keys( $header );
for my $line ( #lines ) {
for my $key (#$keys) {
my $startpos = $startpos->{$key};
my $str = substr $line, $startpos;
my ( $value ) = $str =~ /^(\S+)/;
if ( defined $value ) {
push #{$self->{values}{$key}}, $value;
}
}
}
}
sub get_block_keys {
my ( $self, $header ) = #_;
my $values = $self->{values};
my #keys;
my %spos;
while ($header =~ /(\S+)/g) {
my $key = $1;
my $startpos = $-[1];
$spos{$key} = $startpos;
push #keys, $key;
}
for my $key (#keys) {
if ( !(exists $values->{$key}) ) {
$values->{$key} = [];
}
}
return (\#keys, \%spos);
}
sub new {
my ( $class, %args ) = #_;
return bless \%args, $class;
}
sub print_values {
my ( $self ) = #_;
my $values = $self->{values};
for my $key (#{$self->{params}}) {
my $value = "<NO VALUE FOUND>";
if ( exists $values->{$key} ) {
$value = join ",", #{$values->{$key}};
}
say "$key=$value";
}
}
Edit
If you want to read the file from STDIN instead, change the following part of the code:
# [...]
my $self = Main->new(params => [#ARGV]);
$self->read_file();
$self->print_values();
sub read_file {
my ( $self ) = #_;
local $/ = ""; #Paragraph mode
my #blocks = <STDIN>;
$self->{values} = {};
for my $block (#blocks) {
$self->parse_block( $block );
}
}
# [...]

stockholm to fasta format - include accession id in every header

Hello I've multiple sequences in stockholm format, at the top of every alignment there is a accession ID, for ex: '#=GF AC PF00406' and '//' --> this is the end of the alignment. When I'm converting the stockholm format to fasta format I need PF00406 in the header of every sequence of the particular alignment. Some times there will be multiple stockholm alignments in one file. I tried to modify the following perl script, it gave me bizarre results, any help will be greatly appreciated.
my $columns = 60;
my $gapped = 0;
my $progname = $0;
$progname =~ s/^.*?([^\/]+)$/$1/;
my $usage = "Usage: $progname [<Stockholm file(s)>]\n";
$usage .= " [-h] print this help message\n";
$usage .= " [-g] write gapped FASTA output\n";
$usage .= " [-s] sort sequences by name\n";
$usage .= " [-c <cols>] number of columns for FASTA output (default is $columns)\n";
# parse cmd-line opts
my #argv;
while (#ARGV) {
my $arg = shift;
if ($arg eq "-h") {
die $usage;
} elsif ($arg eq "-g") {
$gapped = 1;
} elsif ($arg eq "-s"){
$sorted = 1;
} elsif ($arg eq "-c") {
defined ($columns = shift) or die $usage;
} else {
push #argv, $arg;
}
}
#ARGV = #argv;
my %seq;
while (<>) {
next unless /\S/;
next if /^\s*\#/;
if (/^\s*\/\//) { printseq() }
else {
chomp;
my ($name, $seq) = split;
#seq =~ s/[\.\-]//g unless $gapped;
$seq{$name} .= $seq;
}
}
printseq();
sub printseq {
if($sorted){
foreach $key (sort keys %seq){
print ">$key\n";
for (my $i = 0; $i < length $seq{$key}; $i += $columns){
print substr($seq{$key}, $i, $columns), "\n";
}
}
} else{
while (my ($name, $seq) = each %seq) {
print ">$name\n";
for (my $i = 0; $i < length $seq; $i += $columns) {
print substr ($seq, $i, $columns), "\n";
}
}
}
%seq = ();
}
Depending on the how much variation there is in the line with the accessionID, you might need to modify the regex, but this works for your example file
my %seq;
my $aln;
while (<>) {
if ($_ =~ /#=GF AC (\w+)/) {
$aln = $1;
}
elsif ($_ =~ /^\s*\/\/\s*$/){
$aln = '';
}
next unless /\S/;
next if /^\s*\#/;
if (/^\s*\/\//) { printseq() }
else {
chomp;
my ($name, $seq) = split;
$name = $name . ' ' . $aln;
$seq{$name} .= $seq;
}
}
printseq();

Uninitialized variable issue in Perl program

#!/usr/bin/perl
use warnings;
use Scalar::Util qw(looks_like_number);
sub term_value();
sub factor_value();
sub expression_value()
{
$num = #_;
#expression = $_[0];
print "expression[0]: " . $expression[0] . "\n";
$index = $_[$num-1];
print "index: $index\n";
$result = &term_value(#expression, $index);
$more = 1;
while($more)
{
$op = $expression[$index];
print "$op\n";
if ($op eq "+" || $op eq "-")
{
$index++;
$value = &term_value(#expression, $index);
if ($op eq '+')
{
$result = $result + $value;
} else {
$result = $result - $value;
}
}
else
{
$more = 0;
}
}
return $result;
}
sub term_value()
{
$num = #_;
#expression = $_[0];
print "expression[0]: " . $expression[0] . "\n";
$index = $_[$num-1];
print "index: $index\n";
$result = &factor_value(#expression, $index);
$more = 1;
while($more)
{
$op = $expression[$index];
if ($op eq "*" || $op eq "/")
{
$index++;
$value = &factor_value(#expression, $index);
if ($op eq '*')
{
$result = $result * $value;
} else {
$result = $result / $value;
}
} else {
$more = 0;
}
}
return $result;
}
sub factor_value()
{
$num = #_;
#expression = $_[0];
print "expression[0]: " . $expression[0] . "\n";
$index = $_[$num-1];
print "index: $index\n";
$result = 0;
$c = $expression[$index];
if ($c eq '(')
{
$index++;
$result = &expression_value(#expression, $index);
$index++;
} else {
while (looks_like_number($c))
{
$result = 10 * $result + $c - '0';
$index++;
$c = $expression[$index];
}
}
return $result;
}
#Collect argument and separate by character
#one_char = split(//, $ARGV[0]);
$index = 0;
$result = &expression_value(#one_char, $index);
print $result . "\n";
My console returns these warnings:
Use of uninitialized value $op in string eq at eval.pl line 58.
Use of uninitialized value $op in string eq at eval.pl line 58.
Use of uninitialized value $op in string eq at eval.pl line 25.
Use of uninitialized value $op in string eq at eval.pl line 25.
about the $op variable being uninitialized. I'm thinking this may be a scope problem...but I can't figure it out. I've tried everything I could think of (initializing the variable outside of the loop, etc.), but none of it seems to make a difference when running the program. Any suggestions would be greatly appreciated!
You're only using package (~global) variables, which is a huge problem given that you are using recursive functions! Start by adding
use strict;
Primarily, this will identify the variables you haven't declared. Use my to declare them in the appropriate scope.
You're trying to pass arrays to the subs, but you're failing. The only thing that can be passed to a sub is a list of scalars. If you want to pass an array to a sub, you'll need to pass a reference (~pointer) to the array.
sub foo {
my ($expressions, $index) = #_;
print($expressions->[$index], "\n");
}
foo(\#expressions, $index);
This is the reason you're getting the warnings. You are assigning one element to an array (#expression = $_[0]), then you try to index the second or later element.
By using prototype (), you're telling Perl the sub takes no arguments. Then you use & to tell Perl to ignore the prototype so you can pass arguments to your subs. Get rid of both the () after the sub names and & before sub calls.
my $more = 1;
while ($more) {
...
if (cond) {
...
} else {
$more = 0;
}
}
can be reduced to
while (1) {
...
last if !cond;
...
}
Higher Order Perl has a chapter on parsing. See section 8.1.2 for how you would build an expression parser and evaluator from scratch.
You can also take a look at the demo calculator script provided with Parse::RecDescent.
Just out of curiosity, I wanted to see what can be achieved without using parsers. The following script makes a lot of assumptions, but "works" for the simple cases.
#!/usr/bin/env perl
use strict;
use warnings;
use Regexp::Common qw(balanced number);
die "Need expression\n" unless #ARGV;
my ($expression) = #ARGV;
my $result = evaluate_expression($expression);
printf(
"'%s' evaluated to %g\n",
$expression, $result
);
my $expected = eval $expression;
unless ($result == $expected) {
die "Wrong result, should have been '$expected'\n";
}
sub evaluate_expression {
my ($expression) = #_;
my $n = qr!$RE{num}{real}!;
my $mul = qr![*/]!;
my $add = qr![+-]!;
my $subexpr = qr!$RE{balanced}{-parens=>'()'}{-keep}!;
1 while
$expression =~ s!
$subexpr
!
my $s = $1;
$s =~ s{(?:^\()|(?:\)\z)}{}g;
evaluate_expression($s)
!gex;
1 while
$expression =~ s!($n) \s* ($mul) \s* ($n)!"$1 $2 $3"!geex;
1 while
$expression =~ s!($n) \s* ($add) \s* ($n)!"$1 $2 $3"!geex;
return $expression;
}
Output:
C:\Temp> z "((1+1)*3 +2)*5"
'((1+1)*3 +2)*5' evaluated to 40
C:\Temp> z "(1+1)*3 + 2*5"
'(1+1)*3 + 2*5' evaluated to 16
But, of course, it's fragile:
C:\Temp> z "2*3+2*5"
'2*3+2*5' evaluated to 610
Wrong result, should have been '16'
As a bit of a corollary to Sinan's answer, here is a "parser" written from the other side of the camel.
use 5.010;
use strict;
use warnings;
my #ops;
use overload map {
my $op = $_;
$op => sub {
my ($x, $y) = #_[$_[2] ? (1, 0) : (0, 1)];
bless [$x, $op, $y]
}
} #ops = qw(+ - / *);
my %ops = map {$_ => eval "sub {\$_[0] $_ \$_[1]}"} #ops;
sub eval {
my $self = shift;
return $$self[0] if #$self == 1;
my ($x, $op, $y) = map {ref eq 'main' ? $_->eval : $_} #$self;
my $ret = $ops{$op}->($x, $y);
say "$ret = $x $op $y";
$ret;
}
BEGIN {overload::constant integer => sub {bless [$_[1]]}}
eval->eval for "#ARGV";
Which when run:
$ perl eval.pl 2*3+2*5
prints:
6 = 2 * 3
10 = 2 * 5
16 = 6 + 10

Why doesn't my decrypt function work?

This was not written by me; it was written by someone who passed it down to me. I lost contact with the author of the code. I have been using this code for a few years and just now realized this error. It seems that the letter sequence rkey1 messes up the output.
For example turkey1 outputs as decryption as tur79y1. This Perl code should output turkey1 and not tur79y1:
$String = "turkey1";
$e = &encode_escaped(&palace_encrypt($String));
$d = &palace_decrypt(&decode_escaped("'\"".$e."\"'"));
print $d."<br>\n";
KEY REMOVED BY OWNER
sub palace_decrypt
{
local $lastchar = 0;
local $rc = 0;
local #bs;
for($i=length($_[0])-1; $i>=0; $i--) {
local $tmp = ord(substr($_[0], $i, 1));
$bs[$i] = $tmp ^ $palace_key[$rc++] ^ $lastchar;
$lastchar = $tmp ^ $palace_key[$rc++];
}
return join("", map { chr($_) } #bs);
}
sub decode_escaped
{
$_[0] =~ m/\"(.*)\"/;
local $str = $1;
$str =~ s/\\\\/\0/g;
$str =~ s/\\"/"/g;
$str =~ s/\\(..)/pack("c",hex($1))/ge;
$str =~ s/\0/\\/g;
return $str;
}
sub palace_encrypt
{
local $lastchar = 0;
local $rc = 0;
local #bs;
for($i=length($_[0])-1; $i>=0; $i--) {
local $b = ord(substr($_[0], $i, 1));
$bs[$i] = $b ^ $palace_key[$rc++] ^ $lastchar;
$lastchar = $bs[$i] ^ $palace_key[$rc++];
}
return join("", map { chr($_) } #bs);
}
sub encode_escaped
{
local $str = $_[0];
$str =~ s/\\/\\\\/g;
$str =~ s/([^A-Za-z0-9\.\\])/sprintf("\\%2.2X", ord($1))/ge;
return $str;
}
Your problem is that your decode_escaped does not exactly undo what encode_escaped did. Replace it with the following and that should fix your problem.
sub decode_escaped
{
$_[0] =~ m/\"(.*)\"/;
local #str = split /(\\\\)/, $1;
foreach (#str) {
s/\\"/"/g;
s/\\(..)/chr(hex($1))/ge;
s/\\\\/\\/;
}
return join '', #str;
}

Why does perl "hash of lists" do this?

I have a hash of lists that is not getting populated.
I checked that the block at the end that adds to the hash is in fact being called on input. It should either add a singleton list if the key doesn't exist, or else push to the back of the list (referenced under the right key) if it does.
I understand that the GOTO is ugly, but I've commented it out and it has no effect.
The problem is that when printhits is called, nothing is printed, as if there are no values in the hash. I also tried each (%genomehits), no dice.
THANKS!
#!/usr/bin/perl
use strict;
use warnings;
my $len = 11; # resolution of the peaks
#$ARGV[0] is input file
#$ARGV[1] is call number
# optional -s = spread number from call
# optional -o specify output file name
my $usage = "see arguments";
my $input = shift #ARGV or die $usage;
my $call = shift #ARGV or die $usage;
my $therest = join(" ",#ARGV) . " ";
print "the rest".$therest."\n";
my $spread = 1;
my $output = $input . ".out";
if ($therest =~ /-s\s+(\d+)\s/) {$spread = $1;}
if ($therest =~ /-o\s+(.+)\s/) {$output = $1;}
# initialize master hash
my %genomehits = ();
foreach (split ';', $input) {
my $mygenename = "err_naming";
if ($_ =~ /^(.+)-/) {$mygenename = $1;}
open (INPUT, $_);
my #wiggle = <INPUT>;
&singlegene(\%genomehits, \#wiggle, $mygenename);
close (INPUT);
}
&printhits;
#print %genomehits;
sub printhits {
foreach my $key (%genomehits) {
print "key: $key , values: ";
foreach (#{$genomehits{$key}}) {
print $_ . ";";
}
print "\n";
}
}
sub singlegene {
# let %hash be the mapping hash
# let #mygene be the gene to currently process
# let $mygenename be the name of the gene to currently process
my (%hash) = %{$_[0]};
my (#mygene) = #{$_[1]};
my $mygenename = $_[2];
my $chromosome;
my $leftbound = -2;
my $rightbound = -2;
foreach (#mygene) {
#print "Doing line ". $_ . "\n";
if ($_ =~ "track" or $_ =~ "output" or $_ =~ "#") {next;}
if ($_ =~ "Step") {
if ($_ =~ /chrom=(.+)\s/) {$chromosome = $1;}
if ($_ =~ /span=(\d+)/) {$1 == 1 or die ("don't support span not equal to one, see wig spec")};
$leftbound = -2;
$rightbound = -2;
next;
}
my #line = split /\t/, $_;
my $pos = $line[0];
my $val = $line[-1];
# above threshold for a call
if ($val >= $call) {
# start of range
if ($rightbound != ($pos - 1)) {
$leftbound = $pos;
$rightbound = $pos;
}
# middle of range, increment rightbound
else {
$rightbound = $pos;
}
if (\$_ =~ $mygene[-1]) {goto FORTHELASTONE;}
}
# else reinitialize: not a call
else {
FORTHELASTONE:
# typical case, in an ocean of OFFs
if ($rightbound != ($pos-1)) {
$leftbound = $pos;
}
else {
# register the range
my $range = $rightbound - $leftbound;
for ($spread) {
$leftbound -= $len;
$rightbound += $len;
}
#print $range . "\n";
foreach ($leftbound .. $rightbound) {
my $key = "$chromosome:$_";
if (not defined $hash{$key}) {
$hash{$key} = [$mygenename];
}
else { push #{$hash{$key}}, $mygenename; }
}
}
}
}
}
You are passing a reference to %genomehits to the function singlegene, and then copying it into a new hash when you do my (%hash) = %{$_[0]};. You then add values to %hash which goes away at the end of the function.
To fix it, use the reference directly with arrow notation. E.g.
my $hash = $_[0];
...
$hash->{$key} = yadda yadda;
I think it's this line:
my (%hash) = %{$_[0]};
You're passing in a reference, but this statement is making a copy of your hash. All additions you make in singlegene are then lost when you return.
Leave it as a hash reference and it should work.
PS - Data::Dumper is your friend when large data structures are not behaving as expected. I'd sprinkle a few of these in your code...
use Data::Dumper; print Dumper \%genomehash;