Pattern Matching in perl - perl

I want to parse some information from the file.
Information in the file:
Rita_bike_house_Sha9
Rita_bike_house
I want to have output like dis
$a = Rita_bike_house and $b = Sha9,
$a = Rita_bike_house and $b = "original"
In order to get that I have used the below code:
$name = #_; # This #_ has all the information from the file that I have shown above.
#For matching pattern Rita_bike_house_Sha9
($a, $b) = $name =~ /\w\d+/;
if ($a ne "" and $b ne "" ) { return ($a,$b) }
# this statement doesnot work at all as its first condition
# before the end is not satisified.
Is there any way where I can store "Rita_bike_house" in $a and "Sha9" in $b? I think my regexp is missing with something. Can you suggest anything?

Please don't use the variables $a and $b in your code. There are used by sort and will confuse you.
Try:
while( my $line = <DATA> ){
chomp $line;
if( $line =~ m{ \A ( \w+ ) _ ( [^_]* \d [^_]* ) \z }msx ){
my $first = $1;
my $second = $2;
print "\$a = $first and \$b = $second\n";
}else{
print "\$a = $line and \$b = \"original\"\n";
}
}
__DATA__
Rita_bike_house_Sha9
Rita_bike_house

Not very nice, but the next:
use strict;
use warnings;
while(<DATA>) {
chomp;
next if /^\s*$/;
my #parts = split(/_/);
my $b = pop #parts if $parts[$#parts] =~ /\d/;
$b //= '"original"';
my $a = join('_', #parts);
print "\$a = $a and \$b = $b,\n";
}
__DATA__
Rita_bike_house_Sha9
Rita_bike_house
prints:
$a = Rita_bike_house and $b = Sha9,
$a = Rita_bike_house and $b = "original",

If you are sure that the pattern which is required will always be similar to 'Sha9' and also it will appear at the end then just do a greedy matching....
open FILE, "filename.txt" or die $!;
my #data = <FILE>;
close(<FILE>);
#my $line = "Rita_bike_house_Sha9";
foreach $line (#data)
{
chomp($line);
if ($line =~ m/(.*?)(_([a-zA-Z]+[0-9]+))?$/)
{
$a = $1;
$b = $3 ? $3 : "original";
}
}

Related

What is causing the error with declaring my $value in Perl for loop?

I am writing code to parse sales information for a auto generated excel report. I am able to pull the data from each cell for a row and print it to a text file. The issue begins when I attempt to loop reading rows. The spreadsheet is setup with data on every other cell. I attempted to create a for loop to increment the cells but get an error.
syntax error at C:\taylor\perl\test_parse.pl line 128, near ")
my "
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at C:\taylor\perl\test_parse.pl line 128.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at C:\taylor\perl\test_parse.pl line 129.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at C:\taylor\perl\test_parse.pl line 168.
Execution of C:\taylor\perl\test_parse.pl aborted due to compilation errors."
I have lookled into moving the variables out of the scope of the if block but am unsure if this is possible.
#!/usr/bin/perl
use warnings;
use strict;
use Spreadsheet::ParseXLSX;
use Spreadsheet::ParseExcel;
use Spreadsheet::XLSX;
use Date::Format;
my $filename = "c:/taylor/perl/DCS8.xlsx";
my $files = "C:\\Taylor\\perl\\imports.csv";
#Parse excel file
my $parser = Spreadsheet::ParseXLSX->new();
my $workbook = $parser->parse("$filename");
for (my $i=0; $i <= 200; $i+=2) {
my $worksheet1 = $workbook->worksheet('Sheet1');
my $a = $worksheet1->get_cell(15+$i,0);
my $d = $worksheet1->get_cell(15+$i,3);
my $f = $worksheet1->get_cell(15+$i,5);
my $h = $worksheet1->get_cell(15+$i,7);
my $j = $worksheet1->get_cell(15+$i,9);
my $l = $worksheet1->get_cell(15+$i,11);
my $n = $worksheet1->get_cell(15+$i,13);
my $p = $worksheet1->get_cell(15+$i,15);
my $r = $worksheet1->get_cell(15+$i,17);
my $t = $worksheet1->get_cell(15+$i,19);
my $v = $worksheet1->get_cell(15+$i,21);
my $x = $worksheet1->get_cell(15+$i,23);
my $z = $worksheet1->get_cell(15+$i,25);
my $ab = $worksheet1->get_cell(15+$i,27);
my $af = $worksheet1->get_cell(15+$i,29);
my $ah = $worksheet1->get_cell(15+$i,31);
my $aj = $worksheet1->get_cell(15+$i,33);
my $ao = $worksheet1->get_cell(15+$i,35);
if (( defined $a and $a->value() ne "")
or ( defined $d and $d->value() ne "")
or ( defined $f and $f->value() ne "")
or ( defined $h and $h->value() ne "")
or ( defined $j and $j->value() ne "")
or ( defined $l and $l->value() ne "")
or ( defined $n and $n->value() ne "")
or ( defined $p and $p->value() ne "")
or ( defined $r and $r->value() ne "")
or ( defined $t and $t->value() ne "")
or ( defined $v and $v->value() ne "")
or ( defined $x and $x->value() ne "")
or ( defined $z and $z->value() ne "")
or ( defined $ab and $ab->value() ne "")
or ( defined $af and $af->value() ne "")
or ( defined $ah and $ah->value() ne "")
or ( defined $aj and $aj->value() ne "")
or ( defined $ao and $ao->value() ne ""))
my $value = $a->value();
$value =~ s/[_, -]//g;
my $value2 = $d->value();
$value2 =~ s/[_, -]//g;
my $value3 = $f->value();
$value3 =~ s/[_, -]//g;
my $value4 = $h->value();
$value4 =~ s/[_, -]//g;
my $value5 = $j->value();
$value5 =~ s/[_, -]//g;
my $value6 = $l->value();
$value6 =~ s/[_, -]//g;
my $value7 = $n->value();
$value7 =~ s/[^a-zA-Z0-9,]//g;
my $value8 = $p->value();
$value8 =~ s/[_, -]//g;
my $value9 = $r->value();
$value9 =~ s/[_, -]//g;
my $value10 = $t->value();
$value10 =~ s/[_, -]//g;
my $value11 = $v->value();
$value11 =~ s/[_, -]//g;
my $value12 = $x->value();
$value12 =~ s/[_, -]//g;
my $value13 = $z->value();
$value13 =~ s/[_, -]//g;
my $value14 = $ab->value();
$value14 =~ s/[_, -]//g;
my $value15 = $af->value();
$value15 =~ s/[_, -]//g;
my $value16 = $ah->value();
$value16 =~ s/[^a-zA-Z0-9,]//g;
my $value17 = $aj->value();
$value17 =~ s/[^a-zA-Z0-9,]//g;
my $value18 = $ao->value();
$value18 =~ s/[^a-zA-Z0-9,]//g;
my $files = "C:\\Taylor\\perl\\imports.csv";
unlink ($files);
open (OUTFILE, ">>$files");
print OUTFILE "$value,$value2,$value3,$value4,$value5,$value6,$value7,$value8,$value9,$value10,$value11,$value12,$value13,$value14,$value15,$value16,$value17,$value18\n";
}
Ideally this would loop through the get cells adding 2 rows each time, instead it seems when the loop begins it breaks.
The solution is already given in the comments, but it might help to make it clearer for anyone else who comes across this problem in the future.
The code contains an if statement that was written like this (simplified massively):
if ($some_condition)
# do something
But in Perl, the code attached to an if statement needs to be in a code block. So it should be written like this:
if ($some_condition) {
# do something
}

perl variable not storing data outside block

I have written below mention code to read a file and and storing data to array #s_arr.
But when I am trying to print that #s_arr array outside the block it shows nothing.
use Data::Dumper;
my #s_arr;
my #err;
my %sort_h_1;
$fname = '/qv/Error.log';
open( IN, "<$fname" );
foreach $line ( <IN> ) {
if ( $line =~ /CODE\+(\w{3})(\d{5})/ ) {
$a = "$1$2";
push #err, $a;
}
}
close IN;
$prev = "";
$count = 0;
my %hash;
foreach ( sort #err ) {
if ( $prev ne $_ ) {
if ( $count ) {
$hash{$prev} = $count;
}
$prev = $_;
$count = 0;
}
$count++;
}
print Dumper \%hash;
printf( "%s:%d\n", $prev, $count ) if $count;
$hash{$prev} = $count;
my $c = 0;
print "Today Error Count\n";
foreach my $name ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) {
#printf "%-8s %s\n", $name, $hash{$name};
#my %sort_h ;
push #s_arr, $name;
push #s_arr, $hash{$name};
#$sort_h{$name} = $hash{$name} ;
#print Dumper \%sort_h ;
#print Dumper \#s_arr ;
$c++;
if ( $c eq 30 ) {
exit;
}
}
print Dumper \#s_arr; # It's showing nothing
You are calling exit inside of your foreach loop. That makes the program stop, and the print Dumper #s_arr is never reached.
To break out of a loop you need to use last.
foreach my $name ( sort ... ) {
# ...
$c++;
last if $c == 30; # break out of the loop when $c reaches 30
}
I used the postfix variant of if here because that makes it way easier to read. Also note that as zdim pointed out above, you should use the numerical equality check == when checking for numbers. eq is for strings.

Running a nested while loop inside a foreach loop in Perl

I'm trying to use a foreach loop to loop through an array and then use a nested while loop to loop through each line of a text file to see if the array element matches a line of text; if so then I push data from that line into a new array to perform calculations.
The outer foreach loop appears to be working correctly (based on printed results with each array element) but the inner while loop is not looping (same data pushed into array each time).
Any advice?
The code is below
#! /usr/bin/perl -T
use CGI qw(:cgi-lib :standard);
print "Content-type: text/html\n\n";
my $input = param('sequence');
my $meanexpfile = "final_expression_complete.txt";
open(FILE, $meanexpfile) or print "unable to open file";
my #meanmatches;
#regex = (split /\s/, $input);
foreach $regex (#regex) {
while (my $line = <FILE>) {
if ( $line =~ m/$regex\s(.+\n)/i ) {
push(#meanmatches, $1);
}
}
my $average = average(#meanmatches);
my $std_dev = std_dev($average, #meanmatches);
my $average_round = sprintf("%0.4f", $average);
my $stdev_round = sprintf("%0.4f", $std_dev);
my $coefficient_of_variation = $stdev_round / $average_round;
my $cv_round = sprintf("%0.4f", $coefficient_of_variation);
print font(
{ color => "blue" }, "<br><B>$regex average: $average_round
&nbspStandard deviation: $stdev_round&nbspCoefficient of
variation(Cv): $cv_round</B>"
);
}
sub average {
my (#values) = #_;
my $count = scalar #values;
my $total = 0;
$total += $_ for #values;
return $count ? $total / $count : 0;
}
sub std_dev {
my ($average, #values) = #_;
my $count = scalar #values;
my $std_dev_sum = 0;
$std_dev_sum += ($_ - $average)**2 for #values;
return $count ? sqrt($std_dev_sum / $count) : 0;
}
Yes, my advice would be:
Turn on strict and warnings.
perltidy your code,
use 3 argument open: open ( my $inputfile, "<", 'final_expression.txt' );
die if it doesn't open - the rest of your program is irrelevant.
chomp $line
you are iterating your filehandle, but once you've done this you're at the end of file for the next iteration of the foreach loop so your while loops becomes a null operation. Simplistically, reading the file into an array my #lines = <FILE>; would fix this.
So with that in mind:
#!/usr/bin/perl -T
use strict;
use warnings;
use CGI qw(:cgi-lib :standard);
print "Content-type: text/html\n\n";
my $input = param('sequence');
my $meanexpfile = "final_expression_complete.txt";
open( my $input_file, "<", $meanexpfile ) or die "unable to open file";
my #meanmatches;
my #regex = ( split /\s/, $input );
my #lines = <$input_file>;
chomp (#lines);
close($input_file) or warn $!;
foreach my $regex (#regex) {
foreach my $line (#lines) {
if ( $line =~ m/$regex\s(.+\n)/i ) {
push( #meanmatches, $1 );
}
}
my $average = average(#meanmatches);
my $std_dev = std_dev( $average, #meanmatches );
my $average_round = sprintf( "%0.4f", $average );
my $stdev_round = sprintf( "%0.4f", $std_dev );
my $coefficient_of_variation = $stdev_round / $average_round;
my $cv_round = sprintf( "%0.4f", $coefficient_of_variation );
print font(
{ color => "blue" }, "<br><B>$regex average: $average_round
&nbspStandard deviation: $stdev_round&nbspCoefficient of
variation(Cv): $cv_round</B>"
);
}
sub average {
my (#values) = #_;
my $count = scalar #values;
my $total = 0;
$total += $_ for #values;
return $count ? $total / $count : 0;
}
sub std_dev {
my ( $average, #values ) = #_;
my $count = scalar #values;
my $std_dev_sum = 0;
$std_dev_sum += ( $_ - $average )**2 for #values;
return $count ? sqrt( $std_dev_sum / $count ) : 0;
}
The problem here is that starting from the second iteration of foreach you are trying to read from already read file handle. You need to rewind to the beginning to read it again:
foreach $regex (#regex) {
seek FILE, 0, 0;
while ( my $line = <FILE> ) {
However that does not look very performant. Why read file several times at all, when you can read it once before the foreach starts, and then iterate through the list:
my #lines;
while (<FILE>) {
push (#lines, $_);
}
foreach $regex (#regex) {
foreach $line (#lines) {
Having the latter, you might also what to consider using grep instead of the while loop.

Renaming files using hash table in perl

I have made a perl code which is shown below. Here what I am trying to do is first get input from a text file consisting of a HTTP URL with a Title.
thus the first regex is the title and the second regex fetches the id from inside the URL.
All these values are inserted into the hash table %myfilenames().
So this hash table has key as the URL id, and value as the Title. Everything till here works fine, now I have a set of files on my computer which have the ID in their name which we extracted from the URL.
What I want to do is that if the ID is there in the hash table, then the files name should change to the value assigned to the ID. Now the output at the print statement in the last function is correct but I am unable to rename the files. I tried many things, but nothing works. Can someone help please.
example stuff:
URL: https://abc.com/789012 <--- ID
Value (new Title) : ABC
file name on computer = file-789012 <---- ID
new file name = ABC
My code:
use File::Slurp;
use File::Copy qw(move);
open( F, '<hadoop.txt' );
$key = '';
$value = '';
%myfilenames = ();
foreach (<F>) {
if ( $_ =~ /Lecture/ ) {
$value = $_;
}
if ( $_ =~ /https/ ) {
if ( $_ =~ /\d{6}/ ) {
$key = $&;
}
}
if ( !( $value eq '' || $key eq '' ) ) {
#print "$key\t\t$value";
$myfilenames{$key} = $value;
$key = '';
$value = '';
}
}
#while ( my ( $k, $v ) = each %myfilenames ) { print "$k $v\n"; }
my #files = read_dir 'C:\\inputfolder';
for (#files) {
if ( $_ =~ /\d{6}/ ) {
$oldval = $&;
}
$newval = $myfilenames{$oldval};
chomp($newval);
print $_ , "\t\t$newval" . "\n";
$key = '';
}
You probably didn't prepend the path to the file names. The following works for me (on a Linux box):
#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp qw{ read_dir };
my $dir = 0;
mkdir $dir;
open my $FH, '>', "$dir/$_" for 123456, 234567;
my $key = my $value = q();
my %myfilenames = ();
for (<DATA>) {
chomp;
$value = $_ if /Lecture/;
$key = $1 if /https/ and /(\d{6})/;
if ($value ne q() and $key ne q()) {
$myfilenames{$key} = $value;
$key = $value = q();
}
}
my #files = read_dir($dir);
for (#files) {
if (/(\d{6})/) {
my $oldval = $1;
my $newval = $myfilenames{$oldval};
rename "$dir/$oldval", "$dir/$newval";
}
}
__DATA__
Lecture A1
https://123456
# Comment
Lecture A2
https://234567

Why does perl "hash of lists" do this?

I have a hash of lists that is not getting populated.
I checked that the block at the end that adds to the hash is in fact being called on input. It should either add a singleton list if the key doesn't exist, or else push to the back of the list (referenced under the right key) if it does.
I understand that the GOTO is ugly, but I've commented it out and it has no effect.
The problem is that when printhits is called, nothing is printed, as if there are no values in the hash. I also tried each (%genomehits), no dice.
THANKS!
#!/usr/bin/perl
use strict;
use warnings;
my $len = 11; # resolution of the peaks
#$ARGV[0] is input file
#$ARGV[1] is call number
# optional -s = spread number from call
# optional -o specify output file name
my $usage = "see arguments";
my $input = shift #ARGV or die $usage;
my $call = shift #ARGV or die $usage;
my $therest = join(" ",#ARGV) . " ";
print "the rest".$therest."\n";
my $spread = 1;
my $output = $input . ".out";
if ($therest =~ /-s\s+(\d+)\s/) {$spread = $1;}
if ($therest =~ /-o\s+(.+)\s/) {$output = $1;}
# initialize master hash
my %genomehits = ();
foreach (split ';', $input) {
my $mygenename = "err_naming";
if ($_ =~ /^(.+)-/) {$mygenename = $1;}
open (INPUT, $_);
my #wiggle = <INPUT>;
&singlegene(\%genomehits, \#wiggle, $mygenename);
close (INPUT);
}
&printhits;
#print %genomehits;
sub printhits {
foreach my $key (%genomehits) {
print "key: $key , values: ";
foreach (#{$genomehits{$key}}) {
print $_ . ";";
}
print "\n";
}
}
sub singlegene {
# let %hash be the mapping hash
# let #mygene be the gene to currently process
# let $mygenename be the name of the gene to currently process
my (%hash) = %{$_[0]};
my (#mygene) = #{$_[1]};
my $mygenename = $_[2];
my $chromosome;
my $leftbound = -2;
my $rightbound = -2;
foreach (#mygene) {
#print "Doing line ". $_ . "\n";
if ($_ =~ "track" or $_ =~ "output" or $_ =~ "#") {next;}
if ($_ =~ "Step") {
if ($_ =~ /chrom=(.+)\s/) {$chromosome = $1;}
if ($_ =~ /span=(\d+)/) {$1 == 1 or die ("don't support span not equal to one, see wig spec")};
$leftbound = -2;
$rightbound = -2;
next;
}
my #line = split /\t/, $_;
my $pos = $line[0];
my $val = $line[-1];
# above threshold for a call
if ($val >= $call) {
# start of range
if ($rightbound != ($pos - 1)) {
$leftbound = $pos;
$rightbound = $pos;
}
# middle of range, increment rightbound
else {
$rightbound = $pos;
}
if (\$_ =~ $mygene[-1]) {goto FORTHELASTONE;}
}
# else reinitialize: not a call
else {
FORTHELASTONE:
# typical case, in an ocean of OFFs
if ($rightbound != ($pos-1)) {
$leftbound = $pos;
}
else {
# register the range
my $range = $rightbound - $leftbound;
for ($spread) {
$leftbound -= $len;
$rightbound += $len;
}
#print $range . "\n";
foreach ($leftbound .. $rightbound) {
my $key = "$chromosome:$_";
if (not defined $hash{$key}) {
$hash{$key} = [$mygenename];
}
else { push #{$hash{$key}}, $mygenename; }
}
}
}
}
}
You are passing a reference to %genomehits to the function singlegene, and then copying it into a new hash when you do my (%hash) = %{$_[0]};. You then add values to %hash which goes away at the end of the function.
To fix it, use the reference directly with arrow notation. E.g.
my $hash = $_[0];
...
$hash->{$key} = yadda yadda;
I think it's this line:
my (%hash) = %{$_[0]};
You're passing in a reference, but this statement is making a copy of your hash. All additions you make in singlegene are then lost when you return.
Leave it as a hash reference and it should work.
PS - Data::Dumper is your friend when large data structures are not behaving as expected. I'd sprinkle a few of these in your code...
use Data::Dumper; print Dumper \%genomehash;