help parsing xml string in perl - perl

I'm having trouble doing a match for this xml string in perl.
<?xml version="1.0" encoding="UTF-8"?><HttpRemoteException path="/proj/feed/abc" class="java.io.FileNotFoundException" message="/proj/feed/abc: No such file or directory."/>
I want to place a condition on FileNotFoundException like so:
code snippet:
my #lines = qx(#cmdargs);
foreach my $line (#lines) { print "$line"; }
if (my $line =~ m/(FileNotFoundException)/) {
print "We have an ERROR: $line\n";
}
Error:
Use of uninitialized value in pattern match (m//) at ./tst.pl

You never assign anything to the variable against which you match (since you create the variable right there inside the if condition), so it doesn't contain what you say it does.
Use use strict; use warnings;!!!
It would have given you a warning. Remove the my.

You should test $lineinside the foreach loop:
my #lines = qx(#cmdargs);
foreach my $line (#lines) {
print "$line";
if ($line =~ m/(FileNotFoundException)/) {
print "We have an ERROR: $line\n";
}
}

Related

Perl script - Confusing error

When I run this code, I am purely trying to get all the lines containing the word "that" in them. Sounds easy enough. But when I run it, I get a list of matches that contain the word "that" but only at the end of the line. I don't know why it's coming out like this and I have been going crazy trying to solve it. I am currently getting an output of 268 total matches, and the output I need is only 13. Please advise!
#!/usr/bin/perl -w
#Usage: conc.shift.pl textfile word
open (FH, "$ARGV[0]") || die "cannot open";
#array = (1,2,3,4,5);
$count = 0;
while($line = <FH>) {
chomp $line;
shift #array;
push(#array, $line);
$count++;
if ($line =~ /that/)
{
$output = join(" ",#array);
print "$output \n";
}
}
print "Total matches: $count\n";
Don't you want to increment your $count variable only if the line contains "that", i.e.:
if ($line =~ /that/) {
$count++;
instead of incrementing the counter before checking if $line contains "that", as you have it:
$count++;
if ($line =~ /that/) {
Similarly, I suspect that your push() and join() calls, for stashing a matching line in #array, should also be within the if block, only executed if the line contains "that".
Hope this helps!

Add a line after every string match

I have a sample file here http://pastebin.com/m5m40nGF
What I want to do is add a line after every instance of protein_id.
protein_id always has the same pattern:
TAB-TAB-TAB-protein_id-TAB-gnl|CorradiLab|M715_#SOME_NUMBER
What I need to do is to add this after every line of protein_id:
TAB-TAB-TAB-transcript_id-TAB-gnl|CorradiLab|M715_mRNA_#SOME_NUMBER
The catch is that #SOME_NUMBER has to stay the same.
In the first case, it would look like this:
94 1476 CDS
protein_id gnl|CorradiLab|M715_ECU01_0190
transcript_id gnl|CorradiLab|M715_mRNA_ECU01_0190
product serine hydroxymethyltransferase
label serine hydroxymethyltransferase
Thanks! Adrian
I tried a perl solution, but I get an error.
open(IN, $in); while(<IN>){
print $_;
if ($_ ~= /gnl\|CorradiLab\|/) {
$_ =~ s/tprotein_id/transcript_id/;
print $_;
}
}
Error:
syntax error at test.pl line 3, near "$_ ~"
syntax error at test.pl line 7, near "}"
Execution of test.pl aborted due to compilation errors.
The following perl script worked
my $in=shift;
open(IN, $in); while(<IN>){
print $_;
if ($_ =~ /gnl\|CorradiLab\|/) {
my $tmp = $_;
$tmp =~ s/protein_id/transcript_id/;
print $tmp;
}
}
Offering an update on existing answer because I feel it can be improved further:
Generally - the precise problem in the OP is this line:
if ($_ ~= /gnl\|CorradiLab\|/) {
Because you've got ~= not =~. That's what syntax error at test.pl line 3, near "$_ ~" is trying to tell you.
I would offer that improving on:
my $in=shift;
open(IN, $in); while(<IN>){
print $_;
if ($_ =~ /gnl\|CorradiLab\|/) {
my $tmp = $_;
$tmp =~ s/protein_id/transcript_id/;
print $tmp;
}
}
while ( my $tmp = <IN> ) { skips the need to assign $_.
3 argument open with lexical filehandle is preferable. E.g. open ( my $in, "<", "$input_filename" ) or die $!; (You should test whether the open worked too)
Explicit open may well be unnecessary if you're just reading a filename from command line. Using <> either reads filenames (opening and processing) or STDIN, which means your script becomes a bit more versatile.
Thus I would rewrite as:
#!/usr/bin/perl
use strict;
use warnings;
while ( my $line = <> ) {
print $line;
if ( $line =~ /gnl\|CorradiLab\|/ ) {
$line =~ s/protein_id/transcript_id/;
print $line;
}
}
Or alternatively:
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
print;
if (m/gnl\|CorradiLab\|/) {
s/protein_id/transcript_id/;
print;
}
}

perl script miscounting because of empty lines

the below script is basically catching the second column and counting the values. The only minor issue I have is that the file has empty lines at the end (it's how the values are being exported) and because of these empty lines the script is miscounting. Any ideas please? Thanks.
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
$line =~ m/\s+(\d+)/; #regexpr to catch second column values
$sum_column_b += $1;
}
print $sum_column_b, "\n";
I think the main issue has been established, you are using $1 when it is not conditionally tied to the regex match, which causes you to add values when you should not. This is an alternative solution:
$sum_column_b += $1 if $line =~ m/\s+(\d+)/;
Typically, you should never use $1 unless you check that the regex you expect it to come from succeeded. Use either something like this:
if ($line =~ /(\d+)/) {
$sum += $1;
}
Or use direct assignment to a variable:
my ($num) = $line =~ /(\d+)/;
$sum += $num;
Note that you need to use list context by adding parentheses around the variable, or the regex will simply return 1 for success. Also note that, like Borodin says, this will give an undefined value when the match fails, and you must add code to check for that.
This can be handy when capturing several values:
my #nums = $line =~ /(\d+)/g;
The main problem is that if the regex does not match, then $1 will hold the value it received in the previous successful match. So every empty line will cause the previous line to be counted again.
An improvement would be:
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
next if $line =~ /^\s*$/; # skip "empty" lines
# ... maybe skip other known invalid lines
if ($line =~ m/\s+(\d+)/) { #regexpr to catch second column values
$sum_column_b += $1;
} else {
warn "problematic line '$line'\n"; # report invalid lines
}
}
print $sum_column_b, "\n";
The else-block is of course optional but can help noticing invalid data.
Try putting this line just after the while line:
next if ( $line =~ /^$/ );
Basically, loop around to the next line if the current line has no content.
#!/usr/bin/perl
use warnings;
use strict;
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while (my $line = <$file>) {
next if (m/^\s*$/); # next line if this is unsignificant
if ($line =~ m/\s+(\d+)/) {
$sum_column_b += $1;
}
}
print "$sum_column_b\n";

Perl - passing an array to subroutine

I'm in the process of learning Perl and am trying to write a script that takes a pattern and list of files as command line arguments and passes them to a subroutine, the subroutine then opens each file and prints the lines that match the pattern. The code below works; however, it stops after printing the lines from the first file and doesn't even touch the second file. What am I missing here?
#!/usr/bin/perl
use strict;
use warnings;
sub grep_file
{
my $pattern = shift;
my #files = shift;
foreach my $doc (#files)
{
open FILE, $doc;
while (my $line = <FILE>)
{
if ($line =~ m/$pattern/)
{
print $line;
}
}
}
grep_file #ARGV;
Shift pops an element from your parameter (see: http://perldoc.perl.org/functions/shift.html).
So #files can only contain one value.
Try
sub foo
{
my $one = shift #_;
my #files = #_;
print $one."\n";
print #files;
}
foo(#ARGV);
There is little reason to use a subroutine here. You are just putting the whole program inside a function and then calling it.
The empty <> operator will read from all the files in #ARGV in sequence, without you having to open them explicitly.
I would code your program like this
use strict;
use warnings;
my $pattern = shift;
$pattern = qr/$pattern/; # Compile the regex
while (<>) {
print if $_ =~ $pattern;
}

Perl: Searching a file

I am creating a perl script that takes in the a file (example ./prog file)
I need to parse through the file and search for a string. This is what I thought would work, but it does not seem to work. The file is one work per line containing 50 lines
#array = < >;
print "Enter the word you what to match\n";
chomp($match = <STDIN>);
foreach $line (#array){
if($match eq $line){
print "The word is a match";
exit
}
}
You're chomping your user input, but not the lines from the file.
They can't match; one ends with \n the other does not. Getting rid of your chomp should solve the problem. (Or, adding a chomp($line) to your loop).
$match = <STDIN>;
or
foreach $line (#array){
chomp($line);
if($match eq $line){
print "The word is a match";
exit;
}
}
Edit in the hope that the OP notices his mistake from the comments below:
Changing eq to == doesn't "fix" anything; it breaks it. You need to use eq for string comparison. You need to do one of the above to fix your code.
$a = "foo\n";
$b = "bar";
print "yup\n" if ($a == $b);
Output:
yup