Perl: Debug for uninitialized s///? - perl

I'm some trouble finding the problem with my program. Getting the error:
Use of uninitialized value in substitution (s///)
I realize this has been asked before, but that didn't help me. I realize $1 might be unitialized, but I was wondering if you guys could help me figure out why?
Here's the problem part of the code:
$one_match_ref->{'sentence'} = $1 if ($line =~ /^Parsing \[sent. \d+ len. \d+\]: \[(.+)\]/);
$one_match_ref->{'sentence'} =~ s/, / /g;
EDIT: I have declared the $one_match_ref->{'sentence'} like so:
my $sentence;
$one_match_ref = {
chapternumber => $chapternumber_value,
sentencenumber => $sentencenumber_value,
sentence => $sentence, ##Get from parsed text: remove commas
grammar_relation => $grammar_relation_value, ##Get from parsed text: split?
arg1 => $argument1, ##Get from parsed text: first_dependencyword
arg2 => $argument2 ##Get from parsed text: second_dependencyword
};
But none of these variables have anything assigned to them.
My attempts:
A. If I put: if( defined (one_match_ref->{'sentence'})) after the s///, it works. But this is cumbersome, and seems to be avoiding the problem instead of fixing it.
The last time I used that fix, it was because my loop had an "off-by-one" error, I don't think this is the case this time.
B. If I declare: my $sentence = ''; It prints, but with a lot of blank lines in between. How can I eliminate these?
EDIT: For interest and efficiency purposes: Is it better to use split to get what I want?
Thanks in advance for any help or advice. Let me know if you need an example of the file format.

Your code boils down to
my $sentence;
$one_match_ref = { sentence => $sentence };
() if ($line =~ /^Parsing \[sent. \d+ len. \d+\]: \[(.+)\]/);
$one_match_ref->{'sentence'} =~ s/, / /g;
You assign undef to $one_match_ref->{'sentence'}, then you try to remove the commas from it. That doesn't make any sense, thus the warning.
Maybe you want
my $sentence;
$one_match_ref = { sentence => $sentence };
if ($line =~ /^Parsing \[sent. \d+ len. \d+\]: \[(.+)\]/) {
$one_match_ref->{'sentence'} = $1;
$one_match_ref->{'sentence'} =~ s/, / /g;
}

I'm not sure it's $1 that's uninitialised here but rather $one_match_ref->{'sentence'}.
That value is set if and only if the line matches the regex. Otherwise it's not touched at all.
My reasoning is that it's complaining during the substitute rather than the assignment. You could possibly fix it by simply setting $one_match_ref->{'sentence'} to a known value before those two lines (such as the empty string).
But this depends on what you're actually using those values for.

Related

Why is my last line is always output twice?

I have a uniprot document with a protein sequence as well as some metadata. I need to use perl to match the sequence and print it out but for some reason the last line always comes out two times. The code I wrote is here
#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
if($_=~m /^\s+(\D+)/) { #this is the pattern I used to match the sequence in the document
$seq=$1;
$seq=~s/\s//g;} #removing the spaces from the sequence
print $seq;
}
I instead tried $seq.=$1; but it printed out the sequence 4.5 times. Im sure i have made a mistake here but not sure what. Here is the input file https://www.uniprot.org/uniprot/P30988.txt
Here is your code reformatted and extra whitespace added between operators to make it clearer what scope the statements are running in.
#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
if ($_ =~ m /^\s+(\D+)/) {
$seq = $1;
$seq =~ s/\s//g;
}
print $seq;
}
The placement of the print command means that $seq will be printed for every line from the input file -- even those that don't match the regex.
I suspect you want this
#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
if ($_ =~ m /^\s+(\D+)/) {
$seq = $1;
$seq =~ s/\s//g;
# only print $seq for lines that match with /^\s+(\D+)/
# Also - added a newline to make it easier to debug
print $seq . "\n";
}
}
When I run that I get this
MRFTFTSRCLALFLLLNHPTPILPAFSNQTYPTIEPKPFLYVVGRKKMMDAQYKCYDRMQ
QLPAYQGEGPYCNRTWDGWLCWDDTPAGVLSYQFCPDYFPDFDPSEKVTKYCDEKGVWFK
HPENNRTWSNYTMCNAFTPEKLKNAYVLYYLAIVGHSLSIFTLVISLGIFVFFRSLGCQR
VTLHKNMFLTYILNSMIIIIHLVEVVPNGELVRRDPVSCKILHFFHQYMMACNYFWMLCE
GIYLHTLIVVAVFTEKQRLRWYYLLGWGFPLVPTTIHAITRAVYFNDNCWLSVETHLLYI
IHGPVMAALVVNFFFLLNIVRVLVTKMRETHEAESHMYLKAVKATMILVPLLGIQFVVFP
WRPSNKMLGKIYDYVMHSLIHFQGFFVATIYCFCNNEVQTTVKRQWAQFKIQWNQRWGRR
PSNRSARAAAAAAEAGDIPIYICHQELRNEPANNQGEESAEIIPLNIIEQESSA
You can simplify this a bit:
while (<IN>) {
next unless m/^\s/;
s/\s+//g;
print;
}
You want the lines that begin with whitespace, so immediately skip those that don't. Said another way, quickly reject things you don't want, which is different than accepting things you do want. This means that everything after the next knows it's dealing with a good line. Now the if disappears.
You don't need to get a capture ($1) to get the interesting text because the only other text in the line is the leading whitespace. That leading whitespace disappears when you remove all the whitespace. This gets rid of the if and the extra variable.
Finally, print what's left. Without an argument, print uses the value in the topic variable $_.
Now that's much more manageable. You escape that scoping issue with if causing the extra output because there's no scope to worry about.

How can I convert a string number to a number in Perl? [duplicate]

Is there any way to replace multiple strings in a string?
For example, I have the string hello world what a lovely day and I want to replace what and lovely with something else..
$sentence = "hello world what a lovely day";
#list = ("what", "lovely"); # strings to replace
#replist = ("its", "bad"); # strings to replace with
($val = $sentence) =~ "tr/#list/#replist/d";
print "$val\n"; # should print "hello world its a bad day"..
Any ideas why it's not working?
Thanks.
First of all, tr doesn't work that way; consult perldoc perlop for details, but tr does transliteration, and is very different from substitution.
For this purpose, a more correct way to replace would be
# $val
$val =~ s/what/its/g;
$val =~ s/lovely/bad/g;
Note that "simultaneous" change is rather more difficult, but we could do it, for example,
%replacements = ("what" => "its", "lovely" => "bad");
($val = $sentence) =~ s/(#{[join "|", keys %replacements]})/$replacements{$1}/g;
(Escaping may be necessary to replace strings with metacharacters, of course.)
This is still only simultaneous in a very loose sense of the term, but it does, for most purposes, act as if the substitutions are done in one pass.
Also, it is more correct to replace "what" with "it's", rather than "its".
Well, mainly it's not working as tr///d has nothing to do with your request (tr/abc/12/d replaces a with 1, b with 2, and removes c). Also, by default arrays don't interpolate into regular expressions in a way that's useful for your task. Also, without something like a hash lookup or a subroutine call or other logic, you can't make decisions in the right-hand side of a s/// operation.
To answer the question in the title, you can perform multiple replaces simultaneously--er, in convenient succession--in this manner:
#! /usr/bin/env perl
use common::sense;
my $sentence = "hello world what a lovely day";
for ($sentence) {
s/what/it's/;
s/lovely/bad/
}
say $sentence;
To do something more like what you attempt here:
#! /usr/bin/env perl
use common::sense;
my $sentence = "hello world what a lovely day";
my %replace = (
what => "it's",
lovely => 'bad'
);
$sentence =~ s/(#{[join '|', map { quotemeta($_) } keys %replace]})/$replace{$1}/g;
say $sentence;
If you'll be doing a lot of such replacements, 'compile' the regex first:
my $matchkey = qr/#{[join '|', map { quotemeta($_) } keys %replace]}/;
...
$sentence =~ s/($matchkey)/$replace{$1}/g;
EDIT:
And to expand on my remark about array interpolation, you can change $":
local $" = '|';
$sentence =~ s/(#{[keys %replace]})/$replace{$1}/g;
# --> $sentence =~ s/(what|lovely)/$replace{$1}/g;
Which doesn't improve things here, really, although it may if you already had the keys in an array:
local $" = '|';
$sentence =~ s/(#keys)/$replace{$1}/g;

Perl: Why would eq work, when =~ doesn't?

Working code:
if ( $check1 eq $search_key ...
Previous 'buggy' code:
if ( $check1 =~ /$search_key/ ...
The words (in $check1 and $search_key) should be the same, but why doesn't the 2nd one return true all the time? What is different about these?
$check1 is acquired through a split. $search_key is either inputted before ("word") or at runtime: (<>), both are then passed to a subroutine.
A further question would be, can I convert the following with without any hidden problems?
if ($category_id eq "subj") {
I want to be able to say: =~ /subj/ so that "subject" would still remain true.
Thanks in advance.
$check1 =~ /$search_key/ doesn't work because any special characters in $search_key will be interpreted as a part of the regular expression.
Moreover, this really tests whether $check1 contains the substring $search_key. You really wanted $check1 =~ /^$search_key$/, although it's still incorrect because of the reason mentioned above.
Better stick with eq for exact string comparisons.
as mentioned before, special characters in $search_key will be interpreted, to prevent this, use \Q: if ( $check1 =~ /\Q$search_key/), which will take he content of $search_key as a literal. You can use \E to end this if ( $check1 =~ /\b\Q$search_key\E\b/) for example.
This information is in perlre
Regarding your second question, if just you want plain substring matching, you can use the index function. Then replace
if ($category_id eq "subj") {
with
if (0 <= index $category_id, "subj") {
This is a case-sensitive match.
Addition for clarafication: it will match asubj, subj, and even subjugate

How can i detect symbols using regular expression in perl?

Please how can i use regular expression to check if word starts or ends with a symbol character, also how to can i process the text within the symbol.
Example:
(text) or te-xt, or tex't. or text?
change it to
(<t>text</t>) or <t>te-xt</t>, or <t>tex't</t>. or <t>text</t>?
help me out?
Thanks
I assume that "word" means alphanumeric characters from your example? If you have a list of permitted characters which constitute a valid word, then this is enough:
my $string = "x1 .text1; 'text2 \"text3;\"";
$string =~ s/([a-zA-Z0-9]+)/<t>$1<\/t>/g;
# Add more to character class [a-zA-Z0-9] if needed
print "$string\n";
# OUTPUT: <t>x1</t> .<t>text1</t>; '<t>text2</t> "<t>text3</t>;"
UPDATE
Based on your example you seem to want to DELETE dashes and apostrophes, if you want to delete them globally (e.g. whether they are inside the word or not), before the first regex, you do
$string =~ s/['-]//g;
I am using DVK's approach here, but with a slight modification. The difference is that her/his code would also put the tags around all words that don't contain/are next to a symbol, which (according to the example given in the question) is not desired.
#!/usr/bin/perl
use strict;
use warnings;
sub modify {
my $input = shift;
my $text_char = 'a-zA-Z0-9\-\''; # characters that are considered text
# if there is no symbol, don't change anything
if ($input =~ /^[a-zA-Z0-9]+$/) {
return $input;
}
else {
$input =~ s/([$text_char]+)/<t>$1<\/t>/g;
return $input;
}
}
my $initial_string = "(text) or te-xt, or tex't. or text?";
my $expected_string = "(<t>text</t>) or <t>te-xt</t>, or <t>tex't</t>. or <t>text</t>?";
# version BEFORE edit 1:
#my #aux;
# take the initial string apart and process it one word at a time
#my #string_list = split/\s+/, $initial_string;
#
#foreach my $string (#string_list) {
# $string = modify($string);
# push #aux, $string;
#}
#
# put the string together again
#my $final_string = join(' ', #aux);
# ************ EDIT 1 version ************
my $final_string = join ' ', map { modify($_) } split/\s+/, $initial_string;
if ($final_string eq $expected_string) {
print "it worked\n";
}
This strikes me as a somewhat long-winded way of doing it, but it seemed quicker than drawing up a more sophisticated regex...
EDIT 1: I have incorporated the changes suggested by DVK (using map instead of foreach). Now the syntax highlighting is looking even worse than before; I hope it doesn't obscure anything...
This takes standard input and processes it to and prints on Standard output.
while (<>) {
s {
( [a-zA-z]+ ) # word
(?= [,.)?] ) # a symbol
}
{<t>$1</t>}gx ;
print ;
}
You might need to change the bit to match the concept of word.
I have use the x modifeid to allow the regexx to be spaced over more than one line.
If the input is in a Perl variable, try
$string =~ s{
( [a-zA-z]+ ) # word
(?= [,.)?] ) # a symbol
}
{<t>$1</t>}gx ;

What is wrong with this Perl code?

$value = $list[1] ~ s/\D//g;
syntax error at try1.pl line 53, near "] ~"
Execution of try1.pl aborted due to compilation errors.
I am trying to extract the digits from the second element of #list, and store it into $value.
You mean =~, not ~. ~ is a unary bitwise negation operator.
A couple of ways to do this:
($value) = $list[1] =~ /(\d+)/;
Both sets of parens are important; only if there are capturing parentheses does the match operation return actual content instead of just an indication of success, and then only in list context (provided by the list-assign operator ()=).
Or the common idiom of copy and then modify:
($value = $list[1]) =~ s/\D//;
maybe you wanted the =~ operator?
P.S. note that $value will not get assigned the resulting string (the string itself is changed in place). $value will get assigned the number of substitutions that were made
You said in a comment that are trying to get rid of non-digits. It looks like you are trying to preserve the old value and get the modified value in a new variable. The Perl idiom for that is:
( my $new = $old ) =~ s/\D//g;
And wanted \digits not non-\Digits. And have a superfluous s/ubstitute operator where a match makes more sense.
if ($list[1] =~ /(\d+)/) {
$value = $1;
}