String matching in perl (if statement in for loop) [closed]

String matching in perl (if statement in for loop) [closed] - perl

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I have following code
#!C:\Perl64\bin -w
#use strict; use warnings;
init_words();
print "What is your name Mr. \n";
$name = <STDIN>;
chomp ($name);
if ($name =~ /^randal\b/i){
print "Hello, Randal, How are you doing \n";
} else {
print "Hello, $name!\n";
print "Tell the secret word\n";
$guess = <STDIN>;
chomp ($guess);
while (!good_word ($name,$guess)) {
print "Wrong, please try again\n";
$guess = <STDIN>;
chomp ($guess);
}
}
sub init_words {
open (WORDSLIST, "wordslist.txt") || die "can't open wordslist: $!";
$k = 1;
$a = 0;
$b = 0;
while (defined ($name = <WORDSLIST>)) {
if ($k % 2 == 0) {
chomp ($name);
$words1[$a] = $name;
++$k;
++$a;
} else {
chomp ($name);
$words2[$b] = $name;
++$k;
++$b;
}
}
close (WORDSLIST) || die "couldn't close wordlist: $!";
}
sub good_word {
my ($somename, $someguess) = #_;
$somename =~ s/\W.*//;
$somename =~ tr/A-Z/a-z/;
if ($somename eq "randal") {
return 1;
} else {
#$n = 0;
#words1 has secret words.
#words2 has names.
$t = scalar #words1;
$u = scalar #words2;
print "the words1 array is #words1 \n";
print "the words2 array is #words2 \n";
for ($d = 0; $d < $u; $d++) {
#print "currently name in array is #words2[$d]\n";
print "The value of somename is $somename \n";
$delta = $words2[$d];
print "The value of delta is $delta";
#use strict; use warnings;
if ($delta eq '$somename') {
print "test";
return 1;
}
}
#print "The final value of d is $d";
#print " The final value of array is #words1[$d]";
#if ("groucho" eq $someguess) {
#return 1;}
#else{
#while ($n < $t){
#if (#words1[$n] eq $someguess) {
#return 1;}
#else { ++$n};
}
The main goal of the code is to have wordslist defined. The code should split the wordslist into two sublists i.e. #words1 and #words2. User is asked for a name and then secret guess. The code should check for the name in the #words2 and if match is found program exit (with printing test).
For some reason, it is not working as expected. I tried doing some basic debugging and everything looks ok but in the function good_word, the if statement under for loop is never returned true although i can see in my debugging that both $somename and $delta are same.
Any suggestions??

Change
if ($delta eq '$somename'){
to
if ($delta eq $somename){
Perl strings with double quotes (") will interpolate variables like $somename but strings with single quotes (') will not do that.
Reference to documentation about that: http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators

Related

How can I calculate the geometric center of a protein in Perl?

I have a PDB file which contains information about a specific protein. One of the information it holds is the positions of the different atoms (xyz coordinates).
The file is the following https://files.rcsb.org/view/6U9D.pdb . With this file I want to calculate the geometric center of the atoms. In theory I know what I need to do, but the script I wrote does not seem to work.
The first part of the program, before the for loop, is another part of the assignment which requires me to read the sequence and convert it from the 3 letter nomenclature to the 1 letter one. The part that interests me is the for loop until the end. I tried to pattern match in order to isolate the XYZ coordinates. Then I used a counter that I had set up in the beginning which is the $k variable. When I check the output on cygwin the only output I get is the sequence 0 0 0 instead of the sum of each dimension divided by $k. Any clue what has gone wrong?
$k=0;
open (IN, '6U9D.pdb.txt');
%amino_acid_conversion = (
ALA=>'A',TYR=>'Y',MET=>'M',LEU=>'L',CYS=>'C',
GLY=>'G',ARG=>'R',ASN=>'N',ASP=>'D',GLN=>'Q',
GLU=>'E',HIS=>'H',TRP=>'W',LYS=>'K',PHE=>'F',
PRO=>'P',SER=>'S',THR=>'T',ILE=>'I',VAL=>'V'
);
while (<IN>) {
if ($_=~m/HEADER\s+(.*)/){
print ">$1\n"; }
if ($_=~m/^SEQRES\s+\d+\s+\w+\s+\d+\s+(.*)/){
$seq.=$1;
$seq=~s/ //g;
}
}
for ($i=0;$i<=length $seq; $i+=3) {
print "$amino_acid_conversion{substr($seq,$i,3)}";
if ($_=~m/^ATOM\s+\d+\s+\w+\s+\w+\s+\w+\s+\d+\s+(\S+)\s+(\S+)\s+(\S+)/) {
$x+=$1; $y+=$2; $z+=$3; $k++;
}
}
print "\n";
#print $k;
$xgk=($x/$k); $ygk=($y/$k); $zgk=($z/$k);
print "$xgk $ygk $zgk \n";

I do not know bioinformatics but it seems like you should do something like this:
use feature qw(say);
use strict;
use warnings;
my $fn = '6U9D.pdb';
open ( my $IN, '<', $fn ) or die "Could not open file '$fn': $!";
my $seq = '';
my $x = 0;
my $y = 0;
my $z = 0;
my $k = 0;
while (<$IN>) {
if ($_ =~ m/HEADER\s+(.*)/) {
say ">$1";
}
if ($_=~m/^SEQRES\s+\d+\s+\w+\s+\d+\s+(.*)/){
$seq .= $1;
}
if ($_ =~ m/^ATOM\s+\d+\s+\w+\s+\w+\s+\w+\s+\d+\s+(\S+)\s+(\S+)\s+(\S+)/) {
$x+=$1; $y+=$2; $z+=$3; $k++;
}
}
close $IN;
$seq =~ s/ //g;
my %amino_acid_conversion = (
ALA=>'A',TYR=>'Y',MET=>'M',LEU=>'L',CYS=>'C',
GLY=>'G',ARG=>'R',ASN=>'N',ASP=>'D',GLN=>'Q',
GLU=>'E',HIS=>'H',TRP=>'W',LYS=>'K',PHE=>'F',
PRO=>'P',SER=>'S',THR=>'T',ILE=>'I',VAL=>'V'
);
my %unknown_keys;
my $conversion = '';
say "Sequence length: ", length $seq;
for (my $i=0; $i < length $seq; $i += 3) {
my $key = substr $seq, $i, 3;
if (exists $amino_acid_conversion{$key}) {
$conversion.= $amino_acid_conversion{$key};
}
else {
$unknown_keys{$key}++;
}
}
say "Conversion: $conversion";
say "Unknown keys: ", join ",", keys %unknown_keys;
say "Number of atoms: ", $k;
my $xgk=($x/$k);
my $ygk=($y/$k);
my $zgk=($z/$k);
say "Geometric center: $xgk $ygk $zgk";
This gives me the following output:
[...]
Number of atoms: 76015
Geometric center: 290.744642162734 69.196842162731 136.395842938893

Perl: undefined value as a HASH reference [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 months ago.
Improve this question
I have inherited an older script that uses hash references that I don't understand. It results in:
Can't use an undefined value as a HASH reference at
./make_quar_dbfile.pl line 65.
63 my $bucket = sprintf('%02x', $i);
64 my $file = sprintf('%s/%02x.db', $qdir, $i);
65 %{$hashes{$bucket}} ? 1 : next;
66 tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
67 %hash = %{$hashes{$bucket}};
68 untie %hash;
The script reads through a number of gzipd emails to identify the sender/recip/subject/date etc., then writes that info to a DB_File hash.
This script used to work with older versions of Perl, but looks like it now is no longer compliant.
I'd really like to understand how this works, but I don't fully understand reference/dereference, why it's even necessary here, and the %{$var} notation. All of the references I've studied talk about hash references in terms of $hash_ref = \%author; not %hash_ref = %{$author}, for example.
Ideas on how to get this to work with hash references would be greatly appreciated.
#!/usr/bin/perl -w
use DB_File;
use File::Basename qw(basename);
use vars qw($verbose);
use strict;
use warnings;
sub DBG($);
$verbose = shift || 1;
my $qdir = '/var/spool/amavisd/qdb';
my $source_dir = '/var/spool/amavisd/quarantine';
my $uid = getpwnam('amavis');
my $gid = getgrnam('amavis');
my %hashes = ( );
my $me = basename($0);
my $version = '1.9';
my $steps = 100;
my $cnt = 0;
DBG("- Creating initial database files...");
for (my $i = 0; $i < 256; $i++) {
my $file = sprintf('%s/%02x.db', $qdir, $i);
unlink $file || DBG("Could not unlink $file to empty db: $! \n");
tie (my %hash, "DB_File", $file, O_CREAT, 0600) || die "Can't open db file: $! \n";
untie %hash;
chown($uid, $gid, $file) || die "Unable to set attributes on file: $! \n";
}
DBG("done\n");
opendir SOURCEDIR, $source_dir || die "Cannot open $source_dir: $! \n";
DBG("- Building hashes... ");
foreach my $f (sort readdir SOURCEDIR) {
next if ($f eq "." || $f eq "..");
if ($f =~ m/^(spam|virus)\-([^\-]+)\-([^\-]+)(\.gz)?/) {
my $type = $1;
my $key = $3;
my $bucket = substr($key, 0, 2);
my $d = $2;
my $subj = '';
my $to = '';
my $from = '';
my $size = '';
my $score = '0.0';
if (($cnt % $steps) == 0) { DBG(sprintf("\e[8D%-8d", $cnt)); } $cnt++;
if ($f =~ /\.gz$/ && open IN, "zcat $source_dir/$f |") {
while(<IN>) {
last if ($_ eq "\n");
$subj = $1 if (/^Subject:\s*(.*)$/);
$to = $1 if (/^To:\s*(.*)$/);
$from = $1 if (/^From:\s*(.*)$/);
$score = $1 if (/score=(\d{1,3}\.\d)/);
}
close IN;
$to =~ s/^.*\<(.*)\>.*$/$1/;
$from =~ s/^.*\<(.*)\>.*$/$1/;
$size = (stat("$source_dir/$f"))[7];
$hashes{$bucket}->{$f} = "$type\t$d\t$size\t$from\t$to\t$subj\t$score";
}
}
}
closedir SOURCEDIR;
DBG("...done\n\n- Populating database files...");
for (my $i = 0; $i < 256; $i++) {
my $bucket = sprintf('%02x', $i);
my $file = sprintf('%s/%02x.db', $qdir, $i);
%{$hashes{$bucket}} ? 1 : next;
tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
%hash = %{$hashes{$bucket}};
untie %hash;
}
exit(0);
sub DBG($) { my $msg = shift; print $msg if ($verbose); }

What is $hash{$key}? A value associated with the (value of) $key, which must be a scalar. So we get the $value out of my %hash = ( $key => $value ).
That's a string, or a number. Or a filehandle. Or, an array reference, or a hash reference. (Or an object perhaps, normally a blessed hash reference.) They are all scalars, single-valued things, and as such are a legitimate value in a hash.
The syntax %{ ... } de-references a hash reference† so judged by %{ $hashes{$bucket} } that code expects there to be a hash reference. So the error says that there is actually nothing in %hashes for that value of a would-be key ($bucket), so it cannot "de-reference" it. There is either no key that is the value of $bucket at that point in the loop, or there is such a key but it has never been assigned anything.
So go debug it. Add printing statements through the loops so you can see what values are there and what they are, and which ones aren't even as they are assumed to be. Hard to tell what fails without running that program.
Then, the line %{$hashes{$bucket}} ? 1 : next; is a little silly. The condition of the ternary operator evaluates to a boolean, "true" (not undefined, not 0, not empty string '') or false. So it tests whether $hashes{$bucket} has a hashref with at least some keys, and if it does then it returns 1; so, the for loop continues. Otherwise it skips to the next iteration.
Well, then skip to next if there is not a (non-empty) hashref there:
next if not defined $hashes{$bucket} or not %{ $hashes{$bucket} };
Note how we first test whether there is such a key, and only then attempt to derefence it.
† Whatever expression may be inside the curlies must evaluate to a hash reference. (If it's else, like a number or a string, the code would still exit with an error but with a different one.)
So, in this code, the hash %hashes must have a key that is the value of $bucket at that point, and the value for that key must be a hash reference. Then, the ternary operator tests whether the hash obtained from that hash reference has any keys.

You need to understand references first, this is a kind of how-to :
#!/usr/bin/perl
use strict; use warnings;
use feature qw/say/;
use Data::Dumper;
my $var = {}; # I create a HASH ref explicitly
say "I created a HASH ref explicitly:";
say ref($var);
say "Now, let's add any type of content:";
say "Adding a ARRAY:";
push #{ $var->{arr} }, (0..5);
say Dumper $var;
say "Now, I add a new HASH";
$var->{new_hash} = {
foo => "value",
bar => "other"
};
say Dumper $var;
say 'To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve';
say "to retrieve a HASH ref, we need to dereference with %:";
while (my ($key, $value) = each %{ $var->{new_hash} }) {
say "key=$key value=$value";
}
say "To retrieve the ARRAY ref:";
say join "\n", #{ $var->{arr} };
Output
I created a HASH ref explicitely:
HASH
Now, let's add any type of content:
Adding a ARRAY:
$VAR1 = {
'arr' => [
0,
1,
2,
3,
4,
5
]
};
Now, I add a new HASH
$VAR1 = {
'new_hash' => {
'foo' => 'value',
'bar' => 'other'
},
'arr' => [
0,
1,
2,
3,
4,
5
]
};
To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve
to retrieve a HASH ref, we need to dereference with %:
key=foo value=value
key=bar value=other
To retrieve the ARRAY ref:
0
1
2
3
4
5
Now with your code, instead of
%{$hashes{$bucket}} ? 1 : next;
You should test the HASH ref first, because Perl say it's undefined, let's debug a bit:
use Data::Dumper;
print Dumper $hashes;
print "bucket=$bucket\n";
if (defined $hashes{$bucket}) {
print "Defined array\n";
}
else {
print "NOT defined array\n";
}

How to match numbers that lie outside the range [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I am trying to print values of range from #arr3 which are same and lie outside the range from #arr4 (not included in range of arr4) but I am not getting the desired output. Please suggest me the modifications in the following code to get the output as 1,2,8,13 (without repeating the values if any)
File 1: result
1..5
5..10
10..15
File 2: annotation
3..7
9..12
14..17
Code:
#!/usr/bin/perl
open($inp1, "<result") or die "not found";
open($inp2, "<annotation") or die "not found";
my #arr3 = <$inp1>;
my #arr4 = <$inp2>;
foreach my $line1 (#arr4) {
foreach my $line2 (#arr3) {
my ($from1, $to1) = split(/\.\./, $line1);
my ($from2, $to2) = split(/\.\./, $line2);
for (my $i = $from1 ; $i <= $to1 ; $i++) {
for (my $j = $from2 ; $j <= $to2 ; $j++) {
$res = grep(/$i/, #result); #to avoid repetition
if ($i != $j && $res == 0) {
print "$i \n";
push(#result, $i);
}
}
}
}
}

Try this:
#!/usr/bin/perl
use strict;
open (my $inp1,"<result.txt") or die "not found";
open (my $inp2,"<annotation.txt") or die "not found";
my #result;
my #annotation;
foreach my $line2 (<$inp2>) {
my ($from2,$to2)=split(/\.\./,$line2);
#annotation = (#annotation, $from2..$to2);
}
print join(",",#annotation),"\n";
my %in_range = map {$_=> 1} #annotation;
foreach my $line1 (<$inp1>) {
my ($from1,$to1)=split(/\.\./,$line1);
#result = (#result, $from1..$to1);
}
print join(",",#result),"\n";
my %tmp_hash = map {$_=> 1} #result;
my #unique = sort {$a <=> $b} keys %tmp_hash;
print join(",",#unique),"\n";
my #out_of_range = grep {!$in_range{$_}} #unique;
print join(",",#out_of_range),"\n";
The print statements are temporary, of course, to help show what's going on when you run this. The basic idea is you use one hash to eliminate duplicate numbers in your "result", and another hash to indicate which ones are in the "annotations".
If you used pattern-matching rather than split then I think it would be a little easier to make this ignore extra lines of input that aren't ranges of numbers, in case you ever have input files with a few "extra" lines that you need to skip over.

If the contents of the files is under your control, you can make use of eval for parsing them. On the other hand, if there might be something else than what you specified, the following is dangerous to use.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use Data::Dumper;
open my $inc, '<', 'result';
open my $exc, '<', 'annotation';
my (%include, %exclude, #result);
while (<$inc>) { $include{$_} = 1 for eval $_ }
while (<$exc>) { $exclude{$_} = 1 for eval $_ }
for (sort {$a <=> $b} keys %include) {
push #result, $_ unless $exclude{$_}
}
print Dumper \#result;
Returns:
$VAR1 = [ 1, 2, 8, 13 ];

The only major tool you need is a %seen style hash as represented in perlfaq4 - How can I remove duplicate elements from a list or array?
The following opens filehandles to string references, but obviously these could be replaced with the appropriate file names:
use strict;
use warnings;
use autodie;
my %seen;
open my $fh_fltr, '<', \ "3..7\n9..12\n14..17\n";
while (<$fh_fltr>) {
my ($from, $to) = split /\.{2}/;
$seen{$_}++ for $from .. $to;
}
my #result;
open my $fh_src, '<', \ "1..5\n5..10\n10..15\n";
while (<$fh_src>) {
my ($from, $to) = split /\.{2}/;
push #result, $_ for grep {!$seen{$_}++} $from .. $to;
}
print "#result\n";
Outputs:
1 2 8 13

Degeneracy of characters when searching for a specific sub-string

I have the following script which searches for specified substrings within an input string (a DNA sequence). I was wondering if anybody could help out with being able to specify degeneracy of specific characters. For example, instead of searching for GATC (or anything consisting solely of G's, T's, A's and C's), I could instead search for GRTNA where R = A or G and where N = A, G, C or T. I would need to be able to specify quite a few of these in a long list within the script. Many thanks for any help or tips!
use warnings;
use strict;
#User Input
my $usage = "Usage (OSX Terminal): perl <$0> <FASTA File> <Results Directory + Filename>\n";
#Reading formatted FASTA/FA files
sub read_fasta {
my ($in) = #_;
my $sequence = "";
while(<$in>) {
my $line = $_;
chomp($line);
if($line =~ /^>/){ next }
else { $sequence .= $line }
}
return(\$sequence);
}
#Scanning for restriction sites and length-output
open(my $in, "<", shift);
open(my $out, ">", shift);
my $DNA = read_fasta($in);
print "DNA is: \n $$DNA \n";
my $len = length($$DNA);
print "\n DNA Length is: $len \n";
my #pats=qw( GTTAAC );
for (#pats) {
my $m = () = $$DNA =~ /$_/gi;
print "\n Total DNA matches to $_ are: $m \n";
}
my $pat=join("|",#pats);
my #cutarr = split(/$pat/, $$DNA);
for (#cutarr) {
my $len = length($_);
print $out "$len \n";
}
close($out);
close($in);

GRTNA would correspond to the pattern G[AG]T[AGCT]A.
It looks like you could do this by writing
for (#pats) {
s/R/[AG]/g;
s/N/[AGCT]/g;
}
before
my $pat = join '|', #pats;
my #cutarr = split /$pat/, $$DNA;
but I'm not sure I can help you with the requirement that "I would need to be able to specify quite a few of these in a long list within the script". I think it would be best to put your sequences in a separate text file rather than embed the list directly into the program.
By the way, wouldn't it be simpler just to
return $sequence
from your read_fasta subroutine? Returning a reference just means you have to dereference it everywhere with $$DNA. I suggest that it should look like this
sub read_fasta {
my ($fh) = #_;
my $sequence;
while (<$fh>) {
unless (/^>/) {
chomp;
$sequence .= $_;
}
}
return $sequence;
}

exists statement not working perl

This is a homework assignment. I am not looking for the "code to make it work" more looking for a point in the right direction on where my logic is wrong.
use strict;
use warnings;
#rot13 sub for passwords
sub rot13{
my $result;
chomp(my $input = <STDIN>);``
# all has to be lower case
my $lower = lc $input;
my $leng = length $lower;
for(my $i = 0; $i < $leng; $i++){
my $temp = substr ($lower,$i,1);
my $con = ord $temp;
if($con >= '55'){
if($con >= '110'){
$con -= 13;
}
else{
$con += 13;
}
}
$result = $result . chr $con;
}
return $result;
};
#opening a file specified by the user for input and reading it
#into an array then closing file.
open FILE, $ARGV[0] or die "cannot open input.txt";
my #input = <FILE>;
close FILE;
my (#username,#password,#name,#uid,#shell,#ssn,#dir,#group,#gid);
my $ui = 100;
foreach(#input){
my ($nam, $ss, $gro) = split ('/', $_);
chomp ($gro);
$nam= lc $nam;
I created a hash so I can use the exists function then using the function and if it does exist go to the next round of the loop. I feel like I am missing something with this.
my %nacheck;
if( exists ($nacheck { '$nam' } )){
next;
}
$nacheck{ "$nam" } = 1;
while (my ($key, $value) = each %nacheck){
print "$key => $value\n";
}
All this works for now but any tips on how to do it better would be appreaciated
my($unf, $unm, $unl) = split (/ /, $nam);
$unf = (substr $unf,0,1);
$unm = (substr $unm,0,1);
$unl = (substr $unl,0,1);
my $un = $unf . $unm . $unl;
if(($gro) eq "faculty"){
push #username, $un;
push #gid, "1010";
push #dir, "/home/faculty/$un";
push #shell, "/bin/tcsh";
}
else{
my $lssn = substr ($ss,7,4);
push #username, $un . $lssn;
push #gid, "505";
push #dir, "/home/student/$un";
push #shell, "/bin/bash";
}
#pushing results onto global arrays to print out later
push #ssn, $ss;
my $pass = rot13;
push #password, $pass;
push #name, $nam;
push #uid, $ui += 1;
}
#printing results
for(my $i = 0; $i < #username; $i++){
print
"$username[$i]:$password[$i]:$uid[$i]:$gid[$i]:$name[$i]:$dir[$i]:$shell[$i]\n";
}

The value of the expression '$nam' is those four characters themselves. The value of the expression "$nam" is whatever the value of the variable $nam is, expressed as a string.
Double quotes allow string interpolation. Single quotes do not; you get exactly what you type.

As you've written it:
my %nacheck;
if( exists ($nacheck { '$nam' } )){
next;
}
$nacheck{ "$nam" } = 1;
the %nacheck is newly created and must be empty. Therefore the exists test fails.
Or have you just shown the definition adjacent to the test for the purpose of the example?
If so, can you show us what your code actually looks like?
Edit: Also, as Charles Engelke noted, you've used single-quotes around a variable '$nam' which is wrong.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

String matching in perl (if statement in for loop) [closed] - perl

Change if ($delta eq '$somename'){ to if ($delta eq $somename){ Perl strings with double quotes (") will interpolate variables like $somename but strings with single quotes (') will not do that. Reference to documentation about that: http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators

Related

How can I calculate the geometric center of a protein in Perl?

Perl: undefined value as a HASH reference [closed]

How to match numbers that lie outside the range [closed]

Degeneracy of characters when searching for a specific sub-string

exists statement not working perl

Categories

Resources