I am getting the following error when I try to execute my CGI script from the terminal:
Use of uninitialized value $friends{"Bob=416-333-6363"} in print at ./new-cgi/data.cgi line 24
Here is my script:
#!/usr/bin/perl -w
use strict;
my %friends;
my $name;
my $phone;
open FILE, "new-cgi/data.dat" or die ("No File\n");
while (<FILE>) {
chomp;
($name, $phone) = split(" ", $_);
$friends{$name}=$phone;
}
foreach (keys %friends) {
print "Name:", $_, "\n";
print "Phone:", $friends{$_}, "\n"; <--This is line 24
}
Hard to see without seeing your new-cgi/data.dat file, but I assume that the data format is a bunch of lines like "Bob=416-333-6363" in which case you want to split on /=/ not " ".
What's happening now is that you're splitting on a non-existant whitespace so $name (the eventual key) gets the entire line and $phone the eventual value, gets an undef value. So when you iterate over the hash later, you have a hash with lots of keys (albeit with odd data for the keys) and undef values.
Related
I am trying to write a small program that takes from command line file(s) and prints out the number of occurrence of a word from all files and in which file it occurs. The first part, finding the number of occurrence of a word, seems to work well.
However, I am struggling with the second part, namely, finding in which file (i.e. file name) the word occurs. I am thinking of using an array that stores the word but don’t know if this is the best way, or what is the best way.
This is the code I have so far and seems to work well for the part that counts the number of times a word occurs in given file(s):
use strict;
use warnings;
my %count;
while (<>) {
my $casefoldstr = lc $_;
foreach my $str ($casefoldstr =~ /\w+/g) {
$count{$str}++;
}
}
foreach my $str (sort keys %count) {
printf "$str $count{$str}:\n";
}
The filename is accessible through $ARGV.
You can use this to build a nested hash with the filename and word as keys:
use strict;
use warnings;
use List::Util 'sum';
while (<>) {
$count{$word}{$ARGV}++ for map +lc, /\w+/g;
}
foreach my $word ( keys %count ) {
my #files = keys %$word; # All files containing lc $word
print "Total word count for '$word': ", sum( #{ $count{$word} }{#files} ), "\n";
for my $file ( #files ) {
print "$count{$word}{$file} counts of '$word' detected in '$file'\n";
}
}
Using an array seems reasonable, if you don't visit any file more than once - then you can always just check the last value stored in the array. Otherwise, use a hash.
#!/usr/bin/perl
use warnings;
use strict;
my %count;
my %in_file;
while (<>) {
my $casefoldstr = lc;
for my $str ($casefoldstr =~ /\w+/g) {
++$count{$str};
push #{ $in_file{$str} }, $ARGV
unless ref $in_file{$str} && $in_file{$str}[-1] eq $ARGV;
}
}
foreach my $str (sort keys %count) {
printf "$str $count{$str}: #{ $in_file{$str} }\n";
}
I've searched around the site and surprisingly I can't seem to find something that will work for my particular problem. So I figured I'd post it and see how some of you more experienced programmers can address with problem.
I have a spreadsheet like text file (many lines with tab delimited columns), that I would like to search through for certain labels (ex scaffold1253.1_size81005.6.32799_7496) and replace them with more simplified labels (ex scaffold1253.1a). These labels are only in the first column of the text file. I've already written the script such that I have a hash with the old labels as keys corresponding to the new labels as their respective values. This hash has about 26000 lines. So essentially I'd like to take the hash keys 1 by 1, search for them in the text file, and replace them with their respective hash values.
I have a pretty good server availible so if its too complicated to make it first column specific to speed up the process then thats ok.
THis is what I have so far:
use warnings;
$gtf = './Hc_genome/Hc_rztk_1+2+8+9.augustus.gtf';
open(FASTAFILE2, $gtf);
#gtfarray = <FASTAFILE2>;
#print #gtfarray;
my %hash;
while (<>)
{
chomp;
my ($key, $val) = split /\t/;
$hash{$key} .= exists $hash{$key} ? ",$val" : $val;
}
#print %hash;
while (my ($find, $replace) = each %hash) {
foreach (#gtfarray){
$_ =~ s/$find/$replace/g;
push #newgtf, $_;
}
}
print #newgtf;
This code doesn't seem to work as it doesn't complete. I'm pretty sure it's a problem with the foreach loop structure. Sorry I don't know of any other way to do this. Does anyone have a better way to run through this file and conduct the replacement?
Any input would be greatly appreciated!
Thanks,
Andrew
#DVK
Here is the full script with your mods that runs into syntax errors with your while loop, any idea why it's not accepting it? Thanks again!
use warnings;
$gtf = './Hc_genome/Hc_rztk_1+2+8+9.augustus.gtf';
open(FASTAFILE2, $gtf);
my %hash;
while (<>){
chomp;
my ($key, $val) = split /\t/;
$hash{$key} .= exists $hash{$key} ? ",$val" : $val;
}
while $line (<FASTAFILE2>){
my #fields = split(/\t/, $line);
# If you only care about first column, don't need the foreach loop below;
# just do the loop insides on $fields[0]
foreach my $field (#fields) {
$field = $hash{$field} if exists $hash{$field};
print $outfile "$field\t"; # Small bug - will print training \t
}
print $outfile "\n"
}
__END__
Here is the syntax error:
perl gtf_mod2.pl <./Hc_genome/header_file.txt
syntax error at gtf_mod2.pl line 14, near "while $line "
syntax error at gtf_mod2.pl line 23, near "}"
Execution of gtf_mod2.pl aborted due to compilation errors.
You exhaust your file the first time through your loop using the initial $find and $replace key/value pair.
There are two potential solutions:
Open the file for reading during each iteration of your while loop (expensive)
Move the foreach loop to the outside of the while and iterate the hash each time (less expensive)
example:
REPLACE:
for my $line (#gtfarray) {
while(my ($find, $replace) = each %hash) {
if($line =~ s/$find/$replace/g) {
push #newgtf, $line;
next REPLACE; # skip to next iteration
}
}
# if there was no replacement, push the old line
push #newgtf, $line
}
How big is the file that you are replacing the first column in?
If it's >50,000 lines, you are better off doing the reverse:
Iterate through hash file once, and store that hash in memory
Iterate through main file once, and for every line, for every column, find that value in the memorized hash, replace with hash value if found, and write.
In other words, remove the first #gtfarray = <FASTAFILE2>; and replace your last while loop with:
while my $line (<FASTAFILE2>) {
my #fields = split(/\t/, $line);
# If you only care about first column, don't need the foreach loop below;
# just do the loop insides on $fields[0]
foreach my $field (#fields) {
$field = $hash{$field} if exists $hash{$field};
print $outfile "$field\t"; # Small bug - will print training \t
}
print $outfile "\n";
}
NOTE: I'm making an assumption that the fields contain FULL contents of your hash keys (e.g. your data file would contain a field with "scaffold1253.1_size81005.6.32799_7496" but NOT a field with "XYZscaffold1253.1_size81005.6.32799_7496___IOU").
If that assumption is wrong and you really DO need to run a regex because your scaffold strings may be contained in longer strings, there may still be a better solution aside from running O(N*M) regexes: if your scaffold strings are all of a certain well defined format (e.g. "scaffoldNNNNN.NNN_sizeNNNNN.NNN.NNNN_NNNN"), what you need to do then is:
For each line of data file, run a single regex finding that pattern, with the entire pattern inside a capture group parenthesis:
#matches = ($line =~ m/(scaffold\d+\.\d+_size\d+\.\d+\.\d+_\d+/g );
Then, look up every value of #matches array in the hash. If found, run ONLY the matches as a s/// regex.
Looking at your previous post, wouldn't it be more simple to create the shortened 'id' while reading the file. Then you would have no need of the other file where you get your hash?
Here is the (untested) code below. (would need to direct the print statements to an output file on the command line or open a file for writing in your script).
#!/usr/bin/perl
use strict;
use warnings;
my $gtf = './Hc_genome/Hc_rztk_1+2+8+9.augustus.gtf';
open my $FASTAFILE2, "<", $gtf or die "Unable to open '$gtf' for reading. $!";
my %seen;
while (<$FASTAFILE2>) {
chomp;
my ($id, $val) = split /\t/, $_, 2;
# copy $id to $prefix and
# remove everything after '.1' in $prefix
(my $prefix = $id) =~ s/\.1\K.*//;
if ($seen{$id}) {
++$seen{$id};
}
else {
$seen{$id} = 'a';
}
print "$prefix$seen{$id}\t$val\n";
}
close $FASTAFILE2 or die "Unable to close '$gtf' from reading. $!";
Could it be a job for Tie::File? Assuming, that is, the data file could be operated on as an array.
use Tie::File;
my $file = "./Hc_genome/Hc_rztk_1+2+8+9.augustus.gtf";
tie #lines, 'Tie::File', $file or die ;
for (#lines) {
s/Oldlabel/NewLable/g; # Change this to fit
}
untie #lines ;
Tie::File does a bunch of tricks to keep the "in place " changes to the file memory efficient.
I have created a hash table from a text file like this:
use strict;
use warnings;
my %h;
open my $fh, '<', 'tst' or die "failed open 'tst' $!";
while ( <$fh> ) {
push #{$h{keys}}, (split /\t/)[0];
}
close $fh;
use Data::Dumper;
print Dumper \%h;
Now I want to look for a field in another text file in the hash table.
if it exists the current line is written in a result file:
use strict;
use warnings;
my %h;
open my $fh, '<', 'tst' or die "failed open 'tst' $!";
while ( <$fh> ) {
push #{$h{keys}}, (split /\t/)[0];
}
close $fh;
use Data::Dumper;
print Dumper \%h;
open (my $fh1,"<", "exp") or die "Can't open the file: ";
while (my $line =<$fh1>){
chomp ($line);
my ($var)=split(">", $line);
if exists $h{$var};
print ($line);
}
I got these errors:
syntax error at codeperl.pl line 26, near "if exists"
Global symbol "$line" requires explicit package name at codeperl.pl line 27.
syntax error at codeperl.pl line 29, near "}"
Execution of codeperl.pl aborted due to compilation errors.
Any idea please?
What is there to say? The statement if exists $h{$var}; is a syntax error. You may want:
print $line, "\n" if exists $h{$var};
or
if (exists $h{$var}) {
print $line, "\n";
}
The other errors will go away once you fixed that. If you get multiple errors, always look at the first error (with respect to the line numbers). Later errors are often a result of a previous one. In this case, the syntax error messed up the parsing.
Edit
your main problem isn't the syntax error, it is how you populate your hash. The
push #{$h{keys}}, (split /\t/)[0];
pushes first field on the line onto the arrayref that is in the keys entry. To me, it seems that you actually want to use this field as the key:
my ($key) = split /\t/;
$h{$key} = undef; # any value will do.
After that, your Dumper \%h will produce something like
$VAR1 = {
'# ries bibliothèques électroniques à travers' => undef,
'a a pour les ressortissants des' => undef,
'a a priori aucune hiérarchie des' => undef,
};
and your lookup via exists should work.
just try your code like this
First, build your hash
while(<$file1>){
# get your key from current line
$key = (split)[0];
# set the key into the hash
$hash{$key} = 1;
}
Second, judge
while(<$file2>){
# get the field you want you judge
$value = (split)[0];
# to see if $value exists
if( exists $hash{$value} ){
print "got $value";
}
}
I readin a txt file using a perl script, but im wondering how to store each line from the txt file into a different variable in the perl script using pattern matching. I can match a line using ~^>gi , but it displays both lines from the txt file with >gi (i.e line 1 & 3), also i want to read the two separate DNA sequences into different variables. Consider my example below.
file.txt
>gi102939
GATCTATC
>gi123453
CATCGACA
the perl script:
#!/usr/local/bin/perl
open (MYFILE, 'file.txt');
#array = <MYFILE>;
($first, $second, $third, $fourth, $fifth) = #array;
chomp $first, $second, $third, $fourth, $fifth;
print "Contents:\n #array";
if (#array =~ /^>gi/)
{
print "$first";
}
close (MYFILE);
Assuming that >gi.. are unique in the input, populate a hash where each key is associated with a sequence:
#!/usr/bin/perl
use warnings;
use strict;
my %hash;
my $last;
while (<DATA>) {
chomp;
if (/^>gi/) {
$last = $_;
} else {
$hash{$last} = $_;
}
}
foreach my $k (keys %hash) {
print "$k => $hash{$k}\n";
}
__DATA__
>gi102939
GATCTATC
>gi123453
CATCGACA
Please always use strict and use warnings at the top of your program, and declare your variables using my at their first point of use. This applies epecially when you are asking for help, as doing so can frequently reveal simlpe problems that could otherwise be overlooked.
As it stands, your program will read the file into #array and print it out. The test if (#array =~ /^>gi/) { ... } will force scalar context on the array, and so compare the number of elements in the array, presumably 5, with the regex pattern and fail.
What exactly are you trying to achieve? Reading a file into an array puts each line into a different scalar variables - the variables being the elements of the array
This one-liner reads the database and extracts one element:
perl < file.txt -e '#array=<>;chomp #array;%hash=#array;print $hash{">gi102939"}'
result:
GATCTATC
I have some code that looks like
my ($ids,$nIds);
while (<myFile>){
chomp;
$ids.= $_ . " ";
$nIds++;
}
This should concatenate every line in my myFile, and nIds should be my number of lines. How do I print out my $ids and $nIds?
I tried simply print $ids, but Perl complains.
my ($ids, $nIds)
is a list, right? With two elements?
print "Number of lines: $nids\n";
print "Content: $ids\n";
How did Perl complain? print $ids should work, though you probably want a newline at the end, either explicitly with print as above or implicitly by using say or -l/$\.
If you want to interpolate a variable in a string and have something immediately after it that would looks like part of the variable but isn't, enclose the variable name in {}:
print "foo${ids}bar";
You should always include all relevant code when asking a question. In this case, the print statement that is the center of your question. The print statement is probably the most crucial piece of information. The second most crucial piece of information is the error, which you also did not include. Next time, include both of those.
print $ids should be a fairly hard statement to mess up, but it is possible. Possible reasons:
$ids is undefined. Gives the warning undefined value in print
$ids is out of scope. With use
strict, gives fatal warning Global
variable $ids needs explicit package
name, and otherwise the undefined
warning from above.
You forgot a semi-colon at the end of
the line.
You tried to do print $ids $nIds,
in which case perl thinks that $ids
is supposed to be a filehandle, and
you get an error such as print to
unopened filehandle.
Explanations
1: Should not happen. It might happen if you do something like this (assuming you are not using strict):
my $var;
while (<>) {
$Var .= $_;
}
print $var;
Gives the warning for undefined value, because $Var and $var are two different variables.
2: Might happen, if you do something like this:
if ($something) {
my $var = "something happened!";
}
print $var;
my declares the variable inside the current block. Outside the block, it is out of scope.
3: Simple enough, common mistake, easily fixed. Easier to spot with use warnings.
4: Also a common mistake. There are a number of ways to correctly print two variables in the same print statement:
print "$var1 $var2"; # concatenation inside a double quoted string
print $var1 . $var2; # concatenation
print $var1, $var2; # supplying print with a list of args
Lastly, some perl magic tips for you:
use strict;
use warnings;
# open with explicit direction '<', check the return value
# to make sure open succeeded. Using a lexical filehandle.
open my $fh, '<', 'file.txt' or die $!;
# read the whole file into an array and
# chomp all the lines at once
chomp(my #file = <$fh>);
close $fh;
my $ids = join(' ', #file);
my $nIds = scalar #file;
print "Number of lines: $nIds\n";
print "Text:\n$ids\n";
Reading the whole file into an array is suitable for small files only, otherwise it uses a lot of memory. Usually, line-by-line is preferred.
Variations:
print "#file" is equivalent to
$ids = join(' ',#file); print $ids;
$#file will return the last index
in #file. Since arrays usually start at 0,
$#file + 1 is equivalent to scalar #file.
You can also do:
my $ids;
do {
local $/;
$ids = <$fh>;
}
By temporarily "turning off" $/, the input record separator, i.e. newline, you will make <$fh> return the entire file. What <$fh> really does is read until it finds $/, then return that string. Note that this will preserve the newlines in $ids.
Line-by-line solution:
open my $fh, '<', 'file.txt' or die $!; # btw, $! contains the most recent error
my $ids;
while (<$fh>) {
chomp;
$ids .= "$_ "; # concatenate with string
}
my $nIds = $.; # $. is Current line number for the last filehandle accessed.
How do I print out my $ids and $nIds?
print "$ids\n";
print "$nIds\n";
I tried simply print $ids, but Perl complains.
Complains about what? Uninitialised value? Perhaps your loop was never entered due to an error opening the file. Be sure to check if open returned an error, and make sure you are using use strict; use warnings;.
my ($ids, $nIds) is a list, right? With two elements?
It's a (very special) function call. $ids,$nIds is a list with two elements.