I am trying to use the following script to shuffle the order of sequences (lines) within a file. I'm not sure how to "initialize" values -- please help!
print "Please enter filename (without extension): ";
my $input = <>;
chomp $input;
use strict;
use warnings;
print "Please enter total no. of sequence in fasta file: ";
my $orig_size = <>*2-1;
chomp $orig_size;
open INFILE, "$input.fasta"
or die "Error opening input file for shuffling!";
open SHUFFLED, ">"."$input"."_shuffled.fasta"
or die "Error creating shuffled output file!";
my #array = (0); # Need to initialise 1st element in array1&2 for the shift function
my #array2 = (0);
my $i = 1;
my $index = 0;
my $index2 = 0;
while (my #line = <INFILE>){
while ($i <= $orig_size) {
$array[$i] = $line[$index];
$array[$i] =~ s/(.)\s/$1/seg;
$index++;
$array2[$i] = $line[$index];
$array2[$i] =~ s/(.)\s/$1/seg;
$i++;
$index++;
}
}
my $array = shift (#array);
my $array2 = shift (#array2);
for ($i = my $header_size; $i >= 0; $i--) {
my $j = int rand ($i+1);
next if $i == $j;
#array[$i,$j] = #array[$j,$i];
#array2[$i,$j] = #array2[$j,$i];
}
while ($index2 <= my $header_size) {
print SHUFFLED "$array[$index2]\n";
print SHUFFLED "$array2[$index2]\n";
$index2++;
}
close INFILE;
close SHUFFLED;
I'm getting these warnings:
Use of uninitialized value in substitution (s///) at fasta_corrector6.pl line 27, <INFILE> line 578914.
Use of uninitialized value in substitution (s///) at fasta_corrector6.pl line 31, <INFILE> line 578914.
Use of uninitialized value in numeric ge (>=) at fasta_corrector6.pl line 40, <INFILE> line 578914.
Use of uninitialized value in addition (+) at fasta_corrector6.pl line 41, <INFILE> line 578914.
Use of uninitialized value in numeric eq (==) at fasta_corrector6.pl line 42, <INFILE> line 578914.
Use of uninitialized value in numeric le (<=) at fasta_corrector6.pl line 47, <INFILE> line 578914.
Use of uninitialized value in numeric le (<=) at fasta_corrector6.pl line 50, <INFILE> line 578914.
First, you read the whole input file in:
use IO::File;
my #lines = IO::File->new($file_name)->getlines;
then you shuffle it:
use List::Util 'shuffle';
my #shuffled_lines = shuffle(#lines);
then you write them out:
IO::File->new($new_file_name, "w")->print(#shuffled_lines);
There's an entry in the Perl FAQ about how to shuffle an array. Another entry tells of the many ways to read a file in one go. Perl FAQs contain a lot of samples and trivia on how to do many common things -- it's a good place to continue learning more about Perl.
On your previous question I gave this answer, and noted that your code failed because you had not initialized a variable named $header_size used in a loop condition. Not only have you repeated that mistake, you have elaborated on it by starting to declare the variable with my each time you try to access it.
for ($i = my $header_size; $i >= 0; $i--) {
# ^^--- wrong!
while ($index2 <= my $header_size) {
# ^^--- wrong!
A variable that is declared with my is empty (undef) by default. $index2 can never contain anything but undef here, and your loop will run only once, because 0 <= undef will evaluate true (albeit with an uninitialized warning).
Please take my advice and set a value for $header_size. And only use my when declaring a variable, not every time you use it.
A better solution
Seeing your errors above, it seems that your input files are rather large. If you have over 500,000 lines in your files, it means your script will consume large amounts of memory to run. It may be worthwhile for you to use a module such as Tie::File and work only with array indexes. For example:
use strict;
use warnings;
use Tie::File;
use List::Util qw(shuffle);
tie my #file, 'Tie::File', $filename or die $!;
for my $lineno (shuffle 0 .. $#file) {
print $line[$lineno];
}
untie #file; # all done
I cannot pinpoint what exactly went wrong, but there are a few oddities with your code:
The Diamond Operator
Perl's Diamond operator <FILEHANDLE> reads a line from the filehandle. If no filehandle is provided, each command line Argument (#ARGV) is treated as a file and read. If there are no arguments, STDIN is used. better specify this yourself. You also should chomp before you do arithemtics with the line, not afterwards. Note that strings that do not start with a number are treated as numeric 0. You should check for numericness (with a regex?) and include error handling.
The Diamond/Readline operator is context sensitive. If given in scalar context (e.g, a conditional, a scalar assignment) it returns one line. If given in list context, e.g. as a function parameter or an array assignment, it returns all lines as an array. So
while (my #line = <INFILE>) { ...
will not give you one line but all lines and is thus equivalent to
my #line;
if (#line = <INFILE>) { ...
Array gymnastics
After you read in the lines, you try to do some manual chomping. Here I remove all trailing whitspaces in #line, in a single line:
s/\s+$// foreach #line;
And here, I remove all non-leading whitespaces (what your regex is doing in fact):
s/(?<!^)\s//g foreach #line;
To stuff an element alternatingly into two arrays, this might work as well:
for my $i (0 .. $##line) {
if ($i % 2) {
push #array1, shift #line;
} else {
push #array2, shift #line;
}
}
or
my $i = 0;
while (#line) {
push ($i++ % 2 ? #array1 : #array2), shift #line
}
Manual bookkeeping of array indices is messy and error-prone.
Your for-loop could be written mor idiomatic as
for my $i (reverse 0 .. $header_size)
Do note that declaring $header_size inside the loop initialisation is possible if it was not declared before, but it will yield the undef value, therefore you assigned undef to $i which leads to some of the error messages, as undef should not be used in arithemtic operations. Assignments always assigns the right side to the left side.
Related
Running this Perl code gives me the error:
Modification of non-creatable array value attempted, subscript -1 at
update.pl line 85, line 1.
Line 85 is the one that has $line[$r] .= $_. Could anyone point me in the right direction?
my $loc = '../update/panden.txt';
my $r = -1;
my #line;
open (R, $loc) || die "$!";
while ( <R> ) {
$_ =~ s/NULL//g;
$r++ if ( $_ =~ /^"[0-9]{2,10}"\|"/ );
$line[$r] .= $_; # Line 85
my $ref = $_;
}
close R;
At a guess - your regex isn't matching, therefore $r is still -1 and you've an empty array.
#!/usr/bin/env perl
use strict;
use warnings;
my #list;
$list[-1] = 1;
Gives you the same error. That implies that:
$_ =~ /^"[0-9]{2,10}"\|"/
Doesn't match for the first line of your input.
The point of negative indices in an array is a special case - it means 'count from the end' - so $array[-1] is the last element. But that doesn't really make sense when you have an empty array.
It would work if you've got an array that's already been populated, but I'd suggest based on your program logic - it shouldn't, and so setting it to -1 with the assumption that you'll be incrementing it later is actually asking for trouble in the first place.
use strict;
use warnings;
$manifest=read_file("release.ms1");
print "$manifest\n";
my #new=split('\.',$manifest);
my %data=#new;
print "$data('vcs version')";
content of the release.ms1
vcs.version:12312321
vcs.path:CiscoMain/IT/GIS/trunk
Error:
vcs.version:12312321
vcs.path:CiscoMain/IT/GIS/trunk
vcsversion:12312321
vcspath:CiscoMain/IT/GIS/trunk
Odd number of elements in hash assignment at ./script.pl line 33.
Use of uninitialized value in print at ./script.pl line 35.
I need output like :
version=12312321
path=CiscoMain/IT/GIS/trunk
Your split function is assigning:
$new[0] = 'vcs'
$new[1] = 'version:12312321\nvcs'
$new[2] = 'path:CiscoMain/IT/GIS/trunk'
When you assign a list to a hash, it has to have an even number of elements, since they're required to be alternating keys and values.
It looks like what you actually want to do is split $manifest on newlines and colons, and replace the dots in the keys with space.
my #new = split(/[.\n]/, #manifest;
my %data;
for (my $i = 0; $i < #new; $i += 2) {
my $key = $new[$i];
$key =~ s/\./ /g;
$data{$key} = $new[$i+1];
}
Finally, your syntax for accessing an element of the hash is wrong. It should be:
print $data{'vcs version'};
The hash key is surrounded with curly braces, not parentheses.
I'm learning PERL for the first time and I am attempting to replicate exactly the simple Perl script on page four of this document:
This is my code:
# example.pl, introductory example
# comments begin with the sharp sign
# open the file whose name is given in the first argument on the command
# line, assigning to a file handle INFILE (it is customary to choose
# all-caps names for file handles in Perl); file handles do not have any
# prefixing punctuation
open(INFILE,$ARGV[0]);
# names of scalar variables must begin with $
$line_count - 0;
$word_count - 0;
# <> construct means read one line; undefined response signals EOF
while ($line - <INFILE>) {
$line_count++;
# break $line into an array of tokens separated by " ", using split()
# (array names must begin with #)
#words_on_this_line - split(" ",$line);
# scalar() gives the length of an array
$word_count += scalar(#words_on_this_line);
}
print "the file contains ", $line_count, "lines and ", $word_count, " words\n";
and this is my text file:
This is a test file for the example code.
The code is written in Perl.
It counts the amount of lines
and the amount of words.
This is the end of the text file that will
be run
on the example
code.
I'm not getting the right output and I'm not sure why. My output is:
C:\Users\KP\Desktop\test>perl example.pl test.txt
the file contains lines and words
For some reason all your "=" operators appear to be "-"
$line_count - 0;
$word_count - 0;
...
while ($line - <INFILE>) {
...
#words_on_this_line - split(" ",$line);
I'd recommend using "my" to declare your variables and then "use strict" and "use warnings" to help you detect such typos:
Currently:
$i -1;
/tmp/test.pl -- no output
When you add strict and warnings:
use strict;
use warnings;
$i -1;
/tmp/test.pl Global symbol "$i" requires explicit package name at
/tmp/test.pl line 4. Execution of /tmp/test.pl aborted due to
compilation errors.
When you add "my" to declare it:
vim /tmp/test.pl
use strict;
use warnings;
my $i -1;
/tmp/test.pl Useless use of subtraction (-) in void context at
/tmp/test.pl line 4. Use of uninitialized value in subtraction (-) at
/tmp/test.pl line 4.
And finally with a "=" instead of the "-" typo -- this is what the correct declaration and initializatoin looks like:
use strict;
use warnings;
my $i = 1;
You have to change - by = in multiple sentences in your code. Also, I've included some changes related to get a more modern perl code (use strict it's a must)
use strict;
use warnings;
open my $INFILE, '<', $ARGV[0] or die $!;
# names of scalar variables must begin with $
my $line_count = 0;
my $word_count = 0;
# <> construct means read one line; undefined response signals EOF
while( my $line = <$INFILE> ) {
$line_count++;
# break $line into an array of tokens separated by " ", using split()
# (array names must begin with #)
my #words_on_this_line = split / /,$line;
# scalar() gives the length of an array
$word_count += scalar(#words_on_this_line);
}
print "the file contains ", $line_count, "lines and ", $word_count, " words\n";
close $INFILE;
replace while ($line - <INFILE>) {
with
while ($line = <INFILE>) {
The word count part could be made a bit simpler (and more efficient). Split returns the number elements if called in a scalar context.
replace
my #words_on_this_line = split / /,$line;
$word_count += scalar(#words_on_this_line);
with
$word_count += split / /,$line;
I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}
Given a start and end line number, what's the fastest way to read a range of lines from a file into a variable?
Use the range operator .. (also known as the flip-flop operator), which offers the following syntactic sugar:
If either operand of scalar .. is a constant expression, that operand is considered true if it is equal (==) to the current input line number (the $. variable).
If you plan to do this for multiple files via <>, be sure to close the implicit ARGV filehandle as described in the perlfunc documentation for the eof operator. (This resets the line count in $..)
The program below collects in the variable $lines lines 3 through 5 of all files named on the command line and prints them at the end.
#! /usr/bin/perl
use warnings;
use strict;
my $lines;
while (<>) {
$lines .= $_ if 3 .. 5;
}
continue {
close ARGV if eof;
}
print $lines;
Sample run:
$ ./prog.pl prog.pl prog.c main.hs
use warnings;
use strict;
int main(void)
{
import Data.Function (on)
import Data.List (sortBy)
--import Data.Ord (comparing)
You can use flip-flop operators
while(<>) {
if (($. == 3) .. ($. == 7)) {
push #result, $_;
}
The following will load all desired lines of a file into an array variable. It will stop reading the input file as soon as the end line number is reached:
use strict;
use warnings;
my $start = 3;
my $end = 6;
my #lines;
while (<>) {
last if $. > $end;
push #lines, $_ if $. >= $start;
}
Reading line by line isn't going to be optimal. Fortunately someone has done the hardwork already :)
use Tie::File; it present the file as an array.
http://perldoc.perl.org/Tie/File.html
# cat x.pl
#!/usr/bin/perl
my #lines;
my $start = 2;
my $end = 4;
my $i = 0;
for( $i=0; $i<$start; $i++ )
{
scalar(<STDIN>);
}
for( ; $i<=$end; $i++ )
{
push #lines, scalar(<STDIN>);
}
print #lines;
# cat xxx
1
2
3
4
5
# cat xxx | ./x.pl
3
4
5
#
Otherwise, you're reading a lot of extra lines at the end you don't need to. As it is, the print #lines may be copying memory, so iterating the print while reading the second for-loop might be a better idea. But if you need to "store it" in a variable in perl, then you may not be able to get around it.
Update:
You could do it in one loop with a "continue if $. < $start" but you need to make sure to reset "$." manually on eof() if you're iterating over or <>.