I currently have my Perl script to read fstab files, split them up by column and search for which word in each column is the longest to display it. All that works peachy (I think), the problem I'm having is that it keeps printing out the same length for every line which is not true. Example $dev_parts prints 24, and $labe_parts prints 24 and so on...
below is my code.
#!/usr/bin/perl
use strict;
print "Enter file name: \n";
my $file_name = <STDIN>;
open(IN, "$file_name");
my #parts = split( /\s+/, $file_name);
foreach my $usr_file (<IN>) {
chomp($usr_file);
#parts = split( /\s+/, $usr_file);
push(#dev, $parts[0]);
push(#label, $parts[1]);
push(#tmpfs, $parts[2]);
push(#devpts, $parts[3]);
push(#sysfs, $parts[4]);
push(#proc, $parts[5]);
}
foreach $dev_parts (#dev) {
$dev_length1 = length ($parts[$dev_parts]);
if ( $dev_length1 > $dev_length2) {
$dev_length2 = $dev_length1;
}
}
print "The longest word in the first line is: $dev_length2 \n";
foreach $label_parts (#label) {
$label_length1 = length($parts[$label_parts]);
if ($label_length1 > $label_length2) {
$label_length2 = $label_length1;
}
}
print "The longest word in the first line is: $label_length2 \n";
This is how your code should be
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
print "Enter file name: \n";
my $file_name = <STDIN>;
chomp($file_name);
open(FILE, "$file_name") or die $!;
my %colhash;
while (<FILE>) {
my $col=0;
my #parts = split /\s+/;
map { my $len = length($_);
$col++;
if($colhash{$col} < $len ){
$colhash{$col} = $len; # store the longest word length for each column
}
} #parts;
}
print Dumper(\%colhash);
You have a mistake here:
foreach $dev_parts (#dev) {
$dev_length1 = length ($parts[$dev_parts]);
As I understand it, you are looking for the longest element in #dev. However, you take the length of an element from the #parts array. This array is always set to whatever the last line of the file is. So you are looking at each element in the last line of the file, rather than each element of the appropriate column.
You just need to take length($dev_parts) instead.
Incidentally, here is a simpler way to find the longest length in an array:
use List::Util qw/max/; #Core module, always available.
my $longest_dev = max map {length} #dev;
A few other comments on your code:
use strict; is good. You should also use warnings;. It will help
you catch silly mistakes in your code.
You ought to check for errors whenever you open a file:
open(IN, $file_name) or die "Failed to open $file_name: $!";
Better yet, use the preferred open syntax with a lexical filehandle:
open(my $in_file, '<', $file_name) or die "Failed to open $file_name: $!";
...
while (<$in_file>) {
I'm not sure what you are trying to do here:
my #parts = split( /\s+/, $file_name);
You are splitting the file name by white space, but you don't use that for anything. And then you re-use the same array to hold the lines later.
A while loop is preferred to foreach when you go through lines of a file. It saves memory because it doesn't read the whole file into memory first (and it is otherwise exactly the same).
while (my $usr_file = <IN>) {
Related
I am trying to both learn perl and use it in my research. I need to do a simple task which is counting the number of sequences and their lengths in a file such as follow:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this:
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
This is the code I have written which is very crude and simple:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>/)
{
my $number_of_sequences++;
}else{
my length = length ($input);
}
}
print length, number_of_sequences;
close (INFILE);
I'd be grateful if you could give me some hints, for example, in the else block, when I use the length function, I am not sure what argument I should pass into it.
Thanks in advance
You're printing out just the last length, not each sequence length, and you want to catch the sequence names as you go:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
my ($lastSeq, $number_of_sequences) = ('', 0);
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
# You never use OUTFILE
# open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>(.+)/)
{
$lastSeq = $1;
$number_of_sequences++;
}
else
{
my $length = length($_);
print "$lastSeq $length\n";
}
}
print "Total number of sequences = $number_of_sequences\n";
close (INFILE);
Since you have indicated that you want feedback on your program, here goes:
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
Personally, I think when dealing with a simple input/output file relation, it is best to just use the diamond operator and standard output. That means that you read from the special file handle <>, commonly referred to as "the diamond operator", and you print to STDOUT, which is the default output. If you want to save the output in a file, just use shell redirection:
perl program.pl input.txt > output.txt
In this part:
my $number_of_sequences++;
you are creating a new variable. This variable will go out of scope as soon as you leave the block { .... }, in this case: the if-block.
In this part:
my length = length ($input);
you forgot the $ sigil. You are also using length on the file name, not the line you read. If you want to read a line from your input, you must use the file handle:
my $length = length(<INFILE>);
Although this will also include the newline in the length.
Here you have forgotten the sigils again:
print length, number_of_sequences;
And of course, this will not create the expected output. It will print something like sequence112.
Recommendations:
Use a while (<>) loop to read your input. This is the idiomatic method to use.
You do not need to keep a count of your input lines, there is a line count variable: $.. Though keep in mind that it will also count "bad" lines, like blank lines or headers. Using your own variable will allow you to account for such things.
Remember to chomp the line before finding out its length. Or use an alternative method that only counts the characters you want: my $length = ( <> =~ tr/ATCG// ) This will read a line, count the letters ATGC, return the count and discard the read line.
Summary:
use strict;
use warnings; # always use these two pragmas
my $count;
while (<>) {
next unless /^>/; # ignore non-header lines
$count++; # increment counter
chomp;
my $length = (<> =~ tr/ATCG//); # get length of next line
s/^>(\S+)/$1 $length\n/; # remove > and insert length
} continue {
print; # print to STDOUT
}
print "Total number is sequences = $count\n";
Note the use of continue here, which will allow us to skip a line that we do not want to process, but that will still get printed.
And as I said above, you can redirect this to a file if you want.
For starters, you need to change your inner loop to this:
...
chomp;
if (/^>/)
{
$number_of_sequences++;
$sequence_name = $_;
}else{
print "$sequence_name ", length($input), "\n";
}
...
Note the following:
The my declaration has been removed from $number_of_sequences
The sequence name is captured in the variable $sequence_name. It is used later when the next line is read.
To make the script run under strict mode, you can add my declarations for $number_of_sequences and $sequence_name outside of the loop:
my $sequence_name;
my $number_of_sequences = 0;
while (<INFILE>) {
...(as above)...
}
print "Total number of sequences: $number_of_sequences\n";
The my keyword declares a new lexically scoped variable - i.e. a variable which only exists within a certain block of code, and every time that block of code is entered, a new version of that variable is created. Since you want to have the value of $sequence_name carry over from one loop iteration to the next you need to place the my outside of the loop.
#!/usr/bin/perl
use strict;
use warnings;
my ($file, $line, $length, $tag, $count);
$file = $ARGV[0];
open (FILE, "$file") or print"can't open file $file\n";
while (<FILE>){
$line=$_;
chomp $line;
if ($line=~/^>/){
$tag = $line;
}
else{
$length = length ($line);
$count=1;
}
if ($count==1){
print "$tag\t$length\n";
$count=0
}
}
close FILE;
I am terribly sorry for bothering you with my problem in several questions, but I need to solve it...
I want to extract several substrings from a file whick contains string by using another file with the begin and the end of each substring that I want to extract.
The first file is like:
>scaffold30 24194
CTTAGCAGCAGCAGCAGCAGTGACTGAAGGAACTGAGAAAAAGAGCGAGCTGAAAGGAAGCATAGCCATTTGGGAGTGCCAGAGAGTTGGGAGG GAGGGAGGGCAGAGATGGAAGAAGAAAGGCAGAAATACAGGGAGATTGAGGATCACCAGGGAG.........
.................
(the string must be everything in the file except the first line), and the coordinates file is like:
44801988 44802104
44846151 44846312
45620133 45620274
45640443 45640543
45688249 45688358
45729531 45729658
45843362 45843490
46066894 46066996
46176337 46176464
.....................
my script is this:
my $chrom = $ARGV[0];
my $coords_file = $ARGV[1];
#finds subsequences: fasta files
open INFILE1, $chrom or die "Could not open $chrom: $!";
my $count = 0;
while(<INFILE1>) {
if ($_ !~ m/^>/) {
local $/ = undef;
my $var = <INFILE1>;
open INFILE, $coords_file or die "Could not open $coords_file: $!";
my #cline = <INFILE>;
foreach my $cline (#cline) {
print "$cline\n";
my#data = split('\t', $cline);
my $start = $data[0];
my $end = $data[1];
my $offset = $end - $start;
$count++;
my $sub = substr ($var, $start, $offset);
print ">conserved $count\n";
print "$sub\n";
}
close INFILE;
}
}
when I run it, it looks like it does only one iteration and it prints me the start of the first file.
It seems like the foreach loop doesn't work.
also substr seems that doesn't work.
when I put an exit to print the cline to check the loop, it prints all the lines of the file with the coordinates.
I am sorry if I become annoying, but I must finish it and I am a little bit desperate...
Thank you again.
This line
local $/ = undef;
changes $/ for the entire enclosing block, which includes the section where you read in your second file. $/ is the input record separator, which essentially defines what a "line" is (it is a newline by default, see perldoc perlvar for details). When you read from a filehandle using <>, $/ is used to determine where to stop reading. For example, the following program relies on the default line-splitting behavior, and so only reads until the first newline:
my $foo = <DATA>;
say $foo;
# Output:
# 1
__DATA__
1
2
3
Whereas this program reads all the way to EOF:
local $/;
my $foo = <DATA>;
say $foo;
# Output:
# 1
# 2
# 3
__DATA__
1
2
3
This means your #cline array gets only one element, which is a string containing the text of your entire coordinates file. You can see this using Data::Dumper:
use Data::Dumper;
print Dumper(\#cline);
Which in your case will output something like:
$VAR1 = [
'44801988 44802104
44846151 44846312
45620133 45620274
45640443 45640543
45688249 45688358
45729531 45729658
45843362 45843490
46066894 46066996
46176337 46176464
'
];
Notice how your array (technically an arrayref in this case), delineated by [ and ], contains only a single element, which is a string (delineated by single quotes) that contains newlines.
Let's walk through the relevant sections of your code:
while(<INFILE1>) {
if ($_ !~ m/^>/) {
# Enable localized slurp mode. Stays in effect until we leave the 'if'
local $/ = undef;
# Read the rest of INFILE1 into $var (from current line to EOF)
my $var = <INFILE1>;
open INFILE, $coords_file or die "Could not open $coords_file: $!";
# In list context, return each block until the $/ character as a
# separate list element. Since $/ is still undef, this will read
# everything until EOF into our first list element, resulting in
# a one-element array
my #cline = <INFILE>;
# Since #cline only has one element, the loop only has one iteration
foreach my $cline (#cline) {
As a side note, your code could be cleaned up a bit. The names you chose for your filehandles leave something to be desired, and you should probably use lexical filehandles anyway (and the three-argument form of open):
open my $chromosome_fh, "<", $ARGV[0] or die $!;
open my $coordinates_fh, "<", $ARGV[1] or die $!;
Also, you do not need to nest your loops in this case, it just makes your code more convoluted. First read the relevant parts of your chromosome file into a variable (named something more meaningful than var):
# Get rid of the `local $/` statement, we don't need it
my $chromosome;
while (<$chromosome_fh>) {
next if /^>/;
$chromosome .= $_;
}
Then read in your coordinates file:
my #cline = <$coordinates_fh>;
Or if you only need to use the contents of the coordinates file once, process each line as you go using a while loop:
while (<$coordinates_fh>) {
# Do something for each line here
}
As 'ThisSuitIsBlackNot' suggested, your code could be cleaned up a little. Here is a possible solution that may be what you want.
#!/usr/bin/perl
use strict;
use warnings;
my $chrom = $ARGV[0];
my $coords_file = $ARGV[1];
#finds subsequences: fasta files
open INFILE1, $chrom or die "Could not open $chrom: $!";
my $fasta;
<INFILE1>; # get rid of the first line - '>scaffold30 24194'
while(<INFILE1>) {
chomp;
$fasta .= $_;
}
close INFILE1 or die "Could not close '$chrom'. $!";
open INFILE, $coords_file or die "Could not open $coords_file: $!";
my $count = 0;
while(<INFILE>) {
my ($start, $end) = split;
# Or, should this be: my $offset = $end - ($start - 1);
# That would include the start fasta
my $offset = $end - $start;
$count++;
my $sub = substr ($fasta, $start, $offset);
print ">conserved $count\n";
print "$sub\n";
}
close INFILE or die "Could not close '$coords_file'. $!";
I am facing issues with perl chomp function.
I have a test.csv as below:
col1,col2
vm1,fd1
vm2,fd2
vm3,fd3
vm4,fd4
I want to print the 2nd field of this csv. This is my code:
#!/usr/bin/perl -w
use strict;
my $file = "test.csv";
open (my $FH, '<', $file);
my #array = (<$FH>);
close $FH;
foreach (#array)
{
my #row = split (/,/,$_);
my $var = chomp ($row[1]); ### <<< this is the problem
print $var;
}
The output of aboe code is :
11111
I really don't know where the "1" is comming from. Actually, the last filed can be printed as below:
foreach (#array)
{
my #row = split (/,/,$_);
print $row[1]; ### << Note that I am not printing "\n"
}
the output is:
vm_cluster
fd1
fd2
fd3
fd4
Now, i am using these field values as an input to the DB and the DB INSERT statement is failing due this invisible newline. So I thought chomp would help me here. instead of chomping, it gives me "11111".
Could you help me understand what am i doing wrong here.
Thanks.
Adding more information after reading loldop's responce:
If I write as below, then it will not print anything (not even the "11111" output mentioned above)
foreach (#array)
{
my #row = split (/,/,$_);
chomp ($row[1]);
my $var = $row[1];
print $var;
}
Meaning, chomp is removing the last string and the trailing new line.
The reason you see only a string of 1s is that you are printing the value of $val which is the value returned from chomp. chomp doesn't return the trimmed string, it modifies its parameter in-place and returns the number of characters removed from the end. Since it always removes exactly one "\n" character you get a 1 output for each element of the array.
You really should use warnings instead of the -w command-line option, and there is no reason here to read the entire file into an array. But well done on using a lexical filehandle with the three-parameter form of open.
Here is a quick refactoring of your program that will do what you want.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'test.csv';
open my $FH, '<', $file or die qq(Unable to open "$file": $!);
while (<$FH>) {
chomp;
my #row = split /,/;
print $row[1], "\n";
}
although, it is my fault at the beginning.
chomp function return 1 <- result of usage this function.
also, you can find this bad example below. but it will works, if you use numbers.
sometimes i use this cheat (don't do that! it is my bad-hack code!)
map{/filter/ && $_;}#all_to_filter;
instead of this, use
grep{/filter/}#all_to_filter;
foreach (#array)
{
my #row = split (/,/,$_);
my $var = chomp ($row[1]) * $row[1]; ### this is bad code!
print $var;
}
foreach (#array)
{
my #row = split (/,/,$_);
chomp ($row[1]);
my $var = $row[1];
print $var;
}
If you simply want to get rid of new lines you can use a regex:
my $var = $row[1];
$var=~s/\n//g;
So, I was quite frustrated with this easy looking task bugging me for the whole day long. I really appreciate everyone who responded.
Finaly I ended up using Text::CSV perl module and then calling each of the CSV field as array reference. There was no need left to run the chomp after using Text::CSV.
Here is the code:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<:encoding(utf8)", "vm.csv" or die "vm.csv: $!";
<$fh>; ## this is to remove the column headers.
while ( my $row = $csv->getline ($fh) )
{
print $row->[1];
}
and here is hte output:
fd1fd2fd3fd4
Later i was pulled these individual values and inserted into the DB.
Thanks everyone.
I've a code as below to parse a text file. Display all words after "Enter:" keyword on all lines of the text file. I'm getting displayed all words after "Enter:" keyword, but i wan't duplicated should not be repeated but its repeating. Please guide me as to wht is wrong in my code.
#! /usr/bin/perl
use strict;
use warnings;
$infile "xyz.txt";
open (FILE, $infile) or die ("can't open file:$!");
if(FILE =~ /ENTER/ ){
#functions = substr($infile, index($infile, 'Enter:'));
#functions =~/#functions//;
%seen=();
#unique = grep { ! $seen{$_} ++ } #array;
while (#unique != ''){
print '#unique\n';
}
}
close (FILE);
Here is a way to do the job, it prints unique words found on each line that begins with the keyword Enter:
#!/usr/bin/perl
use strict;
use warnings;
my $infile = "xyz.txt";
# use 3 arg open with lexical file handler
open my $fh, '<', $infile or die "unable to open '$infile' for reading: $!";
# loop thru all lines
while(my $line = <$fh) {
# remove linefeed;
chomp($line);
# if the line begins with "Enter:"
# remove the keyword "Enter:"
if ($line =~ s/^Enter:\s+//) {
# split the line on whitespaces
# and populate the array with all words found
my #words = split(/\s+/, $line);
# create a hash where the keys are the words found
my %seen = map { $_ => 1 }#words;
# display unique words
print "$_\t" for(keys %seen);
print "\n";
}
}
If I understand you correctly, one problem is that your 'grep' only counts the occurrences of each word. I think you want to use 'map' so that '#unique' only contains the unique words from '#array'. Something like this:
#unique = map {
if (exists($seen{$_})) {
();
} else {
$seen{$_}++; $_;
}
} #array;
Ive been trying to compare lines between two files and matching lines that are the same.
For some reason the code below only ever goes through the first line of 'text1.txt' and prints the 'if' statement regardless of if the two variables match or not.
Thanks
use strict;
open( <FILE1>, "<text1.txt" );
open( <FILE2>, "<text2.txt" );
foreach my $first_file (<FILE1>) {
foreach my $second_file (<FILE2>) {
if ( $second_file == $first_file ) {
print "Got a match - $second_file + $first_file";
}
}
}
close(FILE1);
close(FILE2);
If you compare strings, use the eq operator. "==" compares arguments numerically.
Here is a way to do the job if your files aren't too large.
#!/usr/bin/perl
use Modern::Perl;
use File::Slurp qw(slurp);
use Array::Utils qw(:all);
use Data::Dumper;
# read entire files into arrays
my #file1 = slurp('file1');
my #file2 = slurp('file2');
# get the common lines from the 2 files
my #intersect = intersect(#file1, #file2);
say Dumper \#intersect;
A better and faster (but less memory efficient) approach would be to read one file into a hash, and then search for lines in the hash table. This way you go over each file only once.
# This will find matching lines in two files,
# print the matching line and it's line number in each file.
use strict;
open (FILE1, "<text1.txt") or die "can't open file text1.txt\n";
my %file_1_hash;
my $line;
my $line_counter = 0;
#read the 1st file into a hash
while ($line=<FILE1>){
chomp ($line); #-only if you want to get rid of 'endl' sign
$line_counter++;
if (!($line =~ m/^\s*$/)){
$file_1_hash{$line}=$line_counter;
}
}
close (FILE1);
#read and compare the second file
open (FILE2,"<text2.txt") or die "can't open file text2.txt\n";
$line_counter = 0;
while ($line=<FILE2>){
$line_counter++;
chomp ($line);
if (defined $file_1_hash{$line}){
print "Got a match: \"$line\"
in line #$line_counter in text2.txt and line #$file_1_hash{$line} at text1.txt\n";
}
}
close (FILE2);
You must re-open or reset the pointer of file 2. Move the open and close commands to within the loop.
A more efficient way of doing this, depending on file and line sizes, would be to only loop through the files once and save each line that occurs in file 1 in a hash. Then check if the line was there for each line in file 2.
If you want the number of lines,
my $count=`grep -f [FILE1PATH] -c [FILE2PATH]`;
If you want the matching lines,
my #lines=`grep -f [FILE1PATH] [FILE2PATH]`;
If you want the lines which do not match,
my #lines = `grep -f [FILE1PATH] -v [FILE2PATH]`;
This is a script I wrote that tries to see if two file are identical, although it could easily by modified by playing with the code and switching it to eq. As Tim suggested, using a hash would probably be more effective, although you couldn't ensure the files were being compared in the order they were inserted without using a CPAN module (and as you can see, this method should really use two loops, but it was sufficient for my purposes). This isn't exactly the greatest script ever, but it may give you somewhere to start.
use warnings;
open (FILE, "orig.txt") or die "Unable to open first file.\n";
#data1 = ;
close(FILE);
open (FILE, "2.txt") or die "Unable to open second file.\n";
#data2 = ;
close(FILE);
for($i = 0; $i < #data1; $i++){
$data1[$i] =~ s/\s+$//;
$data2[$i] =~ s/\s+$//;
if ($data1[$i] ne $data2[$i]){
print "Failure to match at line ". ($i + 1) . "\n";
print $data1[$i];
print "Doesn't match:\n";
print $data2[$i];
print "\nProgram Aborted!\n";
exit;
}
}
print "\nThe files are identical. \n";
Taking the code you posted, and transforming it into actual Perl code, this is what I came up with.
use strict;
use warnings;
use autodie;
open my $fh1, '<', 'text1.txt';
open my $fh2, '<', 'text2.txt';
while(
defined( my $line1 = <$fh1> )
and
defined( my $line2 = <$fh2> )
){
chomp $line1;
chomp $line2;
if( $line1 eq $line2 ){
print "Got a match - $line1\n";
}else{
print "Lines don't match $line1 $line2"
}
}
close $fh1;
close $fh2;
Now what you may really want is a diff of the two files, which is best left to Text::Diff.
use strict;
use warnings;
use Text::Diff;
print diff 'text1.txt', 'text2.txt';