Why this Perl program is not giving expected output? [closed] - perl

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
open ( F1, "file1.txt" );
open ( F2, "+>file2.txt" );
$/ = "\n\n" ;
while (<F1>) {
print F2;
$/ = "\n" ;
#arr = <F2> ;
#found = grep(/^: /, #arr);
if( $#found == -1) {
truncate(F2, $length);
}
$/ = "\n\n" ;
}
Please help me to find the error in this code.
file1.txt:
:a
:b
: x
:y
note::a , :b and : x , :y are separated by "\n" and :b and :x by "\n\n"
Expected contents in file2.txt after execution of program:
: x
:y

There are things wrong with your program, but its difficult to tell what you might be referring to, since you do not actually specify a problem or a question.
open ( F1, "file1.txt" );
open ( F2, "+>file2.txt" );
You should use three argument open, with explicit mode and lexical file handle. Also, you should check the return value of the open to see that it did not fail and why. Also, using better names for file handles does not hurt.
open my $in, "<", "file1.txt" or die "Cannot open file1.txt for reading: $!";
open my $out, "+>", "file2.txt" or die "Cannot open file2.txt for overwrite: $!";
Do note that the file open mode +> will truncate your file when you open it, but allow you to both read and write to/from the file handle. Most of the time, you do not want this.
$/ = "\n\n" ;
while (<F1>) {
Setting the input record separator to $/ will read paragraphs from your input file, in your case (assuming I got the formatting right), that would be
$_ = ":a
:b
";
You then print this value to your output file
print F2; # this means "print F2 $_"
You then change the input record separator again, and read all the lines in your output file:
$/ = "\n" ;
#arr = <F2> ;
But unfortunately, this is wrong, because the position of the file handle will be at the end of file (eof), because this is a file handle your are printing to. So #arr will be empty.
#found = grep(/^: /, #arr);
if( $#found == -1) {
truncate(F2, $length);
}
So this code, with the truncate will always happen. Also, of course, $length is an undefined variable, so it will give you a warning such as Use of uninitialized value $length in truncate at ... unless you have been so foolish as to not use:
use warnings;
I assume that what you are trying to do here is to check the input received before printing it to the output, but you should know that trying to print and truncate afterwards is a horrible idea. Why not check it before printing it instead? That's not only how its done 99.99% of the times, its also the simplest and most logical way to do it.
if (/^: /) {
print $out $_;
}
/^: / is short for $_ =~ /^: /, in case you are uncertain -- it applies a regex match operator to the default input variable $_, which are what you are reading to in the while (<F1>) loop condition, which is short for while ($_ = <F1>)
In your programs you should always use
use strict;
use warnings;
And learn to use them, because they will save you lots of time when debugging, and give you vital information about what your program is doing.
So you get:
use strict;
use warnings;
open my $in, "<", "file1.txt" or die "Cannot open file1.txt for reading: $!";
open my $out, "+>", "file2.txt" or die "Cannot open file2.txt for overwrite: $!";
$/ = "\n\n";
while (<$in>) {
if (/^: /) {
print $out $_;
}
}
Be advised that you can solve this with a simple one-liner program
perl -00 -nle 'print if /^: /' file1.txt > file2.txt

Related

Read specific text patterns in perl [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to read from a text file only specific text, for example:
FileExample:
1111111/first/second/third/fourth.c11111111...etc...
1111111/afirst/asecond/athird/afourth.c11111111...etc...etc
I would like to read the whole file except the part of the file from the 3rd "1" from the first "/" until the ".c" after the 4th "/" to make myself more clear I will bold the text I want my program to read and leave unbolded the part of the text I don't want my program to read.
1111111/first/second/third/fourth.c11111111...etc...etc
1111111/afirst/asecond/athird/afourth.c11111111...etc...etc
after I do all the operations I want with the bolded text,I want to write it in another file the unbolded text unmodified and the bolded text with the modifications made after the operations,and placed in the original file order.
open my $fh1, '<', 'hex.txt';
open my $fh2, '<', 'hex2.txt';
until ( eof $fh1 or eof $fh2 ) {
my #l1 = map hex,unpack '(a2)*', <$fh1>;
my #l2 = map hex,unpack '(a2)*', <$fh2>;
my $n = #l2 > #l1 ? #l2 : #l1;
my #sum = map {
$l1[$_] + $l2[$_];
} 0 .. $n-1;
#sum = map { sprintf '%X', $_ } #sum;
open my $out, '>', 'sum.txt';
print { $out } #sum, "\n";
}
I want to sum the hex values from the file hex to the sum values from file hex2,both files have the same construction type, both have text and hex values in the same location and have the exact same length.i just need to understand how to tell him to read from location1 to location2.
Convert file to hex:
{
my $input = do {
open my $in, '<', $ARGV[0];
local $/;
<$in>
};
open my $out, '>', 'hex.txt';
print $out unpack 'H*', $input;
}
Your precise criteria aren't clear. Are those digits always ones? It's a mistake to show such a very simple example when you're hoping for help. But I suggest you use split
Something like this perhaps?
use strict;
use warnings;
use feature 'say';
my $data = do {
local $/;
<DATA>;
};
$data =~ tr/\n//d;
say for split qr{\d\d\d(?:/\w+)+/\w+\.c}, $data;
__DATA__
1111111/first/second/third/fourth.c11111111...etc...
1111111/afirst/asecond/athird/afourth.c11111111...etc...etc
output
1111
11111111...etc...1111
11111111...etc...etc
I changed the input to be able to recognize what 1's it matches:
abcd111/first/second/third/fourth.cX1111111...etc...
abcd111/afirst/asecond/athird/afourth.cX1111111...etc...etc
This seems to produce the output you want
perl -pe 's=([^/]+).../.*\.c=$1='
[^/] is a character class, it matches anything that's not a slash;
+ means it must be present one or more times
putting it into parentheses makes it a "capture group", i.e. Perl will remember what matched that part.
.../ matches any three character followed by a slash.
.* matches anything.
\.c matches a dot followed by a c.
the whole matching part (abcd in the sample input, up to the c before X) is substituted (hence the starting s) with $1, i.e. the contents of the first capture group, i.e. the abcd in the sample input.

Adding sequence from FASTA file using Perl

I'm still learning Perl and I have a program which is able to take a FASTA file sequence header and print only the species name within square brackets. I want to add to this code to have it also print the entire sequence associated with the species.
Here is my code:
#!/usr/bin/perl
use warnings;
my $file = 'seqs.fasta';
my $tmp = 'newseqs.fasta';
open(OUT, '>', $tmp) or die "Can't open $tmp: $!";
open(IN, '<', $file) or die "Can't open $file: $!";
while(<IN>) {
chomp;
if ( $_ =~ /\[([^]]+)\]/ ) {
print OUT "$1\n";
}
}
close(IN);
close(OUT);
Here is a sample of the original FASTA file I had:
>gi|334187971|ref|NP_001190408.1| Cam-binding protein 60-like G [Arabidopsis thaliana] >gi|332006244|gb|AED93627.1| Cam-binding protein 60-like G [Arabidopsis thaliana]
MKIRNSPSFHGGSGYSVFRARNLTFKKVVKKVMRDQSNNQFMIQMENMIRRIVREEIQRSLQPFLSSSCVSMERSRSETP
SSRSRLKLCFINSPPSSIFTGSKIEAEDGSPLVIELVDATTNTLVSTGPFSSSRVELVPLNADFTEESWTVEGFNRNILT
QREGKRPLLTGDLTVMLKNGVGVITGDIAFSDNSSWTRSRKFRLGAKLTGDGAVEARSEAFGCRDQRGESYKKHHPPCPS
DEVWRLEKIAKDGVSATRLAERKILTVKDFRRLYTIIGAGVSKKTWNTIVSHAMDCVLDETECYIYNANTPGVTLLFNSV
YELIRVSFNGNDIQNLDQPILDQLKAEAYQNLNRITAVNDRTFVGHPQRSLQCPQDPGFVVTCSGSQHIDFQGSLDPSSS
SMALCHKASSSTVHPDVLMSFDNSSTARFHIDKKFLPTFGNSFKVSELDQVHGKSQTVVTKGCIENNEEDENAFSYHHHD
DMTSSWSPGTHQAVETMFLTVSETEEAGMFDVHFANVNLGSPRARWCKVKAAFKVRAAFKEVRRHTTARNPREGL
Currently, the output only pulls the species name Arabidopsis thaliana
However, I want it to print properly in a fasta file as such:
>Arabidopsis thaliana
MKIRNSPSFHGGSGYSVFRARNLTFKKVVKKVMRDQSNNQFMIQMENMIRRIVREEIQRSLQPFLSSSCVSMERSRSETP
SSRSRLKLCFINSPPSSIFTGSKIEAEDGSPLVIELVDATTNTLVSTGPFSSSRVELVPLNADFTEESWTVEGFNRNILT
QREGKRPLLTGDLTVMLKNGVGVITGDIAFSDNSSWTRSRKFRLGAKLTGDGAVEARSEAFGCRDQRGESYKKHHPPCPS
DEVWRLEKIAKDGVSATRLAERKILTVKDFRRLYTIIGAGVSKKTWNTIVSHAMDCVLDETECYIYNANTPGVTLLFNSV
YELIRVSFNGNDIQNLDQPILDQLKAEAYQNLNRITAVNDRTFVGHPQRSLQCPQDPGFVVTCSGSQHIDFQGSLDPSSS
SMALCHKASSSTVHPDVLMSFDNSSTARFHIDKKFLPTFGNSFKVSELDQVHGKSQTVVTKGCIENNEEDENAFSYHHHD
DMTSSWSPGTHQAVETMFLTVSETEEAGMFDVHFANVNLGSPRARWCKVKAAFKVRAAFKEVRRHTTARNPREGL
Could you suggest ways to modify the code to achieve this?
That's because what this does:
if ( $_ =~ /\[([^]]+)\]/ ) {
print OUT "$1\n";
}
Is find and capture any text in []. But if that pattern doesn't match, you don't do anything else with the line - like print it.
Adding:
else {
print OUT $_;
}
Will mean if a line doesn't contain [] it'll get printed by default.
I will also suggest:
turn on use strict;.
lexical filehandles are good practice: open ( my $input, '<', $file ) or die $!;
a pattern match implicitly applies to $_ by default. So you can write that 'if' as if ( /\[([^]]+)\]/ )
A couple of general points about your program
You must always use strict as well as use warnings 'all' at the top of every Perl program you write. It will reveal many simple mistakes that you could otherwise easily overlook
You have done well to choose the three-parameter form of open, but you should also use lexical file handles. So this line
open(OUT, '>', $tmp) or die "Can't open $tmp: $!";
should be written as
open my $out_fh, '>', $tmp or die "Can't open $tmp: $!";
It's probably best to supply the input and output file names on the command line, so you don't have to edit your program to run it against different files
I would solve your problem like this. It checks to see if each line is a header that contains a string enclosed in square brackets. The first test is that the line starts with a close angle bracket >, and the second test is the same as you wrote in your own program that captures the bracketed string — the species name
If these checks are passed then the species name is printed with an closing angle bracket and a newline, otherwise the line is printed as it is
This program should be run like this
$ fasta_species.pl seqs.fasta > newseqs.fasta
The dollar is just the Linux prompt character, and it assumes you have put the program in a file names fasta_species.pl. You can omit the > newseqs.fasta to display the output directly to the screen so that you can see what is being produced without creating an output file and editing it
use strict;
use warnings 'all';
while ( <> ) {
if ( /^>/ and / \[ ( [^\[\]]+ ) \] /x ) {
print ">$1\n";
}
else {
print;
}
}
output
>Arabidopsis thaliana
MKIRNSPSFHGGSGYSVFRARNLTFKKVVKKVMRDQSNNQFMIQMENMIRRIVREEIQRSLQPFLSSSCVSMERSRSETP
SSRSRLKLCFINSPPSSIFTGSKIEAEDGSPLVIELVDATTNTLVSTGPFSSSRVELVPLNADFTEESWTVEGFNRNILT
QREGKRPLLTGDLTVMLKNGVGVITGDIAFSDNSSWTRSRKFRLGAKLTGDGAVEARSEAFGCRDQRGESYKKHHPPCPS
DEVWRLEKIAKDGVSATRLAERKILTVKDFRRLYTIIGAGVSKKTWNTIVSHAMDCVLDETECYIYNANTPGVTLLFNSV
YELIRVSFNGNDIQNLDQPILDQLKAEAYQNLNRITAVNDRTFVGHPQRSLQCPQDPGFVVTCSGSQHIDFQGSLDPSSS
SMALCHKASSSTVHPDVLMSFDNSSTARFHIDKKFLPTFGNSFKVSELDQVHGKSQTVVTKGCIENNEEDENAFSYHHHD
DMTSSWSPGTHQAVETMFLTVSETEEAGMFDVHFANVNLGSPRARWCKVKAAFKVRAAFKEVRRHTTARNPREGL

Perl : How to remove white spaces in beginning of each line of a file?

I have written this code but it is not working..The white spaces still exist
open(FILE2, "<WordNetTest2.txt");
my #lin=<FILE2>;
while (<FILE2>) {
s/\^\s+//g;
}
print FILE2 "#lin";
close(FILE2);
There are several problems with your code:
3 argument open is much better.
you should really check return codes of open.
Assigning a file to an array reads the whole file. This is redundant in your case, and wastes memory (which is a consideration for larger files).
but when you do this, the while loop right after has nothing to read.
You're opening a file to read, which means it isn't open for writing.
your regex is broken. Escaping \^ makes it literal. Try s/\A\s+// or w/^\s+// which will match start of line.
Something like this:
#!/usr/bin/perl
use strict;
use warnings;
open ( my $input, "<", "WordNetTest2.txt" ) or die $!;
open ( my $output, ">", "WordNetTest2NEW.txt" ) or die $!;
while ( my $line = <$input> ) {
$line =~ s/^\s+//;
print {$output} $line;
}
close ( $input );
close ( $output );
#copy one over the other if so inclined
However, your problem may be solved even more simply with sed:
sed -i.bak -e 's/^ +//g' WordNetTest2.txt
There are several issues with your code.
You open the filehandle for reading (better would be the three-argument-version of open with lexical filehandle and error checking open my $fh , '<' , 'file.txt' or die "Cannot read file.txt: $!\n";). Then you read in the complete file in the array #lin leaving the filehandle at EOF. Therefor the while loop has nothing to do.
Now you try to print the unmodified array to a readonly filehandle and then close the filehandle.
Better would be:
use strict ;
use warnings ;
my $filename = 'WordNetTest2.txt' ;
open my $fh , '<' , $filename or die "Cannot read '$filename': $!\n" ;
my #lines = <$fh> ;
close $fh ;
for ( #lines ) {
s/^\s+// ; # No need for global substitution
}
open $fh , '>' , $filename or die "Cannot write '$filename': $!\n" ;
print $fh #lines ;
close $fh ;
Try this :
perl -i -pe 's/^\s+//' WordNetTest2.txt
You've escaped your caret:
s/\^\s+//g;
^--
which means it's no longer the "start of string" regex metachar. it's become a literal caret, meaning:
foo^ bar <--matches
foo bar <--no match, because no caret
Is very easy:
open(FILE2, "< WordNetTest2.txt");
my #lin=< FILE2>;
$str_file = join("", #lin);
print FILE2 "$str_file";
close(FILE2);
It worked for me.

What can be wrong with word count program?

I've got a question in my test:
What is wrong with program that counts number of lines and words in file?
open F, $ARGV[0] || die $!;
my #lines = <F>;
my #words = map {split /\s/} #lines;
printf "%8d %8d\n", scalar(#lines), scalar(#words);
close(F);
My conjectures are:
If file does not exist, program won't tell us about that.
If there are punctuation signs in file, program will count them, for example, in
abc cba
, , ,dce
will be five word, but on the other hand wc outputs the same result, so it might be considered as correct behavior.
If F is a large file, it might be better to iterate over lines and not to dump it into lines array.
Do you have any less trivial ideas?
On the first line, you have a precedence problem:
open F, $ARGV[0] || die $!;
is the same as
open F, ($ARGV[0] || die $!);
which means the die is executed if the filename is false, not if the open fails. You wanted to say
open(F, $ARGV[0]) || die $!;
or
open F, $ARGV[0] or die $!;
Also, you should be using the 3 argument form of open, in case $ARGV[0] contains characters that mean something to open.
open F, '<', $ARGV[0] or die $!;
On a different note, splitting on /\s/ means that you get a "word" between consecutive whitespace characters. You probably meant /\s+/, or as amphetamachine suggested, /\W+/, depending on how you want to define a "word".
That still leaves the problem of the empty "word" you get if the line begins with whitespace. You could split on ' ' to suppress that (it's a special case), or you could trim leading whitespace first, or insert a grep { length $_ } to weed out empty "words", or abandon split and use a different method for counting words.
Processing line by line instead of reading the whole file at once would also be a good improvement, but it's not as important as those first two items.
Your conjecture #1 is incorrect: your program will die if the open fails. (see cjm's answer re order of operations.)
you're using a global filehandle, rather than a lexical variable.
you're not using the three-argument form of open.
you could just read from stdin, which gives more flexibility as to input - the user can provide a file, or pipe the input into stdin.
lastly, I wouldn't write my own code to parse words; I'd reach for CPAN, say something like Lingua::EN::Splitter.
use strict; use warnings;
use Lingua::EN::Splitter qw(words);
my ($wordcount, $lines);
while (<>)
{
my $line = $_;
$lines++;
$wordcount += scalar(words $line);
}
printf "%8d %8d\n", $lines, $wordcount;
When you open F, $ARGV[0] || die $! that will effectively exit if the file doesn't exist.
There are some improvements to be made here:
{local $/; $lines = <F>;} # read all lines at once
my #words = split /\W+/, $lines;

perl + append text between two lines in file

I need to edit file , the main issue is to append text between two known lines in the file
for example I need to append the following text
a b c d e f
1 2 3 4 5 6
bla bla
Between the first_line and the second_line
first_line=")"
second_line="NIC Hr_Nic ("
remark: first_line and second_line argument can get any line or string
How to do this by perl ? ( i write bash script and I need to insert the perl syntax in my script)
lidia
You could read the file in as a single string and then use a regular expression to do the search and replace:
use strict;
use warnings;
# Slurp file myfile.txt into a single string
open(FILE,"myfile.txt") || die "Can't open file: $!";
undef $/;
my $file = <FILE>;
# Set strings to find and insert
my $first_line = ")";
my $second_line = "NIC Hr_Nic (";
my $insert = "hello world";
# Insert our text
$file =~ s/\Q$first_line\E\n\Q$second_line\E/$first_line\n$insert\n$second_line/;
# Write output to output.txt
open(OUTPUT,">output.txt") || die "Can't open file: $!";
print OUTPUT $file;
close(OUTPUT);
By unsetting $/ we put Perl into "slurp mode" and so can easily read the whole file into $file.
We use the s/// operator to do a search and replace using our two search lines as a pattern.
The \Q and \E tell Perl to escape the strings between them, i.e. to ignore any special characters that happen to be in $first_line or $second_line.
You could always write the output over the original file if desired.
The problem as you state it is not solvable using the -i command line option since this option processes a file one line at a time; to insert text between two specific lines you'll need to know about two lines at once.
Well to concenate strings you do
my $text = $first_line . $second_line;
or
my $text = $first_line;
$text .= $second_line;
I'm not sure if I understand your question correctly. A "before and after" example of the file content would, I think, be easier. Anyhow, Here's my take on it, using splice instead of a regular expression. We must of course know the line numbers for this to work.
Load the file into an array:
my #lines;
open F, '<', 'filename' or die $!;
push #lines, $_ for <F>;
close F;
Insert the stuff (see perldoc -f splice):
splice #lines, 1, 0, ('stuff');
..and you're done. All you need to do now is save the array again:
open F, '>', 'filename' or die $!;
print F #lines;
close F;