Perl while loops and reading lines - perl

Each record has 4 lines:
Like the following:
#NCYC361­11a03.q1k bases 1 to 1576
GCGTGCCCGAAAAAATGCTTTTGGAGCCGCGCGTGAAAT
+
!)))))****(((***%%((((*(((+,**(((+**+,­
There are two files in which 1 file corresponded to the other
there are an array of seqeunces A1
So read 1 record at a time from file 1. read record from file 2. if the sequence in record 1 file 1 (line 2) matches the seuqnece in the array A1, i print the record from file 2 to an output file so on...but the point is i need to read a record at a time.... how would i break out of the inner loop so that i can read the next record from the file 1 and then compare it to the next record in file 2

If you ask about controlling nested loops you can do that with labels.
Example:
OUTER:
while(<>){
for(#something){
last OUTER;
}
}
See last for example.

In case only lines with same number could ever match, you don't really need more than one loop. You can call reading operation (<>, read, sysread) wherever you want. It only usually placed directly in loop because it conveniently returns undef and breaks it when work is done.
while(defined(my $first_line = <FIRST>)){
my $second_line = <SECOND>;
if($first_line eq $second_line){
print "match\n";
} else {
print "no match\n";
}
}

From your sentence I need to check if the sequence matches any with the sequence from the second I gather that you want to check whether any lines in the two files match?
If you need to read a file several times then you can use seek to rewind to the start of it without reopening it.
This program shows the idea.
use strict;
use warnings;
open my $fh1, '<', 'file1' or die $!;
open my $fh2, '<', 'file2' or die $!;
open my $out, '>', 'matches' or die $!;
while (my $line1 = <$fh1>) {
seek $fh2, 0, 0;
while (my $line2 = <$fh2>) {
if ($line1 eq $line2) {
print $out $line1;
last;
}
}
}
Edit
Your comment has changed the problem. Both files have four-line records and you want to compare the second line in corresponding records across the two files.
use strict;
use warnings;
open my $fh1, '<', 'file1' or die $!;
open my $fh2, '<', 'file2' or die $!;
open my $match, '>', 'matches' or die $!;
open my $nomatch, '>', 'nomatch' or die $!;
while (1) {
my (#data1, #data2);
for (1 .. 4) {
my $line;
$line = <$fh1>;
push #data1, $line if defined $line;
$line = <$fh2>;
push #data2, $line if defined $line;
}
last unless #data1 == 4 and #data2 == 4;
if ($data1[1] eq $data2[1]) {
print $match #data2;
}
else {
print $nomatch #data2;
}
}

A full example :
#!/usr/bin/env perl
use strict;
use warnings;
open F1, "<", "/path/1";
open F2, "<", "/path/2";
#a1 = <F1>;
#a2 = <F2>;
for (0..$#a1) {
if ($a1[$_] eq $a2[$_]) {
print "MATCH line [$_]\n";
} else {
print "DOESN'T MATCH line [$_]\n";
}
}

Related

Recursive search in Perl?

I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)
The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.
If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;
You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.

Can't call method "print" on an undefined value

Can't call method print on an undefined value in line 40 line 2.
Here is the code. I use FileHandle to settle files:
#!/usr/bin/perl
use strict;
use warnings;
use FileHandle;
die unless (#ARGV ==4|| #ARGV ==5);
my #input =();
$input[0]=$ARGV[3];
$input[1]=$ARGV[4] if ($#ARGV==4);
chomp #input;
$input[0] =~ /([^\/]+)$/;
my $out = "$1.insert";
my $lane= "$1";
my %fh=();
open (Info,">$ARGV[1]") || die "$!";
open (AA,"<$ARGV[0]") || die "$!";
while(<AA>){
chomp;
my #inf=split;
my $iden=$inf[0];
my $outputfile="$ARGV[2]/$iden";
$fh{$iden}=FileHandle->new(">$outputfile");
}
close AA;
foreach my $input (#input) {
open (IN, "<$input" ) or die "$!" ;
my #path=split (/\//,$input);
print Info "#$path[-1]\n";
while (<IN>) {
my $line1 = $_;
my ($id1,$iden1) = (split "\t", $line1)[6,7];
my $line2 = <IN> ;
my ($id2,$iden2) = (split "\t", $line2)[6,7];
if ($id1 eq '+' && $id2 eq '-') {
my #inf=split(/\t/,$line1);
$fh{$iden1}->print($line1);
$fh{$iden2}->print($line2);
}
}
close IN;
}
I’ve tried multiple variations of this, but none of them seem to work. Any ideas?
Please remember that the primary worth of a Stack Overflow post is not to fix your particular problem, but to help the thousands of others who may be stuck in the same way. With that in mind, "I fixed it, thanks, bye" is more than a little selfish
As I said in my comment, using open directly on a hash element is much preferable to involving FileHandle. Perl will autovivify the hash element and create a file handle for you, and most people at all familiar with Perl will thank you for not making them read up again on the FileHandle documentation
I rewrote your code like this, which is much more Perlish and relies less on "magic numbers" to access #ARGV. You should really assign #ARGV to a list of named scalars, or - better still - use Getopt::Long so that they are named anyway
You should open your file handles as late as possible, and close the output handles early. This is effected most easily by using lexical file handles and limiting their scope to a block. Perl will implicitly close lexical handles for you when they go out of scope
There is no need to chomp the contents of #ARGVunless you could be be called under strange and errant circumstances, in which case you need to do a hell of a lot more to verify the input
You never use the result of $input[0] =~ /([^\/]+)$/ or the variables $out and $lane, so I removed them
#!/usr/bin/perl
use strict;
use warnings 'all';
# $ARGV[0] -- input file
# $ARGV[1] -- output log file
# $ARGV[2] -- directory for outputs per ident
# $ARGV[3] -- 1, $input[0]
# $ARGV[4] -- 2, $input[1] or undef
die "Fix the parameters" unless #ARGV == 4 or #ARGV == 5;
my #input = #ARGV[3,4];
my %fh;
{
open my $fh, '<', $ARGV[0] or die $!;
while ( <$fh> ) {
my $id = ( split )[0];
my $outputfile = "$ARGV[2]/$id";
open $fh{$id}, '>', $outputfile or die qq{Unable to open "$outputfile" for output: $!};
}
}
open my $log_fh, '>', $ARGV[1] or die qq{Unable to open "$ARGV[1]" for output: $!};
for my $input ( #input ) {
next unless $input; # skip unspecified parameters
my #path = split qr|/|, $input; # Really should be done by File::Spec
print $log_fh "#$path[-1]\n"; # Or File::Basename
open my $fh, '<', $input or die qq{Unable to open "$input" for input: $!};
while ( my $line0 = <$fh> ) {
chomp $line0;
my $line1 = <$fh>;
chomp $line1;
my ($id0, $iden0) = (split /\t/, $line0)[6,7];
my ($id1, $iden1) = (split /\t/, $line1)[6,7];
if ( $id0 eq '+' and $id1 eq '-' ) {
$fh{$_} or die qq{No output file for "$_"} for $iden0, $iden1;
print { $fh{$iden0} } $line0;
print { $fh{$iden1} } $line1;
}
}
}
while ( my ($iden, $fh) = each %fh ) {
close $fh or die qq{Unable to close file handle for "$iden": $!};
}
You don't have any error handling on this line:
$fh{$iden}=FileHandle->new(">$outputfile");
It's possible that opening a filehandle is silently failing, and only producing an error when you try to print to it. For example, if you have specified an invalid filename.
Also, you never check if $iden1 and $iden2 are names of open filehandles that actually exist. It's possible one of them does not exist.
In particular, you aren't removing a newline from $line1, so if $iden1 and $iden2 happen to be the last values on the line, this will be included in the name you are trying to use, and it will fail.
In your first while loop, you set up a hash of filehandles that you will write to later. The keys in this hash are the "iden" strings from the first file passed to the program.
Later, you parse another file and use the "iden" values in that file to choose which filehandle to write data to. But one (or more) of the "iden" values in the second file is missing from the first file. So that filehandle can't be found in the %fh hash. Because you don't check for that, you get `undef back from the hash and you can't print to an undefined filehandle.
To fix it, put a check before trying to use one of the filehandles from the %fh hash.
die "Unknown fh identifier '$iden1'" unless exists $fh{$iden1};
die "Unknown fh identifier '$iden2'" unless exists $fh{$iden2};
$fh{$iden1}->print($line1);
$fh{$iden2}->print($line2);

How to compare two text files and removing the matching contents and pass to output in perl?

I have two text files text1.txt and text2.txt like below
text1
ac
abc
abcd
abcde
text2
ab
abc
acd
abcd
output
ac
abcde
I need to compare the two files and remove the content from text1 when there is a match in the second file.
I want the code in Perl. Currently I am trying the below code.
#!usr/bin/perl
use strict;
use warnings;
open (GEN, "text1.txt") || die ("cannot open general.txt");
open (SEA, "text2.txt") || die ("cannot open search.txt");
open (OUT,">> output.txt") || die ("cannot open intflist.txt");
open (LOG, ">> logfile.txt");
undef $/;
foreach (<GEN>) {
my $gen = $_;
chomp ($gen);
print LOG $gen;
foreach (<SEA>) {
my $sea = $_;
chomp($sea);
print LOG $sea;
if($gen ne $sea) {
print OUT $gen;
}
}
}
In this I am getting all content from text1, not the unmatched content. Please help me out.
I think you should read the text2 in an array and then in the second foreach on that array use the array.
#b = <SEA>;
Or else in the second loop the file pointer would be at the end already
One way:
#!/usr/bin/perl
use strict;
use warnings;
$\="\n";
open my $fh1, '<', 'file1' or die $!;
open my $fh2, '<', 'file2' or die $!;
open my $out, '>', 'file3' or die $!;
chomp(my #arr1=<$fh1>);
chomp(my #arr2=<$fh2>);
foreach my $x (#arr1){
print $out $x if (!grep (/^\Q$x\E$/,#arr2));
}
close $fh1;
close $fh2;
close $out;
After executing the above, the file 'file3' contains:
$ cat file3
ac
abcde
This is my plan:
Read the contents of first file in a hash, with a counter of occurrences. For example, working with your data you get:
%lines = ( 'ac' => 1,
'abc' => 1,
'abcd' => 1,
'abcde' => 1);
Read the second file, deleting the previous hash %lines if key exists.
Print the keys %lines to the desired file.
Example:
use strict;
open my $fh1, '<', 'text1' or die $!;
open my $fh2, '<', 'text2' or die $!;
open my $out, '>', 'output' or die $!;
my %lines = ();
while( my $key = <$fh1> ) {
chomp $key;
$lines{$key} = 1;
}
while( my $key = <$fh2> ) {
chomp $key;
delete $lines{$key};
}
foreach my $key(keys %lines){
print $out $key, "\n";
}
close $fh1;
close $fh2;
close $out;
Your main problem is that you have undefined the input record separator $/. That means the whole file will be read as a single string, and all you can do is say that the two files are different.
Remove undef $/ and things will work a whole lot better. However the inner for loop will read and print all the lines in file2 that don't match the first line of file1. The second time this loop is encountered all the data has been read from the file so the body of the loop won't be executed at all. You must either open file2 inside the outer loop or read the file into an array and loop over that instead.
Then again, do you really want to print all lines from file2 that aren't equal to each line in file1?
Update
As I wrote in my comment, it sounds like you want to output the lines in text1 that don't appear anywhere in text2. That is easily achieved using a hash:
use strict;
use warnings;
my %exclude;
open my $fh, '<', 'text2.txt' or die $!;
while (<$fh>) {
chomp;
$exclude{$_}++;
}
open $fh, '<', 'text1.txt' or die $!;
while (<$fh>) {
chomp;
print "$_\n" unless $exclude{$_};
}
With the data you show in your question, that produces this output
ac
abcde
I would like to view your problem like this:
You have a set S of strings in file.txt.
You have a set F of forbidden strings in forbidden.txt.
You want the strings that are allowed, so S \ F (setminus).
There is a data structure in Perl that implements a set of strings: The hash. (It can also map to scalars, but that is secondary here).
So first we create the set of the lines we have. We let all the strings in that file map to undef, as we don't need that value:
open my $FILE, "<", "file.txt" or die "Can't open file.txt: $!";
my %Set = map {$_ => undef} <$FILE>;
We create the forbidden set the same way:
open my $FORBIDDEN, "<", "forbidden.txt" or die "Can't open forbidden.txt: $!";
my %Forbidden = map {$_ => undef} <$FORBIDDEN>;
The set minus works like either of these ways:
For each element x in S, x is in the result set R iff x isn't in F.
my %Result = map {$_ => $Set{$_}} grep {not exists $Forbidden{$_}} keys %Set;
The result set R initially is S. For each element in F, we delete that item from R:
my %Result = %Set; # make a copy
delete $Result{$_} for keys %Forbidden;
(the keys function accesses the elements in the set of strings)
We can then print out all the keys: print keys %Result.
But what if we want to preserve the order? Entries in a hash can also carry an associated value, so why not the line number? We create the set S like this:
open my $FILE, "<", "file.txt" or die "Can't open file.txt: $!";
my $line_no = 1;
my %Set = map {$_ => $line_no++} <$FILE>;
Now, this value is carried around with the string, and we can access it at the end. Specifically, we sort the keys in the hash after their line number:
my #sorted_keys = sort { $Result{$a} <=> $Result{$b} } keys %Result;
print #sorted_keys;
Note: All of this assumes that the files are terminated by newline. Else, you would have to chomp.

Perl - binary unpack using pointer and index

I have a binary file that contain 3 files, a PNG, a PHP and a TGA file.
Here the file to give you the idea : container.bin
the file is build this way:
first 6 bytes are a pointer to the index, in this case 211794
Then you have all 3 files stacked one after the other
and at the ofset 211794, you have the index, that tell you where the file start and end
in this example you have:
[offset start] [offset end] [random data] [offset start] [name]
6 15149 asdf 6 Capture.PNG
15149 15168 4584 15149 index.php
15168 211794 12 15168 untilted.tga
meaning that capture.png start at offset 6, finish at offset 15149, then asdf is a random data, and the start offset is repeated again.
Now what I want to do is a perl to separate the file on this binary files.
The perl need to check the first 6 offset of the file (header), then jump to the index location, and use the list to extract the file out.
A mix of seek and read can be used to achieve the task:
#!/usr/bin/env perl
use strict;
use warnings;
use Fcntl 'SEEK_SET';
sub get_files_info {
my ( $fh, $offset ) = #_;
my %file;
while (<$fh>) {
chomp;
my $split_count = my ( $offset_start, $offset_end, $random_data, $offset_start_copy,
$file_name ) = split /\s/;
next if $split_count != 5;
if ( $offset_start != $offset_start_copy ) {
warn "Start of offset mismatch: $file_name\n";
next;
}
$file{$file_name} = {
'offset_start' => $offset_start,
'offset_end' => $offset_end,
'random_data' => $random_data,
};
}
return %file;
}
sub write_file {
my ( $fh, $file_name, $file_info ) = #_;
seek $fh, $file_info->{'offset_start'}, SEEK_SET;
read $fh, my $contents,
$file_info->{'offset_end'} - $file_info->{'offset_start'};
open my $fh_out, '>', $file_name or die 'Error opening file: $!';
binmode $fh_out;
print $fh_out $contents;
print "Wrote file: $file_name\n";
}
open my $fh, '<', 'container.bin' or die "Error opening file: $!";
binmode $fh;
read $fh, my $offset, 6;
seek $fh, $offset, SEEK_SET;
my %file = get_files_info $fh, $offset;
for my $file_name ( keys %file ) {
write_file $fh, $file_name, $file{$file_name};
}
The only real difficulty here is to make sure that both input and output files are read in binary mode. This can be achieved by using the :raw PerlIO layer when the files are opened.
This program seems to do what you want. It first locates and reads the index block into a string, and then opens that string for input and reads the start and end position and name of each of the constituent files. Thereafter processing each file is simple.
Be aware that unless the formatting of the index block is more strict than you say, you can rely only on the first, second, and last whitespace-separated fields on each line since random text could contain spaces. There is also no way to specify a file name containing spaces.
The output, using Data::Dump, is there to demonstrate correct functionality and is not necessary for the functioning of the program.
use v5.10;
use warnings;
use Fcntl ':seek';
use autodie qw/ open read seek close /;
open my $fh, '<:raw', 'container.bin';
read $fh, my $index_loc, 6;
seek $fh, $index_loc, SEEK_SET;
read $fh, my ($index), 1024;
my %contents;
open my $idx, '<', \$index;
while (<$idx>) {
my #fields = split;
next unless #fields;
$contents{$fields[-1]} = [ $fields[0], $fields[1] ];
}
use Data::Dump;
dd \%contents;
for my $file (keys %contents) {
my ($start, $end) = #{ $contents{$file} };
my $size = $end - $start;
seek $fh, $start, SEEK_SET;
my $nbytes = read $fh, my ($data), $size;
die "Premature EOF" unless $nbytes == $size;
open my $out, '>:raw', $file;
print { $out } $data;
close $out;
}
output
{
"Capture.PNG" => [6, 15149],
"index.php" => [15149, 15168],
"untilted.tga" => [15168, 211794],
}

perl + create loop to print values from INI file

My name is abbi
My first perl script run on linux machine
This script read the INI file called (input) and print the values of val , param , name .....
How to create loop that print values of val1-valn OR loop to print values of param1-paramn... etc? (in place the print command's in the script )
the loop must have option to match the parameter
for example print only param1 until paramn values
n - Is the last number of each param
#!/usr/bin/perl
open(IN,"input") or die "Couldn't open input: $!\n";
while(<IN>) {
chomp;
/^([^=]+)=(.*)$/;
$config{$1} = $2;
}
close(IN);
print $config{val1};
print $config{val2};
print $config{val3};
print $config{param1};
print $config{param2};
print $config{param3};
print $config{name1};
.
.
.
.
example of the ini file from linux machine
cat input
val1=1
val2=2
val3=3
param1=a
param2=b
param3=c
name1=abbi
name2=diana
name3=elena
You can use Config::Tiny to read your .ini file.
Then you can use the returned hash to filter what you want.
According to your last comment, this will do what you want:
use strict;
use warnings;
my %config;
my $max_n = 0;
my $input = 'input';
open my $in, '<', $input
or die "unable to open '$input' for reading: $!";
while (<$in>) {
chomp;
if (/^(.*?(\d+))\s*=(.*)$/) {
$config{$1} = $3;
$max_n = $2 if $2 > $max_n;
}
}
close $in or die "unable to close '$input': $!";
for my $n(1..$max_n) {
for my $param (qw/val param/) {
print "$param.$n = $config{$param.$n}\n" if exists $config{$param.$n};
}
}
How about this:
use warnings;
use strict;
my %config;
open my $input, "<", "input"
or die "Couldn't open input: $!\n";
while(<$input>) {
chomp;
if ( /^([^=]+)=(.*)$/) {
$config{$1} = $2;
}
}
close($input) or die $!;
for (sort keys %config) {
if (/param\d+/) {
print "$config{$_}\n";
}
}