Using goto LABEL for comparing two files - perl

I am unable to get desired output.
Please help to correct my errors.
file1
A
B
C
D
E
F
file2
A
D
C
Desired Output (if found print '1' at relative position in larger file and if not print '0')
1
0
1
1
0
0
code
#!/usr/bin/perl -w
open(FH,$file);
#q=<FH>;
open(FH1,$file2);
#d=<FH1>;
open(OUT,">out.txt");
foreach $i(#q) {
foreach $j(#d) {
if ($i eq $j) {
$id=1 ;
goto LABEL;
} elsif ($i ne $j) {
$id=1;
goto LABEL;
}
}
}
print OUT "1\t";
LABEL:
print OUT "0\t";
}
close FH;
close FH1;
close OUT;
note: actual files are much much larger and contain uneven number of elements.

You were looking for
for $q (#q) {
my $found = 0;
for $d (#d) {
if ($q eq $d) {
$found = 1;
goto LABEL;
}
}
LABEL: print "$found\n";
}
The above is better written as follows:
for $q (#q) {
my $found = 0;
for $d (#d) {
if ($q eq $d) {
$found = 1;
last;
}
}
print "$found\n";
}
But those solutions perform poorly. You can avoid iterating over #d repeatedly by using a hash.
my %d = map { $_ => 1 } #d;
for $q (#q) {
print $d{$q} ? "1" : "0", "\n";
}

Consider the following approach:
use strict;
use warnings;
use autodie;
use feature 'say';
open my $fh1, '<', 'file1';
open my $fh2, '<', 'file2';
say <$fh1> eq <$fh2> ? '1' : '0'
until eof $fh1 or eof $fh2;
Notes:
use strict; use warnings; to maintain sanity
autodie to take care of failed file opens
Lexical filehandles are preferred to bareword filehandles
say for syntactic sugar to automatically append a newline at the end of every 1 or 0
Diamond operator to read in each filehandle line-by-line
eq to string-compare the two lines
Ternary operator (COND ? TRUE : FALSE) to decide whether to print 1 or 0
until is a negated while
eof to tell the loop when either of the two filehandles has been exhausted

As it was said don't use LABEL. And to be honest you don't need perl for that, because join and sed do the job (may be you need to sort the files first):
join -a1 -a2 -e "0" -o 2.1 file1.txt file2.txt | sed "s/[^0]/1/g"
May be you need to sort your files first - in this case have a look at this post: comparing to unsorted files.

To be honest LABEL is not your friend - don't do that. For me it sounds more like a job for the join. But if you want to solve it using Perl I would try the following:
If the input files are sorted (otherwise you can use sort to achieve that) compare them line by line and print the result:
while ($line_from_f1 = <F1>)
{
$line_from_f2=<F2>;
if ($line_from_f1 eq $line_from_f2)
{
print "1\n";
}
else
{
print "0\n";
}
}
Shorter version (untested):
while (<F1>)
{
print ($_ eq <F2>)."\n";
}
Note: These versions compare the files line by line - if a line is missing in the middle it does not work properly.

Related

perl program to validate user input by comparing existing array

Im trying to write a Perl script to validate user input.
If the user provide wrong value then it should show which is wrong value among the provided user input.
Example:
I have any array #arr=qw/cat rat mat sat/;
If user provide the input as perl validate_user_input.pl cat sot bat then, my script should show sot bat is wrong inputs to the script.
below is the script im trying .
#!/usr/bin/perl
use strict;
use warnings;
my (#not,#arr,$flag);
#arr=qw/cat rat mat sat/;
my $count=#arr;
foreach (#ARGV)
{
my $i=1;
foreach my $existing (#arr)
{
until ( "$existing" eq "$_" )
{
$flag = 1;
$i++;
last;
}
}
print "$i\n";
if ( ($count==$i) && ($flag == 1))
{
push(#not,"$_");
}
}
print "#not\n";
Logic i tried to implement:
Element in #ARGV will be checked against each element of #arr. If $flag=1 and the iteration $i is equal to number of elements in #arr (i.e checked against all the elements in #arr) then input not found in #arr.
Can anyone tell me what need to be done for this script to work.
Too complicated. Turn your 'valid' args into a hash, then use grep:
#!/usr/bin/env perl
use strict;
use warnings;
my #allowed = qw/cat rat mat sat/;
my %is_allowed = map { $_ => 1 } #allowed;
foreach my $arg ( grep { not $is_allowed{$_} } #ARGV ) {
print "$arg is not valid\n";
}
You can set a flag in that foreach loop if you like, if you want to proceed. Or just use grep in a scalar context:
if ((grep { not $is_allowed{$_} } #ARGV) > 0 ) {
die "invalid args found, exiting\n"
}
Well, a short search of this site would lead me to believe you are over-complicating things:
#!/usr/bin/env perl
use strict;
use warnings;
my (#not,#arr);
#arr=qw/cat rat mat sat/;
foreach (#ARGV)
{
push(#not,"$_") unless ( "$_" ~~ #arr );
}
print "#not\n";
The smart match operator (~~) still comes back as 'experimental' on my 5.24 version of bash, but it would seem it has been around since at least 5.10.
You may also want to add some kind of test at the end to check if #not has anything in it as all words may have been found. You could re-use your flag :)
Put the valid arguments into a hash %valid (instead of an array #arr). Then you can do:
for (#ARGV) {
push #not, $_ if !exists $valid{$_};
}
TIMTOWDI
use strict;
use warnings;
my #arr=qw(cat rat mat sat);
my $ok = { map{ $_ => 0 } #arr};
my $wrong;
exists $ok->{$_} ? $ok->{$_}++ : $wrong->{$_}++ for #ARGV;
print "OK $_ : $ok->{$_}\n" for #arr;
print "NO $_ : $wrong->{$_}\n" for (sort keys %$wrong);
using
perl arg.pl bat cat ded rat mat foo bar foo cat
output
OK cat : 2
OK rat : 1
OK mat : 1
OK sat : 0
NO bar : 1
NO bat : 1
NO ded : 1
NO foo : 2

Extract and filter a range of lines from the input using Perl

I'm quite new to Perl and I have some problems in skipping lines using a foreach loop. I want to copy some lines of a text file to a new one.
When the first words of a line are FIRST ITERATION, skip two more lines and print everything following until the end of the file or an empty line is encountered.
I've tried to find out a similar post but nobody talks about working with text files.
This is the form I thought of
use 5.010;
use strict;
use warnings;
open( INPUT, "xxx.txt" ) or die("Could not open log file.");
open( OUT, ">>yyy.txt" );
foreach my $line (<INPUT>) {
if ( $line =~ m/^FIRST ITERATION/ ) {
# print OUT
}
}
close(OUT);
close(INFO);
I tried using next and $line++ but my program prints only the line that begins with FIRST ITERATION.
I may try to use a for loop but I don't know how many lines my file may have, nor do I know how many lines there are between "First Iteration" and the next empty line.
The simplest way is to process the file a line at a time and keep a state flag which is set to 1 if the current line is begins with FIRST ITERATION and 0 if it is blank, otherwise it is incremented if it is already positive so that it provides a count of the line number within the current block
This solution expects the path to the input file as a parameter on the command line and prints its output to STDOUT, so you will need to redirect the output to the file on the command line as necessary
Note that the regex pattern /\S/ checks whether there is a non-blank character anywhere in the current line, so not /\S/ is true if the line is empty or all blank characters
use strict;
use warnings;
my $lines = 0;
while ( <> ) {
if ( /^FIRST ITERATION/ ) {
$lines = 1;
}
elsif ( not /\S/ ) {
$lines = 0;
}
elsif ( $lines > 0 ) {
++$lines;
}
print if $lines > 3;
}
This can be simplified substantially by using Perl's built-in range operator, which keeps its own internal state and returns the number of times it has been evaluated. So the above may be written
use strict;
use warnings;
while ( <> ) {
my $s = /^FIRST ITERATION/ ... not /\S/;
print if $s and $s > 3;
}
And the last can be rewritten as a one-line command line program like this
$ perl -ne '$s = /^FIRST ITERATION/ ... not /\S/; print if $s and $s > 3' myfile.txt
Use additional counter, that will say on which condition print line. Something like this:
$skipCounter = 3;
And in foreach:
if ($skipCounter == 2) {
// print OUT
}
if ( $line =~ m/^FIRST ITERATION/) {
$skipCounter = 0;
}
$skipCounter++;
Advice: Use STDIN and STDOUT instead of files, this will allowes you to change them without modifying script
Code:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
open(INPUT, "xxx.txt" ) or die "Could not open log file: $!.";
open(OUT, ">yyy.txt") or die "Could not open output file: $!";
while( my $line = <INPUT> )
{
if ( $line =~ m/^FIRST ITERATION/) {
<INPUT>; # skip line
<INPUT>; # skip line
while( $line = <INPUT>) # print till empty line
{
last if $line eq "\n";
print OUT $line;
}
};
};
close (OUT);
close (INPUT);
You're on the right track. What you need to use is the flip-flop operator (which is basically the range operator) ... It will toggle for you between two matches, so you get everything in between. After that, it's a matter of keeping track of the lines you want to skip.
So basically we are checking for FIRST ITERATION and for an empty line, and grab everything in between those. $skip is used to remember how many lines were skipped. It starts at 0 and gets incremented for the first two lines after we start being in the flip-flop if block. In the else case, where we are after the flip-flop, it gets reset to 0 so we can start over with the next block.
Since you know how to open and write files, I'll skip that.
use strict;
use warnings;
my $skip = 0;
while (<DATA>) {
if (/^FIRST ITERATION$/ .. /^$/) {
next if $skip++ <= 2;
print $_;
} else {
$skip = 0;
}
}
__DATA__
FIRST ITERATION
skip1
skip2
foo
bar
baz
don't print this
The output of this is:
foo
bar
baz
To stick with your own code, here's a very verbose solution that uses a foreach and no flip-flop. It does the same thing, just with a lot more words.
my $skip = 0; # skip lines
my $match = 0; # keep track of if we're in between the borders
foreach my $line (<DATA>) {
if ( $line =~ m/^FIRST ITERATION/ ) {
$match = 1; # we are inside the match
next;
}
if ($line =~ m/^$/) {
$match = 0; # we are done matching
next;
}
if ($match) {
$skip++; # count skip-lines
if ($skip <= 2) {
next; # ... and skip the first two
}
print $line; # this is the content we want
}
}
Using paragraph mode (which returns blocks separated by blank lines rather than lines):
local $/ = ""; # Paragraph mode.
while (<>) {
s/\n\K\n+//; # Get rid of trailing empty lines.
print /^FIRST ITERATION\n[^\n]*\n[^\n]*\n(.*)/ms;
}
Using the flip-flop operator:
while (<>) {
if (my $line_num = /^FIRST ITERATION$/ .. /^$/) {
print if $line_num > 3 && $line_num !~ /E0/;
}
}
$line_num !~ /E0/ is true when the flip-flop is flopping (i.e. for the first empty line after FIRST ITERATION). This is checked to avoid printing the blank line.

How to identify nth lines of n files in while<>

I have a code which adds all vectors in all files.
There can be any number of input files. For example first input file is:
0.55 0 0.3335 1.2
0.212 0 2.2025 1
and the second one is:
0.25 0 0.3333 1.0
0.1235 0 0.2454 1
What I get is the sum of all vectors, thus in result i get one vector
which is:
1.13550 0 3.1147 4.2
But I'm trying to sum the first vector of the first file with the first vector of the second file and so on. In result according to this example I should get 2 vectors.
For now I have this:
use strict;
use warnings;
if ($ARGV[0] ne "vector1.dat"){
die ("vector1.dat is necessary as first argument");
}
my #sum = 0;
my $dim = 0;
while (<>) {
#Ignore blank lines, hashtags
#and lines starting with $
if ($_ =~ /#/ || $_ =~ /^$/ || $_ =~ /^\s$/){
next;
}
my #vectors = split(" ", $_);
my $vector_length = #vectors;
if ($dim eq 0) {
$dim = $vector_length;
}
else {
if ($dim ne $vector_length) {
die ("Vector dimensions do not match. : $!");
}
}
for (my $i = 0; $i <= $#vectors; $i++) {
$sum[$i] += $vectors[$i];
}
}
$" = "\t\t";
print "\n --- \n #sum \n";
What I need is just to find out how to identify each file's nth line
and to sum the column values of those lines while keeping in mind, that there can be n number of files.
I saw filehandling question over here with similar issue, however
I didn't find my answer there.
Just looking for some suggestions and guidance. Got stuck on this.
Open each file yourself and use the $. variable to know which line you are on (or count the files yourself). Here's the basic structure:
foreach my $file ( #files ) {
open my $fh, '<', $file or die ...;
while( <$fh> ) {
chomp;
$sum[ $. ] = ...; # $. is the line number
}
}
If you don't like $., you can use its longer name. You have to turn on English (which comes with Perl):
use English;
## use English qw( -no_match_vars ); # for v5.16 and earlier
foreach my $file ( #files ) {
open my $fh, '<', $file or die ...;
while( <$fh> ) {
chomp;
$sum[ $INPUT_LINE_NUMBER ] = ...;
}
}
Or, you can count yourself, which might be handy if the vectors in the files don't line up by strict line number (perhaps because of comments or some other formatting oddity):
foreach my $file ( #files ) {
open my $fh, '<', $file or die ...;
my $line = -1;
while( <$fh> ) {
$line++;
chomp;
$sum[ $line ] = ...;
}
}
The harder way is the answer bart gives which inspects eof at the end of every line to see if the magical ARGV handle is looking at a new file, and resetting $. if it is. It's an interesting trick but hardly anyone is going to understand what it's doing (or even notice it).
For the other part of the problem, I think you're doing the vector sum wrong, or using confusing variable names. A line is a vector, and the numbers in the lines are a component. A two dimensional array will work. The first index is the line number and the second in the component index:
while( <$fh> ) {
chomp;
... skip unwanted lines
my #components = split;
... various dimension checks
foreach my $i ( 0 .. $#components ) {
$sum[ $. ][ $i ] += $components[ $i ];
}
}
The Data::Dumper module is handy for complex data structures. You can also see the perldsc (Perl Data Structures Cookbook) documentation. The $. variable is found in perlvar.
$. is the line number of the most recently read file handle. close(ARGV) if eof; can be used to reset the file number between files (as documented in eof). (Note: eof() is different than eof.) So you now have line numbers.
The second problem you have is that you are adding vector components ($vectors[$i]) to a vectors ($sum[$i]). You need to add vector components to vectors components. Start by using more appropriate variable names.
This is what we get:
my #sum_vectors;
while (<>) {
s/#.*//; # Remove comments.
next if /^\s*$/; # Ignore blank lines.
my #vector = split;
if ($sum_vectors[$.] && #{ $sum_vectors[$.] } != #vector) {
die("$ARGV:$.: Vector dimensions do not match\n");
}
for my $i (0..$#vector) {
$sum_vectors[$.][$i] += $vector[$i];
}
} continue {
close(ARGV) if eof; # Reset line numbers for each file.
}
Two other errors fixed:
$! did not contain anything meaningful when you used it.
You ignored lines that contain comments, even if they contained valid data too.
Try this:
#!/usr/bin/perl
use strict;
use warnings;
if ($ARGV[0] ne "vector1.dat"){
die ("vector1.dat is necessary as first argument");
}
my %sum;
my $dim = 0;
my $vector_length;
my $line_number;
while (<>) {
#Ignore blank lines, hashtags
#and lines starting with $
if ($_ =~ /#/ || $_ =~ /^$/ || $_ =~ /^\s$/){
next;
}
my #vectors = split(" ", $_);
$vector_length = #vectors;
if ($dim eq 0) {
$dim = $vector_length;
}
else {
if ($dim ne $vector_length) {
die ("Vector dimensions do not match. : $!");
}
}
for (my $i = 0; $i <= $#vectors; $i++) {
$sum{$.}{$i} += $vectors[$i];
}
$line_number = $.;
$. = 0 if eof;
}
$" = "\t\t";
for (my $line=1; $line<=$line_number; $line++)
{
print $line;
for (my $vector=0; $vector<$vector_length; $vector++)
{
print " " . $sum{$line}{$vector};
}
print "\n";
}

perl equal strings returns 0 even if they are equal

Perl is continuing to surprise me. I have a code which takes an input from the command line and checks if it is in a file. I have a file like this:
ls
date
pwd
touch
rm
First i read this file as
open(MYDATA,"filename") or die "Can not open file\n";
#commandlist = <MYDATA>;
chomp #commandlist;
close MYDATA;
the argument is in $commandname variable. To check if it is correct i printed to screen.
print $commandname."\n";
it works well. then i write the code.
$count = #commandlist;
for($i=0;$i < $count;$i++)
{
print $commandname;
print $commandlist[$i];
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
and it does not print 'equal'. but it should do becaues $commandname variable has the value 'ls' which is in the file. i also print the value of $commandname and $commandlist[$i] to see if "visibly" they are equal and i get the output:
ls
lsls
lsdate
lspwd
lstouch
lsrm
here i see that they got the same value but why never eq operator evaluates to zero.
Additionally to get this task done, I have tried various methods all of which come to be useless like making a hash from the array and using exists.
I am struggling for this seemingly easy problem for a day but i just dont get it.
Thanks in advance
EDIT:
when i change the above loop as below
$count = #commandlist;
for($i=0;$i < $count;$i++)
{
print $commandlist[$i];
print $commandname;
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
I got an output like.
ls
ls
lste
lsd
lsuch
ls
it seems like for some reason it overwrites some characters.
EDIT:
my whole script is like:
#reading file code, i posted above
while(<>)
chomp($_);
$commandname = $_;
if($commandname eq "start"){
##something here
} elsif ($commandname eq "machines"){
##something here
} else {
$count = #commandlist;
for($i=0;$i < $count;$i++)
{
print $commandlist[$i];
print $commandname;
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
}
A bit change in the code would result in what you are looking for, "chomp" the string from array before you put it for comparison. Here it is
chomp $commandlist[$i];
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
EDIT: as per perldoc chomp when you are chomping a list you should parenthesis. So, in your case ... instead simply saying
chomp #commandlist
make it like
chomp(#commandlist)
FINAL EDIT: I tried this and worked fine. Give it a try
$commandname = $ARGV[0];
open(MYDATA,"chk.txt") or die "Can not open file\n";
#commandlist = <MYDATA>;
chomp(#commandlist);
close MYDATA;
print $commandname."\n";
$count = #commandlist;
print $commandname;
for($i=0;$i < $count;$i++)
{
print $commandlist[$i];
print "\n";
if($commandname eq $commandlist[$i])
{
print "equal\n";
}
}
The overwritting indicates the presence of a CR. The lines end with CR LF, but you only remove the LF with chomp. Change
while (<>) {
chomp($_)
to
while (<>) {
s/\s+\z//;
You might consider restructuring your code as:
my $path='filename';
my $match='ls';
part 1 - read the file
open(my $fh, '<', $path) or die "failed to open $path: $!";
my #commandlist=<$fh>;
chomp #commandlist;
# or you can combine these lines as:
# chomp(my #commandlist=<$fh>);
# because chomp operates on the array itself rather than making a copy.
close($fh);
or
use File::Slurp qw/ read_file /;
# see http://search.cpan.org/dist/File-Slurp/lib/File/Slurp.pm
my #commandlist=read_file($path); # result is pre-chomped!
part 2 - check for a match
foreach my $command (#commandlist) {
print "$match equals $command\n" if $match eq $command;
}
One important consideration is that each line in your file must contain only the command name and cannot begin or end with any spaces or tabs. To compensate for possible leading or trailing whitespace, try:
foreach my $command (#commandlist) {
$command=~s/^\s+|\s+$//g; # strip leading or trailing whitespace
print "$match equals $command\n" if $match eq $command;
}
And finally, always start your Perl script with a Perl developer's bestest friends:
use strict;
use warnings;
which will catch most (if not all) errors caused by sloppy programming practice. (We all suffer from this!)

extract every nth number

i want to extract every 3rd number ( 42.034 , 41.630 , 40.158 as so on ) from the file
see example-
42.034 13.749 28.463 41.630 12.627 28.412 40.158 12.173 30.831 26.823
12.596 32.191 26.366 13.332 32.938 25.289 12.810 32.419 23.949 13.329
Any suggestions using perl script ?
Thanks,
dac
You can split file's contents to separate numbers and use the modulo operator to extract every 3rd number:
my $contents = do { local $/; open my $fh, "file" or die $!; <$fh> };
my #numbers = split /\s+/, $contents;
for (0..$#numbers) {
$_ % 3 == 0 and print "$numbers[$_]\n";
}
use strict;
use warnings;
use 5.010; ## for say
use List::MoreUtils qw/natatime/;
my #vals = qw/42.034 13.749 28.463 41.630 12.627 28.412 40.158 12.173 30.831
26.823 12.596 32.191 26.366 13.332 32.938 25.289 12.810 32.419 23.949 13.329/;
my $it = natatime 3, #vals;
say while (($_) = $it->());
This is probably the shortest way to specify that. If #list is your list of numbers
#list[ grep { $_ % 3 == 0 } 0..$#list ]
It's a one-liner!
$ perl -lane 'print for grep {++$i % 3 == 1} #F' /path/to/your/input
-n gives you line-by-line processing, -a autosplitting for field processing, and $i (effectively initialized to zero for our purposes) keeps count of the number of fields processed...
This method avoids reading the entire file into memory at once:
use strict;
my #queue;
while (<>) {
push #queue, / ( \d+ (?: \. \d* ) ? ) /gx;
while (#queue >= 3) {
my $third = (splice #queue, 0, 3)[2];
print $third, "\n"; # Or do whatever with it.
}
}
If the file has 10 numbers in every line you can use this:
perl -pe 's/([\d.]+) [\d.]+ [\d.]+/$1/g;' file
It's not a clean solution but it should "do the job".
Looks like this post lacked a solution that didn't read the whole file and used grep.
#!/usr/bin/perl -w
use strict;
my $re = qr/-?\d+(?:\.\d*)/; # Insert a more precise regexp here
my $n = 3;
my $count = 0;
while (<>) {
my #res = grep { not $count++ % $n } m/($re)/go;
print "#res\n";
};
I believe you’ll find that this work per spec, behaves politely, and never reads in more than it needs to.
#!/usr/bin/env perl
use 5.010_001;
use strict;
use autodie;
use warnings qw[ FATAL all ];
use open qw[ :std IO :utf8 ];
END { close STDOUT }
use Regexp::Common;
my $real_num_rx = $RE{num}{real};
my $left_edge_rx = qr{
(?: (?<= \A ) # or use \b
| (?<= \p{White_Space} ) # or use \D
)
}x;
my $right_edge_rx = qr{
(?= \z # or use \b
| \p{White_Space} # or use \D
)
}x;
my $a_number_rx = $left_edge_rx
. $real_num_rx
. $right_edge_rx
;
if (-t STDIN && #ARGV == 0) {
warn "$0: reading numbers from stdin,"
. " type ^D to end, ^C to kill\n";
}
$/ = " ";
my $count = 0;
while (<>) {
while (/($a_number_rx)/g) {
say $1 if $count++ % 3 == 0;
}
}