How to find lines containing a match between two files in perl?

How to find lines containing a match between two files in perl? - perl

I'm a novice at using perl. What I want to do is compare two files. One is my index file that I am calling "temp." I am attempting to use this to search through a main file that I am calling "array." The index file has only numbers in it. There are lines in my array that have those numbers. I've been trying to find the intersection between those two files, but my code is not working. Here's what I've been trying to do.
#!/usr/bin/perl
print "Enter the input file:";
my $filename=<STDIN>;
open (FILE, "$filename") || die "Cannot open file: $!";
my #array=<FILE>;
close(FILE);
print "Enter the index file:";
my $temp=<STDIN>;
open (TEMP, "$temp") || die "Cannot open file: $!";
my #temp=<TEMP>;
close(TEMP);
my %seen= ();
foreach (#array) {
$seen{$_}=1;
}
my #intersection=grep($seen{$_}, #temp);
foreach (#intersection) {
print "$_\n";
}
If I can't use intersection, then what else can I do to move each line that has a match between the two files?
For those of you asking for the main file and the index file:
Main file:
1 CP TRT
...
14 C1 MPE
15 C2 MPE
...
20 CA1 MPE
Index file
20
24
22
17
18
...
I want to put those lines that contain one of the numbers in my index file into a new array. So using this example, only
20 CA1 MPE would be placed into a new array.
My main file and index file are both longer than what I've shown, but that hopefully gives you an idea on what I'm trying to do.

I am assuming something like this?
use strict;
use warnings;
use Data::Dumper;
# creating arrays instead of reading from file just for demo
# based on the assumption that your files are 1 number per line
# and no need for any particular parsing
my #array = qw/1 2 3 20 60 50 4 5 6 7/;
my #index = qw/10 12 5 3 2/;
my #intersection = ();
my %hash1 = map{$_ => 1} #array;
foreach (#index)
{
if (defined $hash1{$_})
{
push #intersection, $_;
}
}
print Dumper(\#intersection);
==== Out ====
$VAR1 = [
'5',
'3',
'2'
];

A few things:
Always have use strict; and use warnings; in your program. This will catch a lot of possible errors.
Always chomp after reading input. Perl automatically adds \n to the end of lines read. chomp removes the \n.
Learn a more modern form of Perl.
Use nemonic variable names. $temp doesn't cut it.
Use spaces to help make your code more readable.
You never stated the errors you were getting. I assume it has to do with the fact that the input from your main file doesn't match your index file.
I use a hash to create an index that the index file can use via my ($index) = split /\s+/, $line;:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
use feature qw(say);
print "Input file name: ";
my $input_file = <STDIN>;
chomp $input_file; # Chomp Input!
print "Index file name: ";
my $index_file = <STDIN>;
chomp $index_file; # Chomp Input!
open my $input_fh, "<", $input_file;
my %hash;
while ( my $line = <$input_fh> ) {
chomp $line;
#
# Using split to find the item to index on
#
my ($index) = split /\s+/, $line;
$hash{$index} = $line;
}
close $input_fh;
open my $index_fh, "<", $index_file;
while ( my $index = <$index_fh> ) {
chomp $index;
#
# Now index can look up lines
#
if( exists $hash{$index} ) {
say qq(Index: $index Line: "$hash{$index}");
}
else {
say qq(Index "$index" doesn't exist in file.);
}
}

#!/usr/bin/perl
use strict;
use warnings;
use autodie;
#ARGV = 'main_file';
open(my $fh_idx, '<', 'index_file');
chomp(my #idx = <$fh_idx>);
close($fh_idx);
while (defined(my $r = <>)) {
print $r if grep { $r =~ /^[ \t]*$_/ } #idx;
}
You may wish to replace those hardcoded file names for <STDIN>.
FYI: The defined call inside a while condition might be "optional".

Related

Perl: Read columns and convert to array

I am new to perl, trying to read a file with columns and creating an array.
I am having a file with following columns.
file.txt
A 15
A 20
A 33
B 20
B 45
C 32
C 78
I wanted to create an array for each unique item present in A with its values assigned from second column.
eg:
#A = (15,20,33)
#B = (20,45)
#C = (32,78)
Tried following code, only for printing 2 columns
use strict;
use warnings;
my $filename = $ARGV[0];
open(FILE, $filename) or die "Could not open file '$filename' $!";
my %seen;
while (<FILE>)
{
chomp;
my $line = $_;
my #elements = split (" ", $line);
my $row_name = join "\t", #elements[0,1];
print $row_name . "\n" if ! $seen{$row_name}++;
}
close FILE;
Thanks

Firstly some general Perl advice. These days, we like to use lexical variables as filehandles and pass three arguments to open().
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
And then...
while (<$fh>) { ... }
But, given that you have your filename in $ARGV[0], another tip is to use an empty file input operator (<>) which will return data from the files named in #ARGV without you having to open them. So you can remove your open() line completely and replace the while with:
while (<>) { ... }
Second piece of advice - don't store this data in individual arrays. Far better to store it in a more complex data structure. I'd suggest a hash where the key is the letter and the value is an array containing all of the numbers matching that letter. This is surprisingly easy to build:
use strict;
use warnings;
use feature 'say';
my %data; # I'd give this a better name if I knew what your data was
while (<>) {
chomp;
my ($letter, $number) = split; # splits $_ on whitespace by default
push #{ $data{$letter} }, $number;
}
# Walk the hash to see what we've got
for (sort keys %data) {
say "$_ : #{ $data{$_ } }";
}

Change the loop to be something like:
while (my $line = <FILE>)
{
chomp($line);
my #elements = split (" ", $line);
push(#{$seen{$elements[0]}}, $elements[1]);
}
This will create/append a list of each item as it is found, and result in a hash where the keys are the left items, and the values are lists of the right items. You can then process or reassign the values as you wish.

Best way to keep track of previous and following line in perl

What is the best/right way, in perl, of keeping the information from the previous and/or following line. For example, with this code:
while (<IN>) {
print;
}
how can it be changed to not print the line only if the previous or the next line in the file match foo, but printing otherwise?
Could you give code examples. Thanks.

Updated: Simplified exposition.
Basically, you need to keep track of two extra lines if you want to print the current lines based on information contained in two other lines. Here is a simple script with everything hard-coded:
#!/usr/bin/env perl
use strict;
use warnings;
my $prev = undef;
my $candidate = scalar <DATA>;
while (defined $candidate) {
my $next = <DATA>;
unless (
(defined($prev) && ($prev =~ /foo/)) ||
(defined($next) && ($next =~ /foo/))
) {
print $candidate;
}
($prev, $candidate) = ($candidate, $next);
}
__DATA__
1
2
foo
3
4
5
foo
6
foo
7
8
9
foo
We can generalize this to a function that takes a filehandle and a test (as a subroutine reference):
#!/usr/bin/env perl
use strict; use warnings;
print_mid_if(\*DATA, sub{ return !(
(defined($_[0]) && ($_[0] =~ /foo/)) ||
(defined($_[1]) && ($_[1] =~ /foo/))
)} );
sub print_mid_if {
my $fh = shift;
my $test = shift;
my $prev = undef;
my $candidate = scalar <$fh>;
while (defined $candidate) {
my $next = <$fh>;
print $candidate if $test->($prev, $next);
($prev, $candidate) = ($candidate, $next);
}
}
__DATA__
1
2
foo
3
4
5
foo
6
foo
7
8
9
foo

You could read your line into an array, and then if you get something that signals you in some way, pop out the last few elements of the array. Once you've finished reading everything in, you could print it:
use strict;
use warnings;
use feature qw(say);
use autodie; #Won't catch attempt to read from an empty file
use constant FILE_NAME => "some_name.txt"
or die qq(Cannot open ) . FILE_NAME . qq(for reading: $!\n);
open my $fh, "<", FILE_NAME;
my #output;
LINE:
while ( my $line = <DATA> ) {
chomp $line;
if ( $line eq "foo" ) {
pop #output; #The line before foo
<DATA>; #The line after foo
next LINE; #Skip line foo. Don't push it into the array
}
push #output, $line;
}
From there, you can print out the array with the values you don't want printed already taken care of.
for my $line ( #output ) {
say $line;
}
The only problem is that this takes memory. If your file is extremely large, you could run out of memory.
One way to get around this is to use a buffer. You store your values in an array, and shift out the last value when you push another in the array. If the value read in is foo, you can reset the array. In this case, the buffer will contain at most one line:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw(say);
my #buffer;
LINE:
while ( my $line = <DATA> ) {
chomp $line;
if ( $line eq "foo" ) {
#buffer = (); #Empty buffer of previous line
<DATA>; #Get rid of the next line
next LINE; #Foo doesn't get pushed into the buffer
}
push #buffer, $line;
if ( #buffer > 1 ) { #Buffer is "full"
say shift #buffer; #Print out previous line
}
}
#
# Empty out buffer
#
for my $line ( #buffer ) {
say $line;
}
__DATA__
2
3
4
5
6
7
8
9
10
11
12
13
1
2
foo
3
4
5
foo
6
7
8
9
foo
Note that it is very possible that I might attempt to read from an empty file when I skip the next line. This is okay. The <$fh> will return either an empty string or undef, but I can ignore that. I'll catch the error when I go back to the top of my loop.

I didn't see that you had any specific criteria for "best", so I'll give you a solution that may be "best" along a different axis than those presented so far. You could use Tie::File and treat the entire file as an array, then iterate the array using an index. The previous and next lines are just $index-1 and $index+1 respectively. You just have to worry a little about your indices going beyond the bounds of your array. Here's an example:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010; # just for "say"
use Tie::File;
tie my #array, 'Tie::File', "filename" or die;
for my $i (0..$#array) {
if ($i > 0 && $i < $#array) { # ensure $i-1 and $i+1 make sense
next if $array[$i-1] =~ /BEFORE/ &&
$array[$i+1] =~ /AFTER/;
}
say $array[$i];
}
If it's more convenient, you can specify a filehandle instead of a filename and Tie::File also has some parameters to control memory usage or change what it means to be a "line" if you want that. Check the docs for more info.
Anyway, that's another way to do what you want that might be conceptually simpler if you are familiar with arrays and like to think in terms of arrays.

I would read the file into an array, with each line being an array element, then you can do the comparisons. The only real design consideration is the size of the file being read into memory.

Printing array in Perl

I currently have my Perl script to read fstab files, split them up by column and search for which word in each column is the longest to display it. All that works peachy (I think), the problem I'm having is that it keeps printing out the same length for every line which is not true. Example $dev_parts prints 24, and $labe_parts prints 24 and so on...
below is my code.
#!/usr/bin/perl
use strict;
print "Enter file name: \n";
my $file_name = <STDIN>;
open(IN, "$file_name");
my #parts = split( /\s+/, $file_name);
foreach my $usr_file (<IN>) {
chomp($usr_file);
#parts = split( /\s+/, $usr_file);
push(#dev, $parts[0]);
push(#label, $parts[1]);
push(#tmpfs, $parts[2]);
push(#devpts, $parts[3]);
push(#sysfs, $parts[4]);
push(#proc, $parts[5]);
}
foreach $dev_parts (#dev) {
$dev_length1 = length ($parts[$dev_parts]);
if ( $dev_length1 > $dev_length2) {
$dev_length2 = $dev_length1;
}
}
print "The longest word in the first line is: $dev_length2 \n";
foreach $label_parts (#label) {
$label_length1 = length($parts[$label_parts]);
if ($label_length1 > $label_length2) {
$label_length2 = $label_length1;
}
}
print "The longest word in the first line is: $label_length2 \n";

This is how your code should be
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
print "Enter file name: \n";
my $file_name = <STDIN>;
chomp($file_name);
open(FILE, "$file_name") or die $!;
my %colhash;
while (<FILE>) {
my $col=0;
my #parts = split /\s+/;
map { my $len = length($_);
$col++;
if($colhash{$col} < $len ){
$colhash{$col} = $len; # store the longest word length for each column
}
} #parts;
}
print Dumper(\%colhash);

You have a mistake here:
foreach $dev_parts (#dev) {
$dev_length1 = length ($parts[$dev_parts]);
As I understand it, you are looking for the longest element in #dev. However, you take the length of an element from the #parts array. This array is always set to whatever the last line of the file is. So you are looking at each element in the last line of the file, rather than each element of the appropriate column.
You just need to take length($dev_parts) instead.
Incidentally, here is a simpler way to find the longest length in an array:
use List::Util qw/max/; #Core module, always available.
my $longest_dev = max map {length} #dev;
A few other comments on your code:
use strict; is good. You should also use warnings;. It will help
you catch silly mistakes in your code.
You ought to check for errors whenever you open a file:
open(IN, $file_name) or die "Failed to open $file_name: $!";
Better yet, use the preferred open syntax with a lexical filehandle:
open(my $in_file, '<', $file_name) or die "Failed to open $file_name: $!";
...
while (<$in_file>) {
I'm not sure what you are trying to do here:
my #parts = split( /\s+/, $file_name);
You are splitting the file name by white space, but you don't use that for anything. And then you re-use the same array to hold the lines later.
A while loop is preferred to foreach when you go through lines of a file. It saves memory because it doesn't read the whole file into memory first (and it is otherwise exactly the same).
while (my $usr_file = <IN>) {

Displaying duplicate records

I've a code as below to parse a text file. Display all words after "Enter:" keyword on all lines of the text file. I'm getting displayed all words after "Enter:" keyword, but i wan't duplicated should not be repeated but its repeating. Please guide me as to wht is wrong in my code.
#! /usr/bin/perl
use strict;
use warnings;
$infile "xyz.txt";
open (FILE, $infile) or die ("can't open file:$!");
if(FILE =~ /ENTER/ ){
#functions = substr($infile, index($infile, 'Enter:'));
#functions =~/#functions//;
%seen=();
#unique = grep { ! $seen{$_} ++ } #array;
while (#unique != ''){
print '#unique\n';
}
}
close (FILE);

Here is a way to do the job, it prints unique words found on each line that begins with the keyword Enter:
#!/usr/bin/perl
use strict;
use warnings;
my $infile = "xyz.txt";
# use 3 arg open with lexical file handler
open my $fh, '<', $infile or die "unable to open '$infile' for reading: $!";
# loop thru all lines
while(my $line = <$fh) {
# remove linefeed;
chomp($line);
# if the line begins with "Enter:"
# remove the keyword "Enter:"
if ($line =~ s/^Enter:\s+//) {
# split the line on whitespaces
# and populate the array with all words found
my #words = split(/\s+/, $line);
# create a hash where the keys are the words found
my %seen = map { $_ => 1 }#words;
# display unique words
print "$_\t" for(keys %seen);
print "\n";
}
}

If I understand you correctly, one problem is that your 'grep' only counts the occurrences of each word. I think you want to use 'map' so that '#unique' only contains the unique words from '#array'. Something like this:
#unique = map {
if (exists($seen{$_})) {
();
} else {
$seen{$_}++; $_;
}
} #array;

How can I grep and sort text files using Perl?

I have a simple log file which is very messy and I need it to be neat. The file contains log headers, but they are all jumbled up together. Therefore I need to sort the log files according to the log headers. There are no static number of lines - that means that there is no fixed number of lines for the each header of the text file. And I am using perl grep to sort out the headers.
The Log files goes something like this:
Car LogFile Header
<text>
<text>
<text>
Car LogFile Header
<text>
Car LogFile Header
<and so forth>
I have done up/searched a simple algorithm but it does not seem to be working. Can someone please guide me? Thanks!
#!/usr/bin/perl
#use 5.010; # must be present to import the new 5.10 functions, notice
#that it is 5.010 not 5.10
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(FH, $srce);
my #buf = <FH>;
close(FH);
my #lines = grep (/$string1/, #buffer);
After executing the code, there is no result shown at the terminal. Any ideas?

I think you want something like:
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open my $fh, '<', $srce or die "Could not open $srce: $!";
my #lines = sort grep /\Q$string1/, <$fh>;
print #lines;
Make sure you have the right file path and that the file has lines that match your test pattern.
It seems like you are missing a lot of very basic concepts and maybe cutting and paste code you see elsewhere. If you're just starting out, pick up a Perl tutorial such as Learning Perl. There are other books and references listed in perlfaq2.

Always use:
use strict;
use warnings;
This would have told you that #buffer is not defined.
#!/usr/bin/perl
use strict;
use warnings;
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(my $FH, $srce) or die "Failed to open file $srce ($!)";
my #buf = <$FH>;
close($FH);
my #lines = grep (/$string1/, #buf);
print #lines;
Perl is tricky for experts, so experts use the warnings it provides to protect them from making mistakes. Beginners need to use the warnings so they don't make mistakes they don't even know they can make.
(Because you didn't get a chance to chomp the input lines, you still have newlines at the end so the print prints the headings one per line.)

I don't think grep is what you want really.
As you pointed out in brian's answer, the grep will only give you the headers and not the subsequent lines.
I think you need an array where each element is the header and the subsequent lines up to the next header.
Something like: -
#!/usr/bin/perl
use strict;
use warnings;
my $srce = "./default.log";
my $string1 = "Car LogFile Header";
my #logs;
my $log_entry;
open(my $FH, $srce) or die "Failed to open file $srce ($!)";
my $found = 0;
while(my $buf = <$FH>)
{
if($buf =~ /$string1/)
{
if($found)
{
push #logs, $log_entry;
}
$found = 1;
$log_entry = $buf;
}
else
{
$log_entry = $log_entry . $buf;
}
}
if($found)
{
push #logs, $log_entry;
}
close($FH);
print sort #logs;
i think it's what is being asked for.

Perl grep is not same as Unix grep command in that it does not print anything on the screen.
The general syntax is: grep Expr, LIST
Evaluates Expr for each element of LIST and returns a list consisting of those elements for which the expression evaluated to true.
In your case all the #buffer elements which have the vale of $string1 will be returned.
You can then print the #buffer array to actually see them.

You just stored everything in an array instead of printing it out. It's also not necessary to keep the whole file in memory. You can read and print the match results line by line, like this:
my $srce = "./root/Desktop/logs/Default.log";
my $string1 = "Car LogFile Header";
open(FH, $srce);
while(my $line = <FH>) {
if($line =~ m/$string1/) {
print $line;
}
}
close FH;

Hello I found a way to extract links from html file
!/usr/bin/perl -w
2
3 # Links graber 1.0
2
3 # Links graber 1.0
4 #Author : peacengell
5 #28.02.13
6
7 ####
8
9 my $file_links = "links.txt";
10 my #line;
11 my $line;
12
13
14 open( FILE, $file_links ) or die "Can't find File";
15
16 while (<FILE>) {
17 chomp;
18 $line = $_ ;
19
20 #word = split (/\s+/, $line);
21 #word = grep(/href/, #word);
22 foreach $x (#word) {
23
24 if ( $x =~ m /ul.to/ ){
25 $x=~ s/href="//g;
26 $x=~s/"//g;
27 print "$x \n";
28
29
30 }
31
32 }
33
34 }
you can use it and modify it please let me know if you modify it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to find lines containing a match between two files in perl? - perl

Related

Perl: Read columns and convert to array

Best way to keep track of previous and following line in perl

Printing array in Perl

Displaying duplicate records

How can I grep and sort text files using Perl?

Categories

Resources