perl code for comparing file contents - perl

I'm a newbie in perl scripting. I have 2 files. I want to compare contents line by line and delete the matching ones. if i use a wild card in 1 file to match multiple lines in second file, it should delete multiple matches and write the rest to another file. I got a bit from another mail it does not take care of wild cards
use strict;
use warnings;
$\="\n";
open my $FILE, "<", "file.txt" or die "Can't open file.txt: $!";
my %Set = map {$_ => undef} <$FILE>;
open my $FORBIDDEN, "<", "forbidden.txt" or die "Can't open forbidden.txt: $!";
my %Forbidden = map {$_ => undef} <$FORBIDDEN>;
open my $OUT, '>', 'output' or die $!;
my %Result = %Set; # make a copy
delete $Result{$_} for keys %Forbidden;
print $OUT keys %Result

I'm not sure what you mean with "wild card".
Nevertheless there are many ways to do what you want. Since it's prettier to use some existing modules you can use the List::Compare module available at cpan.
With the following code you use this module to store all the lines contained in the one file (file.txt) but not in the other file (forbidden.txt). So you implicitly match the lines which are equal. This code doesn't delete them from the file, but find them.
Your code would look like:
use strict;
use warnings;
use File::Slurp qw(read_file); #cpan-module
use List::Compare; #cpan-module
chomp( my #a_file = read_file 'file.txt' );
chomp( my #b_file = read_file 'forbidden.txt' );
#here it stores all the lines contained in the 'file.txt'
#but not in the 'forbidden.txt' in an array
my #a_file_only = List::Compare->new( \#a_file, \#b_file )->get_Lonly;
print "$_\n" for #a_file_only;
#here you could write these lines in a new file to store them.
#At this point I just print them out.
the new approach:
foreach my $filter (#b_file){
#a_file = grep{ /${filter}/} #a_file;
}
print Dumper(#a_file);
It will reduce the lines in the #a_file step by step by using each filter.

Related

print specific INFILE area using perl

I have a file with the format below
locale,English,en_AU,6251
locale,French,fr_BE,25477
charmap,English,EN,5423
And I would like to use perl to print out something with the option "-a" follows by the file and outputs something like
Available locales:
en_Au
fr_BE
EN
To do that, I have the perl script below
$o = $ARGV[0];
$f = $ARGV[1];
open (INFILE, "<$f") or die "error";
my $line = <INFILE>;
my #fields = split(',', $line);
if($o eq "-a"){
if(!$fields[2]){print "No locales available\n";}
else{print "Available locales: \n";
while($fields[2]){print "$fields[2]\n";}
}
}
close(INFILE);
And I have three questions here.
1. my script will only print the first locale "en_Au" forever.
2. it should be able to test if a file is empty, but if a file is purely empty, it outputs nothing, but if I type in two empty lines in the file, it prints two lines of "No locales available" instead.
3.In fact in the (!$filed[2]) part I should verify if the file is empty or no available locales exist, if so do I need to put some regular expression here to verify if it is a locale as well??
Hope someone could help me figure these out! Many thanks!!!
The biggest missing thing is a loop over lines from the file, in which you then process one line at a time. Comments follow the code.
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
#my ($opt, $file) = #ARGV; # better use a module
my ($opt, $file);
Getoptions( 'a' => \$opt, 'file=s' => \$file ) or usage();
usage() if not $file; # mandatory argument
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = <$fh>) {
chomp $line;
my #fields = split /,/, $line;
next if not $fields[2];
if ($opt) {
say $fields[2];
}
}
close $fh;
sub usage {
say STDERR "Usage: $0 [-a] --file filename";
exit 1;
}
This prints the desired output. (Is that simple condition on $fields[2] really all you need?)
Comments
Always have use warnings; and use strict; at the beginning
I do not recommend single-letter variable names. One forgets what they mean, it makes the code harder to follow, and it's way too easy to make silly mistakes
The #ARGV can be assigned to variables in a list. Much better, use Getopt::Long module, which checks invocation and allows for far easier interface changes. I set the -a option to act as a "flag," so it just sets a variable ($opt) if it's given. If that should have possible values instead, use 'a=s' => \$opt and check for a value.
Use lexical filehandles and the three-argument open, open my $fh, '<', $file ...
When die-ing print the error, die "... $!";, using $! variable
The "diamond" (angle) operator, <$fh>, reads one line from a file opened with $fh when used in scalar context, as in $line = <$fh>. It advances a pointer in the file as it reads a line so the next time it's used it returns the next line. If you use it in list context then it returns all lines, but when you process a file you normally want to go line by line.
Some of the described logic and requirements aren't clear to me, but hopefully the code above is going to be easier to adjust as needed.

File not getting copied in perl

File "/root/actual" is not getting over written with content of "/root/temp" via perl script. If manually edited "/root/actual" is getting modified.
copy("/root/actual","/root/temp") or die "Copy failed: $!";
open(FILE, "</root/temp") || die "File not found";
my #lines = <FILE>;
close(FILE);
my #newlines;
foreach(#lines) {
$_ =~ s/$aref1[0]/$profile_name/;
push(#newlines,$_);
}
open(FILE, ">/root/actual") || die "File not found";
print FILE #newlines;
close(FILE);
File "/root/actual" is not getting over written with content of "/root/temp" via perl script. If manually edited "/root/actual" is getting modified.
Do you mean that /root/temp isn't being replaced by /root/actual? Or is /root/temp being modified as you wish, but it's not copying over /root/acutual at the end of your program?
I suggest that you read up on modern Perl programming practices. You need to have use warnings; and use strict; in your program. In fact, many people on this forum won't bother answering Perl questions unless use strict; and use warnings; are used.
Where is $aref1[0] coming from? I don't see #aref1 declared anywhere in your program. Or, for that matter $profile_name.
If you're reading in the entire file into a regular expression, there's no reason to copy it over to a temporary file first.
I rewrote what you had in a more modern syntax:
use strict;
use warnings;
use autodie;
use constant {
FILE_NAME => 'test.txt',
};
my $profile_name = "bar"; #Taking a guess
my #aref1 = qw(foo ??? ??? ???); #Taking a guess
open my $input_fh, "<", FILE_NAME;
my #lines = <$input_fh>;
close $input_fh;
for my $line ( #lines ) {
$line =~ s/$aref1[0]/$profile_name/;
}
open my $output_fh, ">", FILE_NAME;
print ${output_fh} #lines;
close $output_fh;
This works.
Notes:
use autodie; means you don't have to check whether files opened.
When I use a for loop, I can do inplace replacing in an array. Each item is a pointer to that entry in the array.
No need for copy or a temporary file since you're replacing the original file anyway.
I didn't use it here since you didn't, but map { s/$aref1[0]/$profile_name/ } #lines; can replace that for loop. See map.

Process files by extension instead of individually

I have multiple files that have the extension .tdx.
Currently my program works on individual files using $ARGV[0], however the number of files are growing and I would like to use a wildcard based upon the file extension.
After much research I am at a loss.
I would like to read each file individually so the extract from the file is identified by the user.
#!C:\Perl\bin\perl.exe
use warnings;
use FileHandle;
open my $F_IN, '<', $ARGV[0] or die "Unable to open file: $!\n";
open my $F_OUT, '>', 'output.txt' or die "Unable to open file: $!\n";
while (my $line = $F_IN->getline) {
if ($line =~ /^User/) {
$F_OUT->print($line);
}
if ($line =~ /--FTP/) {
$F_OUT->print($line);
}
if ($line =~ /^ftp:/) {
$F_OUT->print($line);
}
}
close $F_IN;
close $F_OUT;
All the files are in one directory, so I assume I will need to open the directory.
I am just not sure how if I need to build an array of files or build a list and chomp it.
You have many options --
Loop over #ARGV, allowing the user to pass in a list of files
Use glob to pass in a pattern that perl will expand into a list of files (and then loop over that list, as in #1). This can be messy as they have to make sure to quote it so the shell doesn't interpolate it first.
Write some wrapper to call your existing script over and over again.
There's also a variant of the first one, which is to read from <>. This is set to either STDIN, or it'll automatically open the files named in #ARGV. See eof for an example of how to use it.
As an variant of #2, you can pass in a directory name, and use either opendir and readdir to loop over the list (making sure to grab only files with your extension, or at the very least ignore . and ..) or append /* or /*.tdx to it and use glob again.
The glob function can help you. Just try
my #files = glob '*.tdx';
for my $file (#files) {
# Process $file...
}
In list context, glob expands its argument to the list of file names that match the pattern. For details, see glob in perlfunc.
I never got glob to work. What I ended up doing was building an array based on the file extension .tdx. from there I copied the array to a filelist and read from that. What I ended up with is:
#!C:\Perl\bin\perl.exe
use warnings;
use FileHandle;
open my $F_OUT, '>', 'output.txt' or die "Unable to open file: $!\n";
open(FILELIST, "dir /b /s \"%USERPROFILE%\\Documents\\holding\\*.tdx\" |");
#filelist=<FILELIST>;
close(FILELIST);
foreach $file (#filelist)
{
chomp($file);
open my $F_IN, '<', $file or die "Unable to open file: $!\n";
while (my $line = $F_IN->getline)
{
Doing Something
}
close $F_IN;
}
close $F_OUT;
Thank you for your answers they helped in the learning experaince.
If you're on a Windows machine, putting in *.tdx on the command line might not work, nor may glob which historically used the shell's globbing abilities. (It now appears that the built in glob function now uses File::Glob, so that may no longer be an issue).
One thing you can do is not use globs, but allow the user to input the directories and suffixes they want. Then use opendir and readdir to go through the directories yourself.
use strict;
use warnings;
use feature qw(say);
use autodie;
use Getopt::Long; # Why not do it right?
use Pod::Usage; # It's about time to learn about POD documentation
my #suffixes; # Hey, why not let people put in more than one suffix?
my #directories; # Let people put in the directories they want to check
my $help;
GetOptions (
"suffix=s" => \#suffixes,
"directory=s" => \#directories,
"help" => \$help,
) or pod2usage ( -message => "Invalid usage" );
if ( not #suffixes ) {
#suffixes = qw(tdx);
}
if ( not #directories ) {
#directories = qw(.);
}
if ( $help ) {
pod2usage;
}
my $regex = join, "|", #suffixes;
$regex = "\.($regex)$"; # Will equal /\.(foo|bar|txt)$/ if Suffixes are foo, bar, txt
for my $directory ( #directories ) {
opendir my ($dir_fh), $directory; # Autodie will take care of this:
while ( my $file = readdir $dir_fh ) {
next unless -f $file;
next unless $file =~ /$regex/;
... Here be dragons ...
}
}
This will go through all of the directories your user input and then examines each entry. It uses the suffixes your user inputs (With .tdx being the default) to create a regular expression to check against the file name. If the file name matches the regular expression, do whatever you wanted to do with that file.

Read newline delimited file in Perl

I am trying to read a newline-delimited file into an array in Perl. I do NOT want the newlines to be part of the array, because the elements are filenames to read later. That is, each element should be "foo" and not "foo\n". I have done this successfully in the past using the methods advocated in Stack Overflow question Read a file into an array using Perl and Newline Delimited Input.
My code is:
open(IN, "< test") or die ("Couldn't open");
#arr = <IN>;
print("$arr[0] $arr[1]")
And my file 'test' is:
a
b
c
d
e
My expected output would be:
a b
My actual output is:
a
b
I really don't see what I'm doing wrong. How do I read these files into arrays?
Here is how I generically read from files.
open (my $in, "<", "test") or die $!;
my #arr;
while (my $line = <$in>) {
chomp $line;
push #arr, $line;
}
close ($in);
chomp will remove newlines from the line read. You should also use the three-argument version of open.
Put the file path in its own variable so that it can be easily
changed.
Use the 3-argument open.
Test all opens, prints, and closes for success, and if not, print the error and the file name.
Try:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# put file path in a variable so it can be easily changed
my $file = 'test';
open my $in_fh, '<', $file or die "could not open $file: $OS_ERROR\n";
chomp( my #arr = <$in_fh> );
close $in_fh or die "could not close $file: $OS_ERROR\n";
print "#arr[ 0 .. 1 ]\n";
A less verbose option is to use File::Slurp::read_file
my $array_ref = read_file 'test', chomp => 1, array_ref => 1;
if, and only if, you need to save the list of file names anyway.
Otherwise,
my $filename = 'test';
open (my $fh, "<", $filename) or die "Cannot open '$filename': $!";
while (my $next_file = <$fh>) {
chomp $next_file;
do_something($next_file);
}
close ($fh);
would save memory by not having to keep the list of files around.
Also, you might be better off using $next_file =~ s/\s+\z// rather than chomp unless your use case really requires allowing trailing whitespace in file names.

Getting unique random line (at each script run) from an text file with perl

Having an text file like the next one called "input.txt"
some field1a | field1b | field1c
...another approx 1000 lines....
fielaNa | field Nb | field Nc
I can choose any field delimiter.
Need a script, what at every discrete run will get one unique (never repeated) random line from this file, until used all lines.
My solution: I added one column into a file, so have
0|some field1a | field1b | field1c
...another approx 1000 lines....
0|fielaNa | field Nb | field Nc
and processing it with the next code:
use 5.014;
use warnings;
use utf8;
use List::Util;
use open qw(:std :utf8);
my $file = "./input.txt";
#read all lines into array and shuffle them
open(my $fh, "<:utf8", $file);
my #lines = List::Util::shuffle map { chomp $_; $_ } <$fh>;
close $fh;
#search for the 1st line what has 0 at the start
#change the 0 to 1
#and rewrite the whole file
my $random_line;
for(my $i=0; $i<=$#lines; $i++) {
if( $lines[$i] =~ /^0/ ) {
$random_line = $lines[$i];
$lines[$i] =~ s/^0/1/;
open($fh, ">:utf8", $file);
print $fh join("\n", #lines);
close $fh;
last;
}
}
$random_line = "1|NO|more|lines" unless( $random_line =~ /\w/ );
do_something_with_the_fields(split /\|/, $random_line))
exit;
It is an working solution, but not very nice one, because:
the line order is changing at each script run
not concurrent script-run safe.
How to write it more effective and more elegantly?
What about keeping a shuffled list of the line numbers in a different file, removing the first one each time you use it? Some locking might be needed to asure concurent script-run safety.
From perlfaq5.
How do I select a random line from a file?
Short of loading the file into a database or pre-indexing the lines in
the file, there are a couple of things that you can do.
Here's a reservoir-sampling algorithm from the Camel Book:
srand;
rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading the whole file
in. You can find a proof of this method in The Art of Computer
Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
You can use the File::Random module which provides a function for that
algorithm:
use File::Random qw/random_line/;
my $line = random_line($filename);
Another way is to use the Tie::File module, which treats the entire
file as an array. Simply access a random array element.
All Perl programmers should take the time to read the FAQ.
Update: To get a unique random line each time you're going to have to store state. The easiest way to store the state is to remove the lines that you've used from the file.
This program uses the Tie::File module to open your input.txt file as well as an indices.txt file.
If indices.txt is empty then it is initialised with the indices of all the records in input.txt in a shuffled order.
Each run, the index at the end of the list is removed and the corresponding input record displayed.
use strict;
use warnings;
use Tie::File;
use List::Util 'shuffle';
tie my #input, 'Tie::File', 'input.txt'
or die qq(Unable to open "input.txt": $!);
tie my #indices, 'Tie::File', 'indices.txt'
or die qq(Unable to open "indices.txt": $!);
#indices = shuffle(0..$#input) unless #indices;
my $index = pop #indices;
print $input[$index];
Update
I have modified this solution so that it populates a new indices.txt file only if it doesn't already exist and not, as before, simply when it is empty. That means a new sequence of records can be printed simply by deleting the indices.txt file.
use strict;
use warnings;
use Tie::File;
use List::Util 'shuffle';
my ($input_file, $indices_file) = qw( input.txt indices.txt );
tie my #input, 'Tie::File', $input_file
or die qq(Unable to open "$input_file": $!);
my $first_run = not -f $indices_file;
tie my #indices, 'Tie::File', $indices_file
or die qq(Unable to open "$indices_file": $!);
#indices = shuffle(0..$#input) if $first_run;
#indices or die "All records have been displayed";
my $index = pop #indices;
print $input[$index];