print specific INFILE area using perl - perl

I have a file with the format below
locale,English,en_AU,6251
locale,French,fr_BE,25477
charmap,English,EN,5423
And I would like to use perl to print out something with the option "-a" follows by the file and outputs something like
Available locales:
en_Au
fr_BE
EN
To do that, I have the perl script below
$o = $ARGV[0];
$f = $ARGV[1];
open (INFILE, "<$f") or die "error";
my $line = <INFILE>;
my #fields = split(',', $line);
if($o eq "-a"){
if(!$fields[2]){print "No locales available\n";}
else{print "Available locales: \n";
while($fields[2]){print "$fields[2]\n";}
}
}
close(INFILE);
And I have three questions here.
1. my script will only print the first locale "en_Au" forever.
2. it should be able to test if a file is empty, but if a file is purely empty, it outputs nothing, but if I type in two empty lines in the file, it prints two lines of "No locales available" instead.
3.In fact in the (!$filed[2]) part I should verify if the file is empty or no available locales exist, if so do I need to put some regular expression here to verify if it is a locale as well??
Hope someone could help me figure these out! Many thanks!!!

The biggest missing thing is a loop over lines from the file, in which you then process one line at a time. Comments follow the code.
use warnings;
use strict;
use feature 'say';
use Getopt::Long;
#my ($opt, $file) = #ARGV; # better use a module
my ($opt, $file);
Getoptions( 'a' => \$opt, 'file=s' => \$file ) or usage();
usage() if not $file; # mandatory argument
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $line = <$fh>) {
chomp $line;
my #fields = split /,/, $line;
next if not $fields[2];
if ($opt) {
say $fields[2];
}
}
close $fh;
sub usage {
say STDERR "Usage: $0 [-a] --file filename";
exit 1;
}
This prints the desired output. (Is that simple condition on $fields[2] really all you need?)
Comments
Always have use warnings; and use strict; at the beginning
I do not recommend single-letter variable names. One forgets what they mean, it makes the code harder to follow, and it's way too easy to make silly mistakes
The #ARGV can be assigned to variables in a list. Much better, use Getopt::Long module, which checks invocation and allows for far easier interface changes. I set the -a option to act as a "flag," so it just sets a variable ($opt) if it's given. If that should have possible values instead, use 'a=s' => \$opt and check for a value.
Use lexical filehandles and the three-argument open, open my $fh, '<', $file ...
When die-ing print the error, die "... $!";, using $! variable
The "diamond" (angle) operator, <$fh>, reads one line from a file opened with $fh when used in scalar context, as in $line = <$fh>. It advances a pointer in the file as it reads a line so the next time it's used it returns the next line. If you use it in list context then it returns all lines, but when you process a file you normally want to go line by line.
Some of the described logic and requirements aren't clear to me, but hopefully the code above is going to be easier to adjust as needed.

Related

Perl print to seperate files

I have a text file which lists a service, device and a filter, here I list 3 examples only:
service1 device04 filter9
service2 device01 filter2
service2 device10 filter11
I have written a perl script that iterates through the file and should then print device=device filter=filter to a file named according to the service it belongs to, but if a string contains a duplicate filter, it should add the devices to the same file, seperated by semicolons. Looking at the above example, I then need a result of:
service1.txt
device=device04 filter=filter9
service2.txt
device=device01 filter=filter2 ; device=device10 filter=filter11
Here is my code:
use strict;
use warnings qw(all);
open INPUT, "<", "file.txt" or die $!;
my #Input = <INPUT>;
foreach my $item(#Input) {
my ($serv, $device, $filter) = split(/ /, $item);
chomp ($serv, $device, $filter);
push my #arr, "device==$device & filter==$filter";
open OUTPUT, ">>", "$serv.txt" or die $!;
print OUTPUT join(" ; ", #arr);
close OUTPUT;
}
The problem I am having is that both service1.txt and service2.txt are created, but my results are all wrong, see my current result:
service1.txt
device==device04 filter==filter9
service2.txt
device==device04 filter==filter9 ; device==device01 filter==filter2device==device04 filter==filter9 ; device==device01 filter==filter2 ; device==device10 filter==filter11
I apologise, I know this is something stupid, but it has been a really long night and my brain cannot function properly I believe.
For each service to have its own file where data for it accumulates you need to distinguish for each line what file to print it to.
Then open a new service-file when a service without one is encountered, feasible since there aren't so many as clarified in a comment. This can be organized by a hash service => filehandle.
use warnings;
use strict;
use feature 'say';
my $file = shift #ARGV || 'data.txt';
my %handle;
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>) {
my ($serv, $device, $filter) = split;
if (exists $handle{$serv}) {
print { $handle{$serv} } " ; device==$device & filter==$filter";
}
else {
open my $fh_out, '>', "$serv.txt" or do {
warn "Can't open $serv.txt: $!";
next;
};
print $fh_out "device==$device & filter==$filter";
$handle{$serv} = $fh_out;
}
}
say $_ '' for values %handle; # terminate the line in each file
close $_ for values %handle;
For clarity the code prints almost the same in both cases, what surely can be made cleaner. This was tested only with the provided sample data and produces the desired output.
Note that when a filehandle need be evaluated we need { }. See this post, for example.
Comments on the original code (addressed in the code above)
Use lexical filehandles (my $fh) instead of typeglobs (FH)
Don't read the whole file at once unless there is a specific reason for that
split has nice defaults, split ' ', $_, where ' ' splits on whitespace and discards leading and trailing space as well. (And then there is no need to chomp in this case.)
Another option is to first collect data for each service, just as OP attempts, but again use a hash (service => arrayref/string with data) and print at the end. But I don't see a reason to not print as you go, since you'd need the same logic to decide when ; need be added.
Your code looks pretty perl4-ish, but that's not a problem. As MrTux has pointed out, you are confusing collection and fanning out of your data. I have refactored this to use a hash as intermediate container with the service name as keys. Please note that this will not accumulate results across mutliple calls (as it uses ">" and not ">>").
use strict;
use warnings qw(all);
use File::Slurp qw/read_file/;
my #Input = read_file('file.txt', chomp => 1);
my %store = (); # Global container
# Capture
foreach my $item(#Input) {
my ($serv, $device, $filter) = split(/ /, $item);
push #{$store{$serv}}, "device==$device & filter==$filter";
}
# Write out for each service file
foreach my $k(keys %store) {
open(my $OUTPUT, ">", "$k.txt") or die $!;
print $OUTPUT join(" ; ", #{$store{$k}});
close( $OUTPUT );
}

File not getting copied in perl

File "/root/actual" is not getting over written with content of "/root/temp" via perl script. If manually edited "/root/actual" is getting modified.
copy("/root/actual","/root/temp") or die "Copy failed: $!";
open(FILE, "</root/temp") || die "File not found";
my #lines = <FILE>;
close(FILE);
my #newlines;
foreach(#lines) {
$_ =~ s/$aref1[0]/$profile_name/;
push(#newlines,$_);
}
open(FILE, ">/root/actual") || die "File not found";
print FILE #newlines;
close(FILE);
File "/root/actual" is not getting over written with content of "/root/temp" via perl script. If manually edited "/root/actual" is getting modified.
Do you mean that /root/temp isn't being replaced by /root/actual? Or is /root/temp being modified as you wish, but it's not copying over /root/acutual at the end of your program?
I suggest that you read up on modern Perl programming practices. You need to have use warnings; and use strict; in your program. In fact, many people on this forum won't bother answering Perl questions unless use strict; and use warnings; are used.
Where is $aref1[0] coming from? I don't see #aref1 declared anywhere in your program. Or, for that matter $profile_name.
If you're reading in the entire file into a regular expression, there's no reason to copy it over to a temporary file first.
I rewrote what you had in a more modern syntax:
use strict;
use warnings;
use autodie;
use constant {
FILE_NAME => 'test.txt',
};
my $profile_name = "bar"; #Taking a guess
my #aref1 = qw(foo ??? ??? ???); #Taking a guess
open my $input_fh, "<", FILE_NAME;
my #lines = <$input_fh>;
close $input_fh;
for my $line ( #lines ) {
$line =~ s/$aref1[0]/$profile_name/;
}
open my $output_fh, ">", FILE_NAME;
print ${output_fh} #lines;
close $output_fh;
This works.
Notes:
use autodie; means you don't have to check whether files opened.
When I use a for loop, I can do inplace replacing in an array. Each item is a pointer to that entry in the array.
No need for copy or a temporary file since you're replacing the original file anyway.
I didn't use it here since you didn't, but map { s/$aref1[0]/$profile_name/ } #lines; can replace that for loop. See map.

Read newline delimited file in Perl

I am trying to read a newline-delimited file into an array in Perl. I do NOT want the newlines to be part of the array, because the elements are filenames to read later. That is, each element should be "foo" and not "foo\n". I have done this successfully in the past using the methods advocated in Stack Overflow question Read a file into an array using Perl and Newline Delimited Input.
My code is:
open(IN, "< test") or die ("Couldn't open");
#arr = <IN>;
print("$arr[0] $arr[1]")
And my file 'test' is:
a
b
c
d
e
My expected output would be:
a b
My actual output is:
a
b
I really don't see what I'm doing wrong. How do I read these files into arrays?
Here is how I generically read from files.
open (my $in, "<", "test") or die $!;
my #arr;
while (my $line = <$in>) {
chomp $line;
push #arr, $line;
}
close ($in);
chomp will remove newlines from the line read. You should also use the three-argument version of open.
Put the file path in its own variable so that it can be easily
changed.
Use the 3-argument open.
Test all opens, prints, and closes for success, and if not, print the error and the file name.
Try:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# put file path in a variable so it can be easily changed
my $file = 'test';
open my $in_fh, '<', $file or die "could not open $file: $OS_ERROR\n";
chomp( my #arr = <$in_fh> );
close $in_fh or die "could not close $file: $OS_ERROR\n";
print "#arr[ 0 .. 1 ]\n";
A less verbose option is to use File::Slurp::read_file
my $array_ref = read_file 'test', chomp => 1, array_ref => 1;
if, and only if, you need to save the list of file names anyway.
Otherwise,
my $filename = 'test';
open (my $fh, "<", $filename) or die "Cannot open '$filename': $!";
while (my $next_file = <$fh>) {
chomp $next_file;
do_something($next_file);
}
close ($fh);
would save memory by not having to keep the list of files around.
Also, you might be better off using $next_file =~ s/\s+\z// rather than chomp unless your use case really requires allowing trailing whitespace in file names.

Parsing the large files in Perl

I need to compare the big file(2GB) contains 22 million lines with the another file. its taking more time to process it while using Tie::File.so i have done it through 'while' but problem remains. see my code below...
use strict;
use Tie::File;
# use warnings;
my #arr;
# tie #arr, 'Tie::File', 'title_Nov19.txt';
# open(IT,"<title_Nov19.txt");
# my #arr=<IT>;
# close(IT);
open(RE,">>res.txt");
open(IN,"<input.txt");
while(my $data=<IN>){
chomp($data);
print"$data\n";
my $occ=0;
open(IT,"<title_Nov19.txt");
while(my $line2=<IT>){
my $line=$line2;
chomp($line);
if($line=~m/\b$data\b/is){
$occ++;
}
}
print RE"$data\t$occ\n";
}
close(IT);
close(IN);
close(RE);
so help me to reduce it...
Lots of things wrong with this.
Asides from the usual (lack of use strict, use warnings, use of 2-argument open(), not checking open() result, use of global filehandles), the specific problem in your case is that you are opening/reading/closing the second file once for every single line of the first. This is going to be very slow.
I suggest you open the file title_Nov19.txt once, read all the lines into an array or hash or something, then close it; and then you can open the first file, input.txt and walk along that once, comparing to things in the array so you don't have to reopen that second file all the time.
Futher I suggest you read some basic articles on style/etc.. as your question is likely to gain more attention if it's actually written in vaguely modern standards.
I tried to build a small example script with a better structure but I have to say, man, your problem description is really very unclear. It's important to not read the whole comparison file each time as #LeoNerd explained in his answer. Then I use a hash to keep track of the match count:
#!/usr/bin/env perl
use strict;
use warnings;
# cache all lines of the comparison file
open my $comp_file, '<', 'input.txt' or die "input.txt: $!\n";
chomp (my #comparison = <$comp_file>);
close $comp_file;
# prepare comparison
open my $input, '<', 'title_Nov19.txt' or die "title_Nov19.txt: $!\n";
my %count = ();
# compare each line
while (my $title = <$input>) {
chomp $title;
# iterate comparison strings
foreach my $comp (#comparison) {
$count{$comp}++ if $title =~ /\b$comp\b/i;
}
}
# done
close $input;
# output (sorted by count)
open my $output, '>>', 'res.txt' or die "res.txt: $!\n";
foreach my $comp (#comparison) {
print $output "$comp\t$count{$comp}\n";
}
close $output;
Just to get you started... If someone wants to further work on this: these were my test files:
title_Nov19.txt
This is the foo title
Wow, we have bar too
Nothing special here but foo
OMG, the last title! And Foo again!
input.txt
foo
bar
And the result of the program was written to res.txt:
foo 3
bar 1
Here's another option using memowe's (thank you) data:
use strict;
use warnings;
use File::Slurp qw/read_file write_file/;
my %count;
my $regex = join '|', map { chomp; $_ = "\Q$_\E" } read_file 'input.txt';
for ( read_file 'title_Nov19.txt' ) {
my %seen;
!$seen{ lc $1 }++ and $count{ lc $1 }++ while /\b($regex)\b/ig;
}
write_file 'res.txt', map "$_\t$count{$_}\n",
sort { $count{$b} <=> $count{$a} } keys %count;
Numerically-sorted output to res.txt:
foo 3
bar 1
An alternation regex which quotes meta characters (\Q$_\E) is built and used, so only one pass against the large file's lines is needed. The hash %seen is used to insure that the input words are only counted once per line.
Hope this helps!
Try this:
grep -i -c -w -f input.txt title_Nov19.txt > res.txt

Perl IPC::Run appending output and parsing stderr while keeping it in a batch of files

I'm trying to wrap my head around IPC::Run to be able to do the following. For a list of files:
my #list = ('/my/file1.gz','/my/file2.gz','/my/file3.gz');
I want to execute a program that has built-in decompression, does some editing and filtering to them, and prints to stdout, giving some stats to stderr:
~/myprogram options $file
I want to append the stdout of the execution for all the files in the list to one single $out file, and be able to parse and store a couple of lines in each stderr as variables, while letting the rest be written out into separate fileN.log files for each input file.
I want stdout to all go into a ">>$all_into_one_single_out_file", it's the err that I want to keep in different logs.
After reading the manual, I've gone so far as to the code below, where the commented part I don't know how to do:
for $file in #list {
my #cmd;
push #cmd, "~/myprogram options $file";
IPC::Run::run \#cmd, \undef, ">>$out",
sub {
my $foo .= $_[0];
#check if I want to keep my line, save value to $mylog1 or $mylog2
#let $foo and all the other lines be written into $file.log
};
}
Any ideas?
First things first. my $foo .= $_[0] is not necessary. $foo is a new (empty) value, so appending to it via .= doesn't do anything. What you really want is a simple my ($foo) = #_;.
Next, you want to have output go to one specific file for each command while also (depending on some conditional) putting that same output to a common file.
Perl (among other languages) has a great facility to help in problems like this, and it is called closure. Whichever variables are in scope at the time of a subroutine definition, those variables are available for you to use.
use strict;
use warnings;
use IPC::Run qw(run new_chunker);
my #list = qw( /my/file1 /my/file2 /my/file3 );
open my $shared_fh, '>', '/my/all-stdout-goes-here' or die;
open my $log1_fh, '>', '/my/log1' or die "Cannot open /my/log1: $!\n";
open my $log2_fh, '>', '/my/log2' or die "Cannot open /my/log2: $!\n";
foreach my $file ( #list ) {
my #cmd = ( "~/myprogram", option1, option2, ..., $file );
open my $log_fh, '>', "$file.log"
or die "Cannot open $file.log: $!\n";
run \#cmd, '>', $shared_fh,
'2>', new_chunker, sub {
# $out contains each line of stderr from the command
my ($out) = #_;
if ( $out =~ /something interesting/ ) {
print $log1_fh $out;
}
if ( $out =~ /something else interesting/ ) {
print $log2_fh $out;
}
print $log_fh $out;
return 1;
};
}
Each of the output file handles will get closed when they're no longer referenced by anything -- in this case at the end of this snippet.
I fixed your #cmd, though I don't know what your option1, option2, ... will be.
I also changed the way you are calling run. You can call it with a simple > to tell it the next thing is for output, and the new_chunker (from IPC::Run) will break your output into one-line-at-a-time instead of getting all the output all-at-once.
I also skipped over the fact that you're outputting to .gz files. If you want to write to compressed files, instead of opening as:
open my $fh, '>', $file or die "Cannot open $file: $!\n";
Just open up:
open my $fh, '|-', "gzip -c > $file" or die "Cannot startup gzip: $!\n";
Be careful here as this is a good place for command injection (e.g. let $file be /dev/null; /sbin/reboot. How to handle this is given in many, many other places and is beyond the scope of what you're actually asking.
EDIT: re-read problem a bit more, and changed answer to more closely reflect the actual problem.
EDIT2:: Updated per your comment. All stdout goes to one file, and the stderr from command is fed to the inline subroutine. Also fixed a stupid typo (for syntax was pseudo code not Perl).