Perl change column delimiter of a file - perl

I stored the output of uniq -c into two files $lfile and $lfile2, I tried to make column separator as " " with tr command, but it seems not working, after split of $line there nothing get stored in $count, $e_code.
How to split the $line in two parts?
`egrep -o [A-Z]{3}[0-9]{5} $e_file |sort|uniq -c |sort -nrk1 |head -15 >$lfile1`;
`egrep -o [A-Z]{3}[0-9]{5} $y_file |sort|uniq -c |sort -nrk1 |head -150 >$lfile2`;
open (IN, "<$lfile1") ;
foreach $line (<IN>)
{
my $f_line=`echo $line|tr -s ' ' ` ;
print "$f_line \n" ;
my ($count, $e_code) = split / /, $f_line;

uniq -c produces output similar to this:
2 ABC12345
1 ZXC09876
Notice the leading spaces. Apparently, you intended to strip the leading spaces but keeping the space in between is vital for split / /, $f_line; to succeed.
To remove the leading spaces only use ^\s+ pattern (^ is the start of line anchor) and pass it to s/// substitution operator:
$line =~ s/^\s+//;
Please note you may accomplish this task in pure Perl:
my %counts = ();
open(my $fh, $e_file) or die "Failed to open $e_file: $!";
while (<$fh>) {
# collect counts of each [A-Z]{3}[0-9]{5} match in the %counts
# hash with the match being a kay in this hash and the number
# of occurrences of this match being the value
$counts{$1}++ foreach /([A-Z]{3}[0-9]{5})/g;
}
# iterate through the first 15 top encountered matches
foreach my $key (
(
sort {$counts{$b} <=> $counts{$a}} keys %counts # sort dictionary keys
# in value descending order
)[0..14] # take first 15 items of the ordered list
)
{
print "$key $counts{$key}\n";
}
Demo: https://ideone.com/eN1AyJ

Related

Perl search for a content in file and take out value using regex

I have a set of log files where I want to search for a word called Sum in each file and take the respective sum value out which is next to Sum keyword in the file.
Instead of doing file read operation I am using Tie::File to have the content of file in array and thinking to take out whatever value I needed.
Here is my code:
...
my $logpath = "C:/Users/Vinod/Perl/LOG/";
opendir(DIR, $logpath);
while (my $file = readdir(DIR)) {
next unless (-f "$logpath/$file");
next unless ($file =~ m/\.log$/);
my #lines;
print "$file\n";
tie #lines, 'Tie::File', $file, mode => O_RDWR;
for (#lines) {
print $_ if($_ =~ m/Sum/);
}
untie #lines;
}
closedir(DIR);
Here is what I am trying to extract from my log file:
test_log_file.log
....
....
=
> Sum: 10 PC's, 5 UPS's
End...
From the above test_log_file.log I want to take out value 10.
But the line print $_ if($_ =~ m/Sum/); is printing entire file content. No idea how I can take out the line which contains Sum and PC keywords. So that I can have value 10 using regex.
I can able to take out Sum value using below command:
$sum = qx/more $file | grep -i 'Sum' | grep 'PC' | awk -F " " '{print \$3}'/;
But wanted to resolve this using Perl script itself.
Read line by line. Capture the number and output only the captured part:
while (<>) { say $1 if /Sum: ([0-9]+)/ }

how to count a repeating string in a line using perl

I have the below file
file1:
abc def host 123 host 869 host
I wrote below script to count the occurrence of a "host" keyword in each line.
I tried all the ways(refer the ones which are commented) still it does not seem to work. sed command worked in command line but not inside the perl script
#!/usr/bin/perl
open(SOURCE,"</home/amp/surevy01/file1");
open(DESTINATION,"</home/amp/surevy01/file2");
while(my $line = <SOURCE>)
{
while(my $line1 = <DESTINATION>)
{
#chomp($line);
#chomp($line1);
if ($line =~ "host")
{
#my $count = grep {host} $line;
#my $count = `sed -i {s/host/host\n/g} $line1 | grep -c {host}`;
#my $count = `perl -pi -e 's/host/host\n/g' $line1 | grep -c host`;
#my $count grep ("host" ,$line);
print "$count";
print "match found \n";
next;
}
else
{
print "match not found \n";
exit;
}
}
}
I'm a beginner to perl. Looking for your valuable suggestions
Your own solution will match instances like hostages and Shostakovich
grep is the canonical way to count elements of a list, and split will turn your line into a list of words, giving
my $count = grep { $_ eq 'host' } split ' ', $line
I don't know why you're looping through two files in your example, but you can use the /g (global) flag:
my $line = "abc def host 123 host 869 host";
my $x = 0;
while ($line =~ /host/g){
$x++;
}
print "$x\n"; # 3
When you run a regex with /g in scalar context (as is the conditional in the while statement), it will keep track of the location of the last match and restart from there. Therefore, /host/g in a loop as above will find each occurence of host. You can also use the /g in list contexts:
my $line = "abc def host 123 host 869 host";
my #matches = $contents =~ /host/g;
print scalar #matches; # 3 again
In this case, #matches will contain all matches of the regexp against the string, which will be ('host', 'host', 'host') since the query is a simple string. Then, scalar(#matches) will yield the length of the list.
This produces the number of instances of host in $line:
my $count = () = $line =~ /host/g;
But that also matches hosting. To avoid that, the following will probably do the trick:
my $count = () = $line =~ /\bhost\b/g;
=()= this is called Perl secret Goatse operator. More info

zcat working in command line but not in perl script

Here is a part of my script:
foreach $i ( #contact_list ) {
print "$i\n";
$e = "zcat $file_list2| grep $i";
print "$e\n";
$f = qx($e);
print "$f";
}
$e prints properly but $f gives a blank line even when $file_list2 has a match for $i.
Can anyone tell me why?
Always is better to use Perl's grep instead of using pipe :
#lines = `zcat $file_list2`; # move output of zcat to array
die('zcat error') if ($?); # will exit script with error if zcat is problem
# chomp(#lines) # this will remove "\n" from each line
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print #ar;
# print join("\n",#ar)."\n"; # in case of using chomp
}
Best solution is not calling zcat, but using zlib library :
http://perldoc.perl.org/IO/Zlib.html
use IO::Zlib;
# ....
# place your defiiniton of $file_list2 and #contact list here.
# ...
$fh = new IO::Zlib; $fh->open($file_list2, "rb")
or die("Cannot open $file_list2");
#lines = <$fh>;
$fh->close;
#chomp(#lines); #remove "\n" symbols from lines
foreach $i ( #contact_list ) {
print "$i\n";
#ar = grep (/$i/, #lines);
print (#ar);
# print join("\n",#ar)."\n"; #in case of using chomp
}
Your question leaves us guessing about many things, but a better overall approach would seem to be opening the file just once, and processing each line in Perl itself.
open(F, "zcat $file_list |") or die "$0: could not zcat: $!\n";
LINE:
while (<F>) {
######## FIXME: this could be optimized a great deal still
foreach my $i (#contact_list) {
if (m/$i/) {
print $_;
next LINE;
}
}
}
close (F);
If you want to squeeze out more from the inner loop, compile the regexes from #contact_list into a separate array before the loop, or perhaps combine them into a single regex if all you care about is whether one of them matched. If, on the other hand, you want to print all matches for one pattern only at the end when you know what they are, collect matches into one array per search expression, then loop them and print when you have grepped the whole set of input files.
Your problem is not reproducible without information about what's in $i, but I can guess that it contains some shell metacharacter which causes it to be processed by the shell before the grep runs.

comparing hash element to table element perl

I have a program that compares each line of two files, each line contains one word, if simply read the two files and stock the data into table, and compare the element of the two tables,
the first file contain:
straight
work
week
belief time
saturday
wagon
australia
sunday
french
...
and the second file contain
firepower
malaise
bryson
wagon
dalglish
french
...
this will take a long time to compare file, so I propose another solution, but this doesn't work
#!/usr/bin/perl
use strict;
use warnings;
open( FIC, $ARGV[0] );
open( FICC, $ARGV[1] );
print "choose the name of the file\n";
chomp( my $fic2 = <STDIN> );
open( FIC2, ">$fic2" );
my $i=0;
my $j=0;
my #b=();
my %stops;
while (<FIC>) #read each line into $_
{
# Remove newline from $_
chomp;
$_ =~ s/\s+$//;
$stops{$_} = $i; # add the line to
$i++;
}
close FIC;
while (<FICC>) {
my $ligne = $_;
$ligne =~ s/\s+$//;
$b[$i] = lc($ligne);
# $b contain the data
$i++;
}
foreach my $che (#b) {
chomp($che);
print FIC2 $che;
print FIC2 " ";
print FIC2 $stops{"$che"}; print FIC2 "\n";
#this returns nothing
}
The problem is inthis command $stop{"$che"}; in the case that the elment don't exist in the hash %stop, it return an integer and an error
Use of initalized value in print c:/ats2/hash.pl line 44, line 185B2
Does this what you want?
join <(sort file1) <(sort file2) >result
Works in bash.

Perl out of order diff between text files

I basically want to do an out-of-order diff between two text files (in CSV style) where I compare the fields in the first two columns (I don't care about the 3rd columns value). I then print out the values that file1.txt has but aren't present in file2.txt and vice-versa for file2.txt compared to file1.txt.
file1.txt:
cat,val 1,43432
cat,val 2,4342
dog,value,23
cat2,value,2222
hedgehog,input,233
file2.txt:
cat2,value,312
cat,val 2,11
cat,val 3,22
dog,value,23
hedgehog,input,2145
bird,output,9999
Output would be something like this:
file1.txt:
cat,val 1,43432
file2.txt:
cat,val 3,22
bird,output,9999
I'm new to Perl so some of the better, less ugly methods to achieve this are outside of my knowledge currently. Thanks for any help.
current code:
#!/usr/bin/perl -w
use Cwd;
use strict;
use Data::Dumper;
use Getopt::Long;
my $myName = 'MyDiff.pl';
my $usage = "$myName is blah blah blah";
#retreive the command line options, set up the environment
use vars qw($file1 $file2);
#grab the specified values or exit program
GetOptions("file1=s" => \$file1,
"file2=s" => \$file2)
or die $usage;
( $file1 and $file2 ) or die $usage;
open (FH, "< $file1") or die "Can't open $file1 for read: $!";
my #array1 = <FH>;
close FH or die "Cannot close $file1: $!";
open (FH, "< $file2") or die "Can't open $file2 for read: $!";
my #array2 = <FH>;
close FH or die "Cannot close $file2: $!";
#...do a sort and match
Use a Hash for this with first 2 columns as key.
once you have these two hashes you can iterate and delete the common entries,
what remains in respective hashes will be what you are looking for.
Initialize,
my %hash1 = ();
my %hash2 = ();
Read in first file, join first two columns to form key and save it in hash. This assumes fields are comma separated. You could use a CSV module also for the same.
open( my $fh1, "<", $file1 ) || die "Can't open $file1: $!";
while(my $line = <$fh1>) {
chomp $line;
# join first two columns for key
my $key = join ",", (split ",", $line)[0,1];
# create hash entry for file1
$hash1{$key} = $line;
}
Do the same for file2 and create %hash2
open( my $fh2, "<", $file2 ) || die "Can't open $file2: $!";
while(my $line = <$fh2>) {
chomp $line;
# join first two columns for key
my $key = join ",", (split ",", $line)[0,1];
# create hash entry for file2
$hash2{$key} = $line;
}
Now go over the entries and delete the common ones,
foreach my $key (keys %hash1) {
if (exists $hash2{$key}) {
# common entry, delete from both hashes
delete $hash1{$key};
delete $hash2{$key};
}
}
%hash1 will now have lines which are only in file1.
You could print them as,
foreach my $key (keys %hash1) {
print "$hash1{$key}\n";
}
foreach my $key (keys %hash2) {
print "$hash2{$key}\n";
}
Perhaps the following will be helpful:
use strict;
use warnings;
my #files = #ARGV;
pop;
my %file1 = map { chomp; /(.+),/; $1 => $_ } <>;
push #ARGV, $files[1];
my %file2 = map { chomp; /(.+),/; $1 => $_ } <>;
print "$files[0]:\n";
print $file1{$_}, "\n" for grep !exists $file2{$_}, keys %file1;
print "\n$files[1]:\n";
print $file2{$_}, "\n" for grep !exists $file1{$_}, keys %file2;
Usage: perl script.pl file1.txt file2.txt
Output on your datasets:
file1.txt:
cat,val 1,43432
file2.txt:
cat,val 3,22
bird,output,9999
This builds a hash for each file. The keys are the first two columns and the associated values are the full lines. grep is used to filter the shared keys.
Edit: On relatively smaller files, using map as above to process the file's lines will work fine. However, a list of all of the file's lines is first created, and then passed to map. On larger files, it may be better to use a while (<>) { ... construct, to read one line at a time. The code below does this--generating the same output as above--and uses a hash of hashes (HoH). Because it uses a HoH, you'll note some dereferencing:
use strict;
use warnings;
my %hash;
my #files = #ARGV;
while (<>) {
chomp;
$hash{$ARGV}{$1} = $_ if /(.+),/;
}
print "$files[0]:\n";
print $hash{ $files[0] }{$_}, "\n"
for grep !exists $hash{ $files[1] }{$_}, keys %{ $hash{ $files[0] } };
print "\n$files[1]:\n";
print $hash{ $files[1] }{$_}, "\n"
for grep !exists $hash{ $files[0] }{$_}, keys %{ $hash{ $files[1] } };
I think the above prob can be solved by either of the mentioned algo
a) We can use the hash as mentioned above
b)
1. Sort both the files with Key1 and Key2 (use sort fun)
Iterate through FILE1
Match the key1 and key2 entry of FILE1 with FILE2
If yes then
take action by printing common lines it to desired file as required
Move to next row in File1 (continue with the loop )
If No then
Iterate through File2 startign from the POS-FILE2 until match is found
Match the key1 and key2 entry of FILE1 with FILE2
If yes then
take action by printing common lines it to desired file as required
setting FILE2-END as true
exit from the loop noting the position of FILE2
If no then
take action by printing unmatched lines to desired file as req.
Move to next row in File2
If FILE2-END is true
Rest of Lines in FILE1 doesnt exist in FILE2