File comparison with multiple columns - perl

I am doing a directory cleanup to check for files that are not being used in our testing environment. I have a list of all the file names which are sorted alphabetically in a text file and another file I want to compare against.
Here is how the first file is setup:
test1.pl
test2.pl
test3.pl
It is a simple, one script name per line text file of all the scripts in the directory I want to clean up based on the other file below.
The file I want to compare against is a tab file which lists a script that each server runs as a test and there are obviously many duplicates. I want to strip out the testing script names from this file and compare spit it out to another file, use uniq and sort so that I can diff this file with the above to see which testing scripts are not being used.
The file is setup as such:
server: : test1.pl test2.pl test3.pl test4.sh test5.sh
There are some lines with less and some with more. My first impulse was to make a Perl script to split the line and push the values in an list if they are not there but that seems wholly inefficient. I am not to experienced in awk but I figured there is more than one way to do it. Any other ideas to compare these files?

A Perl solution that makes a %needed hash of the files being used by the servers and then checks against the file containing all the file names.
#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;
my %needed;
while (<SERVTEST>) {
chomp;
my (undef, #files) = split /\t/;
#needed{ #files } = (1) x #files;
}
while (<TESTFILES>) {
chomp;
if (not $needed{$_}) {
print "Not needed: $_\n";
}
}
__TESTFILES__
test1.pl
test2.pl
test3.pl
test4.pl
test5.pl
__SERVTEST__
server1:: test1.pl test3.pl
server2:: test2.pl test3.pl
__END__
*** prints
C:\Old_Data\perlp>perl t7.pl
Not needed: test4.pl
Not needed: test5.pl

This rearranges filenames to be one per line in second file via awk, then diff the output with the first file.
diff file1 <(awk '{ for (i=3; i<=NF; i++) print $i }' file2 | sort -u)

Quick and dirty script to do the job. If it sounds good, use open to read the files with proper error checking.
use strict;
use warnings;
my #server_lines = `cat server_file`;chomp(#server_lines);
my #test_file_lines = `cat test_file_lines`;chomp(#test_file_lines);
foreach my $server_line (#server_lines){
$server_line =~ s!server: : !!is;
my #files_to_check = split(/\s+/is, $server_line);
foreach my $file_to_check (#files_to_check){
my #found = grep { /$file_to_check/ } #test_file_lines;
if (scalar(#found)==0){
print "$file_to_check is not found in $server_line\n";
}
}
}

If I understand your need correctly you have a file with a list of tests (testfiles.txt):
test1.pl
test2.pl
test3.pl
test4.pl
test5.pl
And a file with a list of servers, with files they all test (serverlist.txt):
server1: : test1.pl test3.pl
server2: : test2.pl test3.pl
(Where I have assumed all spaces as tabs).
If you convert the second file into a list of tested files, you can then compare this using diff to your original file.
cut -d: -f3 serverlist.txt | sed -e 's/^\t//g' | tr '\t' '\n' | sort -u > tested_files.txt
The cut removes the server name and ':', the sed removes the leading tab left behind, tr then converts the remaining tabs into newlines, then we do a unique sort to sort and remove duplicates. This is output to tested_files.txt.
Then all you do is diff testfiles.txt tested_files.txt.

It's hard to tell since you didn't post the expected output but is this what you're looking for?
$ cat file1
test1.pl
test2.pl
test3.pl
$
$ cat file2
server: : test1.pl test2.pl test3.pl test4.sh test5.sh
$
$ gawk -v RS='[[:space:]]+' 'NR==FNR{f[$0]++;next} FNR>2 && !f[$0]' file1 file2
test4.sh
test5.sh

Related

how to use system command 'grep' in perl script

I am trying to count the matching character using grep command in Perl script. Below script is counting whole directory, my desired output should contain only the count of input file not the whole directory, some one help me to do so.
#! use/bin/perl
use strict;
print"Enter file name for Unzip\n";
print"File name: ";
chomp(my $Filename=<>);
system("gunzip -r ./$Filename/*\n");
system('grep -c "#SRR" ./$Filename/*');
This is giving whole directory count.
#! use/bin/perl
use strict;
print"Enter file name for Unzip\n";
print"File name: ";
chomp(my $Filename=<>);
system("gunzip -r ./$Filename\*");
system("grep -c '\#SRR' ./$Filename\*");
Please let know if i misunderstood question. But above code gives us number of lines matching #SRR on zipped filename provided.
Also you don't need to unzip to count you can directly do this
system("zgrep -c '\#SRR' $Filename")
instead of
system("gunzip -r ./$Filename\*");
system("grep -c '\#SRR' ./$Filename\*");
my $var=cat filename | grep "your word";
Thanks,
nilesh.

Perl deleting "blank" lines from a csv file

I'm looking to delete blank lines in a CSV file, using Perl.
I'm not too sure how to do this, as these lines aren't exactly "blank" (they're just a bunch of commas).
I'd also like to save the output as a file of the same name, overwriting the original.
How could I go about doing this?
edit: I can't use modules or any source code due to network restrictions...
You can do this using a simple Perl one-liner:
perl -i -ne 'print unless /^[,\s]*$/' <filename>
The -n flag assumes this loop around your program:
while(<>) {
print unless /^[,\s]*$/;
}
and the -i flag means inplace and modifies your input file.
Note: If you are worried about losing your data with -i, you can specify -i.bak and perl will automatically write the original file to your <filename>.bak
More of a command line hack,
perl -i -ne 'print if /[^,\r\n]/' file.csv
If you want to put it inside a shell script you can do this ...
#!/bin/sh
$(perl -i -n -e 'print $_ unless ($_ =~ /^\,+$/);' $*)

Dynamic Perl find and replace using grep inside backticks

I am trying to do a dynamic search and replace with Perl on the command line with part of the replacement text being the output of a grep command within backticks. Is this possible to do on the command line, or will I need to write a script to do this?
Here is the command that I thought would do the trick. I thought that Perl would treat the backticks as a command substitution, but instead it just treats the backticks and the content within them as a string:
perl -p -i -e 's/example.xml/http:\/\/exampleURL.net\/`grep -ril "example_needle" *`\/example\/path/g' `grep -ril "example_needle" *`
UPDATE:
Thanks for the helpful answers. Yes, there was a typo in my original one-liner: the target file of grep is supposed to be *.
I wrote a small script based on Schewrn's example, but am having confusing results. Here is the script I wrote:
#!/usr/bin/env perl -p -i
my $URL_First = "http://examplesite.net/some/path/";
my $URL_Last = "/example/example.xml";
my #files = `grep -ril $URL_Last .`;
chomp #files;
foreach my $val (#files) {
#dir_names = split('/',$val);
if(#dir_names[1] ne $0) {
my $url = $URL_First . #dir_names[1] . $URL_Last;
open INPUT, "+<$val" or die $!;
seek INPUT,0,0;
while(<INPUT>) {
$_ =~ s{\Q$URL_Last}{$url}g;
print INPUT $_;
}
close INPUT;
}
}
Basically what I am trying to do is:
Find files that contain $URL_Last.
Replace $URL_Last with $URL_First plus the name of the directory that the matched file is in, plus $URL_Last.
Write the above change to the input file without modifying anything else in the input file.
After running my script, it completely garbled the HTML code in the input file and it cut off the first few characters of each line in the file. This is strange, because I know for sure that $URL_Last only occurs once in each file, so it should only be matched once and replaced once. Is this being caused by a misuse of the seek function?
You should use another delimiter for s/// so that you don't need to escape slashes in the URL:
perl -p -i -e '
s#example.xml#http://exampleURL.net/`grep -ril "example_needle"`/example/path#g'
`grep -ril "example_needle" *`
Your grep command inside the regex will not be executed, as it is just a string, and backticks are not meta characters. Text inside a substitution will act as though it was inside a double quoted string. You'd need the /e flag to execute the shell command:
perl -p -i -e '
s#example.xml#
qq(http://exampleURL.net/) . `grep -ril "example_needle"` . qq(/example/path)
#ge'
`grep -ril "example_needle" *`
However, what exactly are you expecting that grep command to do? It lacks a target file. -l will print file names for matching files, and grep without a target file will use stdin, which I suspect will not work.
If it is a typo, and you meant to use the same grep as for your argument list, why not use #ARGV?
perl -p -i -e '
s#example.xml#http://exampleURL.net/#ARGV/example/path#g'
`grep -ril "example_needle" *`
This may or may not do what you expect, depending on whether you expect to have newlines in the string. I am not sure that argument list will be considered a list or a string.
It seems like what you're trying to do is...
Find a file in a tree which contains a given string.
Use that file to build a URL.
Replace something in a string with that URL.
You have three parts, and you could jam them together into one regex, but it's much easier to do it in three steps. You won't hate yourself in a week when you need to add to it.
The first step is to get the filenames.
# grep -r needs a directory to search, even if it's just the current one
my #files = `grep -ril $search .`;
# strip the newlines off the filenames
chomp #files;
Then you need to decide what to do if you get more than one file from grep. I'll leave that choice up to you, I'm just going to take the first one.
my $file = $files[0];
Then build the URL. Easy enough...
# Put it in a variable so it can be configured
my $Site_URL = "http://www.example.com/";
my $url = $Site_URL . $file;
To do anything more complicated, you'd use URI.
Now the search and replace is trivial.
# The \Q means meta-characters like . are ignored. Better than
# remembering to escape them all.
$whatever =~ s{\Qexample.xml}{$url}g;
You want to edit files using -p and -i. Fortunately we can emulate that functionality.
#!/usr/bin/env perl
use strict;
use warnings; # never do without these
my $Site_URL = "http://www.example.com/";
my $Search = "example-search";
my $To_Replace = "example.xml";
# Set $^I to edit files. With no argument, just show the output
# script.pl .bak # saves backup with ".bak" extension
$^I = shift;
my #files = `grep -ril $Search .`;
chomp #files;
my $file = $files[0];
my $url = $Site_URL . $file;
#ARGV = ($files[0]); # set the file up for editing
while (<>) {
s{\Q$To_Replace}{$url}g;
}
Everyone's answers were very helpful to my writing a script that wound up working for me. I actually found a bash script solution yesterday, but wanted to post a Perl answer in case anyone else finds this question through Google.
The script that #TLP posted at http://codepad.org/BFpIwVtz is an alternative way of doing this.
Here is what I ended up writing:
#!/usr/bin/perl
use Tie::File;
my $URL_First = 'http://example.com/foo/bar/';
my $Search = 'path/example.xml';
my $URL_Last = '/path/example.xml';
# This grep returns a list of files containing "path/example.xml"
my #files = `grep -ril $Search .`;
chomp #files;
foreach my $File_To_Edit (#files) {
# The output of $File_To_Edit looks like this: "./some_path/index.html"
# I only need the "some_path" part, so I'm going to split up the output and only use #output[1] ("some_path")
#output = split('/',$File_To_Edit);
# "some_path" is the parent directory of "index.html", so I'll call this "$Parent_Dir"
my $Parent_Dir = #output[1];
# Make sure that we don't edit the contents of this script by checking that $Parent_Dir doesn't equal our script's file name.
if($Parent_Dir ne $0) {
# The $File_To_Edit is "./some_path/index.html"
tie #lines, 'Tie::File', $File_To_Edit or die "Can't read file: $!\n";
foreach(#lines) {
# Finally replace "path/example.xml" with "http://example.com/foo/bar/some_path/path/example.xml" in the $File_To_Edit
s{$Search}{$URL_First$Parent_Dir$URL_Last}g;
}
untie #lines;
}
}

Only print matching lines in perl from the command line

I'm trying to extract all ip addresses from a file. So far, I'm just using
cat foo.txt | perl -pe 's/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
but this also prints lines that don't contain a match. I can fix this by piping through grep, but this seems like it ought to be unnecessary, and could lead to errors if the regexes don't match up perfectly.
Is there a simpler way to accomplish this?
Try this:
cat foo.txt | perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
or:
<foo.txt perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
It's the shortest alternative I can think of while still using Perl.
However this way might be more correct:
<foo.txt perl -ne 'if (/((\d{1,3}\.){3}\d{1,3})/) { print $1 . "\n" }'
If you've got grep, then just call grep directly:
grep -Po "(\d{1,3}\.){3}\d{1,3}" foo.txt
You've already got a suitable answer of using grep to extract the IP addresses, but just to explain why you were seeing non-matches being printed:
perldoc perlrun will tell you about all the options you can pass Perl on the command line.
Quoting from it:
-p causes Perl to assume the following loop around your program, which makes it
iterate over filename arguments somewhat like sed:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
You could have used the -n switch instead, which does similar, but does not automatically print, for example:
cat foo.txt | perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1'
Also, there's no need to use cat; Perl will open and read the filenames you give it, so you could say e.g.:
perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1' foo.txt
ruby -0777 -ne 'puts $_.scan(/((?:\d{1,3}\.){3}\d{1,3})/)' file

How can I delete a line in file if the line matched the required PATH, in Perl?

My target is to delete line in file only if PATH match the PATH in the file
For example, I need to delete all lines that have /etc/sysconfig PATH from /tmp/file file
more /tmp/file
/etc/sysconfig/network-scripts/ifcfg-lo file1
/etc/sysconfig/network-scripts/ifcfg-lo file2
/etc/sysconfig/network-scripts/ifcfg-lo file3
I write the following Perl code (the perl code integrated in my bash script) in order to delete lines that have "/etc/sysconfig"
export FILE=/etc/sysconfig
perl -i -pe 's/\Q$ENV{FILE}\E// ' /tmp/file
But I get the following after I run the perl code: (in place to get empty lines)
/network-scripts/ifcfg-lo file1
/network-scripts/ifcfg-lo file2
/network-scripts/ifcfg-lo file3
first question:
How to change the perl syntax : perl -i -pe 's/\Q$ENV{FILE }\E// ' in order to delete line that matches the required PATH (/etc/sysconfig)?
second question:
The same as the first question but line will deleted only if PATH match the first field in the file
Example:
/tmp/file before perl edit:
file1 /etc/sysconfig/network-scripts/ifcfg-lo
/etc/sysconfig/network-scripts/ifcfg-lo file2
/etc/sysconfig/network-scripts/ifcfg-lo file3
/tmp/file after perl edit:
file1 /etc/sysconfig/network-scripts/ifcfg-lo
Perl is a fine way to do it. Use the -n switch, not -p.
perl -i -l -n -e'print unless /\Q$ENV{FILE}/' filename
s/pattern/otherpattern/ won't delete entire lines; it will only alter substrings. You need to entirely change your program to delete entire lines. In pseudocode, it would be:
while (read in a line)
{
if (doesn't match)
{
write the line back out unaltered.
}
}
It can still be rewritten as a oneliner though, with knowledge of how continue and redo work in loops: perl -pe '$_ = <> and redo if /Q$ENV{FILE}\E/'
mef#iwlappy:~$ cat /tmp/file
aaaa
/etc/sysconfig/network-scripts/ifcfg-lofile1
/etc/sysconfig/network-scripts/ifcfg-lofile2
/etc/sysconfig/network-scripts/ifcfg-lofile3
aaa
mef#iwlappy:~$ perl -i -pe 's/$ENV{FILE}\E.*//' /tmp/file
mef#iwlappy:~$ cat /tmp/file
aaaa
aaa
You can do a further regexp to remove empty lines with s/^$//
If I were doing this from the command line, I probably wouldn't even use Perl. I'd just use a negated grep:
$ mv old.txt old.bak; grep -v $FILE old.bak > old.txt
Renaming the original file and writing to a new file with the old name is the same thing that perl's -i switch does for you.
If you want to match just the first column, then I might punt to perl so I don't have to use awk or cut. perl's -a switch splits the line on whitespace and puts the results in #F:
$ perl -ai.bak -ne 'print if $F[0] !~ /^\Q$ENV{FILE}/' old.txt
When you think you have it right, you can remove the .bak training wheels that saves a copy of your original file. Or not. I tend to like the safety net.
See perlrun for the details of command-line switches.