Using command line to remove lines from text file - perl

I have a text file and need to remove all lines that DO NOT contain http in them. Alternatively, it could just output all the files that DO contain http in them to the new file.
The name of my original file is list.txt and I need to generate a new file with a name like new.txt
I know that there are several ways to do this via command line, but what I'm really looking for is the quickest way since I need to do this with several files and each of them are a few gigs in size...

The quickest, shortest solution,
fgrep -v "http"
Of course, grep, egrep, awk, perl, etc make this more fungible.
Here is a short shell script. Edit "delhttp.sh" containing,
#!/bin/bash
if [ $# -eq 0 ] ; then
fgrep -v "http"
elif [ $# -eq 1 ] ; then
f1=${1:-"null"}
if [ ! -f $f1 ]; then echo "file $f1 dne"; exit 1; fi
fgrep -v "http" $f1 #> $f2
elif [ $# -eq 2 ]; then
f1=${1:-"null"}
if [ ! -f $f1 ]; then echo "file $f1 dne"; exit 1; fi
f2=${2:-"null"}
fgrep -v "http" $f1 > $f2
fi
Then make this file executable using,
chmod +x delhttp.sh
Here is a perl script (if you prefer), Edit "delhttp.pl" containing,
#!/bin/env perl
use strict;
use warnings;
my $f1=$ARGV[0]||"-";
my $f2=$ARGV[1]||"-";
my ($fh, $ofh);
open($fh,"<$f1") or die "file $f1 failed";
open($ofh,">$f2") or die "file $f2 failed";
while(<$fh>) { if( !($_ =~ /http/) ) { print $ofh "$_"; } }
Again, make this file executable using,
chmod +x delhttp.pl

perl -i -lne 'print if(/http/)' your_file
This above command will delete all the lines from the file if they do not have http.
If you insist on keeping the original file backup, the you can anyhow give and option of ".bak" like mentioned below:
perl -i.bak -lne 'print if(/http/)' your_file
By this your_file.bak will be generated which is nothing but a copy of the original file and original file will be modified according to your need.
Also you can use awk:
awk '/http/' your_file
This will out put to the console. You can anyhow use '>' to store the output in a new file.

You could use grep. Using -v inverts the sense of matching, to select non-matching lines.
grep -v 'http' list.txt
Using Perl one-liner:
perl -ne '/^(?:(?!http).)*$/ and print' list.txt > new.txt

Related

reset state variables on next file in `perl -n` one-liner

I'm processing multiple files with find ... | xargs perl -ne and when I proceed to next file I need to reset some variables like gawk 'BEGINFILE {}' does.
As a workaround, I check that the current filename changed. Is there a cleaner way?
if ($oldARGV ne $ARGV) { $oldARGV = $ARGV; $var1=""; ... } ...
Using eof with no argument (Or with eof ARGV):
$ perl -nE 'say "Done with file $ARGV" if eof' *.txt
Done with file a.txt
Done with file b.txt

What is the difference between "perl -n" and "perl -p"?

What is the difference between the perl -n and perl -p options?
What is a simple example to demonstrate the difference?
How do you decide which one to use?
How do you decide which one to use?
You use -p if you want to automatically print the contents of $_ at the end of each iteration of the implied while loop. You use -n if you don't want to print $_ automatically.
An example of -p. Adding line numbers to a file:
$ perl -pe '$_ = "$.: $_"' your_file.txt
An example of -n. A basic grep replacement.
$ perl -ne 'print if /some search text/' your_file.txt
-p is short for -np, and it causes $_ to be printed for each pass of the loop created by -n.
perl -ne'...'
executes the following program:
LINE: while (<>) {
...
}
while
perl -pe'...'
executes the following program:
LINE: while (<>) {
...
}
continue {
die "-p destination: $!\n" unless print $_;
}
See perlrun for documentation about perl's command-line options.
What is the difference between the perl -n and perl -p options?
-p causes each line to be printed; equivalent to:
while (<>) { ... } continue { print }
-n does not automatically print each line; equivalent to:
while(<>) {...}
What is a simple example to demonstrate the difference?
e.g., replace foo with FOO:
$ echo 'foo bar' | perl -pe 's/foo/FOO/'
FOO bar
$ echo 'foo bar' | perl -ne 's/foo/FOO/'
$
How do you decide which one to use?
One example where -n is useful is when you don't want every line printed, and there is a conditional print in the code, e.g., only show lines containing foo:
$ echo -e 'foo\nbar\nanother foo' | perl -ne 'print if /foo/;'
foo
another foo
$
The command-line options are documented in perlrun documentation
perl -n is equivalent to while(<>){...}
perl -p is equivalent to while(<>){...;print;}

Executing grep via Perl

I am new to Perl. I am trying to execute grep command with perl.
I have to read input from a file and based on the input, the grep has to be executed.
My code is as follows:
#!/usr/bin/perl
use warnings;
use strict;
#Reading input files line by line
open FILE, "input.txt" or die $!;
my $lineno = 1;
while (<FILE>) {
print " $_";
#This is what expected.
#our $result=`grep -r Unable Satheesh > out.txt`;
our $result=`grep -r $_ Satheesh > out.txt`;
print $result
}
print "************************************************************\n";
But, if I run the script, it looks like a infinite loop and script is keep on waiting and nothing is printed in the out.txt file.
The reason it's hanging is because you forgot to use chomp after reading from FILE. So there's a newline at the end of $_, and it's executing two shell commands:
grep -r $_
Satheesh > out.txt
Since there's no filename argument to grep, it's reading from standard input, i.e. the terminal. If you type Ctl-d when it hangs, you'll then get an error message telling you that there's no Satheesh command.
Also, since you're redirecting the output of grep to out.txt, nothing gets put in $result. If you want to capture the output in a variable and also put it into the file, you can use the tee command.
Here's the fix:
while (<FILE>) {
print " $_";
chomp;
#This is what expected.
#our $result=`grep -r Unable Satheesh > out.txt`;
our $result=`grep -r $_ Satheesh | tee out.txt`;
print $result
}

System command in perl

I need to run a system command which would go to a directory and delete sub directories excluding files if present. I wrote the below command to perform this operation:
system("cd /home/faizan/test/cache ; for i in *\; do if [ -d \"$i\" ]\; then echo \$i fi done");
The command above keeps throwing syntax error. I have tried multiple combinations but still not clear how this should go. Please suggest.
Well, your command line does contain syntax errors. Try this:
system("cd /home/faizan/test/cache ; for i in *; do if [ -d \"\$i\" ]; then echo \$i; fi; done");
Or better yet, only loop over directories in the first place;
system("for i in /home/faizan/test/cache/*/.; do echo \$i; done");
Or better yet, do it without a loop:
system("echo /home/faizan/test/cache/*/.");
(I suppose you will want to rmdir instead of echo once it is properly debugged.)
Or better yet, do it all in Perl. There is nothing here which requires system().
You're still best off trying this as a bash command first. Formatting that properly makes it much clearer that you're missing statement terminators:
for i in *; do
if [ -d "$i" ]; then
echo $i
fi
done
And condensing that by replacing new lines with semicolons (apart from after do/then):
for i in *; do if [ -d "$i" ]; then echo $i; fi; done
Or as has been mentioned, just do it in Perl (I haven't tested this to the point of actually uncommenting remove_tree - be careful!):
use strict;
use warnings;
use File::Path 'remove_tree';
use feature 'say';
chdir '/tmp';
opendir my $cache, '.';
while (my $item = readdir($cache)) {
if ($item !~ /^\.\.?$/ && -d $item) {
say "Deleting '$item'...";
# remove_tree($item);
}
}
Using system
my #args = ("cd /home/faizan/test/cache ; for i in *; do if [ -d \"\$i\" ]; then echo \$i; fi; done");
system(#args);
Using Subroutine
sub do_stuff {
my #args = ( "bash", "-c", shift );
system(#args);
}
do_stuff("cd /home/faizan/test/cache ; for i in *; do if [ -d \"\$i\" ]; then echo \$i; fi; done");
As question title stand for system command, this will answer directly, but the sample command using bash contain only thing that will be simplier in perl only (take a look at other answer using opendir and -d in perl).
If you want to use system (instead of open $cmdHandle,"bash -c ... |"), the prefered syntax for execution commands like system or exec, is to let perl parsing the command line.
Try this (as you've already done):
perl -e 'system("bash -c \"echo hello world\"")'
hello world
perl -e 'system "bash -c \"echo hello world\"";'
hello world
And now better, same but letting perl ensure command line parsing, try this:
perl -e 'system "bash","-c","echo hello world";'
hello world
There are clearly 3 argument of system command:
bash
-c
the script
or little more:
perl -e 'system "bash","-c","echo hello world;date +\"Now it is %T\";";'
hello world
Now it is 11:43:44
as you can see in last purpose, there is no double double-quotes enclosing bash script part of command line.
**Nota: on command line, using perl -e '...' or perl -e "...", it's a little heavy to play with quotes and double-quotes. In a script, you could mix them:
system 'bash','-c','for ((i=10;i--;));do printf "Number: %2d\n" $i;done';
or even:
system 'bash','-c','for ((i=10;i--;));do'."\n".
'printf "Number: %2d\n" $i'."\n".
'done';
Using dots . for concatening part of (script part) string, there are always 3 arguments.

Dynamic Perl find and replace using grep inside backticks

I am trying to do a dynamic search and replace with Perl on the command line with part of the replacement text being the output of a grep command within backticks. Is this possible to do on the command line, or will I need to write a script to do this?
Here is the command that I thought would do the trick. I thought that Perl would treat the backticks as a command substitution, but instead it just treats the backticks and the content within them as a string:
perl -p -i -e 's/example.xml/http:\/\/exampleURL.net\/`grep -ril "example_needle" *`\/example\/path/g' `grep -ril "example_needle" *`
UPDATE:
Thanks for the helpful answers. Yes, there was a typo in my original one-liner: the target file of grep is supposed to be *.
I wrote a small script based on Schewrn's example, but am having confusing results. Here is the script I wrote:
#!/usr/bin/env perl -p -i
my $URL_First = "http://examplesite.net/some/path/";
my $URL_Last = "/example/example.xml";
my #files = `grep -ril $URL_Last .`;
chomp #files;
foreach my $val (#files) {
#dir_names = split('/',$val);
if(#dir_names[1] ne $0) {
my $url = $URL_First . #dir_names[1] . $URL_Last;
open INPUT, "+<$val" or die $!;
seek INPUT,0,0;
while(<INPUT>) {
$_ =~ s{\Q$URL_Last}{$url}g;
print INPUT $_;
}
close INPUT;
}
}
Basically what I am trying to do is:
Find files that contain $URL_Last.
Replace $URL_Last with $URL_First plus the name of the directory that the matched file is in, plus $URL_Last.
Write the above change to the input file without modifying anything else in the input file.
After running my script, it completely garbled the HTML code in the input file and it cut off the first few characters of each line in the file. This is strange, because I know for sure that $URL_Last only occurs once in each file, so it should only be matched once and replaced once. Is this being caused by a misuse of the seek function?
You should use another delimiter for s/// so that you don't need to escape slashes in the URL:
perl -p -i -e '
s#example.xml#http://exampleURL.net/`grep -ril "example_needle"`/example/path#g'
`grep -ril "example_needle" *`
Your grep command inside the regex will not be executed, as it is just a string, and backticks are not meta characters. Text inside a substitution will act as though it was inside a double quoted string. You'd need the /e flag to execute the shell command:
perl -p -i -e '
s#example.xml#
qq(http://exampleURL.net/) . `grep -ril "example_needle"` . qq(/example/path)
#ge'
`grep -ril "example_needle" *`
However, what exactly are you expecting that grep command to do? It lacks a target file. -l will print file names for matching files, and grep without a target file will use stdin, which I suspect will not work.
If it is a typo, and you meant to use the same grep as for your argument list, why not use #ARGV?
perl -p -i -e '
s#example.xml#http://exampleURL.net/#ARGV/example/path#g'
`grep -ril "example_needle" *`
This may or may not do what you expect, depending on whether you expect to have newlines in the string. I am not sure that argument list will be considered a list or a string.
It seems like what you're trying to do is...
Find a file in a tree which contains a given string.
Use that file to build a URL.
Replace something in a string with that URL.
You have three parts, and you could jam them together into one regex, but it's much easier to do it in three steps. You won't hate yourself in a week when you need to add to it.
The first step is to get the filenames.
# grep -r needs a directory to search, even if it's just the current one
my #files = `grep -ril $search .`;
# strip the newlines off the filenames
chomp #files;
Then you need to decide what to do if you get more than one file from grep. I'll leave that choice up to you, I'm just going to take the first one.
my $file = $files[0];
Then build the URL. Easy enough...
# Put it in a variable so it can be configured
my $Site_URL = "http://www.example.com/";
my $url = $Site_URL . $file;
To do anything more complicated, you'd use URI.
Now the search and replace is trivial.
# The \Q means meta-characters like . are ignored. Better than
# remembering to escape them all.
$whatever =~ s{\Qexample.xml}{$url}g;
You want to edit files using -p and -i. Fortunately we can emulate that functionality.
#!/usr/bin/env perl
use strict;
use warnings; # never do without these
my $Site_URL = "http://www.example.com/";
my $Search = "example-search";
my $To_Replace = "example.xml";
# Set $^I to edit files. With no argument, just show the output
# script.pl .bak # saves backup with ".bak" extension
$^I = shift;
my #files = `grep -ril $Search .`;
chomp #files;
my $file = $files[0];
my $url = $Site_URL . $file;
#ARGV = ($files[0]); # set the file up for editing
while (<>) {
s{\Q$To_Replace}{$url}g;
}
Everyone's answers were very helpful to my writing a script that wound up working for me. I actually found a bash script solution yesterday, but wanted to post a Perl answer in case anyone else finds this question through Google.
The script that #TLP posted at http://codepad.org/BFpIwVtz is an alternative way of doing this.
Here is what I ended up writing:
#!/usr/bin/perl
use Tie::File;
my $URL_First = 'http://example.com/foo/bar/';
my $Search = 'path/example.xml';
my $URL_Last = '/path/example.xml';
# This grep returns a list of files containing "path/example.xml"
my #files = `grep -ril $Search .`;
chomp #files;
foreach my $File_To_Edit (#files) {
# The output of $File_To_Edit looks like this: "./some_path/index.html"
# I only need the "some_path" part, so I'm going to split up the output and only use #output[1] ("some_path")
#output = split('/',$File_To_Edit);
# "some_path" is the parent directory of "index.html", so I'll call this "$Parent_Dir"
my $Parent_Dir = #output[1];
# Make sure that we don't edit the contents of this script by checking that $Parent_Dir doesn't equal our script's file name.
if($Parent_Dir ne $0) {
# The $File_To_Edit is "./some_path/index.html"
tie #lines, 'Tie::File', $File_To_Edit or die "Can't read file: $!\n";
foreach(#lines) {
# Finally replace "path/example.xml" with "http://example.com/foo/bar/some_path/path/example.xml" in the $File_To_Edit
s{$Search}{$URL_First$Parent_Dir$URL_Last}g;
}
untie #lines;
}
}