Perl: Delete multiple lines from text file having a specific string - perl

I have a text file having data in below mentioned format..
#rectype='ABC' #recname='123' #rec_id='1K2j' etc...
#rectype='DEF' #recname='matin' #rec_id='458i' etc...
#rectype='ABC' #recname='John' #rec_id='lom0' etc...
#rectype='GHI' #recname='Kalme, #rec_id='pl90' etc...
#rectype='KLM' #recname='Kitty' #rec_id='987k' etc...
#rectype='ABC' #recname='OMR' #rec_id='lo09' etc...
Now, I have to delete all the lines having #rectype='ABC'..there are multiple lines of this kind in the input file.It's a kind of urgent and as I am not a perl coder , I am finding it difficult to figure out the way.
Please suggest!!!
NOTE: I need to make changes in input file only. I don't need to create a seperate output file.

You don't need to do it in Perl. You can use the grep tool.
grep -v "#rectype='ABC'" input_file > output_file
grep -v means "Print every line that does not match this expression."

perl -i -ne 'print if !/\#rectype = \047ABC\047/x' text_file

#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp;
my $output = 'output.txt';
open my $outfile, '>', $output or die "Can't write to $output: $!";
my #array = read_file('input.txt');
for (#array){
next if ($_ =~ /^\#rectype='ABC'/);
print $outfile $_ ;
}
Output (saved to 'output.txt'):
#rectype='DEF' #recname='matin' #rec_id='458i' etc...
#rectype='GHI' #recname='Kalme, #rec_id='pl90' etc...
#rectype='KLM' #recname='Kitty' #rec_id='987k' etc...

Related

How to delete a pattern matching and the rest of the line in a file using Perl

I'm a new programmer in Perl and i would like to find a pattern in a file and delete it with the rest of the line. For example,
"input file"
>hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195 Homo sapiens let-7a-2-3p
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p
UGAGGUAGUAGGUUGUGUGGUU
"desired output file"
>hsa-let-7a-5p MIMAT0000062
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p MIMAT0004481
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063
UGAGGUAGUAGGUUGUGUGGUU
I want to find the string "Homo sapiens" and delete it as well as the rest of the line.
I write the following code but it is not functional
#!/usr/bin/perl
use strict;
use warnings;
my $find = "Homo sapiens"; #string for searching
open (FILE1, "input.fasta") || die "Cannot open the file!"; #open for reading
open (FILE2, ">>output.fasta") || die "Cannot open the file!"; #open for writing
while (my $line = <FILE1>){
if ($line =~ /$find/){
print FILE2 $line;
print FILE2 scalar <FILE1>;
}
}
close(FILE1);
close(FILE2);
exit;
Thanks
The majority of the Linux world has a fascination with one-line programs, so here is a one-line solution that does as you ask
perl -pe's/\s*Homo Sapiens.*//i' input.txt
It will make the changes that you describe and send the result to STDOUT.
If you want to write the altered text to a new file then simply redirect the output, with something like
perl -pe's/\s*Homo Sapiens.*//i' input.txt > fixed.txt
output
>hsa-let-7a-5p MIMAT0000062
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p MIMAT0004481
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063
UGAGGUAGUAGGUUGUGUGGUU
If you are not one of those people and you need help to write the equivalent Perl program then please ask.
Update
An equivalent program would look like this. I've called it sapiens.pl. You would run it from the command line with the input file as a parameter, such as
sapiens.pl input.txt > fixed.txt
#!/usr/bin/perl
use strict;
use warnings;
my $remove = 'Homo sapiens';
while (<>) {
s/\s*$remove.*//i;
print;
}
I would replace your while loop with the following.
while (<FILE1>){
s/$find.*//;
print FILE2 $line;
}
I loaded the line into the default variable by not specifically assigning it to any other variable, and then applied the substitution operator to it. I am substituting your variable find any any characters that come after it in the line with the empty string. We don't need to check if the substitution worked. If it did, then we removed the unwanted characters; If not, then we want the entire line.

How to print result STDOUT to a temporary blank new file in the same directory in Perl?

I'm new in Perl, so it's maybe a very basic case that i still can't understand.
Case:
Program tell user to types the file name.
User types the file name (1 or more files).
Program read the content of file input.
If it's single file input, then it just prints the entire content of it.
if it's multi files input, then it combines the contents of each file in a sequence.
And then print result to a temporary new file, which located in the same directory with the program.pl .
file1.txt:
head
a
b
end
file2.txt:
head
c
d
e
f
end
SINGLE INPUT program ioSingle.pl:
#!/usr/bin/perl
print "File name: ";
$userinput = <STDIN>; chomp ($userinput);
#read content from input file
open ("FILEINPUT", $userinput) or die ("can't open file");
#PRINT CONTENT selama ada di file tsb
while (<FILEINPUT>) {
print ; }
close FILEINPUT;
SINGLE RESULT in cmd:
>perl ioSingle.pl
File name: file1.txt
head
a
b
end
I found tutorial code that combine content from multifiles input but cannot adapt the while argument to code above:
while ($userinput = <>) {
print ($userinput);
}
I was stucked at making it work for multifiles input,
How am i suppose to reformat the code so my program could give result like this?
EXPECTED MULTIFILES RESULT in cmd:
>perl ioMulti.pl
File name: file1.txt file2.txt
head
a
b
end
head
c
d
e
f
end
i appreciate your response :)
A good way to start working on a problem like this, is to break it down into smaller sections.
Your problem seems to break down to this:
get a list of filenames
for each file in the list
display the file contents
So think about writing subroutines that do each of these tasks. You already have something like a subroutine to display the contents of the file.
sub display_file_contents {
# filename is the first (and only argument) to the sub
my $filename = shift;
# Use lexical filehandl and three-arg open
open my $filehandle, '<', $filename or die $!;
# Shorter version of your code
print while <$filehandle>;
}
The next task is to get our list of files. You already have some of that too.
sub get_list_of_files {
print 'File name(s): ';
my $files = <STDIN>;
chomp $files;
# We might have more than one filename. Need to split input.
# Assume filenames are separated by whitespace
# (Might need to revisit that assumption - filenames can contain spaces!)
my #filenames = split /\s+/, $files;
return #filenames;
}
We can then put all of that together in the main program.
#!/usr/bin/perl
use strict;
use warnings;
my #list_of_files = get_list_of_files();
foreach my $file (#list_of_files) {
display_file_contents($file);
}
By breaking the task down into smaller tasks, each one becomes easier to deal with. And you don't need to carry the complexity of the whole program in you head at one time.
p.s. But like JRFerguson says, taking the list of files as command line parameters would make this far simpler.
The easy way is to use the diamond operator <> to open and read the files specified on the command line. This would achieve your objective:
while (<>) {
chomp;
print "$_\n";
}
Thus: ioSingle.pl file1.txt file2.txt
If this is the sole objective, you can reduce this to a command line script using the -p or -n switch like:
perl -pe '1' file1.txt file2.txt
perl -ne 'print' file1.txt file2.txt
These switches create implicit loops around the -e commands. The -p switch prints $_ after every loop as if you had written:
LINE:
while (<>) {
# your code...
} continue {
print;
}
Using -n creates:
LINE:
while (<>) {
# your code...
}
Thus, -p adds an implicit print statement.

Search and replace in Perl for particular word

I have a huge file which consists of similar lines below , with different clocks:
cmd -quiet [get_ports p1] ref_clocks "cudtclk_sp cudtclk"
cmd -quiet [get_ports p2] clock "cu2xdtclk_sp cu2xdtclk"
And I need to replace cudtclk with some other name like cdtclk whenever I have ref_clocks in my file, globally.
I have written following code but it doesn't seem to be working.
#!/usr/bin/perl
use strict;
use warnings;
sub clock_change
{       # Get the subroutine's argument.
my $arg = shift;
# Hash of stuff we want to replace.
my %replace = (
"cudtclk" => "cdtclk",
);
# See if there's a replacement for the given text.
my $text = $replace{$arg};
if(defined($text)) {
return $text;
}
return $arg;
}
open PAR, "<file name>";
while(<PAR>) {
$_ =~ s/\S+\s\S+\s\S+\s\S+\sref_clocks\s+(\S+\s+\S+)/clock_change($1)/eig;
print $_;   ##print it to some file later.
}
"And I need to replace cudtclk with some other name like cdtclk"
perl -pe 's/\bcudtclk\b/cdtclk/' thefile > newfile
"whenever I have ref_clocks"
perl -pe 's/\bcudtclk\b/cdtclk/ if /\bref_clocks\b/' thefile > newfile
Alternatively:
# saves original file as file.bak
perl -i.bak -pe 's/\bcudtclk\b/cdtclk/ if /\bref_clocks\b/' file
Tighten to suit your data, as necessary.
Although the substitution seems like unnecessarily complex, you can fix it with something similar to:
$_ =~ s/(ref_clocks\s+")([^_]+)_sp(\s+)\2/
$1.clock_change($2)."_sp$3".clock_change($2)/eig;

How to perform a series of string replacements and be able to easily undo them?

I have a series of strings and their replacements separated by spaces:
a123 b312
c345 d453
I'd like to replace those strings in the left column with those in the right column, and undo the replacements later on. For the first part I could construct a sed command s/.../...;s/.../... but that doesn't consider reversing, and it requires me to significantly alter the input, which takes time. Is there a convenient way to do this?
Listed some example programs, could be anything free for win/lin.
Text editors provide "undo" functionality, but command-line utilities don't. You can write a script to do the replacement, then reverse the replacements file to do the same thing in reverse.
Here's a script that takes a series of replacements in 'replacements.txt' and runs them against the script's input:
#!/usr/bin/perl -w
use strict;
open REPL, "<replacements.txt";
my #replacements;
while (<REPL>) {
chomp;
push #replacements, [ split ];
}
close REPL;
while (<>) {
for my $r (#replacements) { s/$r->[0]/$r->[1]/g }
print;
}
If you save this file as 'repl.pl', and you save your file above as 'replacements.txt', you can use it like this:
perl repl.pl input.txt >output.txt
To convert your replacements file into a 'reverse-replacements.txt' file, you can use a simple awk command:
awk '{ print $2, $1 }' replacements.txt >reverse-replacements.txt
Then just modify the Perl script to use the reverse replacements file instead of the forward one.
use strict;
use warnings;
unless (#ARGV == 3) {
print "Usage: script.pl <reverse_changes?> <rfile> <input>\n";
exit;
}
my $reverse_changes = shift;
my $rfile = shift;
open my $fh, "<", $rfile or die $!;
my %reps = map split, <$fh>;
if ($reverse_changes) {
%reps = reverse %reps;
}
my $rx = join "|", keys %reps;
while (<>) {
s/\b($rx)\b/$reps{$1}/g;
print;
}
The word boundary checks \b surrounding the replacements will prevent partial matches, e.g. replacing a12345 with b31245. In the $rx you may wish to escape meta characters, if such can be present in your replacements.
Usage:
To perform the replacements:
script.pl 0 replace.txt input.txt > output.txt
To reverse changes:
script.pl 1 replace.txt output.txt > output2.txt

Use a perl script to parse a file then update /etc/hosts

Im working on one last perl script to update my /etc/hosts file, but am stuck and wondered if someone can help please?
I have a text file with an IP in it, and need to have my perl script read this, which iv done, but now im stuck on updating the /etc/hosts file.
here is my script so far:
#!/usr/bin/perl
use strict;
my $ip_to_update;
$ip_to_update = `cat /web_root/ip_update/ip_update.txt | awk {'print \$5'}` ;
print "ip = $ip_to_update";
I then need to find an entry in /etc/hosts like
remote.host.tld 192.168.0.20
so i know i need to parse it for remote.host.tld and then replace the second bit, but because the ip wont be the same i cant just do a straight replace.
Can anyone help with the last bit please as im stuck :(
Thankyou!
Your substitution will look like this:
s#^.*\s(remote\.host\.tld)\s*$#$ip_to_update\t$1#
Replacement can be done in one line:
perl -i -wpe "BEGIN{$ip=`awk {'print \$5'} /web_root/ip_update/ip_update.txt`} s#^.*\s(remote\.host\.tld)\s*$#$ip\t$1#"'
Ok, I updated my script to include the file edit etc all in one. Might not be the best way to do it, but it works :)
#!/usr/bin/perl
use strict;
use File::Copy;
my $ip_to_update; # IP from file
my $fh_r; # File handler for reading
my $fh_w; # File handler for writing
my $file_read = "/etc/hosts"; # File to read in
my $file_write = "/etc/hosts.new"; # File to write out
my $file_backup = "/etc/hosts.bak"; # File to copy original to
# Awks the IP from text file
$ip_to_update = `/bin/awk < /web_root/ip_update/ip_update.txt {'print \$5'}` ;
# Open File Handlers
open( $fh_r, '<', $file_read ) or die "Can't open $file_read: $!";
open( $fh_w, '>', $file_write ) or die "Can't open $file_write: $!";
while ( my $line = <$fh_r> )
{
if ( $line =~ /remote.host.tld/ )
{
#print $fh_w "# $line";
}
else
{
print $fh_w "$line";
}
}
chomp($ip_to_update); # Remove newlines
print $fh_w "$ip_to_update remote.host.tld\n";
# Prints out new line with new ip and hostname
# Close file handers
close $fh_r;
close $fh_w;
move("$file_read","$file_backup"); # Moves original file to .bak
move("$file_write","$file_read"); # Moves new file to original file loaction