perl + append text between two lines in file - perl

I need to edit file , the main issue is to append text between two known lines in the file
for example I need to append the following text
a b c d e f
1 2 3 4 5 6
bla bla
Between the first_line and the second_line
first_line=")"
second_line="NIC Hr_Nic ("
remark: first_line and second_line argument can get any line or string
How to do this by perl ? ( i write bash script and I need to insert the perl syntax in my script)
lidia

You could read the file in as a single string and then use a regular expression to do the search and replace:
use strict;
use warnings;
# Slurp file myfile.txt into a single string
open(FILE,"myfile.txt") || die "Can't open file: $!";
undef $/;
my $file = <FILE>;
# Set strings to find and insert
my $first_line = ")";
my $second_line = "NIC Hr_Nic (";
my $insert = "hello world";
# Insert our text
$file =~ s/\Q$first_line\E\n\Q$second_line\E/$first_line\n$insert\n$second_line/;
# Write output to output.txt
open(OUTPUT,">output.txt") || die "Can't open file: $!";
print OUTPUT $file;
close(OUTPUT);
By unsetting $/ we put Perl into "slurp mode" and so can easily read the whole file into $file.
We use the s/// operator to do a search and replace using our two search lines as a pattern.
The \Q and \E tell Perl to escape the strings between them, i.e. to ignore any special characters that happen to be in $first_line or $second_line.
You could always write the output over the original file if desired.
The problem as you state it is not solvable using the -i command line option since this option processes a file one line at a time; to insert text between two specific lines you'll need to know about two lines at once.

Well to concenate strings you do
my $text = $first_line . $second_line;
or
my $text = $first_line;
$text .= $second_line;

I'm not sure if I understand your question correctly. A "before and after" example of the file content would, I think, be easier. Anyhow, Here's my take on it, using splice instead of a regular expression. We must of course know the line numbers for this to work.
Load the file into an array:
my #lines;
open F, '<', 'filename' or die $!;
push #lines, $_ for <F>;
close F;
Insert the stuff (see perldoc -f splice):
splice #lines, 1, 0, ('stuff');
..and you're done. All you need to do now is save the array again:
open F, '>', 'filename' or die $!;
print F #lines;
close F;

Related

Search for multiple terms in perl

I have a file with more than hundred single column entries. I need to search for each of these entries into a file of multiple column and more than thousand entries and need a output file. I tried these codes:
#!/usr/bin/perl -w
use strict;
use warnings;
print "Enter the input file name:";
my $inputfile = <STDIN>;
chomp($inputfile);
print "\nEnter the search file name:";
my $searchfile=<STDIN>;
chomp($searchfile);
open (INPUTFILE, $inputfile) || die;
open (SEARCHFILE, $searchfile) || die;
open (OUT, ">write.txt") || die;
while (my $line=<SEARCHFILE>){
while (<INPUTFILE>) {
if (/$line/){
print OUT $_;
}
}
}
close (INPUTFILE) || die;
close (SEARCHFILE) || die;
close (OUT) || die;
The output file has only one line. It has searched the term from the search file into input file, but only for the first term, not for all. Please help!
When you read INPUTFILE in the inner loop, it's read to the end during the first round of SEARCHFILE. Because it's not reset, the filehandle is used up and will always return eof.
If there are hundreds of lines, but not several 100,000 you can easily read it into an array first and then use that for the lookup. The fact that it's single-column makes that very easy. Note that this is less efficient then the alternative solution below.
chomp( my #needles = <SEARCHFILE> );
while (<INPUTFILE>) {
foreach my $needle (#needles) {
print OUT $_ if m/\Q$needle\E/; # \Q end \E quote regex meta chars
}
}
Alternatively you can also build one large lookup regex that matches all the strings in one go. That is probably faster than iterating the array for each line.
# open ...
chomp( my #needles = <SEARCHFILE> );
my $lookup = join '|', map quotemeta, #needles;
my $lookup_regex = qr/$lookup/; # possibly with /i?
while (my $line = <INPUTFILE>) {
print OUT $line if $line =~ $lookup_regex;
}
The quotemeta takes care of strings that contain regex meta characters like / or | or even .. It's the same as using \Q and \E as above.
Please also use three-argument-open and named filehandles.
open my $fh_searchfile, '<', $searchfile or die $!;
open my $fh_inputfile, '<', $inputfile or die $!;
open my $fh_out, '>', 'write.txt' or die $!;
chomp( my #needles = <$fh_searchfile> );
# ...
The three-argument-open is important because you are taking user input and using it as the filename directly. A malicious user could enter something like | rm -rf *, which would open a pipe to a delete all my files without asking program. Oops. But if you specify the '<' read open method explicitly in its own parameter, the method characters are ignored in the third param.
The lexical filehandle $fh is, as the name says, lexical, while INPUTFILE is a GLOB, which makes it global. That's not so bad if you only have this one script and no modules, but as soon as you deal with different packages it becomes problematic because those are super-global and every part of the program sees them. That can lead to name collisions and weird stuff happening.

Need to replace value from one file to another file using perl

I am writing a program using perl which read a value from one file and replace this value in other file. Program runs successfully, but value didn't get replaced. Please suggest me where is the error.
use strict;
use warnings;
open(file1,"address0.txt") or die "Cannot open file.\n";
my $value;
$value=<file1>;
system("perl -p -i.bak -e 's/add/$value/ig' rough.sp");
Here the value which I want to replace exists in address0.txt file. It is a single value 1. I want to place this value in place of add in other file rough.sp.
My rough.sp looks like
Vdd 1 0 add
My address0.txt looks like
1
So output should be like
Vdd 1 0 1
Please help me out. Thanks in advance
Assuming that there is a 1:1 relationship between lines in adress0.txt and rough.sp, you can proceed like this:
use strict;
use warnings;
my ($curline_1,$curline_2);
open(file1, "address0.txt") or die "Cannot open file.\n";
open(file2, "rough.sp") or die "Cannot open file.\n";
open(file3, ">out.sp") or die "Cannot open file.\n";
while (<file1>) {
$curline_1 = $_;
chomp($curline_1);
$curline_2 = <file2>;
$curline_2 =~ s/ add/ $curline_1/;
print file3 $curline_2;
}
close(file1);
close(file2);
close(file3);
exit(0);
Explanation:
The code iterates through the lines of your input files in parallel. Note that the lines read include the line terminator. Line contents from the 'address' file are taken as replacement values fpr the add literal in your .sp file. Line terminators from the 'address' file are eliminated to avoid introducing additional newlines.
Addendum:
An extension for multi-replacements might look like this:
$curline_1 = $_;
chomp($curline_1);
my #parts = split(/ +/, $curline_1); # splits the line from address0.txt into an array of strings made up of contiguous non-whitespace chars
$curline_2 = <file2>;
$curline_2 =~ s/ add/ $parts[0]/;
$curline_2 =~ s/ sub/ $parts[1]/;
# ...

Reading Input File and Putting It Into An Array with Space Delimiter in Perl

I am trying to take an input file with #ARGV array and write it's all elements to an Array with delimiter space in perl.
sample input parameter is a txt file for example:
0145 2145
4578 47896
45 78841
1249 24873
(there are multiple of lines but here it is not shown)
the problem is:
I do not know how to take for example ARG[0] to an array
I want to take every string of that inputfile as single strings in other words the line1 0145 2145 will not be a string it will be two distinct string by delimiting with space.
I think this is what you want. The #resultarray in this code will end up holding a list of digits.
If you're giving your program a file name to use as input from the command line, take it off the ARGV array and open it as a filehandle:
my $filename = $ARGV[0];
open(my $filehandle, '<', $filename) or die "Could not open $filename\n";
Once you have the filehandle you can loop over each line of the file using a while loop. chomp takes that new line character off the end of each line. Use split to split each line up based on whitespace. This returns an array (#linearray in my code) containing a list of the numbers within that line. I then push my line array on to the end of my #resultarray.
my #resultarray;
while(my $line = <$filehandle>){
chomp $line;
my #linearray = split(" ", $line);
push(#resultarray, #linearray);
}
And remember to add
use warnings;
use strict;
at the top of your perl file to help you if you get any problems in your code.
Just to clarify how you can deal with different inputs to your program. The following command:
perlfile.pl < inputfile.txt
Will take the contents of inputfile.txt and pipe it to the STDIN filehandle. You can then use this filehandle to access the contents of inputfile.txt
while(my $line = <STDIN>){
# do something with this $line
}
But you can also give your program a number of file names to read by placing them after the execution command:
perlfile.pl inputfile1.txt inputfile2.txt
These file names will be read as strings and placed into the #ARGV array so that the array will look like this:
#ARGV:
[0] => "inputfile1.txt"
[1] => "inputfile2.txt"
Since these are just the names of files, you need to open the file in perl before you access the file's contents. So for inputfile1.txt:
my $filename1 = shift(#ARGV);
open(my $filehandle1, '<', $filename1) or die "Can't open $filename1";
while(my $line = <$filehandle1>){
# do something with this line
}
Note how I've used shift this time to get the next element within the array #ARGV. See the perldoc for more details on shift. And also more information on open.

What can be wrong with word count program?

I've got a question in my test:
What is wrong with program that counts number of lines and words in file?
open F, $ARGV[0] || die $!;
my #lines = <F>;
my #words = map {split /\s/} #lines;
printf "%8d %8d\n", scalar(#lines), scalar(#words);
close(F);
My conjectures are:
If file does not exist, program won't tell us about that.
If there are punctuation signs in file, program will count them, for example, in
abc cba
, , ,dce
will be five word, but on the other hand wc outputs the same result, so it might be considered as correct behavior.
If F is a large file, it might be better to iterate over lines and not to dump it into lines array.
Do you have any less trivial ideas?
On the first line, you have a precedence problem:
open F, $ARGV[0] || die $!;
is the same as
open F, ($ARGV[0] || die $!);
which means the die is executed if the filename is false, not if the open fails. You wanted to say
open(F, $ARGV[0]) || die $!;
or
open F, $ARGV[0] or die $!;
Also, you should be using the 3 argument form of open, in case $ARGV[0] contains characters that mean something to open.
open F, '<', $ARGV[0] or die $!;
On a different note, splitting on /\s/ means that you get a "word" between consecutive whitespace characters. You probably meant /\s+/, or as amphetamachine suggested, /\W+/, depending on how you want to define a "word".
That still leaves the problem of the empty "word" you get if the line begins with whitespace. You could split on ' ' to suppress that (it's a special case), or you could trim leading whitespace first, or insert a grep { length $_ } to weed out empty "words", or abandon split and use a different method for counting words.
Processing line by line instead of reading the whole file at once would also be a good improvement, but it's not as important as those first two items.
Your conjecture #1 is incorrect: your program will die if the open fails. (see cjm's answer re order of operations.)
you're using a global filehandle, rather than a lexical variable.
you're not using the three-argument form of open.
you could just read from stdin, which gives more flexibility as to input - the user can provide a file, or pipe the input into stdin.
lastly, I wouldn't write my own code to parse words; I'd reach for CPAN, say something like Lingua::EN::Splitter.
use strict; use warnings;
use Lingua::EN::Splitter qw(words);
my ($wordcount, $lines);
while (<>)
{
my $line = $_;
$lines++;
$wordcount += scalar(words $line);
}
printf "%8d %8d\n", $lines, $wordcount;
When you open F, $ARGV[0] || die $! that will effectively exit if the file doesn't exist.
There are some improvements to be made here:
{local $/; $lines = <F>;} # read all lines at once
my #words = split /\W+/, $lines;

How do I use variables to do substitution in Perl?

I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:
Sa 1 #
Sa 2
U 328 #*
It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.
Thanks in advance for any leads or assistance!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");
my $key;
# Read each line one at a time
while ($key = <IN1>) {
# For each txt file in the current directory
foreach my $file (<*.txt>) {
open(IN, $file) or die("Cannot open TXT file for reading: $!");
open(OUT, ">temp.txt") or die("Cannot open output file: $!");
# Add temp modified file into directory
my $newFilename = "modified\/keyed_" . $file;
my $line;
# Read each line one at a time
while ($line = <IN>) {
$line =~ s/"\$key"/"\$key" . "\/t"/;
print(OUT "$line");
}
rename("temp.txt", "$newFilename");
}
}
EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
The regex seems quoted rather oddly to me. Wouldn't
$line =~ s/$key/$key\t/;
work better?
Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.
And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.
if Perl is not a must, you can use this awk one liner
$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*
$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.
Then for all files named on the command line, available in the special Perl array #ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.
Note that we edit the files in-place thanks to Perl's handy -i option:
-i[extension]
specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …
The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.
#! /usr/bin/perl -i.bak
use warnings;
use strict;
sub usage { "Usage: $0 text-file\n" }
sub read_keys {
my $path = "keys.text";
open my $fh, "<", $path
or die "$0: open $path: $!";
my %key;
while (<$fh>) {
my($text,$num) = split;
++$key{$text}{$num} if defined $text && defined $num;
}
wantarray ? %key : \%key;
}
die usage unless #ARGV;
my %key = read_keys;
while (<>) {
my($text,$num,$line) = split " ", $_, 3;
$_ = "$text $num\t$line" if defined $text &&
defined $num &&
$key{$text}{$num};
print;
}
Sample run:
$ ./add-tab input
$ diff -u input.bak input
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ input 2010-07-20 21:00:21.119531937 -0500
## -1,3 +1,3 ##
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1 # Random line of text follows.
+Sa 2 This text is just as random.
+U 328 #* Continuing text...
Fun answers:
$line =~ s/(?<=$key)/\t/;
Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.
And:
$line =~ s/$key/$key . "\t"/e;
Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.
Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)
How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.
use strict;
use warnings;
my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";
my %hash = ();
my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';
open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
my $line = $_;
if ($line =~ /$keys_regex/){
my $key = $1;
my $number = $2;
my $symbol = $3;
$hash{$key}{'number'} = $number;
$hash{$key}{'symbol'} = $symbol;
}
}
close $fh;
open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
my $line = $_;
if ($line =~ /^([a-zA-Z]+)/){
my $key = $1;
// strip content_file line from keys/number/symbols to leave text
line =~ s/^$key//;
line =~ s/\s*$hash{$key}{'number'}//;
line =~ s/\s*$hash{$key}{'symbol'}//;
$line =~ s/^\s+//g;
$hash{$key}{'text'} = $line;
}
}
close $fh;
open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;
I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.
This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.
Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.
#! /usr/bin/env perl
use strict;
use vars;
use warnings;
open (KEYS, "keys.text")
or die "Cannot open 'keys.text' for reading\n";
my #keys = <KEYS>;
close (KEYS);
foreach my $file (glob("*.txt")) {
open (TEXT, "$file")
or die "Cannot open '$file' for reading\n";
my #textArray = <TEXT>;
close (TEXT);
foreach my $line (#keys) {
chomp $line;
map($_ =~ s/^$line/$line\t/, #textArray);
}
open (NEW_TEXT, ">$file.new") or
die qq(Can't open file "$file" for writing\n);
print TEXT join("\n", #textArray) . "\n";
close (TEXT);
}