Perl - reading through command output file line by line - perl

I have a log file which contains an output of several commands run on each server. The format is like
APRHY01> lt all
131119-15:41:39 10.105.219.68 10.0b stopfile=/tmp/27599
Checking MOM version...RNC_NODE_MODEL_M_1_200
Parsing MOM (cached): /home/ekisjay/moshell//jarxml/RNC_NODE_MODEL_M_1_200.xml.cache.gz Done.
.............
.
.
.
APRHY01> alt
131119-15:41:55 10.105.219.68 10.0b RNC_NODE_MODEL_M_1_200 stopfile=/tmp/27599
Connecting to 10.105.219.68:56834 (CorbaSecurity=OFF, corba_class=2, java=1.6.0_26, jacoms=R73D19, jacorb=R73D01)
Starting to retrieve active alarms
Nr of active alarms are: 3
APRHY01> strt
131119-15:41:58 10.105.219.68 10.0b RNC_NODE_MODEL_M_1_200 stopfile=/tmp/27599
Following 326 sites are up:
---------------------------------------------------------------------------------------------------------------------
MOD IUBLINK CELLNAMES CFRPHEM1 CFRPHEM2 CFRPHEM3 CFRPHEM4 CFRPHEM5 CFRPHEM6 ICDS TN ATMPORTS
---------------------------------------------------------------------------------------------------------------------
21 Iub_00023 UHYD494-X 111111 1 1 I
21 Iub_00032 UHY4100-X 111111 1 1 I
then for next server or node this repeats...
APRHY02> lt all
131119-15:44:51 10.105.219.4 10.0b stopfile=/tmp/2874
Checking MOM version...RNC_NODE_MODEL_M_1_200
Parsing MOM (cached): /home/ekisjay/moshell//jarxml/RNC_NODE_MODEL_M_1_200.xml.cache.gz Done.
Using paramfile /home/ekisjay/moshell//commonjars/pm/PARAM_RNC_M_1_50.txt
Parsing
file /home/ekisjay/moshell//commonjars/pm/PARAM_RNC_M_1_50.txt ...
I have to take few lines (according to the conditons that are said in the requirement) between every command for each node. I wrote a perl program in reading through line by line and stop at every line that matches a command like /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\> and then retrieve the required lines between and upto the next command line and write it to another file. In the loop, my program actually skips one command in between and goes for the next command (1st, 3rd, 5th kind of...). Can anyone help me?

Perhaps the following will be helpful:
use strict;
use warnings;
my ( $fileName, $fh, $i );
while (<>) {
if ( !$fileName or $fileName ne $ARGV ) {
$fileName = $ARGV;
$i = 0;
}
if ( my ($cmd) = /^([A-Z]{5}\d{2}>.+)/ ) {
$cmd =~ s/\W+/_/g;
open $fh, '>', $cmd . '_' . ( sprintf '%05d', ++$i ) . '.txt' or die $!;
}
print $fh $_;
}
Command-line usage: >perl script.pl logFile1 [logFile2 .. logFileN]
The [ ] notation indicates optional, multiple files.
The script uses a regex to capture the command/server line, then substitutes 'non-word' characters with an underscore, and this plus a count plus .txt becomes the file name to which that block of command text is written. Thus, using your dataset, the following text files were created containing command content:
APRHY01_lt_all_00001.txt
APRHY01_alt_00002.txt
APRHY01_strt_00003.txt
APRHY02_lt_all_00001.txt
APRHY02_alt_00002.txt
APRHY02_strt_00003.txt
The count was inserted just in case the same command was issued more than once to the same server, as this number insures separate files for each.

You haven't showed us the code you're using to parse the file, so it's hard to say what might be wrong with it :-)
For breaking down multi-line log output like this, a good method is to loop through the file, appending lines to a block of text, until you find the first line of the next block -- then flush the block you've been appending and create a new one, starting with the current line.
my $block = "";
while (<>) {
if (/[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/) {
write_block($block) if $block;
$block = "";
}
$block .= $_;
}
write_block($block);

Your
code:
my $srcFile = "new.log";
my $destFile = "deviations.log";
my #grabbed = {};
my $line = "";
open (my $src, "$srcFile") or die "Could not open the log file $srcFile: $!";
open (my $dest, ">>$destFile") or die "Could not open the destination file $destFile: $!";
while ($line = <$src>)
{ if ($line =~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{ push #grabbed, "Deviations of the output of command: $line\n";
while ($line = <$src>)
{if ($line !~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{push #grabbed, $line;
}
else
{last;
} } }}
print $dest "\n#grabbed";
close $dest;
close $src;
when executing last on finding a new command line, goes to the outer while ($line = <$src>), thereby already reading the next line (the first output line of the command) and failing to recognize the start of the command. A simple fix is to omit the reading of a new line by labeling the outer loop and using redo instead of last:
LINE:
while ($line = <$src>)
{ if ($line =~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{ push #grabbed, "Deviations of the output of command: $line\n";
while ($line = <$src>)
{ if ($line !~ /[A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9]\>/)
{ push #grabbed, $line }
else
{ redo LINE }
} } }

Related

How to print lines from log file which occurs after some particular time

I want to print all the lines which lets say occur after a time whose value is returned by localtime function of perl inside a perl script. I tried something like below:
my $timestamp = localtime();
open(CMD,'-|','cat xyz.log | grep -A1000 \$timestamp' || die ('Could not open');
while (defined(my $line=<CMD>)){
print $line;
}
If I replace the $timestamp in cat command with actaul time component from xyz.log then it print lines but its not printing with $timestamp variable.
Is there any alternative way I can print lines that occurs after current time in log files or how i can improve above command?
Your $timestamp is never evaluated in Perl as it appears only in single quotes. But why go out to shell in order to match a string and process a file? Perl is far better for that.
Here is a direct way first, then a basic approach. A full script is shown in the second example.
Read the file until you get to the line with the pattern, and exit the loop at that point. The next time you access that filehandle you'll be on the next line and can start printing, in another loop.
while (<$fh>) { last if /$timestamp/ }
print while <$fh>;
This prints out the part of the file starting with the line following the one which has the $timestamp anywhere in it. Adjust how exactly to match the timestamp if it is more specific.
Or -- set a flag when a line matches the timestamp, print if flag is set.
use warnings 'all';
use strict;
my $timestamp = localtime();
my $logile = 'xyz.log';
open my $fh, '<', $logfile or die "Can't open $logfile: $!";
my $mark = 0;
while (<$fh>)
{
if (not $mark) {
$mark = 1 if /$timestamp/;
}
else { print }
}
close $fh;
If you're doing the grepping in Shell anyway you might as well do it the other way round and call perl only to give you the result of localtime:
sed <xyz.log -ne"/^$(perl -E'say scalar localtime')/,\$p"
This uses sed's range addressing: first keep it from printing lines unless explicitly told so using -n, the select everything between the first occurrence of the timestamp (I added a ^ for good measure, just in case log lines could contain time stamps in plain text) and the end of file ($) and print it (p).
A pure Perl solution could look like this:
my $timestamp = localtime();
my $found;
open(my $fh, '<', 'xyz.log') or die ('Could not open xyz.log: $!');
while (<$fh>) {
if($found) {
print;
} else {
$found = 1 if /^$timestamp/;
}
}
I would suggest a Perlish approach like below:
open (my $cmd, "<", "xyz.log") or die $!;
#get all lines in an array with each index containing each line
my #log_lines = <$cmd>;
my $index = 0;
foreach my $line (#log_lines){
#write regex to capture time from line
my $rex = qr/regex_for_time_as_per_logline/is;
if ($line =~ /$rex/){
#found the line with expected time
last;
}
$index++;
}
#At this point we have got the index of array from where our expected time starts.
#So all indexes after that have desired lines, which you can write as below
foreach ($index..$#log_lines){
print $log_lines[$_];
}
If you share one of your logline, I could help with the regex.
You may also try this approach:
In this case, I tried to open /var/log/messages then convert each line timestamp to epoch and finding all the lines which has occurred after time()
use Date::Parse;
my $epoch_now = time(); # print epoch current time.
open (my $fh, "</var/log/messages") || die "error: $!\n";
while (<$fh>) {
chomp;
# one log line - looks like this
# Sep 9 08:17:01 localhost rsyslogd: rsyslogd was HUPed
my ($mon, $day, $hour, $min, $sec) = ($_ =~ /(\S+)\s*(\d+)\s*(\d+):(\d+):(\d+)/);
# date string part shouldn't be empty
if (defined($mon) && defined($day)
&& defined($hour) && defined($min)
&& defined($sec)) {
my $epoch_log = str2time("$mon $day $hour:$min:$sec");
if ($epoch_log > $epoch_now) {
print, "\n";
}
}
}

Perl script to print out all the lines containing a keyword and the line below it

I need to write a perl script to search for a keyword in a large file and then print all the lines containing the keyword plus the line below each keyword to a new file.
In the original file, there are multiple lines (the exact number varies) below each keyword-containing line. I already have a script that makes the variable number of lines to equal 1. I need this functionality to remain in the script and build upon it.
I found out that I could use grep to extract the lines, but this requires running the script I already have first and then using the grep command. I'd really need to have these functions to be combined into one.
Any help is much appreaciated!
Here is the script I have so far:
use strict;
open (FILE, $ARGV[0]) or die ("Cannot open file");
my $name;
my $sequence;
while (my $line = <FILE>) {
chomp ($line);
if (substr ($line, 0, 1) eq ">") {
if ($sequence ne "") {
printf if / ("%s\n%s\n", $name, $sequence);
}
$name = $line;
$sequence = "";
} else {
$sequence .= $line;
}
}
if ($sequence ne "") {
printf ("%s\n%s\n", $name, $sequence);
}
And an example of the original file:
sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPSEKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLDAKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHLEKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDDSFRKIYTDLGW
In this example, the keyword would be "FRG3G". The keyword is always in the same place, the characters before it vary, but the structure is the same.
If you have only 1 line to print after the keyword line, you can just remember if you found the keyword and then print the line like this:
my $matched = 0;
while (<FILE>) {
print if ($matched);
if (m/$keyword/) {
print;
matched = 1;
}
else {
matched = 0;
}
}
If you can detect the end of the lines you want to print somehow, you can adjust the code above instead of just hard-coding it to print 1 line.
Redirect to a new file as needed.

Append a new column to file in perl

I've got the follow function inside a perl script:
sub fileSize {
my $file = shift;
my $opt = shift;
open (FILE, $file) or die "Could not open file $file: $!";
$/ = ">";
my $junk = <FILE>;
my $g_size = 0;
while ( my $rec = <FILE> ) {
chomp $rec;
my ($name, #seqLines) = split /\n/, $rec;
my $sec = join('',#seqLines);
$g_size+=length($sec);
if ( $opt == 1 ) {
open TMP, ">>", "tmp" or die "Could not open chr_sizes.log: $!\n";
print TMP "$name\t", length($sec), "\n";
}
}
if ( $opt == 0 ) {
PrintLog( "file_size: $g_size", 0 );
}
else {
print TMP "file_size: $g_size\n";
close TMP;
}
$/ = "\n";
close FILE;
}
Input file format:
>one
AAAAA
>two
BBB
>three
C
I have several input files with that format. The line beginning with ">" is the same but the other lines can be of different length. The output of the function with only one file is:
one 5
two 3
three 1
I want to execute the function in a loop with this for each file:
foreach my $file ( #refs ) {
fileSize( $file, 1 );
}
When running the next iteration, let's say with this file:
>one
AAAAABB
>two
BBBVFVF
>three
CS
I'd like to obtain this output:
one 5 7
two 3 7
three 1 2
How can I modify the function or modify the script to get this? As can be seen, my function append the text to the file
Thanks!
I've left out your options and the file IO operations and have concentrated on showing a way to do this with an array of arrays from the command line. I hope it helps. I'll leave wiring it up to your own script and subroutines mostly up to to you :-)
Running this one liner against your first data file:
perl -lne ' $name = s/>//r if /^>/ ;
push #strings , [$name, length $_] if !/^>/ ;
END { print "#{$_ } " for #strings }' datafile1.txt
gives this output:
one 5
two 3
three 1
Substituting the second version or instance of the data file (i.e where record one contains AAAAABB) gives the expected results as well.
one 7
two 7
three 2
In your script above, you save to an output file in this format. So, to append columns to each row in your output file, we can just munge each of your data files in the same way (with any luck this might mean things can be converted into a function that will work in a foreach loop). If we save the transformed data to be output into an array of arrays (AoA), then we can just push the length values we get for each data file string onto the corresponding anonymous array element and then print out the array. Voilà! Now let's hope it works ;-)
You might want to install Data::Printer which can be used from the command line as -MDDP to visualize data structures.
First - run the above script and redirect the output to a file with > /tmp/output.txt
Next - try this longish one-liner that uses DDP and p to show the structure of the array we create:
perl -MDDP -lne 'BEGIN{ local #ARGV=shift;
#tmp = map { [split] } <>; p #tmp }
$name = s/>//r if /^>/ ;
push #out , [ $name, length $_ ] if !/^>/ ;
END{ p #out ; }' /tmp/output.txt datafile2.txt `
In the BEGIN block we local-ize #ARGV ; shift off the first file (our version of your TMP file) - {local #ARGV=shift} is almost a perl idiom for handling multiple input files; we then split it inside an anonymous array constructor ([]) and map { } that into the #tmp array which we display with DDP's p() function. Once we are out of the BEGIN block, the implicit while (<>){ ... } that we get with perl's -n command line switch takes over and reads in the remaining file from #ARGV ; we process lines starting with > - stripping the leading character and assigning the string that follows to the $name variable; the while continues and we push $name and the length of any line that does not start with > (if !/^>/) wrapped as elements of an anonymous array [] into the #out array which we display with p() as well (in the END{} block so it doesn't print inside our implicit while() loop). Phew!!
See the AoA that results as a gist #Github.
Finally - building on that, and now we have munged things nicely - we can change a few things in our END{...} block (add a nested for loop to push things around) and put this all together to produce the output we want.
This one liner:
perl -MDDP -lne 'BEGIN{ local #ARGV=shift; #tmp = map {[split]} <>; }
$name = s/>//r if /^>/ ; push #out, [ $name, length $_ ] if !/^>/ ;
END{ foreach $row (0..$#tmp) { push $tmp[$row] , $out[$row][-1]} ;
print "#$_" for #tmp }' output.txt datafile2.txt
produces:
one 5 7
two 3 7
three 1 2
We'll have to convert that into a script :-)
The script consists of three rather wordy subroutines that reads the log file; parses the datafile ; merges them. We run them in order. The first one checks to see if there is an existing log and creates one and then does an exit to skip any further parsing/merging steps.
You should be able to wrap them in a loop of some kind that feeds files to the subroutines from an array instead of fetching them from STDIN. One caution - I'm using IO::All because it's fun and easy!
use 5.14.0 ;
use IO::All;
my #file = io(shift)->slurp ;
my $log = "output.txt" ;
&readlog;
&parsedatafile;
&mergetolog;
####### subs #######
sub readlog {
if (! -R $log) {
print "creating first log entry\n";
my #newlog = &parsedatafile ;
open(my $fh, '>', $log) or die "I CAN HAZ WHA????" ;
print $fh "#$_ \n" for #newlog ;
exit;
}
else {
map { [split] } io($log)->slurp ;
}
}
sub parsedatafile {
my (#out, $name) ;
while (<#file>) {
chomp ;
$name = s/>//r if /^>/;
push #out, [$name, length $_] if !/^>/ ;
}
#out;
}
sub mergetolog {
my #tmp = readlog ;
my #data = parsedatafile ;
foreach my $row (0 .. $#tmp) {
push $tmp[$row], $data[$row][-1]
}
open(my $fh, '>', $log) or die "Foobar!!!" ;
print $fh "#$_ \n" for #tmp ;
}
The subroutines do all the work here - you can likely find ways to shorten; combine; improve them. Is this a useful approach for you?
I hope this explanation is clear and useful to someone - corrections and comments welcome. Probably the same thing could be done with place editing (i.e with perl -pie '...') which is left as an exercise to those that follow ...
You need to open the output file itself. First in read mode, then in write mode.
I have written a script that does what you are asking. What really matters is the part that appends new data to old data. Adapt that to your fileSize function.
So you have the output file, output.txt
Of the form,
one 5
two 3
three 1
And an array of input files, input1.txt, input2.txt, etc, saved in the #inputfiles variable.
Of the form,
>one
AAAAA
>two
BBB
>three
C
>four
DAS
and
>one
AAAAABB
>two
BBBVFVF
>three
CS
Respectively.
After running the following perl script,
# First read previous output file.
open OUT, '<', "output.txt" or die $!;
my #outlines;
while (my $line = <OUT> ) {
chomp $line;
push #outlines, $line;
}
close OUT;
my $outsize = scalar #outlines;
# Suppose you have your array of input file names already prepared
my #inputfiles = ("input1.txt", "input2.txt");
foreach my $file (#inputfiles) {
open IN, '<', $file or die $!;
my $counter = 1; # Used to compare against output size
while (my $line = <IN>) {
chomp $line;
$line =~ m/^>(.*)$/;
my $name = $1;
my $sequence = <IN>;
chomp $sequence;
my $seqsize = length($sequence);
# Here is where I append a column to output data.
if($counter <= $outsize) {
$outlines[$counter - 1] .= " $seqsize";
} else {
$outlines[$counter - 1] = "$name $seqsize";
}
$counter++;
}
close IN;
}
# Now rewrite the results to output.txt
open OUT, '>', "output.txt" or die $!;
foreach (#outlines) {
print OUT "$_\n";
}
close OUT;
You generate the output,
one 5 5 7
two 3 3 7
three 1 1 2
four 3

Read the last line of file with data in Perl

I have a text file to parse in Perl. I parse it from the start of file and get the data that is needed.
After all that is done I want to read the last line in the file with data. The problem is that the last two lines are blank. So how do I get the last line that holds any data?
If the file is relatively short, just read on from where you finished getting the data, keeping the last non-blank line:
use autodie ':io';
open(my $fh, '<', 'file_to_read.txt');
# get the data that is needed, then:
my $last_non_blank_line;
while (my $line = readline $fh) {
# choose one of the following two lines, depending what you meant
if ( $line =~ /\S/ ) { $last_non_blank_line = $line } # line isn't all whitespace
# if ( line !~ /^$/ ) { $last_non_blank_line = $line } # line has no characters before the newline
}
If the file is longer, or you may have passed the last non-blank line in your initial data gathering step, reopen it and read from the end:
my $backwards = File::ReadBackwards->new( 'file_to_read.txt' );
my $last_non_blank_line;
do {
$last_non_blank_line = $backwards->readline;
} until ! defined $last_non_blank_line || $last_non_blank_line =~ /\S/;
perl -e 'while (<>) { if ($_) {$last = $_;} } print $last;' < my_file.txt
You can use the module File::ReadBackwards in the following way:
use File::ReadBackwards ;
$bw = File::ReadBackwards->new('filepath') or
die "can't read file";
while( defined( $log_line = $bw->readline ) ) {
print $log_line ;
exit 0;
}
If they're blank, just check $log_line for a match with \n;
If the file is small, I would store it in an array and read from the end. If its large, use File::ReadBackwards module.
Here's my variant of command line perl solution:
perl -ne 'END {print $last} $last= $_ if /\S/' file.txt
No one mentioned Path::Tiny. If the file size is relativity small you can do this:
use Path::Tiny;
my $file = path($file_name);
my ($last_line) = $file->lines({count => -1});
CPAN page.
Just remember for the large file, just as #ysth said it's better to use File::ReadBackwards. The difference can be substantial.
sometimes it is more comfortable for me to run shell commands from perl code. so I'd prefer following code to resolve the case:
$result=`tail -n 1 /path/file`;

How do I use variables to do substitution in Perl?

I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:
Sa 1 #
Sa 2
U 328 #*
It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.
Thanks in advance for any leads or assistance!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");
my $key;
# Read each line one at a time
while ($key = <IN1>) {
# For each txt file in the current directory
foreach my $file (<*.txt>) {
open(IN, $file) or die("Cannot open TXT file for reading: $!");
open(OUT, ">temp.txt") or die("Cannot open output file: $!");
# Add temp modified file into directory
my $newFilename = "modified\/keyed_" . $file;
my $line;
# Read each line one at a time
while ($line = <IN>) {
$line =~ s/"\$key"/"\$key" . "\/t"/;
print(OUT "$line");
}
rename("temp.txt", "$newFilename");
}
}
EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
The regex seems quoted rather oddly to me. Wouldn't
$line =~ s/$key/$key\t/;
work better?
Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.
And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.
if Perl is not a must, you can use this awk one liner
$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*
$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.
Then for all files named on the command line, available in the special Perl array #ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.
Note that we edit the files in-place thanks to Perl's handy -i option:
-i[extension]
specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …
The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.
#! /usr/bin/perl -i.bak
use warnings;
use strict;
sub usage { "Usage: $0 text-file\n" }
sub read_keys {
my $path = "keys.text";
open my $fh, "<", $path
or die "$0: open $path: $!";
my %key;
while (<$fh>) {
my($text,$num) = split;
++$key{$text}{$num} if defined $text && defined $num;
}
wantarray ? %key : \%key;
}
die usage unless #ARGV;
my %key = read_keys;
while (<>) {
my($text,$num,$line) = split " ", $_, 3;
$_ = "$text $num\t$line" if defined $text &&
defined $num &&
$key{$text}{$num};
print;
}
Sample run:
$ ./add-tab input
$ diff -u input.bak input
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ input 2010-07-20 21:00:21.119531937 -0500
## -1,3 +1,3 ##
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1 # Random line of text follows.
+Sa 2 This text is just as random.
+U 328 #* Continuing text...
Fun answers:
$line =~ s/(?<=$key)/\t/;
Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.
And:
$line =~ s/$key/$key . "\t"/e;
Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.
Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)
How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.
use strict;
use warnings;
my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";
my %hash = ();
my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';
open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
my $line = $_;
if ($line =~ /$keys_regex/){
my $key = $1;
my $number = $2;
my $symbol = $3;
$hash{$key}{'number'} = $number;
$hash{$key}{'symbol'} = $symbol;
}
}
close $fh;
open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
my $line = $_;
if ($line =~ /^([a-zA-Z]+)/){
my $key = $1;
// strip content_file line from keys/number/symbols to leave text
line =~ s/^$key//;
line =~ s/\s*$hash{$key}{'number'}//;
line =~ s/\s*$hash{$key}{'symbol'}//;
$line =~ s/^\s+//g;
$hash{$key}{'text'} = $line;
}
}
close $fh;
open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;
I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.
This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.
Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.
#! /usr/bin/env perl
use strict;
use vars;
use warnings;
open (KEYS, "keys.text")
or die "Cannot open 'keys.text' for reading\n";
my #keys = <KEYS>;
close (KEYS);
foreach my $file (glob("*.txt")) {
open (TEXT, "$file")
or die "Cannot open '$file' for reading\n";
my #textArray = <TEXT>;
close (TEXT);
foreach my $line (#keys) {
chomp $line;
map($_ =~ s/^$line/$line\t/, #textArray);
}
open (NEW_TEXT, ">$file.new") or
die qq(Can't open file "$file" for writing\n);
print TEXT join("\n", #textArray) . "\n";
close (TEXT);
}