The bit of code below is used in a script that queries a database and produces an output file of records. One problem that I am having is that a newline character seems to be inserted at the very end of the file and it is causing grief with another script I am working on to import the data into another database. I thought this would be as easy as a chomp on the filehandle but that is not allowed. I have read many ways to do this on the net but not sure the path to take. Do you guys see a way to do this?
while ($queryResults->MoveNext() == $CQPerlExt::CQ_SUCCESS) {
$swCR = $session->GetEntityByDbId("SWCR", $queryResults->GetColumnValue(1));
# Gather data
$swID = $swCR->GetFieldValue("RecordID")->GetValue();
$swData = "<RecordID>" . $swID . "</RecordID>";
foreach $fieldName (#fieldNames)
{
$swData = $swData . "<" . $fieldName . ">" . $swCR->GetFieldStringValue($fieldName) . "</" . $fieldName . ">";
}
# Build file with records seperated by custom line delimiter
print OUTFILE $swData . "~~lineDelimiter~~\n";
}
close(OUTFILE);
You could output the newline at the beginning of each line instead of the end, and skip the first one.
my $count = 0;
my #stuff = qw(a b c d);
while (my $letter = shift #stuff) {
print "\n" if $count++;
print $letter;
}
print "__test";
This will output:
a
b
c
d__test
Since you have complete control over the output, I would attack it from that angle by never emitting the final newline in the first place. However, since this is Perl we're talking about, there's more than one way to do it. ;-)
I'm using truncate here to truncate the file to 1 byte less than whatever it was before (-s returns the file size in bytes).
my $file = 'test.txt';
open(my $fh, '>', $file) or die $!;
print $fh "$_\n" for 1..10;
close($fh);
truncate($file, (-s $file) - 1);
Output:
$ od -c test.txt
0000000 1 \n 2 \n 3 \n 4 \n 5 \n 6 \n 7 \n 8 \n
0000020 9 \n 1 0
0000024
Related
dummy.pepmasses
YCL049C 1 511.2465 0 0 MFSK
YCL049C 2 4422.3098 0 0 YLVTASSLFVA
YCL049C 3 1131.5600 0 0 DFYQVSFVK
YCL049C 4 1911.0213 0 0 SIAPAIVNSSVIFHDVSR
YCL049C 5 774.4059 0 0 GVAMGNVK
YCL049C 6 261.1437 0 0 SR
my $dummyfile = "dummy.pepmasses"; #filename defined here
my #mzco = ();
open (IFILE, $dummyfile) or die "unable to open file $dummyfile\n ";
while (my $line = $dummyfile){
#read each line in file
chomp $line;
my $mz_value = (split/\s+/,$line)[3]; #pick column 3rd at every line
$mz_value = join "\n"; # add "\n" for data
push (#mzco,$mz_value); #add them all in one array #mzco
}
print "#mzco";
close IFILE;
There should be better way to express this one. How can it be ?
I want to pick up the third column and push it into an array. Are there better methods?
I'll just go through your code and comment
open (IFILE, $dummyfile) or die "unable to open file $dummyfile\n ";
You should use 3-argument open with explicit mode, and a lexical file handle. Also, you should not include newline in the die message unless you want to suppress line number. You should also include the error, $!.
open my $fh, "<", $dummyfile or die "Unable to open $dummyfile: $!";
while (my $line = $dummyfile){
#read each line in file
No, this just copies the file name. To read from the file handle, do this:
while (my $line = <IFILE>) {
Or <$fh> if you use a lexical file handle.
chomp $line;
my $mz_value = (split/\s+/,$line)[3]; #pick column 3rd at every line
This is actually the 4th column, since indexes start at zero 0.
$mz_value = join "\n"; # add "\n" for data
join does not work that way. It is join EXPR, LIST to join a list of values into a string. You want the concatenation operator .:
$mz_value = $mz_value . "\n";
Or more appropriately:
$mz_value .= "\n";
But why do it that way? It is simpler to just add the newline when you print.
print "#mzco";
You can do this:
print "$_\n" for #mzco;
Or if you are feeling daring:
use feature 'say';
say for #mzco;
And just to show you the power of Perl, this program can be reduced to a one-liner, using a lot of built-in features:
perl -lane ' print $F[3] ' dummy.pepmasses
-l chomp lines, add newline (by default) to print
-n put while (<>) loop around code: read input file or stdin
-a autosplit each line into #F.
The program as a file would look like this:
$\ = $/; # set output record separator to input record separator
while (<>) {
chomp;
my #F = split;
print $F[3];
}
I've got the follow function inside a perl script:
sub fileSize {
my $file = shift;
my $opt = shift;
open (FILE, $file) or die "Could not open file $file: $!";
$/ = ">";
my $junk = <FILE>;
my $g_size = 0;
while ( my $rec = <FILE> ) {
chomp $rec;
my ($name, #seqLines) = split /\n/, $rec;
my $sec = join('',#seqLines);
$g_size+=length($sec);
if ( $opt == 1 ) {
open TMP, ">>", "tmp" or die "Could not open chr_sizes.log: $!\n";
print TMP "$name\t", length($sec), "\n";
}
}
if ( $opt == 0 ) {
PrintLog( "file_size: $g_size", 0 );
}
else {
print TMP "file_size: $g_size\n";
close TMP;
}
$/ = "\n";
close FILE;
}
Input file format:
>one
AAAAA
>two
BBB
>three
C
I have several input files with that format. The line beginning with ">" is the same but the other lines can be of different length. The output of the function with only one file is:
one 5
two 3
three 1
I want to execute the function in a loop with this for each file:
foreach my $file ( #refs ) {
fileSize( $file, 1 );
}
When running the next iteration, let's say with this file:
>one
AAAAABB
>two
BBBVFVF
>three
CS
I'd like to obtain this output:
one 5 7
two 3 7
three 1 2
How can I modify the function or modify the script to get this? As can be seen, my function append the text to the file
Thanks!
I've left out your options and the file IO operations and have concentrated on showing a way to do this with an array of arrays from the command line. I hope it helps. I'll leave wiring it up to your own script and subroutines mostly up to to you :-)
Running this one liner against your first data file:
perl -lne ' $name = s/>//r if /^>/ ;
push #strings , [$name, length $_] if !/^>/ ;
END { print "#{$_ } " for #strings }' datafile1.txt
gives this output:
one 5
two 3
three 1
Substituting the second version or instance of the data file (i.e where record one contains AAAAABB) gives the expected results as well.
one 7
two 7
three 2
In your script above, you save to an output file in this format. So, to append columns to each row in your output file, we can just munge each of your data files in the same way (with any luck this might mean things can be converted into a function that will work in a foreach loop). If we save the transformed data to be output into an array of arrays (AoA), then we can just push the length values we get for each data file string onto the corresponding anonymous array element and then print out the array. Voilà! Now let's hope it works ;-)
You might want to install Data::Printer which can be used from the command line as -MDDP to visualize data structures.
First - run the above script and redirect the output to a file with > /tmp/output.txt
Next - try this longish one-liner that uses DDP and p to show the structure of the array we create:
perl -MDDP -lne 'BEGIN{ local #ARGV=shift;
#tmp = map { [split] } <>; p #tmp }
$name = s/>//r if /^>/ ;
push #out , [ $name, length $_ ] if !/^>/ ;
END{ p #out ; }' /tmp/output.txt datafile2.txt `
In the BEGIN block we local-ize #ARGV ; shift off the first file (our version of your TMP file) - {local #ARGV=shift} is almost a perl idiom for handling multiple input files; we then split it inside an anonymous array constructor ([]) and map { } that into the #tmp array which we display with DDP's p() function. Once we are out of the BEGIN block, the implicit while (<>){ ... } that we get with perl's -n command line switch takes over and reads in the remaining file from #ARGV ; we process lines starting with > - stripping the leading character and assigning the string that follows to the $name variable; the while continues and we push $name and the length of any line that does not start with > (if !/^>/) wrapped as elements of an anonymous array [] into the #out array which we display with p() as well (in the END{} block so it doesn't print inside our implicit while() loop). Phew!!
See the AoA that results as a gist #Github.
Finally - building on that, and now we have munged things nicely - we can change a few things in our END{...} block (add a nested for loop to push things around) and put this all together to produce the output we want.
This one liner:
perl -MDDP -lne 'BEGIN{ local #ARGV=shift; #tmp = map {[split]} <>; }
$name = s/>//r if /^>/ ; push #out, [ $name, length $_ ] if !/^>/ ;
END{ foreach $row (0..$#tmp) { push $tmp[$row] , $out[$row][-1]} ;
print "#$_" for #tmp }' output.txt datafile2.txt
produces:
one 5 7
two 3 7
three 1 2
We'll have to convert that into a script :-)
The script consists of three rather wordy subroutines that reads the log file; parses the datafile ; merges them. We run them in order. The first one checks to see if there is an existing log and creates one and then does an exit to skip any further parsing/merging steps.
You should be able to wrap them in a loop of some kind that feeds files to the subroutines from an array instead of fetching them from STDIN. One caution - I'm using IO::All because it's fun and easy!
use 5.14.0 ;
use IO::All;
my #file = io(shift)->slurp ;
my $log = "output.txt" ;
&readlog;
&parsedatafile;
&mergetolog;
####### subs #######
sub readlog {
if (! -R $log) {
print "creating first log entry\n";
my #newlog = &parsedatafile ;
open(my $fh, '>', $log) or die "I CAN HAZ WHA????" ;
print $fh "#$_ \n" for #newlog ;
exit;
}
else {
map { [split] } io($log)->slurp ;
}
}
sub parsedatafile {
my (#out, $name) ;
while (<#file>) {
chomp ;
$name = s/>//r if /^>/;
push #out, [$name, length $_] if !/^>/ ;
}
#out;
}
sub mergetolog {
my #tmp = readlog ;
my #data = parsedatafile ;
foreach my $row (0 .. $#tmp) {
push $tmp[$row], $data[$row][-1]
}
open(my $fh, '>', $log) or die "Foobar!!!" ;
print $fh "#$_ \n" for #tmp ;
}
The subroutines do all the work here - you can likely find ways to shorten; combine; improve them. Is this a useful approach for you?
I hope this explanation is clear and useful to someone - corrections and comments welcome. Probably the same thing could be done with place editing (i.e with perl -pie '...') which is left as an exercise to those that follow ...
You need to open the output file itself. First in read mode, then in write mode.
I have written a script that does what you are asking. What really matters is the part that appends new data to old data. Adapt that to your fileSize function.
So you have the output file, output.txt
Of the form,
one 5
two 3
three 1
And an array of input files, input1.txt, input2.txt, etc, saved in the #inputfiles variable.
Of the form,
>one
AAAAA
>two
BBB
>three
C
>four
DAS
and
>one
AAAAABB
>two
BBBVFVF
>three
CS
Respectively.
After running the following perl script,
# First read previous output file.
open OUT, '<', "output.txt" or die $!;
my #outlines;
while (my $line = <OUT> ) {
chomp $line;
push #outlines, $line;
}
close OUT;
my $outsize = scalar #outlines;
# Suppose you have your array of input file names already prepared
my #inputfiles = ("input1.txt", "input2.txt");
foreach my $file (#inputfiles) {
open IN, '<', $file or die $!;
my $counter = 1; # Used to compare against output size
while (my $line = <IN>) {
chomp $line;
$line =~ m/^>(.*)$/;
my $name = $1;
my $sequence = <IN>;
chomp $sequence;
my $seqsize = length($sequence);
# Here is where I append a column to output data.
if($counter <= $outsize) {
$outlines[$counter - 1] .= " $seqsize";
} else {
$outlines[$counter - 1] = "$name $seqsize";
}
$counter++;
}
close IN;
}
# Now rewrite the results to output.txt
open OUT, '>', "output.txt" or die $!;
foreach (#outlines) {
print OUT "$_\n";
}
close OUT;
You generate the output,
one 5 5 7
two 3 3 7
three 1 1 2
four 3
I was stuck when combining the BLAST command into perl script. The problem is that the command line paused when the PART II begin.
PART I is used to crop the fasta sequence.
PART II is used to do BLAST with the file generated by PART I.
Both the two parts can run well individually, but met the "pause" problem when combining together.
I guess it was because the $ARGV[1] and $ARGV[3] generated by part I cannot be used in part II. I dont know how to fix, though I tried a lot.
Thanks!
#! /usr/bin/perl -w
use strict;
#### PART I
die "usage:4files fasta1 out1 fasta2 out2\n" unless #ARGV==4;
open (S, "$ARGV[0]") || die "cannot open FASTA file to read: $!";
open OUT,">$ARGV[1]" || die "no out\n";
open (S2, "$ARGV[2]") || die "cannot open FASTA file to read: $!";
open OUT2,">$ARGV[3]" || die "no out2\n";
my %s;# a hash of arrays, to hold each line of sequence
my %seq; #a hash to hold the AA sequences.
my $key;
print "how long is the N-terminal(give number,e.g. 30. whole length input \"0\") \n";
chomp(my $nl=<STDIN>);
##delete "\n" for seq.
local $/ = ">";
<S>;
while (<S>){ #Read the FASTA file.
chomp;
my #line=split/\n/;
print OUT ">",$line[0],"\n";
splice #line,0,1;
#print OUT join ("",#line),"\n";
##line = join("",#line);
#print #line,"\n";
if ($nl == 0){ #whole length
my $seq=join("",#line);
my #amac = split(//,$seq);
splice #amac,0,1; # delete first "MM"
#push #{$s{$key}},#amac;
print OUT #amac,"\n";
}
else { # extract inital aa by number ##Guanhua
my $seq=join("",#line);
#print $seq,"\n";
my #amac = split(//,$seq);
splice #amac,0,1; # delete first "MM"
splice #amac,$nl; ##delete from the N to end
#print #amac,"\n";
#push (#{$s{$key}}, #amac);
print OUT #amac,"\n";
}
}
<S2>;
while (<S2>){ #Read the FASTA file.
chomp;
my #line=split/\n/;
print OUT2 ">",$line[0],"\n";
splice #line,0,1;
#print OUT join ("",#line),"\n";
##line = join("",#line);
#print #line,"\n";
if ($nl == 0){ #whole length
my $seq=join("",#line);
my #amac = split(//,$seq);
splice #amac,0,1; # delete first "MM"
#push #{$s{$key}},#amac;
print OUT2 #amac,"\n";
}
else { # extract inital aa by number ##Guanhua
my $seq=join("",#line);
#print $seq,"\n";
my #amac = split(//,$seq);
splice #amac,0,1; # delete first "MM"
splice #amac,$nl; ##delete from the N to end
#print #amac,"\n";
#push (#{$s{$key}}, #amac);
print OUT2 #amac,"\n";
}
}
##### PART II
print "nucl or prot?\n";
chomp(my $tp = <STDIN>);
system ("makeblastdb -in $ARGV[1] -dbtype prot");
system ("makeblastdb -in $ARGV[3] -dbtype $tp");
print "blast type? (blastp,blastn)\n";
chomp(my $cmd = <STDIN>);
system ("blastp -query $ARGV[1] -db $ARGV[3] -outfmt 6 -evalue 1e-3 -out 12.out ");
system ("$cmd -db $ARGV[1] -query $ARGV[3] -outfmt 6 -evalue 1e-3 -out 21.out ");
You changed the way perl reads from 'STDIN' when you set '$/' in this line:
local $/ = ">";
The easiest way to fix this is to add a left bracket right before that line and a right bracket just before the '##### PART II' comment:
{
local $/ = ">";
...
...
}
##### PART II
(I think theoretically, you could put a ">" at the end of the text you input, but that seems strange, so I wouldn't do it)
That will fix your problem. But something that should be addressed is some of the style choices you made. The two big chunks of code in the middle are both identical as far as I can tell and should probably be put into a subroutine and then called twice. This will eliminate duplication and is less error prone.
You should also use the three argument open call to open files.
I have a Perl question. I have a file each line of this file contains a different number of As Ts Gs and Cs
The file looks like below
ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA
I would like to add line number for each line
Then insert a \n every 6 characters and then on each of the new rows created put an
Empty space every 3 characters
Example of the output should be
Line NO 1
ATC GCT
GAS TGA
TGC TG
Line NO 2
GCC TAG
CCC TTA
GC
I have come up with the code below:
my $count = 0;
my $line;
my $row;
my $split;
open(F, "Data.txt") or die "Can't read file: $!";
open (FH, " > UpDatedData.txt") or die "Can't write new file: $!";
while (my $line = <F>) {
$count ++ ;
$row = join ("\n", ( $line =~ /.{1,6}/gs));
$split = join ("\t", ( $row =~ /.{3}/gs ));
print FH "Line NO\t$count\n$split\n";
}
close F;
close FH;
However
It gives the following out put
Line NO 1
ATC GCT
GA STG A
T GCT G
Line NO 2
GCC TAG
CC CTT A
G C
This must have something with the \n being counted as a character in this line of code
$split = join ("\t", ( $row =~ /.{3}/gs ));
Any one got any idea how to get around this problem?
Any help would be greatly appreciated.
Thanks in advance
Sinead
This should solve your problem:
use strict;
use warnings;
while (<DATA>) {
s/(.{3})(.{0,3})?/$1 $2 /g;
s/(.{7}) /$1\n/g;
printf "Line NO %d\n%s\n", $., $_;
}
__DATA__
ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA
This is a one-liner:
perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g' data.txt
The regex looks for 3 characters (does not match newline), followed by 0-3 characters and captures both of those, then inserts a space between them and newline after.
To keep track of the line numbers, you can add
s/^/Line NO $.\n/;
Which will enumerate based on input line number. If you prefer, you can keep a simple counter, such as ++$i.
-l option will handle newlines for you.
You can also do it in two stages, like so:
perl -plwe's/.{6}\K/\n/g; s/^.{3}\K/ /gm;'
Using the \K (keep) escape sequence here to keep the matched part of the string, and then simply inserting a newline after 6 characters, and then a space 3 characters after "line beginnings", which with the /m modifier also includes newlines.
So, in short:
perl -plwe 's/.{6}\K/\n/g; s/^.{3}\K/ /gm; s/^/Line NO $.\n/;' data.txt
perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g; s/^/Line NO $.\n/;' data.txt
Another solution. Note that it uses lexical filehandles and three argument form of open.
#!/usr/bin/perl
use warnings;
use strict;
open my $IN, '<', 'Data.txt' or die "Can't read file: $!";
open my $OUT, '>', 'UpDatedData.txt' or die "Can't write new file: $!";
my $count = 0;
while (my $line = <$IN>) {
chomp $line;
$line =~ s/(...)(...)/$1 $2\n/g; # Create pairs of triples
$line =~ s/(\S\S\S)(\S{1,2})$/$1 $2\n/; # A triple plus something at the end.
$line .= "\n" if $line !~ /\n$/; # A triple or less at the end.
$count++;
print $OUT "Line NO\t$count\n$line\n";
}
close $OUT;
I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:
Sa 1 #
Sa 2
U 328 #*
It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.
Thanks in advance for any leads or assistance!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");
my $key;
# Read each line one at a time
while ($key = <IN1>) {
# For each txt file in the current directory
foreach my $file (<*.txt>) {
open(IN, $file) or die("Cannot open TXT file for reading: $!");
open(OUT, ">temp.txt") or die("Cannot open output file: $!");
# Add temp modified file into directory
my $newFilename = "modified\/keyed_" . $file;
my $line;
# Read each line one at a time
while ($line = <IN>) {
$line =~ s/"\$key"/"\$key" . "\/t"/;
print(OUT "$line");
}
rename("temp.txt", "$newFilename");
}
}
EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
The regex seems quoted rather oddly to me. Wouldn't
$line =~ s/$key/$key\t/;
work better?
Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.
And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.
if Perl is not a must, you can use this awk one liner
$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*
$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.
Then for all files named on the command line, available in the special Perl array #ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.
Note that we edit the files in-place thanks to Perl's handy -i option:
-i[extension]
specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …
The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.
#! /usr/bin/perl -i.bak
use warnings;
use strict;
sub usage { "Usage: $0 text-file\n" }
sub read_keys {
my $path = "keys.text";
open my $fh, "<", $path
or die "$0: open $path: $!";
my %key;
while (<$fh>) {
my($text,$num) = split;
++$key{$text}{$num} if defined $text && defined $num;
}
wantarray ? %key : \%key;
}
die usage unless #ARGV;
my %key = read_keys;
while (<>) {
my($text,$num,$line) = split " ", $_, 3;
$_ = "$text $num\t$line" if defined $text &&
defined $num &&
$key{$text}{$num};
print;
}
Sample run:
$ ./add-tab input
$ diff -u input.bak input
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ input 2010-07-20 21:00:21.119531937 -0500
## -1,3 +1,3 ##
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1 # Random line of text follows.
+Sa 2 This text is just as random.
+U 328 #* Continuing text...
Fun answers:
$line =~ s/(?<=$key)/\t/;
Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.
And:
$line =~ s/$key/$key . "\t"/e;
Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.
Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)
How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.
use strict;
use warnings;
my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";
my %hash = ();
my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';
open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
my $line = $_;
if ($line =~ /$keys_regex/){
my $key = $1;
my $number = $2;
my $symbol = $3;
$hash{$key}{'number'} = $number;
$hash{$key}{'symbol'} = $symbol;
}
}
close $fh;
open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
my $line = $_;
if ($line =~ /^([a-zA-Z]+)/){
my $key = $1;
// strip content_file line from keys/number/symbols to leave text
line =~ s/^$key//;
line =~ s/\s*$hash{$key}{'number'}//;
line =~ s/\s*$hash{$key}{'symbol'}//;
$line =~ s/^\s+//g;
$hash{$key}{'text'} = $line;
}
}
close $fh;
open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;
I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.
This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.
Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.
#! /usr/bin/env perl
use strict;
use vars;
use warnings;
open (KEYS, "keys.text")
or die "Cannot open 'keys.text' for reading\n";
my #keys = <KEYS>;
close (KEYS);
foreach my $file (glob("*.txt")) {
open (TEXT, "$file")
or die "Cannot open '$file' for reading\n";
my #textArray = <TEXT>;
close (TEXT);
foreach my $line (#keys) {
chomp $line;
map($_ =~ s/^$line/$line\t/, #textArray);
}
open (NEW_TEXT, ">$file.new") or
die qq(Can't open file "$file" for writing\n);
print TEXT join("\n", #textArray) . "\n";
close (TEXT);
}