Aligning file output with "\t" - perl

I have an assignment that requires me to print out some sorted lists and delimit the fields by '\t'. I've finished the assignment but I cannot seem to get all the fields to line up with just the tab character. Some of the output is below, names that are over a certain length break the fields. How can I still use '\t' and get everything aligned by only that much space?
open(DOB, ">dob.txt") || die "cannot open $!";
# Output name and DOB, sorted by month
foreach my $key (sort {$month{$a} <=> $month{$b}} keys %month)
{
my #fullName = split(/ /, $namelist{$key});
print DOB "$fullName[1], $fullName[0]\t$doblist{$key}\n";
}
close(DOB);
Current output:
Santiago, Jose 1/5/58
Pinhead, Zippy 1/1/67
Neal, Jesse 2/3/36
Gutierrez, Paco 2/28/53
Sailor, Popeye 3/19/35
Corder, Norma 3/28/45
Kirstin, Lesley 4/22/62
Fardbarkle, Fred 4/12/23

You need to know how many spaces are equivalent to a tab. Then you can work out how many tabs are covered by each entry.
If tabs take 4 spaces then the following code works:
$TAB_SPACE = 4;
$NUM_TABS = 4;
foreach my $key (sort {$month{$a} <=> $month{$b}} keys %month) {
my #fullName = split(/ /, $namelist{$key});
my $name = "$fullName[1], $fullName[0]";
# This rounds down, but that just means you need a partial tab
my $covered_tabs = int(length($name) / $TAB_SPACE);
print $name . ("\t" x ($NUM_TABS - $covered_tabs)) . $doblist{$key}\n";
}
You need to know how many tabs to pad out to, but you could work that out in a very similar way to actually printing the lines.

Related

Perl: Find a match, remove the same lines, and to get the last field

Being a Perl newbie, please pardon me for asking this basic question.
I have a text file #server1 that shows a bunch of sentences (white space is the field separator) on many lines in the file.
I needed to match lines with my keyword, remove the same lines, and extract only the last field, so I have tried with:
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my #uniqmatchedline = split(/ /, #allmatchedlines);
my $lastfield = $uniqmatchedline[-1]\n";
print "$lastfield\n";
and it gives me the output showing:
1
I don't know why it's giving me just "1".
Could someone please explain why I'm getting "1" and how I can get the last field of the matched line correctly?
Thank you!
my #uniqmatchedline = split(/ /, #allmatchedlines);
You're getting "1" because split takes a scalar, not an array. An array in scalar context returns the number of elements.
You need to split on each individual line. Something like this:
my #uniqmatchedline = map { split(/ /, $_) } #allmatchedlines;
There are two issues with your code:
split is expecting a scalar value (string) to split on; if you are passing an array, it will convert the array to scalar (which is just the array length)
You did not have a way to remove same lines
To address these, the following code should work (not tested as no data):
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my %existing;
my #uniqmatchedline = grep !$existing{$_}++, #allmatchedlines; #this will return the unique lines
my #lastfields = map { ((split / /, $_)[-1]) . "\n" } #uniqmatchedline ; #this maps the last field in each line into an array
print for #lastfields;
Apart from two errors in the code, I find the statement "remove the same lines and extract only the last field" unclear. Once duplicate matching lines are removed, there may still be multiple distinct sentences with the pattern.
Until a clarification comes, here is code that picks the last field from the last such sentence.
use warnings 'all';
use strict;
use List::MoreUtils qw(uniq)
my $file = '/tmp/myfile.txt';
my $cmd = "ssh user1\#server1 cat $file";
open my $fh, '-|', $cmd // die "Error opening $cmd: $!"; # /
while (<$fh>) {
chomp;
push #allmatchedlines, $_ if /mysearch/;
}
close(output1);
my #unique_matched_lines = uniq #allmatchedlines;
my $lastfield = ( split ' ', $unique_matched_lines[-1] )[-1];
print $lastfield, "\n";
I changed to the three-argument open, with error checking. Recall that open for a process involves a fork and returns pid, so an "error" doesn't at all relate to what happened with the command itself. See open. (The # / merely turns off wrong syntax highlighting.) Also note that # under "..." indicates an array and thus need be escaped.
The (default) pattern ' ' used in split splits on any amount of whitespace. The regex / / turns off this behavior and splits on a single space. You most likely want to use ' '.
For more comments please see the original post below.
The statement #allmatchedlines = $_ if /mysearch/; on every iteration assigns to the array, overwriting whatever has been in it. So you end up with only the last line that matched mysearch. You want push #allmatchedlines, $_ ... to get all those lines.
Also, as shown in the answer by Justin Schell, split needs a scalar so it is taking the length of #allmatchedlines – which is 1 as explained above. You should have
my #words_in_matched_lines = map { split } #allmatchedlines;
When all this is straightened out, you'll have words in the array #uniqmatchedline and if that is the intention then its name is misleading.
To get unique elements of the array you can use the module List::MoreUtils
use List::MoreUtils qw(uniq);
my #unique_elems = uniq #whole_array;

Regular expression to print a string from a command outpout

I have written a function that uses regex and prints the required string from a command output.
The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ?
sub check_summary{
my ($self) = #_;
my $type = 0;
my $return_type = 0;
my $ipsla = $self->{'ssh_obj'}->exec('show ip sla');
foreach my $line( $ipsla) {
if ( $line =~ m/(icmp)/ ) {
$type = $1;
}
if ( $line =~ m/(OK)/ ) {
$return_type = $1;
}
}
INFO ($type,$return_type);
}
command Ouptut :
PSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
ID Type Destination Stats Return Last
(ms) Code Run
-----------------------------------------------------------------------
*1 icmp 192.168.25.14 RTT=1 OK 1 second ago
Updated to some clarifications -- we need only the last line
As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need.
We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop
my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string
# my #lines = #$ipsla; # if $ipsla is an arrayref
pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end
my ($type, $return_type) = (split ' ', $lines[-1])[1,4];
Here are some comments on the code. Let me know if more is needed.
We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them.
Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as
while ( $lines[-1] !~ /\S/ ) { pop #lines }
Original version, edited for clarifications. It is also a valid way to do what is needed.
I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop
my $data_start;
foreach (#lines)
{
if (not $data_start) {
$data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces
}
else {
my ($type, $return_type) = (split)[1,4];
print "type: $type, return code: $return_type\n";
}
}
This is a sketch until clarifications come. It also assumes that there are more lines than one.
I'm not sure of all possibilities of output from that command so my regular expression may need tweaking.
I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes.
use strict;
use warnings;
sub check_summary {
my ($self) = #_;
my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef.
my $ipsla = $self->{ssh_obj}->exec('show ip sla');
foreach my $line (#$ipsla) {
chomp $line; # Remove newlines from last field
if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) {
$results{Code} = $1; # Code prefixing ID
$results{ID} = $2;
$results{Type} = $3;
$results{Destination} = $4;
$results{Stats} = $5;
$results{Return_Code} = $6;
$results{Last_Run} = $7;
}
}
# Testing
use Data::Dumper;
print Dumper(\%results);
}
# Demonstrate
check_summary();
# Commented for testing
#INFO ($type,$return_type);
Worked on the submitted test line.
EDIT:
Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them.
Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables...
$1
$2

i want to merge multiple csv files by specific condition using perl

i have multiple csv files, i want to merge all those files.....
i am showing some of my sample csv files below...
M1DL1_Interpro_sum.csv
IPR017690,Outer membrane, omp85 target,821
IPR014729,Rossmann,327
IPR013785,Aldolase,304
IPR015421,Pyridoxal,224
IPR003594,ATPase,179
IPR000531,TonB receptor,150
IPR018248,EF-hand,10
M1DL2_Interpro_sum.csv
IPR017690,Outer membrane, omp85 target,728
IPR013785,Aldolase,300
IPR014729,Rossmann,261
IPR015421,Pyridoxal,189
IPR011991,Winged,113
IPR000873,AMP-dependent synthetase/ligase,111
M1DL3_Interpro_sum.csv
IPR017690,Outer membrane,905
IPR013785,Aldolase,367
IPR014729,Rossmann,338
IPR015421,Pyridoxal,271
IPR003594,ATPase,158
IPR018248,EF-hand,3
now to merge these files i have tried the following code
#ARGV = <merge_csvfiles/*.csv>;
print #ARGV[0],"\n";
open(PAGE,">outfile.csv") || die"Can't open outfile.csv\n";
while($i<scalar(#ARGV))
{
open(FILE,#ARGV[$i]) || die"Can't open ...#ARGV[$i]...\n";
$data.=join("",<FILE>);
close FILE;
print"file completed...",$i+1,"\n";
$i++;
}
#data=split("\n",$data);
#data2=#data;
print scalar(#data);
for($i=0;$i<scalar(#data);$i++)
{
#id1=split(",",#data[$i]);
$id_1=#id1[0];
#data[$j]=~s/\n//;
if(#data[$i] ne "")
{
print PAGE "\n#data[$i],";
for($j=$i+1;$j<scalar(#data2);$j++)
{
#id2=split(",",#data2[$j]);
$id_2=#id2[0];
if($id_1 eq $id_2)
{
#data[$j]=~s/\n//;
print PAGE "#data2[$j],";
#data2[$j]="";
#data[$j]="";
print "match found at ",$i+1," and ",$j+1,"\n";
}
}
}
print $i+1,"\n";
}
merge_csvfiles is a folder which contains all the files
output of above code is
IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR003594,ATPase,179,IPR003594,ATPase,158
IPR000531,TonB receptor,150
IPR018248,EF-hand,10,IPR018248,EF-hand,3
IPR011991,Winged,113
IPR000873,AMP-dependent synthetase/ligase
but i want the output in following format....
IPR017690,Outer membrane,821,IPR017690,Outer membrane ,728,IPR017690,Outer membrane,905
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158
IPR000531,TonB receptor,150,0,0,0,0,0,0
IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3
0,0,0,IPR011991,Winged,113,0,0,0
0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0
Has anybody got any idea how can i do this?
Thank you for the help
As mentioned in Miguel Prz's comment, you haven't explained how you want the merge to be performed, but, judging by the "desired output" sample, it appears that what you want is to concatenate lines with matching IDs from all three input files into a single line in the output file, with "0,0,0" taking the place of any lines which don't appear in a given file.
So, then:
#!/usr/bin/env perl
use strict;
use warnings;
my #input_files = glob 'merge_csvfiles/*.csv';
my %data;
for my $i (0 .. $#input_files) {
open my $infh, '<', $input_files[$i]
or die "Failed to open $input_files[$i]: $!";
while (<$infh>) {
chomp;
my $id = (split ',', $_, 2)[0];
$data{$id}[$i] = $_;
}
print "Input file read: $input_files[$i]\n";
}
open my $outfh, '>', 'outfile.csv' or die "Failed to open outfile.csv: $!";
for my $id (sort keys %data) {
my #merge_data;
for my $i (0 .. $#input_files) {
push #merge_data, $data{$id}[$i] || '0,0,0';
}
print $outfh join(',', #merge_data) . "\n";
}
The first loop collects all the lines from each file into a hash of arrays. The hash keys are the IDs, so the lines for that ID from all files are kept together, and the value for each key is (a reference to) an array of the line associated with that ID in each file; using an array for this allows us to keep track of values which are missing as well as those which are present.
The second loop then takes the keys of that hash (in alphabetical order) and, for each one, creates a temporary array of the values associated with that ID, substituting "0,0,0" for missing values, joins them into a single string, and prints that to the output file.
The results, in outfile.csv, are:
IPR000531,TonB receptor,150,0,0,0,0,0,0
0,0,0,IPR000873,AMP-dependent synthetase/ligase,111,0,0,0
IPR003594,ATPase,179,0,0,0,IPR003594,ATPase,158
0,0,0,IPR011991,Winged,113,0,0,0
IPR013785,Aldolase,304,IPR013785,Aldolase,300,IPR013785,Aldolase,367
IPR014729,Rossmann,327,IPR014729,Rossmann,261,IPR014729,Rossmann,338
IPR015421,Pyridoxal,224,IPR015421,Pyridoxal,189,IPR015421,Pyridoxal,271
IPR017690,Outer membrane, omp85 target,821,IPR017690,Outer membrane, omp85 target,728,IPR017690,Outer membrane,905
IPR018248,EF-hand,10,0,0,0,IPR018248,EF-hand,3
Edit: Added explanations requested by OP in comments
can u expalain me the working of my $id = (split ',', $_, 2)[0]; and $# in this program
my $id = (split ',', $_, 2)[0]; gets the text prior to the first comma in the last line of text that was read:
Because I didn't specify what variable to put the data in, while (<$infh>) reads it into the default variable $_.
split ',', $_, 2 splits up the value of $_ into a list of comma-separated fields. The 2 at the end tells it to only produce at most 2 fields; the code will work fine without the 2, but, since I only need the first field, splitting into more parts isn't necessary.
Putting (...)[0] around the split command turns the returned list of fields into an (anonymous) array and returns the first element of that array. It's the same as if I'd written my #fields = split ',', $_, 2; my $id = $fields[0];, but shorter and without the extra variable.
$#array returns the highest-numbered index in the array #array, so for my $i (0 .. $#array) just means "loop over the indexes for all elements in #array". (Note that, if I hadn't needed the value of the index counter, I would have instead looped over the array's data directly, by using for my $filename (#input_files), but it would have been less convenient to keep track of the missing values if I'd done it that way.)

PERL -- Regex incl all hash keys (sorted) + deleting empty fields from $_ in file read

I'm working on a program and I have a couple of questions, hope you can help:
First I need to access a file and retrieve specific information according to an index that is obtained from a previous step, in which the indexes to retrieve are found and store in a hash.
I've been looking for a way to include all array elements in a regex that I can use in the file search, but I haven´t been able to make it work. Eventually i've found a way that works:
my #atoms = ();
my $natoms=0;
foreach my $atomi (keys %{$atome}){
push (#atoms,$atomi);
$natoms++;
}
#atoms = sort {$b cmp $a} #atoms;
and then I use it as a regex this way:
while (<IN_LIG>){
if (!$natoms) {last;}
......
if ($_ =~ m/^\s*$atoms[$natoms-1]\s+/){
$natoms--;
.....
}
Is there any way to create a regex expression that would include all hash keys? They are numeric and must be sorted. The keys refer to the line index in IN_LIG, whose content is something like this:
8 C5 9.9153 2.3814 -8.6988 C.ar 1 MLK -0.1500
The key is to be found in column 0 (8). I have added ^ and \s+ to make sure it refers only to the first column.
My second problem is that sometimes input files are not always identical and they make contain white spaces before the index, so when I create an array from $_ I get column0 = " " instead of column0=8
I don't understand why this "empty column" is not eliminated on the split command and I'm having some trouble to remove it. This is what I have done:
#info = split (/[\s]+/,$_);
if ($info[0] eq " ") {splice (#info, 0,1);} # also tried $info[0] =~ m/\s+/
and when I print the array #info I get this:
Array:
Array: 8
Array: C5
Array: 9.9153
Array: 2.3814
.....
How can I get rid of the empty column?
Many thanks for your help
Merche
There is a special form of split where it will remove both leading and trailing spaces. It looks like this, try it:
my $line = ' begins with spaces and ends with spaces ';
my #tokens = split ' ', $line;
# This prints |begins:with:spaces:and:ends:with:spaces|
print "|", join(':', #tokens), "|\n";
See the documentation for split at http://p3rl.org/split (or with perldoc split)
Also, the first part of your program might be simpler as:
my #atoms = sort {$b cmp $a} keys %$atome;
my $natoms = #atoms;
But, what is your ultimate goal with the atoms? If you simply want to verify that the atoms you're given are indeed in the file, then you don't need to sort them, nor to count them:
my #atoms = keys %$atome;
while (<IN_LIG>){
# The atom ID on this line
my ($atom_id) = split ' ';
# Is this atom ID in the array of atom IDs that we are looking for
if (grep { /$atom_id/ } #atoms) {
# This line of the file has an atom that was in the array: $atom_id
}
}
Lets warm up by refining and correcting some of your code:
# If these are all numbers, do a numerical sort: <=> not cmp
my #atoms = ( sort { $b <=> $a } keys %{$atome} );
my $natoms = scalar #atoms;
No need to loop through the keys, you can insert them into the array right away. You can also sort them right away, and if they are numbers, the sort must be numerical, otherwise you will get a sort like: 1, 11, 111, 2, 22, 222, ...
$natoms can be assigned directly by the count of values in #atoms.
while(<IN_LIG>) {
last unless $natoms;
my $key = (split)[0]; # split splits on whitespace and $_ by default
$natoms-- if ($key == $atoms[$natoms - 1]);
}
I'm not quite sure what you are doing here, and if it is the best way, but this code should work, whereas your regex would not. Inside a regex, [] are meta characters. Split by default splits $_ on whitespace, so you need not be explicit about that. This split will also definitely remove all whitespace. Your empty field is most likely an empty string, '', and not a space ' '.
The best way to compare two numbers is not by a regex, but with the equality operator ==.
Your empty field should be gone by splitting on whitespace. The default for split is split ' '.
Also, if you are not already doing it, you should use:
use strict;
use warnings;
It will save you a lot of headaches.
for your second question you could use this line:
#info = $_ =~ m{^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)}xms;
in order to capture 9 items from each line (assuming they do not contain whitespace).
The first question I do not understand.
Update: I would read alle the lines of the file and use them in a hash with $info[0] as the key and [#info[1..8]] as the value. Then you can lookup the entries by your index.
my %details;
while (<IN_LIG>) {
#info = $_ =~ m{^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)}xms;
$details{ $info[0] } = [ #info[1..$#info] ];
}
Later you can lookup details for the indices you are interested in and process as needed. This assumes the index is unique (has the property of keys).
thanks for all your replies. I tried the split form with ' ' and it saved me several lines of code. thanks!
As for the regex, I found something that could make all keys as part of the string expression with join and quotemeta, but I couldn't make it work. Nevertheless I found an alternative that works, but I liked the join/quotemeta solution better
The atom indexes are obtained from a text file according to some energy threshold. Later, in the IN_LIG loop, I need to access the molecule file to obtain more information about the atoms selected, thus I use the atom "index" in the molecule to identify which lines of the file I have to read and process. This is a subroutine to which I send a hash with the atom index and some other information.
I tried this for the regex:
my $strings = join "|" map quotemeta,
sort { $hash->{$b} <=> $hash->{$a}} keys %($hash);
but I did something wrong cos it wouldn't take all keys

count number of times string repeated in file perl

I am new to Perl, by the way. I have a Perl script that needs to count the number of times a string appears in the file. The script gets the word from the file itself.
I need it to grab the first word in the file and then search the rest of the file to see if it is repeated anywhere else. If it is repeated I need it to return the amount of times it was repeated. If it was not repeated, it can return 0. I need it to then get the next word in the file and check this again.
I will grab the first word from the file, search the file for repeats of that word, grab the second word from
the file, search the file for repeats of that word, grab the third word from the file, search the file for repeats of that word.
So far I have a while loop that is grabbing each word I need, but I do not know how to get it to search for repeats without resetting the position of my current line. So how do I do this? Any ideas or suggestions are greatly appreciated! Thanks in advance!
while (<theFile>) {
my $line1 = $_;
my $startHere = rindex($line1, ",");
my $theName = substr($line1, $startHere + 1, length($line1) - $startHere);
#print "the name: ".$theName."\n";
}
Use a hashtable;
my %wordcount = ();
while(my $line = <theFile>)
{
chomp($line);
my #words = split(' ', $line);
foreach my $word(#words)
{
$wordCount{$word} += 1;
}
}
# output
foreach my $key(keys %wordCount)
{
print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}
The $wordCount{$key} - 1 in the output accounts for the first time a word was seen; Words that only apprear once in the file will have a count of 0
Unless this is actually homework and/or you have to achieve the results in the specific manor you describe, this is going to be FAR more efficient.
Edit: From your comment below:
Each word i am searching for is not "the first word" it is a certain word on the line. Basically i have a csv file and i am skipping to the third value and searching for repeats of it.
I would still use this approach. What you would want to do is:
split on , since this is a CSV file
Pull out the 3rd word in the array on each line and store the words you are interested in in their own hash table
At the end, iterate through the "search word" hash table, and pull out the counts from the wordcount table
So:
my #words = split(',', $line);
$searchTable{#words[2]} = 1;
...
foreach my $key(keys %searchTable)
{
print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}
you'll have to adjust according to what rules you have around counting words that repeat in the third column. You could just remove them from #words before the loop that inserts into your wordCount hash.
my $word = <theFile>
chomp($word); #`assuming word is by itself.
my $wordcount = 0;
foreach my $line (<theFile>) {
$line =~ s/$word/$wordcount++/eg;
}
print $wordcount."\n";
Look up the regex flag 'e' for more on what this does. I didn't test the code, but something like it should work. For clarification, the 'e' flag evaluates the second part of the regex (the substitution) as code before replacing, but it's more than that, so with that flag you should be able to make this work.
Now that I understand what you are asking for, the above solution won't work. What you can do, is use sysread to read the entire file into a buffer, and run the same substition after that, but you will have to get the first word off manually, or you can just decrement after the fact. This is because the sysread filehandle and the regular filehandle are handled differently, so try this:
my $word = <theFile>
chomp($word); #`assuming word is by itself.
my $wordcount = 0;
my $srline = '';
#some arbitrary very long length, longer than file
#Looping also possible.
sysread(theFile,$srline,10000000)
$srline =~ s/$word/$wordcount++/eg;
$wordcount--; # I think that the first word will still be in here, causing issues, you should test.
print $wordcount."\n";
Now, given that I read your comment responding to your question, I don't think that your current algorithm is optimal, and you probably want a hash storing up all of the counts for words in a file. This would probably be best done using something like the following:
my %counts = ();
foreach my $line (<theFile>) {
$line =~ s/(\w+)/$counts{$1}++/eg;
}
# now %counts contains key-value pair words for everything in the file.
To find count of all words present in the file you can do something like:
#!/usr/bin/perl
use strict;
use warnings;
my %count_of;
while (my $line = <>) { #read from file or STDIN
foreach my $word (split /\s+/, $line) {
$count_of{$word}++;
}
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
print "'$word': $count_of{$word}\n";
}
__END__