I need some help with following perl code.
#!perl -w
use strict;
use warnings;
open my $file, '<', 'ubb' or die $1;
my $spool = 0;
my #matchingLines;
while (<$file>) {
if (/GROUPS/i) {
$spool = 1;
next;
}
elsif (/SERVERS/i) {
$spool = 0;
print map { "$_" } #matchingLines;
#matchingLines = ();
}
if ($spool) {
push (#matchingLines, $_);
}
}
close ($file);
Output from that is shown below.
ADM LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=1
ADM_TMS LMID=GW_S4_1_PM,GW_S4_2_BM
GRPNO=2
TMSNAME=TMS
ADM_1 LMID=GW_S4_1_PM
GRPNO=11
ADM_2 LMID=GW_S4_2_BM
GRPNO=12
DMWSG_Gateway_1 LMID=GW_S4_1_PM
GRPNO=101
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_Gateway_2 LMID=GW_S4_2_BM
GRPNO=201
ENVFILE="../GW_S4.Gateway.envfile"
DMWSG_1 LMID=GW_S4_1_PM
GRPNO=106
DMWSG_2 LMID=GW_S4_2_BM
GRPNO=206
But I only would like to get the first word of each line (e.g. ADM, ADM_TMS, ADM_1).
Note that the file has a lot of other lines above and below what's printed here. I only want to do this for lines that is in between GROUPS and SERVERS.
I would suggest 2 changes in your code
Note: Tested these with your sample data (plus other stuff) in your question.
I: Extract first word before push
Change this
push (#matchingLines, $_);
to
push (#matchingLines, /^(\S+)/);
This would push the first word of each line into the array, instead of the entire line.
Note that /^(\S+)/ is shorthand for $_ =~ /^(\S+)/. If you're using an explicit loop variable like in 7stud's answer, you can't use this shorthand, use the explicit syntax instead, say $line =~ /^(\S+)/ or whatever your loop variable is.
Of course, you can also use split function as suggested in 7stud's answer.
II: Change how you print
Change this
print map { "$_" } #matchingLines;
into
local $" = "\n";
print "#matchingLines \n";
$" specifies the delimiter used for list elements when the array is printed with print or say inside double quotes.
Alternatively, as per TLP's suggestion,
$\ = $/;
print for #lines;
or
print join("\n", #lines), "\n"
Note that $/ is the input record separator (newline by default), $\ is the output record separator (undefined by default). $\ is appended after each print command.
For more information on $/, $\, and $":
See perldoc perlvar (just use CTRL+F to find them in that page)
Or you can simply use perldoc -v '$/' etc on your console to get those information.
Note on readability
I don't think implicit regex matching i.e. /pattern/ is bad per se.
But matching against a variable, i.e. $variable =~ /pattern/ is more readable (as in you can immediately see there's a regex matching going on) and more beginner-friendly, at the cost of conciseness.
use strict;
use warnings;
use 5.014; #say()
my $fname = 'data.txt';
open my $INFILE, '<', $fname
or die "Couldn't open $fname: $!"; #-->Not $1"
my $recording_on = 0;
my #matching_lines;
for my $line (<$INFILE>) {
if ($line =~ /groups/i) {
$recording_on = 1;
next;
}
elsif ($line =~ /servers/i) {
say for #matching_lines; #say() is the same as print(), but it adds a newline at the end
#matching_lines = ();
$recording_on = 0;
}
if ($recording_on) {
my ($first_word, $trash) = split " ", $line, 2;
push #matching_lines, $first_word;
}
}
close $INFILE;
You can use the flip-flop operator (range) to select a part of your input. The idea of this operator is that it returns false until its LHS (left hand side) returns true, and after that it returns true until its RHS returns false, after which it is reset. It is somewhat like preserving a state.
Note that the edge lines are also included in the match, so we need to remove those. After that, use doubleDown's idea and push /^(\S+)/ onto an array. The nice thing about using this with push is that the capture regex returns an empty list if it fails, and this gives us a warning-free failure when the regex does not match.
use strict;
use warnings;
my #matches;
while (<>) {
if (/GROUPS/i .. /SERVERS/i) { # flip-flop remembers the matches
next if (/GROUPS/i or /SERVERS/i);
push #matches, /^(\S+)/;
}
}
# #matches should now contain the first words of those lines
Related
I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}
To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better
The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}
As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;
I am working on the perl script and need some help with it. The requirement is, I have to find a lable and once the label is found, I have to replace the word in a line immediately following the label. for Example, if the label is ABC:
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
I want to write a script to match the label (ABC) and once the label is found, replace a word in the next line immediately following the label.
Here is my attempt:
open(my $fh, "<", "file1.txt") or die "cannot open file:$!";
while (my $line = <$fh>))
{
next if ($line =~ /ABC/) {
$line =~ s/original_string/replaced_string/;
}
else {
$msg = "pattern not found \n ";
print "$msg";
}
}
Is this correct..? Any help will be greatly appreciated.
The following one-liner will do what you need:
perl -pe '++$x and next if /ABC:/; $x-- and s/old/new/ if $x' inFile > outFile
The code sets a flag and gets the next line if the label is found. If the flag is set, it's unset and the substitution is executed.
Hope this helps!
You're doing this in your loop:
next if ($line =~ /ABC/);
So, you're reading the file, if a line contains ABC anywhere in that line, you skip the line. However, for every other line, you do the replacement. In the end, you're replacing the string on all other lines and printing that out, and your not printing out your labels.
Here's what you said:
I have to read the file until I find a line with the label:
Once the label is found
I have to read the next line and replace the word in a line immediately following the label.
So:
You want to read through a file line-by-line.
If a line matches the label
read the next line
replace the text on the line
Print out the line
Following these directions:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
say "$line"; # Print out the current line
$line = <$fh> # Read the next line
$line =~ s/old/new/; # Replace that word
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Note that when I use say, I don't have to put the \n on the end of the line. Also, by doing my chomp after my read, I can easily match the label without worrying about the \n on the end.
This is done exactly as you said it should be done, but there are a couple of issues. The first is that when we do $line = <$fh>, there's no guarantee we are really reading a line. What if the file ends right there?
Also, it's bad practice to read a file in multiple places. It makes it harder to maintain the program. To get around this issue, we'll use a flag variable. This allows us to know if the line before was a tag or not:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
my $tag_found = 0; # Flag isn't set
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
$tag_found = 1 # We found the tag!
}
if ( $tag_found ) {
$line =~ s/old/new/; # Replace that word
$tag_found = 0; # Reset our flag variable
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Of course, I would prefer to eliminate mysterious values. For example, the tag should be a variable or constant. Same with the string you're searching for and the string you're replacing.
You mentioned this was a word, so your regular expression replacement should probably look like this:
$line =~ s/\b$old_word\b/$new_word/;
The \b mark word boundaries. This way, if you're suppose to replace the word cat with dog, you don't get tripped up on a line that says:
The Jeopardy category is "Say what".
You don't want to change category to dogegory.
Your problem is that reading in a file does not work like that. You're doing it line by line, so when your regex tests true, the line you want to change isn't there yet. You can try adding a boolean variable to check if the last line was a label.
#!/usr/bin/perl;
use strict;
use warnings;
my $found;
my $replacement = "Hello";
while(my $line = <>){
if($line =~ /ABC/){
$found = 1;
next;
}
if($found){
$line =~ s/^.*?$/$replacement/;
$found = 0;
print $line, "\n";
}
}
Or you could use File::Slurp and read the whole file into one string:
use File::Slurp;
$x = read_file( "file.txt" );
$x =~ s/^(ABC:\s*$ [\n\r]{1,2}^.*?)to\sbe/$1to was/mgx;
print $x;
using /m to make the ^ and $ match embedded begin/end of lines
x is to allow the space after the $ - there is probably a better way
Yields:
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
Also, relying on perl's in-place editing:
use File::Slurp qw(read_file write_file);
use strict;
use warnings;
my $file = 'fakefile1.txt';
# Initialize Fake data
write_file($file, <DATA>);
# Enclosed is the actual code that you're looking for.
# Everything else is just for testing:
{
local #ARGV = $file;
local $^I = '.bac';
while (<>) {
print;
if (/ABC/ && !eof) {
$_ = <>;
s/.*/replaced string/;
print;
}
}
unlink "$file$^I";
}
# Compare new file.
print read_file($file);
1;
__DATA__
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
ABC:
outputs
ABC:
replaced string
some other lines
ABC:
replaced string
some other lines
ABC:
replaced string
ABC:
I'm trying to edit a text using Perl. I need to make a substitution but the substitution cannot be applied once an specific word is found in the text. So, imagine I want to substitute all the "hello" forms by "goodbye", but the substitution cannot be applied once the word "foo" is found.
I tried to do this:
use warnings;
use strict;
$/ = undef;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
while(<F>) {
do {s/hello/goodbay/} until (m{foo});
print;
}
close F;
But, as a result, only the first "hello" of my text is changed.
Any suggestion?
Trying to think what would be the most efficient. It should be one of the following:
s{^(.*?)(foo|\z)}{
my $s = $1;
$s =~ s{hello}{goodbay}g;
$s.$2
}se;
print;
or (same as above, but requires 5.14+)
s{^(.*?)(foo|\z)}{ s{hello}{goodbay}gr . $2 }se;
print;
or
my $pos = /foo/ ? $-[0] : length;
my $s = substr($_, 0, $pos, '');
$s =~ s{hello}{goodbay}g;
print($s);
print;
Both work even if foo isn't present.
This solution uses less memory:
# Assumes foo will always be present
# (though it could be expanded to handle that
# Assumes foo isn't a regex pattern.
local $/ = "foo";
$_ = <$fh>;
chomp;
s{hello}{goodbay}g;
print;
print $/;
local $/;
print <$fh>;
If the substrings you work on (the hello and foo of your example) are single words, a easy way would probably be to replace $/ = undef; with $/ = " ";. Currently you slurp in the whole file at once, meaning the while loop gets executed at most once.
That is because there is only one "line" in the whole input after you told perl that there are no line separators.
If you use a space as input separator, it will loop over the input word by word and hopefully work as you intend.
Use a flag variable:
use warnings;
use strict;
my $filename = shift;
open F, $filename or die "Usa: $0 FILENAME\n";
my $replace=1;
while(<F>) {
$replace = 0 if m{foo};
s/hello/goodbye/g if $replace;
print;
}
close F;
This stops at the line containing the end pattern. It will be slightly more complicated if you want to substitute up to just before the match.
This answer uses the ${^PREMATCH] and related variables introduced in Perl 5.10.
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
my $foo_found;
while (my $line = <>) {
if (!$foo_found) {
if ($line =~ m/foo/ip) {
# only replace hellos in the part before foo
${^PREMATCH} =~ s/hello/goodbye/g;
$line = "${^PREMATCH}${^MATCH}${^POSTMATCH}";
$foo_found ++;
} else {
$line =~ s/hello/goodbye/ig;
}
}
print $line;
}
Given the following input:
hello cruel world
hello baseball
hello mudda, hello fadda
foo
The rest of the hellos should stay
Last hello
I get the following output
goodbye cruel world
goodbye baseball
goodbye mudda, goodbye fadda
foo
The rest of the hellos should stay
Last hello
If you don't have 5.10 you can use $` and related variables but they come with a performance hit. See perldoc perlvar for details.
I"m writing to perl script where basically want to open a file having many strings(one string in one line) and compare each of these strings is present in another file(search file) and print each occurrence of it. I have written the below code for one particular string finding. How can i improve it for list of strings from a file.
open(DATA, "<filetosearch.txt") or die "Couldn't open file filetosearch.txt for reading: $!";
my $find = "word or string to find";
#open FILE, "<signatures.txt";
my #lines = <DATA>;
print "Lined that matched $find\n";
for (#lines) {
if ($_ =~ /$find/) {
print "$_\n";
}
}
I'd try something like this:
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'filetosearch.txt';
my #matched;
my #result;
tie my #patterns, 'Tie::File', 'patterns.txt';
foreach my $pattern (#patterns)
{
$pattern = quotemeta $pattern;
#matched = grep { /$pattern/ } #lines;
push #result, #matched;
}
I use Tie::File, because it is convenient (not especially in this case, but others), others (perhaps a lot of others?) would disagree, but it is of no importance here
grep is a core function, that is very good at what it does (In my experience)
Ok, something like this will be faster.
sub testmatch
{
my ($find, $linesref)= #_ ;
for ( #$linesref ) { if ( $_ =~ /$find/ ) { return 1 ; } }
return 0 ;
}
{
open(DATA, "<filetosearch.txt") or die "die" ;
my #lines = <DATA> ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ( testmatch( $_, \#lines )) { print "a match\n" }
}
}
If its matching full line to full line, you can pack the one line in as keys to a hash and just test existance:
{
open(DATA, "<filetosearch.txt") or die "die" ;
my %lines ;
#lines{<DATA>}= undef ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ($_ ~~ %lines) { print "a match\n" }
}
}
maybe something like this will do the job:
open FILE1, "filetosearch.txt";
my #arrFileToSearch = <FILE1>;
close FILE1;
open FILE2, "signatures.txt";
my #arrSignatures = <FILE2>;
close FILE2;
for(my $i = 0; defined($arrFileToSearch[$i]);$i++){
foreach my $signature(#arrSignatures){
chomp($signature);
$signature = quotemeta($signature);#to be sure you are escaping special characters
if($arrFileToSearch[$i] =~ /$signature/){
print $arrFileToSearch[$i-3];#or any other index that you want
}
}
}
Here's another option:
use strict;
use warnings;
my $searchFile = pop;
my #strings = map { chomp; "\Q$_\E" } <>;
my $regex = '(?:' . ( join '|', #strings ) . ')';
push #ARGV, $searchFile;
while (<>) {
print if /$regex/;
}
Usage: perl script.pl strings.txt searchFile.txt [>outFile.txt]
The last, optional parameter directs output to a file.
First, the search file's name is (implicitly) popped off #ARGV and saved for later. Then the strings' file is read (<>) and map is used to chomp each line, escape meta-characters (the \Q and \E, in case there may be regex chars, e.g., a '.' or '*' etc., in the string) then these lines are passed to an array. The array's elements are joined with the regex alternation character (|) to effectively form an OR statement of all the strings that will be matched against each of the search file's lines. Next, the search file's name is pushed onto #ARGV so its lines can be searched. Again, each line is chomped and printed if one of the strings are found on the line.
Hope this helps!
I am need to push all the matched groups into an array.
#!/usr/bin/perl
use strict;
open (FILE, "/home/user/name") || die $!;
my #lines = <FILE>;
close (FILE);
open (FH, ">>/home/user/new") || die $!;
foreach $_(#lines){
if ($_ =~ /AB_(.+)_(.+)_(.+)_(.+)_(.+)_(.+)_(.+)_W.+txt/){
print FH "$1 $2 $3 $4 $5 $6 $7\n"; #needs to be first element of array
}
elsif ($_ =~ /CD_(.+)_(.+)_(.+)_(.+)_(.+)_(.+)_W.+txt/){
print FH "$1 $2 $3 $4 $5 $6\n"; #needs to be second element of array
}
}close (FH);
_ INPUT _
AB_ first--2-45_ Name_ is34_ correct_ OR_ not_W3478.txt
CD_ second_ input_ 89-is_ diffErnt_ 76-from_Wfirst6.txt
Instead of writing matched groups to FILE, I want to push them into array. I can't think of any other command other than push but this function does not accept more than one argument. What is the best way to do the same? The output should look like following after pushing matched groups into array.
_ OUTPUT _
$array[0] = first--2-45 Name is34 correct OR not
$array[1] = second input 89-is diffErnt 76-from
Use the same argument for push that you use for print: A string in double quotes.
push #array, "$1 $2 $3 $4 $5 $6 $7";
Take a look at perldoc -f grep, which returns a list of all elements of a list that match some criterion.
And incidentally, push does take more than one argument: see perldoc -f push.
push #matches, grep { /your regex here/ } #lines;
You didn't include the code leading up to this though.. some of it is a little odd, such as the use of $_ as a function call. Are you sure you want to do that?
If you are using Perl 5.10.1 or later, this is how I would write it.
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1; # or use 5.010;
use autodie;
my #lines = do{
# don't need to check for errors, because of autodie
open( my $file, '<', '/home/user/name' );
grep {chomp} <$file>;
# $file is automatically closed
};
# use 3 arg form of open
open( my $file, '>>', '/home/user/new' );
my #matches;
for( #lines ){
if( /(?:AB|CD)( (?:_[^_]+)+ )_W .+ txt/x ){
my #match = "$1" =~ /_([^_]+)/g;
say {$file} "#match";
push #matches, \#match;
# or
# push #matches, [ "$1" =~ /_([^_]+)/g ];
# if you don't need to print it in this loop.
}
}
close $file;
This is a little bit more permissive of inputs, but the regex should be a little bit more "correct", than the original.
Remember that a capturing match in list context returns the captured fields, if any:
#!/usr/bin/perl
use strict; use warnings;
my $file = '/home/user/name';
open my $in, '<', $file
or die "Cannot open '$file': $!";
my #matched;
while ( <$in> ) {
my #fields;
if (#fields = /AB_(.+)_(.+)_(.+)_(.+)_(.+)_(.+)_(.+)_W.+txt/
or #fields = /CD_(.+)_(.+)_(.+)_(.+)_(.+)_(.+)_W.+txt/)
{
push #matched, "#fields";
}
}
use Data::Dumper;
print Dumper \#matched;
Of course, you could also do
push #matched, \#fields;
depending on what you intend to do with the matches.
I wonder if using push and giant regexes is really the right way to go.
The OP says he wants lines starting with AB at index 0, and those with CD at index 1.
Also, those regexes look like an inside-out split to me.
In the code below I have added some didactic comments that point out why I am doing things differently than the OP and the other solutions offered here.
#!/usr/bin/perl
use strict;
use warnings; # best use warnings too. strict doesn't catch everything
my $filename = "/home/user/name";
# Using 3 argument open addresses some security issues with 2 arg open.
# Lexical filehandles are better than global filehandles, they prevent
# most accidental filehandle name colisions, among other advantages.
# Low precedence or operator helps prevent incorrect binding of die
# with open's args
# Expanded error message is more helpful
open( my $inh, '<', $filename )
or die "Error opening input file '$filename': $!";
my #file_data;
# Process file with a while loop.
# This is VERY important when dealing with large files.
# for will read the whole file into RAM.
# for/foreach is fine for small files.
while( my $line = <$inh> ) {
chmop $line;
# Simple regex captures the data type indicator and the data.
if( $line =~ /(AB|CD)_(.*)_W.+txt/ ) {
# Based on the type indicator we set variables
# used for validation and data access.
my( $index, $required_fields ) = $1 eq 'AB' ? ( 0, 7 )
: $1 eq 'CD' ? ( 1, 6 )
: ();
next unless defined $index;
# Why use a complex regex when a simple split will do the same job?
my #matches = split /_/, $2;
# Here we validate the field count, since split won't check that for us.
unless( #matches == $required_fields ) {
warn "Incorrect field count found in line '$line'\n";
next;
}
# Warn if we have already seen a line with the same data type.
if( defined $file_data[$index] ) {
warn "Overwriting data at index $index: '#{$file[$index]}'\n";
}
# Store the data at the appropriate index.
$file_data[$index] = \#matches;
}
else {
warn "Found non-conformant line: $line\n";
}
}
Be forewarned, I just typed this into the browser window. So, while the code should be correct, there may be typos or missed semicolons lurking--it's untested, use it at your own peril.