Data value of array not printing properly - perl

I have written a script which collects marks of students and print the one who scored above 50.
Script is below:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #array = (
'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
');
print Dumper(\#array);
my $class = "3";
foreach my $each_value (#array) {
print "EACH: $each_value\n";
my ($name, $score ) = split (/,/, $each_value);
if ($score lt 50) {
next;
} else {
print "$name, \"GOOD SCORE\", $score, $class";
}
}
Here I wanted to print data of STUDENT1, since his score is greater than 50.
So output should be:
STUDENT1, "GOOD SCORE", 90, 3
But its printing output like this:
STUDENT1, "GOOD SCORE", 90
STUDENT2, 3
Here some manipulation happens between 90 STUDENT2 which it discards to separate it.
I know I was not splitting data with new line character since we have single element in the array #array.
How can I split the element which is in array to new line, so that inside for loop I can split again with comma(,) to have the values in $name and $score.
Actually the #array is coming as an argument to this script. So I have to modify this script in order to parse right values.

As you already know your "array" only has one "element" with a string with the actual records in it, so it essentially is more a scalar than an array.
And as you suspect, you can split this scalar just as you already did with the newline as a separator instead of a comma. You can then put a foreach around the result of split() to iterate over the records.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $records = 'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
';
my $class = "3";
foreach my $record (split("\n", $records)) {
my ($name, $score) = split(',', $record);
if ($score >= 50) {
print("$name, \"GOOD SCORE\", $score, $class\n");
}
}

As a small note, lt is a string comparison operator. The numeric comparisons use symbols, such as <.
Although you have an array, you only have a single string value in it:
my #array = (
'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
');
That's not a big deal. Dave Cross has already shown you have you can break that up into multiple values, but there's another way I like to handle multi-line strings. You can open a filehandle on a reference to the string, then read lines from the string as you would a file:
my $string = 'STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
';
open my $string_fh, '<', \$string;
while( <$string_fh> ) {
chomp;
...
}
One of the things to consider while programming is how many times you are duplicating the data. If you have it in a big string then split it into an array, you've now stored the data twice. That might be fine and its usually expedient. You can't always avoid it, but you should have some tools in your toolbox that let you avoid it.
And, here's a chance to use indented here docs:
use v5.26;
my $string = <<~"HERE";
STUDENT1,90
STUDENT2,40
STUDENT3,30
STUDENT4,30
HERE
open my $string_fh, '<', \$string;
while( <$string_fh> ) {
chomp;
...
}
For your particular problem, I think you have a single string where the lines are separated by the '|' character. You don't show how you call this program or get the data, though.
You can choose any line ending you like by setting the value for the input record separator, $/. Set it to a pipe and this works:
use v5.10;
my $string = 'STUDENT1,90|STUDENT2,40|STUDENT3,30|STUDENT4,30';
{
local $/ = '|'; # input record separator
open my $string_fh, '<', \$string;
while( <$string_fh> ) {
chomp;
say "Got $_";
}
}
Now the structure of your program isn't too far away from taking the data from standard input or a file. That gives you a lot of flexibility.

The #array contains one element, Actually the for loop will working correct, you can fix it without any change in the for block just by replacing this array:
my #array = (
'STUDENT1,90',
'STUDENT2,40',
'STUDENT3,30',
'STUDENT4,30');
Otherwise you can iterate on them by splitting lines using new line \n .

Related

Sorting 5th column in descending order error message

The text file I am trying to sort:
MYNETAPP01-NY
700000123456
Filesystem total used avail capacity Mounted on
/vol/vfiler_PROD1_SF_NFS15K01/ 1638GB 735GB 903GB 45% /vol/vfiler_PROD1_SF_NFS15K01/
/vol/vfiler_PROD1_SF_NFS15K01/.snapshot 409GB 105GB 303GB 26% /vol/vfiler_PROD1_SF_NFS15K01/.snapshot
/vol/vfiler_PROD1_SF_isci_15K01/ 2048GB 1653GB 394GB 81% /vol/vfiler_PROD1_SF_isci_15K01/
snap reserve 0TB 0TB 0TB ---% /vol/vfiler_PROD1_SF_isci_15K01/..
I am trying to sort this text file by its 5th column (the capacity field) in descending order.
When I first started this there was a percentage symbol mixed with the numbers. I solved this by substituting the the value like so: s/%/ %/g for #data;. This made it easier to sort the numbers alone. Afterwards I will change it back to the way it was with s/ %/%/g.
After running the script, I received this error:
#ACI-CM-L-53:~$ ./netapp.pl
Can't use string ("/vol/vfiler_PROD1_SF_isci_15K01/"...) as an ARRAY ref while "strict refs" in use at ./netapp.pl line 20, line 24 (#1)
(F) You've told Perl to dereference a string, something which
use strict blocks to prevent it happening accidentally. See
"Symbolic references" in perlref. This can be triggered by an # or $
in a double-quoted string immediately before interpolating a variable,
for example in "user #$twitter_id", which says to treat the contents
of $twitter_id as an array reference; use a \ to have a literal #
symbol followed by the contents of $twitter_id: "user \#$twitter_id".
Uncaught exception from user code:
Can't use string ("/vol/vfiler_PROD1_SF_isci_15K01/"...) as an ARRAY ref while "strict refs" in use at ./netapp.pl line 20, <$DATA> line 24.
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open (my $DATA, "<raw_info.txt") or die "$!";
my $systemName = <$DATA>;
my $systemSN = <$DATA>;
my $header = <$DATA>;
my #data;
while ( <$DATA> ) {
#data = (<$DATA>);
}
s/%/ %/g for #data;
s/---/000/ for #data;
print #data;
my #sorted = sort { $b->[5] <=> $a->[5] } #data;
print #sorted;
close($DATA);
Here is an approach using Text::Table which will nicely align your output into neat columns.
#!/usr/bin/perl
use strict;
use warnings;
use Text::Table;
open my $DATA, '<', 'file1' or die $!;
<$DATA> for 1 .. 2; # throw away first two lines
chomp(my $hdr = <$DATA>); # header
my $tbl = Text::Table->new( split ' ', $hdr, 6 );
$tbl->load( map [split /\s{2,}/], sort by_percent <$DATA> );
print $tbl;
sub by_percent {
my $keya = $a =~ /(\d+)%/ ? $1 : '0';
my $keyb = $b =~ /(\d+)%/ ? $1 : '0';
$keyb <=> $keya
}
The output generated is:
Filesystem total used avail capacity Mounted on
/vol/vfiler_PROD1_SF_isci_15K01/ 2048GB 1653GB 394GB 81% /vol/vfiler_PROD1_SF_isci_15K01/
/vol/vfiler_PROD1_SF_NFS15K01/ 1638GB 735GB 903GB 45% /vol/vfiler_PROD1_SF_NFS15K01/
/vol/vfiler_PROD1_SF_NFS15K01/.snapshot 409GB 105GB 303GB 26% /vol/vfiler_PROD1_SF_NFS15K01/.snapshot
snap reserve 0TB 0TB 0TB ---% /vol/vfiler_PROD1_SF_isci_15K01/..
Update
To explain some of the advanced parts of the program.
my $tbl = Text::Table->new( split ' ', $hdr, 6 );
This creates the Text::Table object with the header split into 6 columns. Without the limit of 6 columns, it would have created 7 columns (because the last field, 'mounted on', also contains a space. It would have been incorrectly split into 2 columns for a total of 7).
$tbl->load( map [split /\s{2,}/], sort by_percent <$DATA> );
The statement above 'loads' the data into the table. The map applies a transformation to each line from <$DATA>. Each line is split into an anonymous array, (created by [....]). The split is on 2 or more spaces, \s{2,}. If that wasn't specified, then the data `snap reserve' with 1 space would have been incorrectly split.
I hope this makes whats going on more clear.
And a simpler example that doesn't align the columns like Text::Table, but leaves them in the form they originally were read might be:
open my $DATA, '<', 'file1' or die $!;
<$DATA> for 1 .. 2; # throw away first two lines
my $hdr = <$DATA>; # header
print $hdr;
print sort by_percent <$DATA>;
sub by_percent {
my $keya = $a =~ /(\d+)%/ ? $1 : '0';
my $keyb = $b =~ /(\d+)%/ ? $1 : '0';
$keyb <=> $keya
}
In addition to skipping the fourth line of the file, this line is wrong
my #sorted = sort { $b->[5] <=> $a->[5] } #data
But presumably you knew that as the error message says
at ./netapp.pl line 20
$a and $b are lines of text from the array #data, but you're treating them as array references. It looks like you need to extract the fifth "field" from both variables before you compare them, but no one can tell you how to do that
You code is quite far from what you want. Trying to change it as little as possible, this works:
#!/usr/bin/perl
use strict;
use warnings;
open (my $fh, "<", "raw_info.txt") or die "$!";
my $systemName = <$fh>;
my $systemSN = <$fh>;
my $header = <$fh>;
my #data;
while( my $d = <$fh> ) {
chomp $d;
my #fields = split '\s{2,}', $d;
if( scalar #fields > 4 ) {
$fields[4] = $fields[4] =~ /(\d+)/ ? $1 : 0;
push #data, [ #fields ];
}
}
foreach my $i ( #data ) {
print join("\t", #$i), "\n";
}
my #sorted = sort { $b->[4] <=> $a->[4] } #data;
foreach my $i ( #sorted ) {
$i->[4] .= '%';
print join("\t", #$i), "\n";
}
close($fh);
Let´s make a few things clear:
If using the $ notation, it is customary to define file variables in lower case as $fd. It is also typical to name the file descriptor as "fd".
You define but not use the first three variables. If you don´t apply chomp to them, the final CR will be added to them. I have not done it as they are not used.
You are defining a list with a line in each element. But then you need a list ref inside to separate the fields.
The separation is done using split.
Empty lines are skipped by counting the number of fields.
I use something more compact to get rid of the % and transform the --- into a 0.
Lines are added to list #data using push and turning the list to add into a list ref with [ #list ].
A list of list refs needs two loops to get printed. One traverses the list (foreach), another (implicit in join) the columns.
Now you can sort the list and print it out in the same way. By the way, Perl lists (or arrays) start at index 0, so the 5th column is 4.
This is not the way I would have coded it, but I hope it is clear to you as it is close to your original code.

Interpolating a non-interpolated passed string inside a subroutine in Perl

I am looking to parse a tab delimited text file into a nested hash with a subroutine. Each file row will be keyed by a unique id from a uid column(s), with the header row as nested keys. Which column(s) is(are) to become the uid changes (as sometimes there isn't a unique column, so the uid has to be a combination of columns). My issue is with the $uid variable, which I pass as a non-interpolated string. When I try to use it inside the subroutine in an interpolated way, it will only give me the non-interpolated value:
use strict;
use warnings;
my $lofrow = tablehash($lof_file, '$row{gene}', "transcript", "ENST");
##sub to generate table hash from file w/ headers
##input values are file, uid, header starter, row starter, max column number
##returns hash reference (deref it)
sub tablehash {
my ($file, $uid, $headstart, $rowstart, $colnum) = #_;
if (!$colnum){ # takes care of a unknown number of columns
$colnum = 0;
}
open(INA, $file) or die "failed to open $file, $!\n";
my %table; # permanent hash table
my %row; # hash of column values for each row
my #names = (); # column headers
my #values = (); # line/row values
while (chomp(my $line = <INA>)){ # reading lines for lof info
if ($line =~ /^$headstart/){
#names = split(/\t/, $line, $colnum);
} elsif ($line =~ /^$rowstart/){ # splitting lof info columns into variables
#values = split(/\t/, $line, $colnum);
#row{#names} = #values;
print qq($uid\t$row{gene}\n); # problem: prints "$row{gene} ACB1"
$table{"$uid"} = { %row }; # puts row hash into permanent hash, but with $row{gene} key)
}
}
close INA;
return \%table;
}
I am out of ideas. I could put $table{$row{$uid}} and simply pass "gene", but in a couple of instances I want to have a $uid of "$row{gene}|$row{rsid}" producing $table{ACB1|123456}
Interpolation is a feature of the Perl parser. When you write something like
"foo $bar baz"
, Perl compiles it into something like
'foo ' . $bar . ' $baz'
It does not interpret data at runtime.
What you have is a string where one of the characters happens to be $ but that has no special effect.
There are at least two possible ways to do something like what you want. One of them is to use a function, not a string. (Which makes sense because interpolation really means concatenation at runtime, and the way to pass code around is to wrap it in a function.)
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; $row->{gene} }, "transcript", "ENST");
sub tablehash {
my ($file, $mkuid, $headstart, $rowstart, $colnum) = #_;
...
my $uid = $mkuid->(\%row);
$table{$uid} = { %row };
Here $mkuid isn't a string but a reference to a function that (given a hash reference) returns a uid string. tablehash calls it, passing a reference to %row to it. You can then later change it to e.g.
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; "$row->{gene}|$row->{rsid}" }, "transcript", "ENST");
Another solution is to use what amounts to a template string:
my $lofrow = tablehash($lof_file, "gene|rsid", "transcript", "ENST");
sub tablehash {
my ($file, $uid_template, $headstart, $rowstart, $colnum) = #_;
...
(my $uid = $uid_template) =~ s/(\w+)/$row{$1}/g;
$table{$uid} = { %row };
The s/// code goes through the template string and manually replaces every word by the corresponding value from %row.
Random notes:
Bonus points for using strict and warnings.
if (!$colnum) { $colnum = 0; } can be simplified to $colnum ||= 0;.
Use lexical variables instead of bareword filehandles. Barewords are effectively global variables (and syntactically awkward because they're not first-class citizens of the language).
Always use the 3-argument form of open to avoid unexpected interpretation of the second argument.
Include the name of your program in error messages (either explicitly with $0 or implicitly by omitting \n from die).
my #foo = (); my %bar = (); is redundant and can be simplified to my #foo; my %bar;. Arrays and hashes start out empty; overwriting them with an empty list is pointless.
chomp(my $line = <INA>) will throw a warning when you reach EOF (because you're trying to chomp a variable containing undef).
my %row; should probably be declared inside the loop. It looks like it's supposed to only contain values from the current line.
Suggestion:
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
while (my $line = readline $fh) {
chomp $line;
...
}

how to read a line and save multiple parameters into variables separated by ;?

So lets say I have a file.txt, this documents Syntax is like this:
"1;22;333;'4444';55555",
I now want my code to do the following:
open the file = already done
read line and save each Parameter separated by ; into a variable like ( $one = 1, $two = 22, $three = 333, $four = '4444', $five = 55555; )
this step would be writing the variables into a DB but thats done already
Loop until all lines of the file are done
So I actually Need help with Step 2, i think I am able to do the Loop and DB code. Do you guys have any ideas or tips how I could do this? beginnerfriendly would be nice so I can learn out of it.
foreach $file (#file){
$currentfile = "$currentdir\\$file";
open(my $reader, "<", $currentfile) or die "Failed to open file: $!\n";
?????
close $reader;
}
If you're just doing 'numbered fields' then you should be thinking 'array':
use Data::Dumper;
while ( <$reader> ) {
chomp;
my #row = split /;/;
print Dumper \#row;
}
This will give you an array that you can access - e.g. $row[0] for the first element.
$VAR1 = [
'1',
'22',
'333',
'\'4444\'',
'55555'
];
If you know what the headers are 'named' and prefer to work on names you can do something similar with a hash:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #cols = qw ( id value fish name sprout );
while ( <DATA> ) {
my %row;
chomp;
#row{#cols} = split /;/;
print Dumper \%row;
}
__DATA__
1;22;333;'4444';55555
This gives instead:
$VAR1 = {
'fish' => '333',
'name' => '\'4444\'',
'id' => '1',
'value' => '22',
'sprout' => '55555'
};
Note - hashes are unordered, but their whole point is that you don't need to care about the 'order' - just print $row{name},"\n";
You need to read from the filehandle $reader, line by line. See the tutorial perlopentut and the full reference open.
Then you split each line by the separator ;, what returns a list which you assign to an array.
open my $reader, "<", $currentfile or die "Failed to open file: $!\n";
while (my $line = <$reader>) {
chomp($line);
my #params = split ';', $line;
# do something with #params, it will be overwritten on next iteration
}
close $reader;
The diamond operator <> reads from a filehandle, <$fh>, returning a line at a time. See about it in perlop. When there are no more lines it returns undef and looping stops. You may assign the string that it returns to a variable which you declare (my $line), which then exists only within the body of the while loop. If you don't, but do while (<$fh>) instead, the line is assigned to the special variable $_, which is default for many things in Perl.
The chomp removes the linefeed (new line) from the end of the line.
Note that '4444' from your example is not a number and cannot be used as such.
Alternatively, you can take a reference to the array with parameters on each line, and put it in another array which thus will in the end contain all lines.
my #all_params;
while (my $line = <$reader>) {
my #params = split ';', $line;
push #all_params, \#params;
}
Now #all_params has elements which are references, each to an array with parameters for one line. For how to work with references see the tutorial perlreftut and the Cookbook on complex data structures, perldsc.
The following is more complex but let me mention it since it's a bit of an idiom. You can do the above in one statement
my #all_params = map { [ split ';', $_ ] } <$reader>;
This uses map, which applies the code in { ... } to each element of the list that is submitted to it, returning a list. So it takes a list and returns the processed list. The [...] inside makes an anonymous array, equivalent to the reference we took of an array previously. The filehandle <$reader>returns all lines of the file in one list when invoked in the list context, which is in this case imposed by map (since it must receive a list).
An important one: always start your programs with
use warnings 'all';
use strict;
The order of these doesn't really matter. Mostly you'll see use strict; first.
Then your loop over filenames need be foreach my $file (#file) { ... } and you must declare all variables, so my $currentfile = ....

Perl - Using an Array on a Config txt file

So I have a text file with four sets of data on a line, such as aa bb username password. So far I have been able to parse through the first line of the file using substrings and indices, and assigning each of the four to variables.
My goal is to use an array and chomp through each line and assign them to the four variables, and than to match an user inputted argument to the first variable, and use the four variables in that correct line.
For example, this would be the text file:
"aa bb cc dd"
"ee ff gg hh"
And depending on whether the user inputs "aa" or "ee" as the argument, it would use that line's set of arguments in the code.
I am trying to get up a basic array and chomp through it based on a condition for the first variable, essentially.
Here is my code for the four variables for the first line, but like I said, this only works for the first line in the text file:
local $/;
open(FILE, $configfile) or die "Can't read config file 'filename' [$!]\n";
my $document = <FILE>;
close (FILE);
my $string = $document;
my $substring = " ";
my $Index = index($string, $substring);
my $aa = substr($string, 0, $Index);
my $newstring = substr($string, $Index+1);
my $Index2 = index($newstring, $substring);
my $bb = substr($newstring, 0, $Index2);
my $newstring2 = substr($newstring, $Index2+1);
my $Index3 = index($newstring2, $substring);
my $cc = substr($newstring2, 0, $Index3);
my $newstring3 = substr($newstring2, $Index3+1);
my $Index4 = index($newstring3, $substring);
my $dd = substr($newstring3, 0, $Index4);
First of all, you can parse your whole line using split instead of running index and substring on them:
my ( $aa, $bb, $cc, $dd ) = split /\s+/, $line;
Even better, use an array:
my #array = split /\s+/, $line;
I think you're saying that you need to store each array of command parts into another array of lines. Is that correct? Take a look at this tutorial on references available in the Perl Documentation.
Perl has three different types of variables. The problem is that each of the types of variables of these stores only a single piece of data. Arrays and hashes may store lots of data, but only one piece of data can be stored in each element of a hash or array.
References allow you to get around this limitation. A reference is simply a pointer to another piece of data. For example, if $line = aa bb cc dd, doing this:
my #command_list = split /\s+/ $line;
Will give you the following:
$command_list[0] = "aa";
$command_list[1] = "bb";
$command_list[2] = "cc";
$command_list[3] = "dd";
You want to store #command_list into another structure. What you need is a reference to #command_list. To get a reference to it, you merely put a backslash in front of it:
my $reference = \#command_list;
This could be put into an array:
my #array;
$array[0] = $reference;
Now, I'm storing an entire array into a single element of an array.
To get back to the original structure from the reference, you put the correct sigil. Since this is an array, you put # in front of it:
my #new_array = #{ $reference };
If you want the first item in your reference without using having to transport it into another array, you could simply treat #{ $reference } as an array itself:
${ $reference }[0] = "aa";
Or, use the magic -> which makes the syntax a bit cleaner:
$reference->[0] = "aa";
Go through the tutorial. This will help you understand the full power of references, and how they can be used. Your program would look something like this:
use strict;
use warnings;
use feature qw(say); #Better print that print
use autodie; #Kills your program if the file can't be open
my $file = [...] #Somehow get the file you're reading in...
open my $file_fh, "<", $file;
my #command_list;
while ( my $line = <$file_fh> ) {
chomp $line;
my #line_list = split /\s+/, $line;
push #command_list, \#line_list;
}
Note that push #command_list, \#line_list; is pushing a reference to one array into another. How do you get it back out? Simple:
for my $cmd_line_ref ( #command_list ) {
my $command = $cmd_line_ref->[0]; #This is the first element in your command
next unless $command eq $user_desires; # However you figure out what the user wants
my $line = join " ", #{ $cmd_line_ref } #Rejoins your command line once again
??? #Profit
}
Read the tutorial on references, and learn about join and split.
You are reading the whole file in the my $document = <FILE> line.
Try something like:
my #lines;
open my $file, '<', $configfile or die 'xxx';
while( <$file> ) {
chomp;
push #lines, [ split ]
}
And now #lines has an array of arrays with the information you need.
(EDIT) don't forget to lose the local $/; -- it's what is making you read the whole file at once.
my $document = <FILE> is reading in only the first line. Try using a while loop.
If you want to read all lines of the file at once - assuming it's a small file - you may want to use File::Slurp module:
use File::Slurp;
my #lines = File::Slurp::read_file($configfile);
foreach my $line (#lines) {
# do whatever
Also, you can use CPAN modules to split the strings into fields.
If they are single-space separated, simply read the whole file using a standard CSV parser (you can configure Text::CSV_XS to use any characater as separator). Example here: How can I parse downloaded CSV data with Perl?
If they are separated by random amount of whitespace, use #massa's advice below and use split function.

How to parse multiple line, fixed-width file in perl?

I have a file that I need to parse in the following format. (All delimiters are spaces):
field name 1: Multiple word value.
field name 2: Multiple word value along
with multiple lines.
field name 3: Another multiple word
and multiple line value.
I am familiar with how to parse a single line fixed-width file, but am stumped with how to handle multiple lines.
#!/usr/bin/env perl
use strict; use warnings;
my (%fields, $current_field);
while (my $line = <DATA>) {
next unless $line =~ /\S/;
if ($line =~ /^ \s+ ( \S .+ )/x) {
if (defined $current_field) {
$fields{ $current_field} .= $1;
}
}
elsif ($line =~ /^(.+?) : \s+ (.+) \s+/x ) {
$current_field = $1;
$fields{ $current_field } = $2;
}
}
use Data::Dumper;
print Dumper \%fields;
__DATA__
field name 1: Multiple word value.
field name 2: Multiple word value along
with multiple lines.
field name 3: Another multiple word
and multiple line value.
Fixed-width says unpack to me. It is possible to parse with regexes and split, but unpack should be a safer choice, as it is the Right Tool for fixed width data.
I put the width of the first field to 12 and the empty space between to 13, which works for this data. You may need to change that. The template "A12A13A*" means "find 12 then 13 ascii characters, followed by any length of ascii characters". unpack will return a list of these matches. Also, unpack will use $_ if a string is not supplied, which is what we do here.
Note that if the first field is not fixed width up to the colon, as it appears to be in your sample data, you'll need to merge the fields in the template, e.g. "A25A*", and then strip the colon.
I chose array as the storage device, as I do not know if your field names are unique. A hash would overwrite fields with the same name. Another benefit of an array is that it preserves the order of the data as it appears in the file. If these things are irrelevant and quick lookup is more of a priority, use a hash instead.
Code:
use strict;
use warnings;
use Data::Dumper;
my $last_text;
my #array;
while (<DATA>) {
# unpack the fields and strip spaces
my ($field, undef, $text) = unpack "A12A13A*";
if ($field) { # If $field is empty, that means we have a multi-line value
$field =~ s/:$//; # strip the colon
$last_text = [ $field, $text ]; # store data in anonymous array
push #array, $last_text; # and store that array in #array
} else { # multi-line values get added to the previous lines data
$last_text->[1] .= " $text";
}
}
print Dumper \#array;
__DATA__
field name 1: Multiple word value.
field name 2: Multiple word value along
with multiple lines.
field name 3: Another multiple word
and multiple line value
with a third line
Output:
$VAR1 = [
[
'field name 1:',
'Multiple word value.'
],
[
'field name 2:',
'Multiple word value along with multiple lines.'
],
[
'field name 3:',
'Another multiple word and multiple line value with a third line'
]
];
You could do this:
#!/usr/bin/perl
use strict;
use warnings;
my #fields;
open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n";
for (<$fh>) {
if (/^\s/) {
$fields[$#fields] .= $_;
} else {
push #fields, $_;
}
}
close $fh;
If the line starts with white space, append it to the last element in #fields, otherwise push it onto the end of the array.
Alternatively, slurp the entire file and split with look-around:
#!/usr/bin/perl
use strict;
use warnings;
$/=undef;
open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n";
my #fields = split/(?<=\n)(?!\s)/, <$fh>;
close $fh;
It's not a recommended approach though.
You can change delimiter:
$/ = "\nfield name";
while (my $line = <FILE>) {
if ($line =~ /(\d+)\s+(.+)/) {
print "Record $1 is $2";
}
}