Try to remove specific columns using splice in Perl - perl

I am a brand new Perl novice, looking for help with my first ever Perl script
I have some huge files 30-50GB files and they are constructed like this - millions of columns and thousands of rows:
A B C D E 1 2 3 4 5 6 7 8 9 10
A B C D E 1 2 3 4 5 6 7 8 9 10
A B C D E 1 2 3 4 5 6 7 8 9 10
A B C D E 1 2 3 4 5 6 7 8 9 10
A B C D E 1 2 3 4 5 6 7 8 9 10
A B C D E 1 2 3 4 5 6 7 8 9 10
A B C D E 1 2 3 4 5 6 7 8 9 10
I would like to delete column "A", and column "C", then ever third of the number columns, so the "3" column and the "6" column, then "9" column until the end of the file. Space delimited.
My attempt is like this:
#!/usr/local/bin/perl
use strict;
use warnings;
my #dataColumns;
my $dataColumnCount;
if(scalar(#ARGV) != 2){
print "\nNo files supplied, please supply file name\n";
exit;
}
my $Infile = $ARGV[0];
my $Outfile = $ARGV[1];
open(INFO,$Infile) || die "Could not open $Infile for reading";
open(OUT,">$Outfile") || die "Could not open $Outfile for writing";
while (<INFO>) {
chop;
#dataColumns = split(" ");
$dataColumnCount = #dataColumns + 1;
#Now remove the first element of the list
shift(#dataColumns);
#Now remove the third element (Note that it is now the second - after removal of the first)
splice(#dataColumns,1,1); # remove the third element (now the second)
#Now remove the 6th (originally the 8th) and every third one thereafter
#NB There are now $dataColumnCount-1 columns
for (my $i = 5; $i < $dataColumnCount-1; $i = $i + 3 ) {
splice($dataColumns; $i; 1);
}
#Now join the remaining elements of the list back into a single string
my $AmendedLine = join(" ",#dataColumns);
#Finally print out the line into your new file
print OUT "$AmendedLine/n";
}
But I am getting a few weird errors:
It is saying it doesn't like my $1 in the for loop, I have added a 'my' which seems to make the error go away but nobody else's for code seems to contain a 'my' here so I am not sure what is going on.
Global symbol "$i" requires explicit package name at Convertversion2.pl line 36.
Global symbol "$i" requires explicit package name at Convertversion2.pl line 36.
Global symbol "$i" requires explicit package name at Convertversion2.pl line 36.
Global symbol "$i" requires explicit package name at Convertversion2.pl line 36.
The other error is this:
syntax error at Convertversion2.pl line 37, near "#dataColumns;"
syntax error at Convertversion2.pl line 37, near "1)"
I am not sure how to correct this error, I think I am almost there, but not sure what exactly what the syntax error is, is am unsure how to fix it.
Thank you in advance.

After I blogged about this question, a commenter pointed out that it is possible to reduce run time by 45% for my test case. I paraphrased his code a little bit:
my #keep;
while (<>) {
my #data = split;
unless (#keep) {
#keep = (0, 1, 0, 1, 1);
for (my $i = 5; $i < #data; $i += 3) {
push #keep, 1, 1, 0;
}
}
my $i = 0;
print join(' ', grep $keep[$i++], #data), "\n";
}
This runs in almost half the time my original solution took:
$ time ./zz.pl input.data > /dev/null
real 0m21.861s
user 0m21.310s
sys 0m0.280s
Now, it is possible to gain another 45% performance by using Inline::C in a rather dirty way:
#!/usr/bin/env perl
use strict;
use warnings;
use Inline C => <<'END_C'
/*
This code 'works' only in a limited set of circumstances!
Don't expect anything good if you feed it anything other
than plain ASCII
*/
#include <ctype.h>
SV *
extract_fields(char *line, AV *wanted_fields)
{
int ch;
IV current_field = 0;
IV wanted_field = -1;
unsigned char *cursor = line;
unsigned char *field_begin = line;
unsigned char *save_field_begin;
STRLEN field_len = 0;
IV i_wanted = 0;
IV n_wanted = av_len(wanted_fields);
AV *ret = newAV();
while (i_wanted <= n_wanted) {
SV **p_wanted = av_fetch(wanted_fields, i_wanted, 0);
if (!(*p_wanted)) {
croak("av_fetch returned NULL pointer");
}
wanted_field = SvIV(*p_wanted);
while ((ch = *(cursor++))) {
if (!isspace(ch)) {
continue;
}
field_len = cursor - field_begin - 1;
save_field_begin = field_begin;
field_begin = cursor;
current_field += 1;
if (current_field != wanted_field) {
continue;
}
av_push(ret, newSVpvn(save_field_begin, field_len));
break;
}
i_wanted += 1;
}
return newRV_noinc((SV *) ret);
}
END_C
;
And, here is the Perl part. Note that we split only once to figure out the indices of fields to keep. Once we know those, we pass the line and the (1-based) indices to the C routine to slice and dice.
my #keep;
while (my $line = <>) {
unless (#keep) {
#keep = (2, 4, 5);
my #data = split ' ', $line;
push #keep, grep +(($_ - 5) % 3), 6 .. scalar(#data);
}
my $fields = extract_fields($line, \#keep);
print join(' ', #$fields), "\n";
}
$ time ./ww.pl input.data > /dev/null
real 0m11.539s
user 0m11.083s
sys 0m0.300s
input.data was generated using:
$ perl -E 'say join(" ", "A" .. "ZZZZ") for 1 .. 100' > input.data
and it is about 225MB in size.

The code you show doesn't produce those errors. You have no $1 in there at all, and if you meant $i then your use of that variable is fine. The only syntax error is in the line splice($dataColumns; $i; 1) which has semicolons instead of commas, and uses $dataColumns instead of #dataColumns.
Apart from that
It is good practice to declare variables as close as possible to their point of use, not at the top of the program.
Capital letters are generally used for constants like package names. You should use lower case, digits and underscore for variables.
Are you aware you are setting $dataColumnCount to one more than the number of elements in #dataColumns?
It is frowned on more recently to use global file handles - you should use lexical variables instead.
I suggest this refactoring of your program. It uses autodie to avoid having to check the success of the open calls. It builds a list of array indices that need deleting as soon as it can: once the number of fields in each line is known after the first record is read. Then it deletes them from the end backwards to avoid having to do arithmetic on the indices as preceding elements are removed.
#!/usr/local/bin/perl
use strict;
use warnings;
use autodie;
if (#ARGV != 2) {
die "\nNo files supplied, please supply file names\n";
}
my ($infile, $outfile) = #ARGV;
open my $info, '<', $infile;
open my $out, '>', $outfile;
my #remove;
while (<$info>) {
my #data = split;
unless (#remove) {
#remove = (0, 2);
for (my $i = 7; $i < #data; $i += 3) {
push #remove, $i;
}
}
splice #data, $_, 1 for reverse #remove;
print $out join(' ', #data), "\n";
}

While the other answers above work perfectly, and mine probably doesn't present any advantage, this is a different way of achieving the same while avoiding split:
#!/usr/local/bin/perl
use strict;
use warnings;
use feature 'say';
my $dir='D:\\';
open my $fh,"<", "$dir\\test.txt" or die;
while (<$fh>) {
chomp;
my #fields = split ' ';
print "$fields[0] $fields[2] ";
for (my $i=7; $i <= $#fields; $i += 3){
print "$fields[$i] ";
}
print "\n";
}
close $fh;
Please let me know if this is useless.

Related

Issue with nested loop

I got file called numbers.txt which is basically line with 5 numbers:
they look like this:
1 2 3 4 5
What I'm trying to achieve is I want to read those numbers from the line (which already works), then in each iteration I want to add +1 to every number which was read from that file and print them on screen with print, so the final result should look like:
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
.
#!/usr/bin/perl
use strict;
use warnings;
open("handle", 'numbers.txt') or die('unable to open numbers file\n');
$/ = ' ';
OUT: for my $line (<handle>) {
for (my $a = 0; $a < 5; $a++) {
chomp $line;
$line += 1;
print "$line ";
next OUT;
}
}
close("handle");
Haven't done looping in perl for a while now and would be great if someone could provide working example.
Also, it would be great if you could provide more than one working example, just to be future proof ;)
Thanks
You can try this on for size.
#!/usr/bin/perl
use strict;
use warnings;
open("handle", 'numbers.txt') or die('unable to open numbers file\n');
for my $line (<handle>) {
chomp $line;
for my $number (split /\s+/, $line) {
for (my $a = $number; $a < $number+5; $a++) {
print "$a ";
}
print "\n";
}
}
close("handle");
You can dispense with $/=' ' and instead let the outer loop iterate on lines of the file.
For each line you want to iterate for each number which is separated by white space, thus the split /\s+/, $line which gives you a list of numbers for the inner loop.
For your output $a starts at the number read from the file.
This will do what you're after:
use strict;
use warnings;
while(<DATA>) {
chomp;
print "$_\n";
my #split = split;
my $count = 0;
for (1..4){
$count++;
foreach (#split){
my $num = $_ + $count;
print "$num ";
}
print "\n";
}
}
__DATA__
1 2 3 4 5
Here no need to use nested loop it's always program make slower.
#!/usr/bin/perl
use strict;
use warnings;
my #num = split(" ",(<DATA>)[0]);
foreach my $inc (0..$#num)
{
print map{$inc+$_," "}#num; # Add one by one in array element
print "\n";
}
__DATA__
1 2 3 4 5
Update Added another method, this one in line with the posted approach.
Increment each number in the string, changing the string in place. Repeat that. Below are two ways to do that. Yet another method reads individual numbers and prints following integer sequences.
(1) With regular expressions. It also fits in one-liner
echo "1 2 3 4 5" | perl -e '$v = <>; for (1..5) { print $v; $v =~ s/(\d+)/$1+1/eg; }'
This prints the desired output. But better put it in a script
use warnings;
use strict;
my $file = 'numbers.txt';
open my $fh, '<', $file or die "can't open $file: $!";
while (my $line = <$fh>) {
# Add chomp($line) if needed for some other processing.
for (1..5) {
print $line;
$line =~ s/(\d+)/$1+1/eg;
}
}
The /e modifier is crucial for this. It makes the replacement side of the regex be evaluated as code instead of as a double-quoted string. So you can actually execute code there and here we add to the captured number, $1+1, for each matched number as /g moves down the string. This changes the string so the next iteration of the for (1..5) increments those, etc. I match multiple digits, \d+, which isn't necessary in your example but makes far more sense in general.
(2) Via split + map + join, also repeatedly changing the line in place
while (my $line = <$fh>) {
for (1..5) {
print $line;
$line = join ' ', map { $_+1 } split '\s+', $line;
}
}
The split gets the list of numbers from $line and feeds it to map, which increments each, feeding its output list to join. The joined string is assigned back to $line, and this is repeated. I split by \s+ to allow multiple white space but this makes it very 'relaxed' in what input format it accepts, see perlrecharclass. If you know it's one space please change that to ' '.
(3) Take a number at a time and print the integer sequence starting from it.
open my $fh, '<', $file or die "can't open $file: $!";
local $/ = ' ';
while (my $num = <$fh>) {
print "$_ " for $num..$num+4;
print "\n";
}
The magical 4 can be coded by pre-processing the whole line to find the sequence length, say by
my $len = () = $line =~ /(\d+)/g;
or by split-ing into an array and taking its scalar, then using $len-1.
Additional comments.
I recommend the three-argument open, open my $fh, '<', $file
When you check a call print the error, die "Your message: $!", to see the reason for failure. If you decide to quit, if ($bad) { die "Got $bad" }, then you may not need $!. But when an external call fails you don't know the reason so you need the suitable error variable, most often $!.
Your program has a number of problems. Here is what's stopping it working
You are setting the record separator to a single space. Your input file contains "1 2 3 4 5\n", so the while loop will iterate five times setting $line to "1 ", "2 ", "3 ", "4 ", "5\n"
Your for loop is set up to iterate five times. It does chomp $line which removes the space after the number, then increments $line and prints it. Then you jump out of the for loop, having executed it only once, with next OUT. This results in each value in the file being incremented by one and printed, so you get 2 3 4 5 6
Removing the unnecessary next OUT, produces something closer
2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 6 7 8 9 10
There are now five numbers being printed for each number in the input file
Adding print "\n" after the for loop help separate the lines
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
Now we need to print the number before it is incremented instead of afterwards. If we swap $line += 1 and print "$line " we get this
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5
6 7 8 9
What is happening here is that the 5 is still followed be a newline, which now appears in the output. The chomp won't remove this because it removes the value of $/ from the end of a string. You've set that to a space, so it will remove only spaces. The fix is to replace chomp with a substitution s/\s+//g which removes *all whitespace from the string. You also need to do that only once so I've put it outside the for loop at the top
Now we get this
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
And this is your code as it ended up
use strict;
use warnings;
open( "handle", 'numbers.txt' ) or die('unable to open numbers file\n');
$/ = ' ';
for my $line (<handle>) {
$line =~ s/\s+//g;
for ( my $a = 0; $a < 5; $a++ ) {
print "$line ";
$line += 1;
}
print "\n";
}
close("handle");
There are a few other best practices that could improve your program
Use use warnings 'all'
Use lexical file handles, and the three-parameter form of open
Use local if you are changing Perl's built-in variables
Put $! into your die string so that you know why the open failed
Avoid the C-style for loop, and iterate over a list instead
Making these fixes as well looks like this. The output is identical to the above
use strict;
use warnings 'all';
open my $fh, '<', 'numbers.txt'
or die qq{Unable to open "numbers.txt" for input: $!};
local $/ = ' ';
for my $line ( <$fh> ) {
$line =~ s/\s+//g;
for my $a ( 0 .. 4 ) {
print "$line ";
++$line;
}
print "\n";
}

Iterate a simple program

I have the following file:
a b 5
c d 6
e f 7
g h 4
i j 3
k l 10
and I want to find which line present the minimum value in the third column and erase it from the initial file. After this, I want to iterate again the program and find again which line present the minimum and make the same thing for 2 more times.
The output file should be
c d 6
e f 7
k l 10
I tried to write the following code:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file1 = "try.dat";
open my $fg, "<", $file1 or die "Error during $file1 opening: $!";
my #vector;
while (<$fg>) {
push #vector, [ split ];
}
my $minimum = $vector[0][2];
my #blacklist;
for my $i (0 .. $#vector) {
if ($minimum > $vector[$i][2]){
$minimum = $vector[$i][2];
push #blacklist, #vector[$i+1];
}
}
#say "my minimum is $minimum";
#say "the blacklist is composed by #blacklist";
I don't know how to erase the elements contained by the #blacklist (that in the first case should be i j 3) and how to iterate everything.
Any help for the iteration?
This sort of thing is what Tie::File was made for. It allows you to modify the file in-place by modfying a tied array.
This program does what you want. The helper function minidx returns the first index of the element of the passed array that holds the smallest value.
The program works by copying the third field of the file records into array #field3, and finding the index of the smallest value in there. The element at that index is then deleted from both the file and #field3 using splice.
use strict;
use warnings;
use Tie::File;
tie my #file, 'Tie::File', 'file.txt' or die $!;
my #field3 = map { (split)[2] } #file;
for (1 .. 3) {
my $i = minidx(\#field3);
splice #file, $i, 1;
splice #field3, $i, 1;
}
sub minidx {
my ($arr) = #_;
my ($i, $v);
for (0 .. $#$arr) {
($i, $v) = ($_, $arr->[$_]) unless defined $v and $arr->[$_] >= $v;
}
return $i;
}
output
c d 6
e f 7
k l 10
When you say #blacklist = #vector you are adding the entire #vector array to the black list. You probably want to do a push #blacklist, $vector[$i]. That will push the array reference into blacklist.
Now, blacklist has an array ref in it, so you have to deference it to print it.
say "the blacklist is composed by #{$blacklist[0]}";
Edit: For iterating and writing:
I would skip the #blacklist array (unless you need it for something else) and remove the min values from #vector. Then you can write #vector to some file.
my $num_elts_to_remove = 3;
for (my $j = 0; $j < $num_elts_to_remove; $j++) {
my $minimum = $vector[0][2];
my $min_idx = 0;
for my $i (0 .. $#vector) {
if ($minimum > $vector[$i][2]){
$minimum = $vector[$i][2];
$min_idx = $i;
}
}
push #blacklist, $vector[$min_index];
splice #vector, $min_idx, 1; #remove array with smallest value
}
Now write the array to a file
open OUT, ">", $outfile or die "Error: $!";
foreach(#vector) {
print OUT join " ", #$_;
print OUT "\n";
}
close(OUT);
Prints:
c d 6
e f 7
k l 10
Taking Borodin's Tie::File suggestion even further. I have written a cute module called Tie::Array::CSV which allow you to treat a delimited file as an array (and because it uses Tie::File underneath, it is both read and write). Because of this I can use Perlish operations like map and sort (and Schwartzian transform!) to perform this task:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::Array::CSV;
tie my #data, 'Tie::Array::CSV', 'data', sep_char => ' ';
# get a list of row ids sorted by last value (inc)
my $i = 0;
my #sorted =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$i++, $_->[-1]] }
#data;
#splice the rows by index of the lowest three (from bottom to top)
splice #data, $_, 1 for reverse sort #sorted[0..2];
Note that in the end you want to remove rows from the bottom so that you don't have to reindex every time.

$array can't print anything

This is my program , I want to let user type a matrix line by line and print the while matrix , but I can't see the matrix
The user will type
1 2 3
4 5 6
7 8 9
like this
and I want to let it show
1 2 3
4 5 6
7 8 9
Perl program
$Num = 3;
while($Num > 0 )
{
$Row = <STDIN>;
$Row = chomp($Row);
#Row_array = split(" ",$Row);
push #P_matrix , #Row_array;
#Row_array = ();
$Num = $Num - 1;
}
for($i=0;$i<scalar(#P_matrix);$i++)
{
for($j=0;$j<scalar(#P_matrix[$i]);$j++)
{
printf "$d ",$P_matrix[$i][$j];
}
print "\n";
}
I change the expression => printf "$d ",$P_matrix[$i][$j]; to print $P_matrix[$i][$j]
but still don't work.
To create a multi-dimensional array, you have to use references. Use
push #P_matrix, [ #Row_array ];
to create the desired structure.
Also, chomp does not return the modified string. Simply use
chomp $Row;
to remove a newline from $Row. Moreover, chomp is not needed at all if you split on ' '.
printf uses % as the formatting character, not $.
You can use Data::Dumper to inspect complex data structures. Use strict and warnings to help you avoid common problems. Here is how I would write your program:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my #p_matrix;
push #p_matrix , [ split ' ' ] while <>;
warn Dumper \#p_matrix;
for my $i (0 .. $#p_matrix)
{
for my $j (0 .. $#{ $p_matrix[$i] })
{
printf '%d ', $p_matrix[$i][$j];
}
print "\n";
}
First and foremost please use use strict; use warnings;
Issues in your code:
You have a single dimensional array, but your are trying to access
it like two dimensional array. In order to make 2 dimensional array push the array reference of Row_array in #P_matrix as [#Row_array].
Where is $d defined? declare $d as my $d or our $d if you mean $d as scalar variable.
OR
For using %d, use need sprintf. Please read this.

Perl script to extract 2 lines before and after the pattern matching

my file is like
line 1
line 2
line 3
target
line 5
line 6
line 7
I can write a regex that matches the target. What all I need is I need to grab lines 2,3,5,6.
Is there any way to do it?
If you're not determined to use perl you can easily extract the context you want with grep and Context Line Control options
grep -A 2 -B 2 target filename | grep -v target
Of course target will need to be replaced by a suitable regex.
Robert is on the right path. You have to multiline your regex and match the 2 previous and next lines:
#!/usr/bin/perl -w
my $lines = <<EOF
line 1
line 2
line 3
target
line 5
line 6
line 7
EOF
;
# Match a new line, then 2 lines, then target, then 2 lines.
# { $1 } { $3 }
my $re = qr/^.*\n((.*?\n){2})target\n((.*?\n){2}).*$/m;
(my $res = $lines) =~ s/$re/$1$3/;
print $res;
#lines = ('line 1', 'line 2', 'line 3', 'target', 'line 5', 'line 6', 'line 7');
my %answer;
$regex = 'target';
for my $idx (0..$#lines) {
if ($lines[$idx] =~ /$regex/) {
for $ii (($idx - 2)..($idx + 2)){
unless ($lines[$ii] =~ /^$regex$/) {$answer{$ii} = $lines[$ii];}
}
}
}
foreach $key (sort keys %answer) { print "$answer{$key}\n" }
Which yields...
[mpenning#Bucksnort ~]$ perl search.pl
line 2
line 3
line 5
line 6
[mpenning#Bucksnort ~]$
EDIT
Fixed for #leonbloy's comment about multiple target strings in the file
slurp the file to a list / array, find the index of the matching line, and use this index to get the desired values (using offsets)
Although this was asked 8 months ago, I had to rethink this question, since none of the findable solution met with my aims. My goal was to make a script which examines many of huge log files, and makes extracts from them, containing only the wanted lines, putting optional number of lines before and after the line which contains the searched pattern(s) WITHOUT any redundancies. I tried to reuse some of the codes found here, but none of them was good enough for me. So finally I create a unique one, which is probably not the most beautiful, but looks useful, so I'd like to share it with you:
use strict;
my #findwhat = ('x');
my $extraLines = 3;
my #cache = ('') x ($extraLines);
my #stack;
my $lncntr = 0;
my $hit = 0;
my $nextHitWatch = 0;
my $shift = 1;
open (IN, "<test1.log");
while (my $line=<IN>) {
$lncntr++;
chomp $line;
foreach my $what (#findwhat) {if ($line =~ m/$what/i) {$hit = 1; last}}
if ($hit && !$nextHitWatch) {
#stack = #cache;
$hit = 0;
$nextHitWatch++;
}
if (!$hit && $nextHitWatch && $nextHitWatch < ($extraLines * 2) + 2) {
#stack = (#stack, $line);
$nextHitWatch++;
}
if (!$hit && $nextHitWatch && $nextHitWatch == ($extraLines * 2) + 2) {
#stack = (#stack, $line);
for (my $i = 0; $i <= ($#stack - ($extraLines + $shift)); $i++) {
print $stack[$i]. "\n" if $stack[$i];
}
$nextHitWatch = 0;
$shift = 1;
#stack = ();
}
if ($nextHitWatch >= 1 && eof) {
foreach(#stack) {print "$_\n"}
}
if ($nextHitWatch >= 1 && eof) {
if (!$hit) {
my $upValue = 3 + $#stack - ($nextHitWatch - $extraLines + $shift);
$upValue = ($upValue > $#stack) ? $#stack : $upValue;
for (my $i = 0; $i <= $upValue; $i++) {
print $stack[$i] . "\n";
}
} else {
foreach (#stack) {print "$_\n"}
}
}
shift(#cache);
push(#cache, $line);
}
close (IN);
Probably, you will have to change only the values of the list #findwhat and the scalar $extraLines. I hope my code will be useable. (Sorry for my poor English)
multiline the regex, eg: /\n{3}(foo)\n{3}/m;
edit
/\n*(foo)\n*/m works in the general case
One liner version (where -l = chomp and -n = while(<>){}. See perldocperlrun for more options):
perl -lnE '$h{$.}=$_; END {
for ( grep { $h{$_} eq "target" } sort{ $a <=> $b } keys %h ) {
say for #h{$_-2..$_-1 , $_+1..$_+2} } }' data.txt
Script with explanation:
#!perl
use feature 'say';
while (<DATA>) {
chomp;
$hash{$.} = $_ ; # hash entry with line number as key; line contents as value
}
# find the target in the hash and sort keys or line numbers into an array
#matches = sort {$a <=> $b} grep { $hash{$_} eq 'target' } keys %hash;
for (#matches) {
say "before\n" ;
say for #hash{$_-2..$_-1} ; # print the context lines as a hash slice
say ">>>>\" $hash{$.} \"<<<< " ;
say "after\n" ;
say for #hash{$_+1..$_+2} ;
say "";
}
__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target
Output:
before
line 2
line 3
>>>>" target "<<<<
after
line 5
line 6
before
line 6
line 7
>>>>" target "<<<<
after
line of context1
line of context2
before
line of context1
line of context2
>>>>" target "<<<<
after
A simpler version using only arrays and with output that excludes the target as the OP question requested:
#!perl -l
chomp( my #lines = <DATA> ) ;
my $n = 2 ; # context range before/after
my #indexes = grep { $lines[$_] =~ m/target/ } 0..$#lines ;
foreach my $i (#indexes) {
print for #lines[$i-$n..$i-1], #lines[$i+1..$i+$n],"";
}
__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target
This avoids constructing the hash but may be slower on very large files/arrays.
On CPAN List::MoreUtils has indexes() and there is always splice(), but I'm not sure these would make things simpler.

Perl multidimensional array question

I have a program that prints out the location of commas in a paragraph of text in the form
For example if the paragraph is
one,two,three
three and a half
four,five
six
seven,eight
The program will print
0:4
0:8
2:5
4:6
I would like to use this output to create an array where the numbers after the colon are listed across columns in the row specified by the index before the colon. The array formed by the coordinates above would be
4 8
<blank or character '.'>
5
<blank or character '.'>
6
so array[0,0] = 4, array[0,1] = 8
array[1,0] = empty
array[2,0] = 5
etc...
I bet this is simple but I need help to write it.
$data_file="file.out";
open(DAT, $data_file) || die("Could not open file!");
#raw_data=<DAT>;
close(DAT);
my %array;
my $line = 0;
foreach $newline(#raw_data) {
chomp;
while ( $newline=~m/(,|;|:|and)/) {
push #{ $array{$line} }, pos($newline); # autovivification
}
$line++; }
Program
#!/usr/bin/env perl
use strict;
use warnings;
my %array;
my $line = 0;
while (<DATA>) {
chomp;
while ( /(,|;|:|(?:and))/g ) {
my $position = pos() - length($1) + 1;
push #{ $array{$line} }, $position; # autovivification
}
$line++;
}
for ( sort { $a <=> $b } keys %array ) {
my $flattened_value = join ', ', #{ $array{$_} };
print "$_ -> $flattened_value\n";
}
__DATA__
one,two,three
three and a half
four,five
six
seven,eight
Output
0 -> 4, 8
1 -> 7
2 -> 5
4 -> 6
Refer: chomp, join, keys, sort, split.
Refer the following documents to get an understanding of Perl's data structures and especially autovivification which has been used in this example.
perldoc perlref
perldoc perlreftut