Matching a value in 2 D array - perl

#!/usr/bin/perl
my $file = $ARGV[0];
my $value = $ARGV[1];
my #grabbed;
open (FILE, $file);
while (<FILE>) {
if (/alignment# residue#/) {
push #grabbed, $_;
while (<FILE>) {
last if /^$/;
push #grabbed, $_;
}
}
}
close (FILE);
my $line= `awk ' {if(\$2==$value)} ' #grabbed`;
print $line;
Problem :
1.First, I don't know if its possible to do awk on an array or not?
2. I am trying to match a value, existing on the second column of the 2-D array (#grabbed). The #grabbed will look like this :
7 1 M 1.000 6 .VPMLG 66.63
8 2 S 1.000 10 .QINTSARKG 66.63
9 3 V 1.000 13 .KTAVFPRGQMSL 66.63
10 4 L 1.000 7 .SLAKFT 66.63
11 5 L 1.000 14 .ALSVQWIKMRYPF 66.63
12 6 R 1.000 16 .DERSAVGTNQLYMIP 66.63
13 7 S 1.000 18 .GDTHPKRSALFCIQVYN 66.63
14 8 G 1.000 17 .DRFLENGAQPSTYCHM 66.63
15 9 L 1.000 19 .NDHPELASVKRCWFGTQI 66.63
16 10 G 1.000 18 .RLDPEGFTYAVCIKNMH 66.63
I am trying to match and grab the line in which column 2 is of value "9".

No need to swith to awk when that job can be done with perl too.
for ( #grabbed ) {
my #f = split;
if ( $f[1] == $value ) {
push #line, $_;
}
}

It appears that by "2D Array" you mean an array of strings, each string being a whitespace-delimited list of values.
Perl is made for this sort of thing. You could use the other answer's suggestion of splitting each line and looking at each value; however, a simple regular expression would be faster. Replace your awk line with something like this:
foreach (#grabbed)
{
#Match the beginning of the line, possibly some whitespace,
#then some digits, then more whitespace, then the contents of $value
if (/^\s*\d+\s+$value/)
{
#The line matched: do stuff
}
}
Also, will you ever need to look at the lines that don't match? If not, it would be much more efficient not to put the whole file into an array; instead, just do all of your processing in the while loop.

Related

I want to add 2nd and 3rd column if 1st column within range of 1 to 10000

This is sample file with tab separted.
2000 46 26
3000 52 25
5149 4 3
10000 104 32
10500 20 12
13397 0 3
20000 20 12
24489 8 0
I try this with my Perl code, this works fine with one condition then I unable to do the same in when the condition is increased to 10001 to 20000 and 30001 to 40000 and so on, until the end of the file.
I want output as :-
1 10000 102 54
10001 20000 124 47
20001 30000 28 12 so on.....
#! /usr/bin/perl
my $file = "$ARGV[0]";
open (f, $file);
#f = <f>;
foreach $F1 (#f) {
($a, $b, $c) = split(/\t/, $F1);
$x = "1";
$y = "10000" ;
if ( ( $a > $x ) && ( $a <= $y ) ) {
$total += $b ;
$total_1 += $c;
}
#$x = $y;
#$y = $y*2;
}
print "$x\t$y\t$total\t$total_1\n" ;
By starting with the simple case and then trying to build on that, you're actually making things harder than they need to be. This is one example where seeing the bigger picture helps to simplify the code.
You're splitting your data into "buckets" - using the first column to determine which bucket the record should go into and then summing the second and third columns within a bucket.
I would write it something like this.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
# Store bucket totals here
my #totals;
# Read from STDIN
while (<>) {
# Skip blank lines
next unless /\S/;
# Split the data on white space
my #cols = split;
# Calculate the bucket.
# 1 - 10,000 is bucket 0
# 10,001 - 20,000 is bucket 1
# etc...
my $bucket = int($cols[0] / 10_000);
# Each element in #totals is a two-element array.
# The first element is the sum of column two.
# The second element is the sum of column three
$totals[$bucket][0] += $cols[1];
$totals[$bucket][1] += $cols[2];
}
# Walk the #totals array and display the results.
for (0 .. $#totals) {
my $start = ($_ * 10_000) + 1;
my $end = ($_ + 1) * 10_000;
say "$start $end $totals[$_][0] $totals[$_][1]";
}
As we read from <>, there is no need to bother with opening filehandles.
I put this in a file called sum and called it like this:
$ ./sum in.txt
And the result I got was:
1 10000 102 54
10001 10001 124 47
20001 10002 28 12
Which looks correct to me. Let me know if you have any questions.

Issue with nested loop

I got file called numbers.txt which is basically line with 5 numbers:
they look like this:
1 2 3 4 5
What I'm trying to achieve is I want to read those numbers from the line (which already works), then in each iteration I want to add +1 to every number which was read from that file and print them on screen with print, so the final result should look like:
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
.
#!/usr/bin/perl
use strict;
use warnings;
open("handle", 'numbers.txt') or die('unable to open numbers file\n');
$/ = ' ';
OUT: for my $line (<handle>) {
for (my $a = 0; $a < 5; $a++) {
chomp $line;
$line += 1;
print "$line ";
next OUT;
}
}
close("handle");
Haven't done looping in perl for a while now and would be great if someone could provide working example.
Also, it would be great if you could provide more than one working example, just to be future proof ;)
Thanks
You can try this on for size.
#!/usr/bin/perl
use strict;
use warnings;
open("handle", 'numbers.txt') or die('unable to open numbers file\n');
for my $line (<handle>) {
chomp $line;
for my $number (split /\s+/, $line) {
for (my $a = $number; $a < $number+5; $a++) {
print "$a ";
}
print "\n";
}
}
close("handle");
You can dispense with $/=' ' and instead let the outer loop iterate on lines of the file.
For each line you want to iterate for each number which is separated by white space, thus the split /\s+/, $line which gives you a list of numbers for the inner loop.
For your output $a starts at the number read from the file.
This will do what you're after:
use strict;
use warnings;
while(<DATA>) {
chomp;
print "$_\n";
my #split = split;
my $count = 0;
for (1..4){
$count++;
foreach (#split){
my $num = $_ + $count;
print "$num ";
}
print "\n";
}
}
__DATA__
1 2 3 4 5
Here no need to use nested loop it's always program make slower.
#!/usr/bin/perl
use strict;
use warnings;
my #num = split(" ",(<DATA>)[0]);
foreach my $inc (0..$#num)
{
print map{$inc+$_," "}#num; # Add one by one in array element
print "\n";
}
__DATA__
1 2 3 4 5
Update Added another method, this one in line with the posted approach.
Increment each number in the string, changing the string in place. Repeat that. Below are two ways to do that. Yet another method reads individual numbers and prints following integer sequences.
(1) With regular expressions. It also fits in one-liner
echo "1 2 3 4 5" | perl -e '$v = <>; for (1..5) { print $v; $v =~ s/(\d+)/$1+1/eg; }'
This prints the desired output. But better put it in a script
use warnings;
use strict;
my $file = 'numbers.txt';
open my $fh, '<', $file or die "can't open $file: $!";
while (my $line = <$fh>) {
# Add chomp($line) if needed for some other processing.
for (1..5) {
print $line;
$line =~ s/(\d+)/$1+1/eg;
}
}
The /e modifier is crucial for this. It makes the replacement side of the regex be evaluated as code instead of as a double-quoted string. So you can actually execute code there and here we add to the captured number, $1+1, for each matched number as /g moves down the string. This changes the string so the next iteration of the for (1..5) increments those, etc. I match multiple digits, \d+, which isn't necessary in your example but makes far more sense in general.
(2) Via split + map + join, also repeatedly changing the line in place
while (my $line = <$fh>) {
for (1..5) {
print $line;
$line = join ' ', map { $_+1 } split '\s+', $line;
}
}
The split gets the list of numbers from $line and feeds it to map, which increments each, feeding its output list to join. The joined string is assigned back to $line, and this is repeated. I split by \s+ to allow multiple white space but this makes it very 'relaxed' in what input format it accepts, see perlrecharclass. If you know it's one space please change that to ' '.
(3) Take a number at a time and print the integer sequence starting from it.
open my $fh, '<', $file or die "can't open $file: $!";
local $/ = ' ';
while (my $num = <$fh>) {
print "$_ " for $num..$num+4;
print "\n";
}
The magical 4 can be coded by pre-processing the whole line to find the sequence length, say by
my $len = () = $line =~ /(\d+)/g;
or by split-ing into an array and taking its scalar, then using $len-1.
Additional comments.
I recommend the three-argument open, open my $fh, '<', $file
When you check a call print the error, die "Your message: $!", to see the reason for failure. If you decide to quit, if ($bad) { die "Got $bad" }, then you may not need $!. But when an external call fails you don't know the reason so you need the suitable error variable, most often $!.
Your program has a number of problems. Here is what's stopping it working
You are setting the record separator to a single space. Your input file contains "1 2 3 4 5\n", so the while loop will iterate five times setting $line to "1 ", "2 ", "3 ", "4 ", "5\n"
Your for loop is set up to iterate five times. It does chomp $line which removes the space after the number, then increments $line and prints it. Then you jump out of the for loop, having executed it only once, with next OUT. This results in each value in the file being incremented by one and printed, so you get 2 3 4 5 6
Removing the unnecessary next OUT, produces something closer
2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 6 7 8 9 10
There are now five numbers being printed for each number in the input file
Adding print "\n" after the for loop help separate the lines
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
Now we need to print the number before it is incremented instead of afterwards. If we swap $line += 1 and print "$line " we get this
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5
6 7 8 9
What is happening here is that the 5 is still followed be a newline, which now appears in the output. The chomp won't remove this because it removes the value of $/ from the end of a string. You've set that to a space, so it will remove only spaces. The fix is to replace chomp with a substitution s/\s+//g which removes *all whitespace from the string. You also need to do that only once so I've put it outside the for loop at the top
Now we get this
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
And this is your code as it ended up
use strict;
use warnings;
open( "handle", 'numbers.txt' ) or die('unable to open numbers file\n');
$/ = ' ';
for my $line (<handle>) {
$line =~ s/\s+//g;
for ( my $a = 0; $a < 5; $a++ ) {
print "$line ";
$line += 1;
}
print "\n";
}
close("handle");
There are a few other best practices that could improve your program
Use use warnings 'all'
Use lexical file handles, and the three-parameter form of open
Use local if you are changing Perl's built-in variables
Put $! into your die string so that you know why the open failed
Avoid the C-style for loop, and iterate over a list instead
Making these fixes as well looks like this. The output is identical to the above
use strict;
use warnings 'all';
open my $fh, '<', 'numbers.txt'
or die qq{Unable to open "numbers.txt" for input: $!};
local $/ = ' ';
for my $line ( <$fh> ) {
$line =~ s/\s+//g;
for my $a ( 0 .. 4 ) {
print "$line ";
++$line;
}
print "\n";
}

How to compare the second line with the first line in one single file using perl? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I am new to Perl, Currently, I am using Perl to do some text processing. There are four columns in the input file, separated by tab. I want to find the minimum of column 3 and maximum of column 4 and put them in one line for the same ID. Below shows how the input file look like:
A A1 1 5
A A1 9 18
A A1 23 40
A A2 20 30
A A2 35 43
B A1 2 10
B A1 12 30
B A1 35 100
C A9 2 40
C A9 45 70
My desired output:
A A1 1 40
A A2 23 43
B A1 2 100
C A9 2 70
Perl from command line,
perl -anE'
$k = join "\t", #F[0,1];
$h{$k} or push #r, $k;
(!defined or $_ >$F[2]) and $_ = $F[2] for $h{$k}{m};
($_ <$F[3]) and $_ = $F[3] for $h{$k}{M};
}{
say join "\t", $_, #{$h{$_}}{qw(m M)} for #r
' file
output
A A1 1 40
A A2 20 43
B A1 2 100
C A9 2 70
Reading data file line by line, using the combination of first two columns as key of a record hash, and rembering in that hash the minimum column three and maximum column four. If you want to keep the order of those keys, also push them to an array.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(switch say);
use Data::Dumper;
my (%record, #key);
while (<>) {
chomp;
my #field = split /\s+/;
my $key = join "\t", #field[0,1];
push #key, $key unless $record{$key};
if (!$record{$key}{min} || $record{$key}{min} > $field[2]) {
$record{$key}{min} = $field[2];
}
if (!$record{$key}{max} || $record{$key}{max} < $field[3]) {
$record{$key}{max} = $field[3];
}
}
for my $key (#key) {
print (join "\t", $key, $record{$key}{min}, $record{$key}{max}, "\n");
}
Something like this?
use strict;
use warnings;
open my $fh, '<', 'input-data.txt';
# Keep track of the current minimum and maximum
# values while we read the file.
#
my (%val1_min, %val2_max);
while (<$fh>) ## loop through lines of file
{
chomp; ## remove trailing "\n" character
# Split on sequences of whitespace
#
my ($key1, $key2, $val1, $val2) = split /\s+/;
# Record a new minimum if there is no old
# minimum, or if the old minimum is higher
# than the current value.
#
$val1_min{$key1}{$key2} = $val1
if !defined($val1_min{$key1}{$key2})
or $val1_min{$key1}{$key2} > $val1;
# Record a new maximum if there is no old
# maximum, or if the old maximum is lower
# than the current value.
#
$val2_max{$key1}{$key2} = $val2
if !defined($val2_max{$key1}{$key2})
or $val2_max{$key1}{$key2} < $val2;
}
# Now we need to produce some output.
#
# Loop through the first level of keys.
#
for my $key1 (sort keys %val1_min)
{
# Loop through the second level of keys.
#
for my $key2 (sort keys %{$val1_min{$key1}})
{
# Print a line of output to STDOUT.
#
printf(
"%-04s %-04s %3d %3d\n", ## formatting string
$key1, ## first key
$key2, ## second key
$val1_min{$key1}{$key2}, ## minimum first value
$val2_max{$key1}{$key2}, ## maximum second value
);
}
}
Using command line perl:
perl -MList::Util=max,min -lane '
$k = join "\t", splice #F, 0, 2;
push #k, $k if !$v{$k};
push #{$v{$k}[$_]}, $F[$_] for (0..$#F);
}{
print join "\t", $_, min(#{$v{$_}[0]}), max(#{$v{$_}[1]}) for #k;
' file.txt
Outputs:
A A1 1 40
A A2 20 43
B A1 2 100
C A9 2 70

How to grab multiple lines after matching a line in Perl?

My file looks like this:
1 15
2 16
3 18
4 19
5 25
6 30
7 55
8 45
9 34
10 52
If the matched pattern is 30 in line 6, I would like to grab N lines before and M lines after the line 6, for example if N=3 and M=4 so the result is expected to be like this:
3 18
4 19
5 25
6 30
7 55
8 45
9 34
10 52
I am a very new beginner in Perl and any advice would be appreciated.
﹟UPDATE
Many thanks for these helpful advice below and I really appreciate them.
Here is my updated code for this and any suggestions are welcome!
my $num;
while(<>)
{
if ( /pattern/)
{$num = $. ;}
}
open (,"") || die ("Can't open the file");
while(<>)
{
if ( $. >= $num-N and $. <=$num+M)
{
print OUT "$_ \r";
}
}
Maintain an array (I'll call it #preceding) of the last N lines read. When the pattern is matched, stop updating this array and start inserting lines into another array (#following). Do this until #following has M lines in it.
It should look something like this (fixed now thanks to ikegami):
my $matched = 0;
my #preceding;
my #following;
while(<>){
if ($matched){
push ( #following, $_);
last if #following == M;
next;
}
else {
push ( #preceding, $_);
shift(#preceding) if #preceding > N;
}
$matched = 1 if /pattern/;
}
my #lines = <>;
foreach $idx (grep { $lines[$_] =~ /pattern/ } 0..$#lines) {
print join (map {$lines[$_]} grep { $_ >= $idx - $B && $_ <= $idx +$A } 0..$#lines)."\n";
}
You can also use the GNU grep command, with -A,-B flags for that exact purpose.
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing -- between contiguous groups of
matches.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing -- between contiguous groups of
matches.

How to print/extract information listed under a column from two dimensional array in Perl?

I have a output file which is a two dimensional array (this file was output generated after running script written to produce 2D array) and I have to read information under a particular column, say column 1. In other words, how do I read and print out information listed, corresponding to all the rows, under column 1.
Any suggestions?
__DATA__
1 2 3 4 5 6 7 8 9
A B C D E F G H I
93 48 57 66 52 74 33 22 91
From the above data I want to extract information column wise, say if I want information from column 1, I should be able to list only the following output.
want to list Then I want
OUTPUT:
1
A
93
Final version after all corrections:
#!/usr/bin/perl
use strict;
use warnings;
my $column_to_show = 0;
while ( <DATA> ) {
last unless /\S/;
print +(split)[$column_to_show], "\n";
}
__DATA__
1 2 3 4 5 6 7 8 9
A B C D E F G H I
93 48 57 66 52 74 33 22 91
Output:
C:\Temp> u
1
A
93
Explanation of print +(split)[$column_to_show], "\n";:
perldoc -f split:
Splits the string EXPR into a list of strings and returns that list.
...
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted,
splits on whitespace (after skipping any leading whitespace).
So: (split)[3] selects the fourth element of the list returned by split. The + in front of (split) is necessary to help perl parse the expression correctly. See perldoc -f print:
Also be careful not to follow the
print keyword with a left parenthesis
unless you want the corresponding
right parenthesis to terminate the
arguments to the print — interpose a +
or put parentheses around all the
arguments.
I thoroughly recommend every Perl programmer to occasionally skim through all of the documentation perldoc perltoc. It is on your computer.
my $line ;
foreach $line (#DATA)
{
my #DATA1 = split( / +/, "$line" );
print "first element of array is $DATA1[0]";
}
__DATA__
1 2 3 4 5 6 7 8 9
A B C D E F G H I
93 48 57 66 52 74 33 22 91
OUTPUT:-
1
A
93
Try playing with this code. Basically I load the data into an array of arrays
Each line is a reference to a row.
#!/usr/bin/perl
use strict;
use warnings;
my $TwoDimArray;
while (my $line=<DATA>) {
push #$TwoDimArray, [split(/,/,$line)];
};
for my $column (0..2) {
print "[$column,0] : " . $TwoDimArray->[0]->[$column] ."\n";
print "[$column,1] : " . $TwoDimArray->[1]->[$column] ."\n";
print "\n";
}
__DATA__
1,2,3,04,05,06
7,8,9,10,11,12
The map function is your friend:
open FILE, "data.txt";
while ($line = <FILE>) {
chomp($line);
push #data, [split /[, ]+/, $line];
}
close FILE;
#column1 = map {$$_[0]} #data;
print "#column1\n";
And in data.txt something like:
1, 2, 3, 4
5, 6, 7, 8
9, 10, 11, 12
13, 14, 15, 16
perl -lne '#F = split /\s+/ and print $F[1]'
This might be what you want:
use English qw<$OS_ERROR>; # Or just use $!
use IO::Handle;
my #columns;
open my $fh, '<', 'columns.dat' or die "I'm dead. $OS_ERROR";
while ( my $line = <$fh> ) {
my #cols = split /\s+/, $line;
$columns[$_][$fh->input_line_number()-1] = $cols[$_] foreach 0..$#cols;
}
$fh->close();
You can access them directly by element.
$arrays[0][0] = 1;
$arrays[0][1] = 2;
$arrays[1][0] = 3;
$arrays[1][1] = 4;
for (my $i = 0; $i <= $#{$arrays[1]}; $i++) {
print "row for $i\n";
print "\tfrom first array: " . $arrays[0][$i] . "\n";
print "\tfrom second array: " . $arrays[1][$i] . "\n";
}
prints
row for 0
from first array: 1
from second array: 3
row for 1
from first array: 2
from second array: 4