Comparing two files, where one piece of information can be flexible - perl

Comparing two files. So easy, but comparing two files where one piece of information can be flexible is proving to be very challenging for me.
fileA
4 "dup" 37036335 37044984
3 "dup" 100146708 100147504
7 "del" 100 203
2 "dup" 34 89
fileB
4 "dup" 37036335 37036735
3 "dup" 100146708 100147504
4 "dup" 68 109
Anticipated output:
output_file1 (matching hits)
fileA: 4 "dup" 37036335 37044984
fileB: 4 "dup" 37036335 37036735
fileA: 3 "dup" 100146708 100147504
fileB: 3 "dup" 100146708 100147504
output_file2 (found in fileA, but not in FileB including non-overlap)
7 "del" 100 203
2 "dup" 34 89
output_file3 (found in fileB, but not in FileA including non-overlap)
4 "dup" 68 109
The credentials are...
I need field 1 and field 2 in the first file to exactly match the second file and the coordinates in field 3 to be exact or overlap.
This would mean these are the same.
fileA :4 "dup" 37036335 37044984
fileB :4 "dup" 37036335 37036735
I also need to find differences between the two files. (no-overlap, 1 row isn't present in one file, but not in the other, etc)
Here's the gist of what I've tried. I've written this code probably 4 different ways, alas, still no success. I've put both files into arrays (I've tried a hash too...idk)
## if no hits in original, but hits in calculated
if((! #ori) && (#calc)){}
## if CNV calls in original, but none in calculated
if((#ori) && (! #calc)){}
## if CNV calls in both
if((#ori) && (#calc)){
## compare calls with double 'for' loop
foreach my $l (#ori){
my #l = split(/\s/,$l);
my $Ochromosome = $l[0];
my $Ostart = $l[2];
my $Oend = $l[3];
my $Otype = $l[1];
foreach my $l (#calc){
my #l = split(/\s/,$l);
my $Cchromosome = $l[0];
my $Cstart = $l[2];
my $Cend = $l[3];
my $Ctype = $l[1];
## check chromosome and type here
if(($Ochromosome eq $Cchromosome) && ($Otype eq $Ctype)){ ## what if there are two duplications on the same chromosome?
## check coordinates
if(($Ostart <= $Cend) && ($Cstart <= $Oend)){
## overlap
}else{
## noOverlap
}
}else{
## what if there is something found in one, but not in the other and they both have calls?
## ahhhh
}
}
}

Here is a simple solution which is also fairly efficient.
Iterate over lines of one file, checking each against all lines of the other (until a match is found). This is the very least we must do complexity wise, given all information that needs to be gathered.
If a line from A is not found in B, it is added to #not_in_B. To determine which lines in B are not in A, we prepare a hash where each element of B is a key with a value 0. Once/if an element of B is found, the value of its key in the hash is set to 1. Those that are not 1 at the end have never been found by elements of A, and so are the extra ones. They go in #not_in_A.
Both files are first read into arrays for simplicity (but this is needed for the inner loop).
use warnings;
use strict;
use feature 'say';
my $f1 = 'f1.txt';
my $f2 = 'f2.txt';
open my $fh, '<', $f1;
my #a1 = <$fh>; chomp(#a1);
open $fh, '<', $f2;
my #a2 = <$fh>; chomp(#a2);
close $fh;
my (#not_in_A, #not_in_B);
my %Bs_in_A = map { $_ => 0 } #a2;
foreach my $e1 (#a1)
{
my $match = 0;
foreach my $e2 (#a2)
{
if ( lines_match($e1, $e2) ) {
$match = 1;
say "Match:\n\tf1: $e1\n\tf2: $e2";
$Bs_in_A{$e2} = 1;
last;
}
}
push #not_in_B, $e1 if not $match;
}
#not_in_A = grep { $Bs_in_A{$_} == 0 } keys %Bs_in_A;
say '---';
say "Elements of A that are not in B:";
say "\t$_" for #not_in_B;
say "Elements of B that are not in A:";
say "\t$_" for #not_in_A;
sub lines_match
{
my ($l1, $l2) = #_;
my #t1 = split ' ', $l1;
my #t2 = split ' ', $l2;
# First two fields must be the same
return if $t1[0] ne $t2[0] or $t1[1] ne $t2[1];
# Third-to-fourth-field ranges must overlap
return
if ($t1[2] < $t2[2] and $t1[3] < $t2[2])
or ($t1[2] > $t2[3] and $t1[3] > $t2[3]);
return 1; # match
}
Output
Match:
f1: 4 "dup" 37036335 37044984
f2: 4 "dup" 37036335 37036735
Match:
f1: 3 "dup" 100146708 100147504
f2: 3 "dup" 100146708 100147504
---
Elements of A that are not in B:
7 "del" 100 203
2 "dup" 34 89
Elements of B that are not in A:
4 "dup" 68 109
Note that I've used 1 in place of A and 2 in place of B.

Related

horizontal absolute values of every line

I am trying to calculate the absolute values of line 2 - values of line 1
and then the horizontal absolute values of every line in my input file. Here's a part of that input.
43 402 51 360
63 60 69 63
65 53 89 55
103 138 135 135
109 36 123 38
To be more precise about what im trying to do I made the following example
initial data
0 2 0 0
0 1 1 1
next stage (absolute value after subscription the second line minus the first line)
2 2 0
1 0 0
final stage (horizontal application of abs values until one column remained)
0
1
The below code was a failed attempt to obtain the final stage of the single column. My problem here is that I don't know how to obtain the final (desired) stage by using subroutine, as I believe that it is a better way to solving my problem. Of course, every idea or better approach is welcome.
#!/usr/bin/perl
use feature qw(say);
use strict;
use warnings;
use Data::Dumper;
my #rows = 'table_only_numbers';
open(my $fh, '<:encoding(UTF-8)', $rows)
sub ori {
for ($num_cols=#{ $rows[$r-1]}; $num_cols=1; $num_cols-- ){
my #diff_diffs = map { abs($diffs[$_-1] - $diffs[$_]) } 1..$num_cols-1;
#final=#diff_diffs;
say join ' ',#final;
return (final) }
my $num_cols = #{ $rows[0] };
for my $r (1..$#rows) {
die "Bad format!" if #{ $rows[$r] } != $num_cols;
my #diffs = map { abs($rows[$r-1][$_] - $rows[$r][$_]) } 0..$num_cols-1;
while ($num_cols>1)
{
$final_output = ori(#{ $rows[0] })
say "final_output";
}
}
close $fh;
Finally, I figure it by myself without subroutines!!! Im posting it in case someone face the same issue in the future.I know that it is an easiest way to do it but as I am newbie in Perl it is the easiest way for me.
So I used:
for the first abs of the line 2 minus the line 1
my #data = map { abs($current[$_]-$previous[$_]) } 0..$#current;
push #final, \#data;
To obtain the absolute value of row 2 minus the row 1
And after I used 3 times as I had 3 columns left (in my case) the following coding line and each time I Substituted the #xxx with a new variable. and I have desired output of I column.
foreach my $row (#XXX) {
my #data = map { abs(#{$row}[$_]-#{$row}[$_+1]) } 0..$#{$row}-1;
say join ' ', #data;
push #XXX, \#data;}

Is it normal that two different versions of perl produce different results?

I am trying to perform some stack analysis on an MCU following the steps described here. The site links then to a Perl script that I launch as a post-build operation by meanings of a simple batch file.
The IDEA based on Eclipse uses the Perl executable at the path:
C:\..\S32DS_ARM_v2018.R1\utils\msys32\usr\bin\perl.exe
perl.exe -v gives:
This is perl 5, version 22, subversion 1 (v5.22.1) built for i686-msys-thread-multi-64int
The OS (windows) has a perl installation at
C:\Perl64\bin\perl.exe
perl.exe -v gives:
This is perl 5, version 24, subversion 3 (v5.24.3) built for MSWin32-x64-multi-thread
(with 1 registered patch, see perl -V for more detail)
I can confirm that avstak.pl (the perl script I am referring some lines above) produces different results with the former or the latter.
WHY this happens, is out of my area of expertise at the moment.
What I would like to understand is
Understand why this is happening;
Understand which perl provides the right outputs (pretty sure I suppose the 5.24.3 is the correct one);
Learning how to prevent this issue if I am going to use perl in future.
Thanks and best regards,
L.
Edit: the outcome of the script with the two different perl versions (reduced output for readability):
This one is result_5.22.1
Func Cost Frame Height
------------------------------------------------------------------------
> I2C_MasterGetTransferStatus 292 292 1
> FLEXIO_I2C_DRV_MasterStartTransfer 236 236 1
> CLOCK_DRV_Init 172 172 1
> CLOCK_SYS_SetConfiguration 172 172 1
> EDMA_DRV_ConfigScatterGatherTransfer 132 132 1
> CLOCK_SYS_SetSystemClockConfig 76 76 1
> FLEXIO_I2C_DRV_MasterInit 60 60 1
> EDMA_DRV_ConfigSingleBlockTransfer 60 60 1
> main 52 52 1
> LPI2C_DRV_MasterSetBaudRate 52 52 1
> LPI2C_DRV_MasterStartDmaTransfer 52 52 1
> FLEXIO_DRV_InitDriver 52 52 1
> I2C_MasterInit 44 44 1
> LPI2C_DRV_SlaveStartDmaTransfer 44 44 1
> CLOCK_SYS_UpdateConfiguration 44 44 1
> CLOCK_DRV_SetClockSource 44 44 1
> LPI2C_DRV_SlaveInit 44 44 1
> EDMA_DRV_Init 44 44 1
> EDMA_DRV_Deinit 36 36 1
> CLOCK_SYS_ConfigureSOSC 36 36 1
> CLOCK_SYS_ConfigureFIRC 36 36 1
vs
result_5.24.3
Func Cost Frame Height
------------------------------------------------------------------------
> main 536 52 9
I2C_MasterSendDataBlocking 484 28 8
> I2C_MasterReceiveDataBlocking 484 28 8
> I2C_MasterReceiveData 468 20 7
> I2C_MasterSendData 468 20 7
FLEXIO_I2C_DRV_MasterReceiveDataBlocking 456 28 7
FLEXIO_I2C_DRV_MasterSendDataBlocking 456 28 7
FLEXIO_I2C_DRV_MasterSendData 448 20 5
FLEXIO_I2C_DRV_MasterReceiveData 448 20 5
FLEXIO_I2C_DRV_MasterStartTransfer 428 236 4
> I2C_MasterGetTransferStatus 408 292 6
CLOCK_SYS_UpdateConfiguration 336 44 6
CLOCK_SYS_SetConfiguration 292 172 5
> CLOCK_DRV_Init 292 172 5
LPI2C_DRV_MasterReceiveDataBlocking 256 20 7
> I2C_SlaveReceiveDataBlocking 252 12 8
> I2C_SlaveSendDataBlocking 252 12 8
As you can see the hight number in the first version doesn't increase (and it should).
Cost and frame suffer the same issue I suppose.
the script is here:
#!/usr/bin/perl -w
# avstack.pl: AVR stack checker
# Copyright (C) 2013 Daniel Beer <dlbeer#gmail.com>
#
# Permission to use, copy, modify, and/or distribute this software for
# any purpose with or without fee is hereby granted, provided that the
# above copyright notice and this permission notice appear in all
# copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
# WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
# AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
# DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
# PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
# TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
# PERFORMANCE OF THIS SOFTWARE.
#
# Usage
# -----
#
# This script requires that you compile your code with -fstack-usage.
# This results in GCC generating a .su file for each .o file. Once you
# have these, do:
#
# ./avstack.pl <object files>
#
# This will disassemble .o files to construct a call graph, and read
# frame size information from .su. The call graph is traced to find, for
# each function:
#
# - Call height: the maximum call height of any callee, plus 1
# (defined to be 1 for any function which has no callees).
#
# - Inherited frame: the maximum *inherited* frame of any callee, plus
# the GCC-calculated frame size of the function in question.
#
# Using these two pieces of information, we calculate a cost (estimated
# peak stack usage) for calling the function. Functions are then listed
# on stdout in decreasing order of cost.
#
# Functions which are recursive are marked with an 'R' to the left of
# them. Their cost is calculated for a single level of recursion.
#
# The peak stack usage of your entire program can usually be estimated
# as the stack cost of "main", plus the maximum stack cost of any
# interrupt handler which might execute.
use strict;
# Configuration: set these as appropriate for your architecture/project.
my $objdump = "arm-none-eabi-objdump";
my $call_cost = 4;
# First, we need to read all object and corresponding .su files. We're
# gathering a mapping of functions to callees and functions to frame
# sizes. We're just parsing at this stage -- callee name resolution
# comes later.
my %frame_size; # "func#file" -> size
my %call_graph; # "func#file" -> {callees}
my %addresses; # "addr#file" -> "func#file"
my %global_name; # "func" -> "func#file"
my %ambiguous; # "func" -> 1
foreach (#ARGV) {
# Disassemble this object file to obtain a callees. Sources in the
# call graph are named "func#file". Targets in the call graph are
# named either "offset#file" or "funcname". We also keep a list of
# the addresses and names of each function we encounter.
my $objfile = $_;
my $source;
open(DISASSEMBLY, "$objdump -dr $objfile|") ||
die "Can't disassemble $objfile";
while (<DISASSEMBLY>) {
chomp;
if (/^([0-9a-fA-F]+) <(.*)>:/) {
my $a = $1;
my $name = $2;
$source = "$name\#$objfile";
$call_graph{$source} = {};
$ambiguous{$name} = 1 if defined($global_name{$name});
$global_name{$name} = "$name\#$objfile";
$a =~ s/^0*//;
$addresses{"$a\#$objfile"} = "$name\#$objfile";
}
if (/: R_[A-Za-z0-9_]+_CALL[ \t]+(.*)/) {
my $t = $1;
if ($t eq ".text") {
$t = "\#$objfile";
} elsif ($t =~ /^\.text\+0x(.*)$/) {
$t = "$1\#$objfile";
}
$call_graph{$source}->{$t} = 1;
}
}
close(DISASSEMBLY);
# Extract frame sizes from the corresponding .su file.
if ($objfile =~ /^(.*).o$/) {
my $sufile = "$1.su";
open(SUFILE, "<$sufile") || die "Can't open $sufile";
while (<SUFILE>) {
$frame_size{"$1\#$objfile"} = $2 + $call_cost
if /^.*:([^\t ]+)[ \t]+([0-9]+)/;
}
close(SUFILE);
}
}
# In this step, we enumerate each list of callees in the call graph and
# try to resolve the symbols. We omit ones we can't resolve, but keep a
# set of them anyway.
my %unresolved;
foreach (keys %call_graph) {
my $from = $_;
my $callees = $call_graph{$from};
my %resolved;
foreach (keys %$callees) {
my $t = $_;
if (defined($addresses{$t})) {
$resolved{$addresses{$t}} = 1;
} elsif (defined($global_name{$t})) {
$resolved{$global_name{$t}} = 1;
warn "Ambiguous resolution: $t" if defined ($ambiguous{$t});
} elsif (defined($call_graph{$t})) {
$resolved{$t} = 1;
} else {
$unresolved{$t} = 1;
}
}
$call_graph{$from} = \%resolved;
}
# Create fake edges and nodes to account for dynamic behaviour.
$call_graph{"INTERRUPT"} = {};
foreach (keys %call_graph) {
$call_graph{"INTERRUPT"}->{$_} = 1 if /^__vector_/;
}
# Trace the call graph and calculate, for each function:
#
# - inherited frames: maximum inherited frame of callees, plus own
# frame size.
# - height: maximum height of callees, plus one.
# - recursion: is the function called recursively (including indirect
# recursion)?
my %has_caller;
my %visited;
my %total_cost;
my %call_depth;
sub trace {
my $f = shift;
if ($visited{$f}) {
$visited{$f} = "R" if $visited{$f} eq "?";
return;
}
$visited{$f} = "?";
my $max_depth = 0;
my $max_frame = 0;
my $targets = $call_graph{$f} || die "Unknown function: $f";
if (defined($targets)) {
foreach (keys %$targets) {
my $t = $_;
$has_caller{$t} = 1;
trace($t);
my $is = $total_cost{$t};
my $d = $call_depth{$t};
$max_frame = $is if $is > $max_frame;
$max_depth = $d if $d > $max_depth;
}
}
$call_depth{$f} = $max_depth + 1;
$total_cost{$f} = $max_frame + ($frame_size{$f} || 0);
$visited{$f} = " " if $visited{$f} eq "?";
}
foreach (keys %call_graph) { trace $_; }
# Now, print results in a nice table.
printf " %-30s %8s %8s %8s\n",
"Func", "Cost", "Frame", "Height";
print "------------------------------------";
print "------------------------------------\n";
my $max_iv = 0;
my $main = 0;
foreach (sort { $total_cost{$b} <=> $total_cost{$a} } keys %visited) {
my $name = $_;
if (/^(.*)#(.*)$/) {
$name = $1 unless $ambiguous{$name};
}
my $tag = $visited{$_};
my $cost = $total_cost{$_};
$name = $_ if $ambiguous{$name};
$tag = ">" unless $has_caller{$_};
if (/^__vector_/) {
$max_iv = $cost if $cost > $max_iv;
} elsif (/^main#/) {
$main = $cost;
}
if ($ambiguous{$name}) { $name = $_; }
printf "%s %-30s %8d %8d %8d\n", $tag, $name, $cost,
$frame_size{$_} || 0, $call_depth{$_};
}
print "\n";
print "Peak execution estimate (main + worst-case IV):\n";
printf " main = %d, worst IV = %d, total = %d\n",
$total_cost{$global_name{"main"}},
$total_cost{"INTERRUPT"},
$total_cost{$global_name{"main"}} + $total_cost{"INTERRUPT"};
print "\n";
print "The following functions were not resolved:\n";
foreach (keys %unresolved) { print " $_\n"; }
Edit2:
As Amon suggested to check, subsequent iterations of the script on the same dataset doesn't produce the same output. Values (cost/frame/height) are always the same but the order in which the functions are reported is different.

Perl: perl regex for extracting values from complex lines

Input log file:
Nservdrx_cycle 4 servdrx4_cycle
HCS_cellinfo_st[10] (type = (LTE { 2}),cell_param_id = (28)
freq_info = (10560),band_ind = (rsrp_rsrq{ -1}),Qoffset1 = (0)
Pcompensation = (0),Qrxlevmin = (-20),cell_id = (7),
agcreserved{3} = ({ 0, 0, 0 }))
channelisation_code1 16/5 { 4} channelisation_code1
sync_ul_info_st_ (availiable_sync_ul_code = (15),uppch_desired_power =
(20),power_ramping_step = (3),max_sync_ul_trans = (8),uppch_position_info =
(0))
trch_type PCH { 7} trch_type8
last_report 0 zeroth bit
I was trying to extract only integer for my above inputs but I am facing some
issue with if the string contain integer at the beginning and at the end
For ( e.g agcreserved{3},HCS_cellinfo_st[10],Qoffset1)
here I don't want to ignore {3},[10] and 1 but in my code it does.
since I was extracting only integer.
Here I have written simple regex for extracting only integer.
MY SIMPLE CODE:
use strict;
use warnings;
my $Ipfile = 'data.txt';
open my $FILE, "<", $Ipfile or die "Couldn't open input file: $!";
my #array;
while(<$FILE>)
{
while ($_ =~ m/( [+-]?\d+ )/xg)
{
push #array, ($1);
}
}
print "#array \n";
output what I am getting for above inputs:
4 4 10 2 28 10560 -1 1 0 0 -20 7 3 0 0 0 1 16 5 4 1 15 20 3 8 0 7 8 0
expected output:
4 2 28 10560 -1 0 0 -20 7 0 0 0 4 15 20 3 8 0 7 0
If some body can help me with explanation ?
You are catching every integer because your regex has no restrictions on which characters can (or can not) come before/after the integer. Remember that the /x modifier only serves to allow whitespace/comments inside your pattern for readability.
Without knowing a bit more about the possible structure of your output data, this modification achieves the desired output:
while ( $_ =~ m! [^[{/\w] ( [+-]?\d+ ) [^/\w]!xg ) {
push #array, ($1);
}
I have added rules before and after the integer to exclude certain characters. So now, we will only capture if:
There is no [, {, /, or word character immediately before the number
There is no / or word character immediately after the number
If your data could have 2-digit numbers in the { N} blocks (e.g. PCH {12}) then this will not capture those and the pattern will need to become much more complex. This solution is therefore quite brittle, without knowing more of the rules about your target data.

Perl script to check another array values depending on current array index

I'm working on a perl assignment, that has three arrays - #array_A, #array_B and array_C with some values in it, I grep for a string "CAT" on array A and fetching its indices too
my #index = grep { $#array_A[$_] =~ 'CAT' } 0..$#array_A;
print "Index : #index\n";
Output: Index : 2 5
I have to take this as an input and check the value of other two arrays at indices 2 and 5 and print it to a file.
Trick is the position of the string - "CAT" varies. (Index might be 5 , 7 and 9)
I'm not quite getting the logic here , looking for some help with the logic.
Here's an overly verbose example of how to extract the values you want as to show what's happening, while hopefully leaving some room for you to have to further investigate. Note that it's idiomatic Perl to use regex delimiters when using =~. eg: $name =~ /steve/.
use warnings;
use strict;
my #a1 = qw(AT SAT CAT BAT MAT CAT SLAT);
my #a2 = qw(a b c d e f g);
my #a3 = qw(1 2 3 4 5 6 7);
# note the difference in the next line... no # symbol...
my #indexes = grep { $a1[$_] =~ /CAT/ } 0..$#a1;
for my $index (#indexes){
my $a2_value = $a2[$index];
my $a3_value = $a3[$index];
print "a1 index: $index\n" .
"a2 value: $a2_value\n" .
"a3 value: $a3_value\n" .
"\n";
}
Output:
a1 index: 2
a2 value: c
a3 value: 3
a1 index: 5
a2 value: f
a3 value: 6

How to code in perl using subroutines

I need to score my string patches following certain criteria:
Column 1: B for buried or E for Exposed - Threshold: 25%
Column 2: Amino acid
Column 3: Sequence name
Column 4: Amino acid number
Column 5: Relative Surface Accessibility - RSA
Column 6: Absolute Surface Accessibility
Column 7: Z-fit score for RSA prediction
Column 8: Probability for Alpha-Helix
Column 9: Probability for Beta-strand
Column 10: Probability for Coil
E K 132L_A_PDBID_CHAIN_SEQUENCE 1 0.716 147.261 1.150 0.016 0.005 0.979
E V 132L_A_PDBID_CHAIN_SEQUENCE 2 0.514 79.033 1.252 0.191 0.086 0.723
B F 132L_A_PDBID_CHAIN_SEQUENCE 3 0.134 26.793 -0.325 0.191 0.086 0.723
E G 132L_A_PDBID_CHAIN_SEQUENCE 4 0.570 44.835 1.012 0.354 0.048 0.598
Remember, the last three columns are the probabilities for either Helix/Sheet/Coil.......
So first we need to classify whether a certain residue falls under Helix/Sheet/Coil using some criterion function....based on the max. probability within the last 3 columns...
Then one we get the structural preferences, we need to score the sequences breaking into patches of 10......
My scoring criteria is this:
EXPOSED = 1; # +1 for letters that exposed
BURIED = 0; # 0 for letters that are buried
COIL = 3; # +3 for any coil
HELIX = 2; # +2 for any helix
SHEET = 1; # +1 for any sheet
The link below is for breaking a string into patches of 10~11
http://pastebin.com/GeW5AKF3
The problem I am facing is that I have splitted in string into horizontal patches as in the above link, but the file is vertically aligned......
Thanks for help....... Waiting for reply
This should get you going:
open my $fh, "<", "input.txt";
my #data;
while(my $line = <$fh>) # If we got line from file
{
chomp $line; # remove carraigereturn/linefeed
my #parts = split /\s+/, $line; # split based on values seperated by one or more spaces
push #data, [#parts] # Add array of split parts to data array
}
Thats puts everything into #data. You access it like this:
# now access whatever you want...
# example: line 3 column 6 (perl arrays start from 0 not 1):
print $data[2][5] . "\n"; #prints 26.793
# line 4 column 2:
print $data[3][1] . "\n"; #prints G
Then you can sort like this. (Example sort by Col1 then by Col5(RSA):)
#data = sort { if ( $a->[0] eq $b->[0] ) { $a->[4] <=> $b->[4] } else { $a->[0] cmp $b->[0] } } #data;
Then output data like this:
foreach my $line (#data)
{
foreach my $field (#$line)
{
print $field."\t";
}
print "\n";
}
Output is:
B F 132L_A_PDBID_CHAIN_SEQUENCE 3 0.134 26.793 -0.325 0.191 0.086 0.723
E V 132L_A_PDBID_CHAIN_SEQUENCE 2 0.514 79.033 1.252 0.191 0.086 0.723
E G 132L_A_PDBID_CHAIN_SEQUENCE 4 0.570 44.835 1.012 0.354 0.048 0.598
E K 132L_A_PDBID_CHAIN_SEQUENCE 1 0.716 147.261 1.150 0.016 0.005 0.979