Manipulating files according to indexes by perl

Manipulating files according to indexes by perl - perl

I am working on some genome data and I have 2 files ->
File1
A1 1 10
A1 15 20
A2 2 11
A2 13 16
File2
>A1
CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA
AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT
>A2
GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAAAATTTCCACCA
AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC
CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT
In file 1, 2nd and 3rd column represents the indexes in File2. So I want that, if character in column1 of file1 matches with character followed by symbol (>) in file2 , then from next line of that file2 give back the substring according to indexes in col2 and col3 of file1. (sorry, I know its complicated) Here is the desire output ->
Output
>A1#1:10
CTATTATTTA
>A1#15:20
ACCTA
>A2#2:11
TCTGCACAGC
>A2#13:16
GCTT
I know if I have only 1 string I can take out sub-string very easily ->
#ARGV or die "No input file specified";
open $first, '<',$ARGV[0] or die "Unable to open input file: $!";
$string="GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT";
while (<$first>)
{
#cols = split /\s+/;
$co=$cols[1]-1;
$length=$cols[2]-$co;
$fragment = substr $string, $co, $length;
print ">",$cols[0],"#",$cols[1],":",$cols[2],"\n",$fragment,"\n";
}
but here my problem is when should I input my second file and how should I match the character in col1 (of file1) with character in file2 (followed by > symbol) and then how to get substring?

I wasnt sure if they were all one continuous line or separate lines.
I set it up as continuous for now.
Basically, read the 2nd file as master.
Then you can process as many index files as you need.
You can use hash of arrays to help with the indexing.
push #{$index{$key}}, [$start,$stop];
use strict;
my $master_file = "dna_master.txt";
if ($#ARGV) {
print "Usage: $0 [filename(s)]\n";
exit 1;
}
my %Data = read_master($master_file);
foreach my $index_file (#ARGV) {
my %Index = read_index($index_file);
foreach my $key (sort keys %Index) {
foreach my $i (#{$Index{$key}}) {
my ($start,$stop) = #$i;
print ">$key#$start:$stop\n";
my $pos = $start - 1;
my $count = $stop - $start + 1;
print substr($Data{$key},$pos,$count)."\n";
}
}
}
sub read_file {
my $file = shift;
my #lines;
open(FILE, $file) or die "Error: cannot open $file\n$!";
while(<FILE>){
chomp; #remove newline
s/(^\s+|\s+$)//g; # strip lead/trail whitespace
next if /^$/; # skip blanks
push #lines, $_;
}
close FILE;
return #lines;
}
sub read_index {
my $file = shift;
my #lines = read_file($file);
my %index;
foreach (#lines) {
my ($key,$start,$stop) = split /\s+/;
push #{$index{$key}}, [$start,$stop];
}
return %index;
}
sub read_master {
my $file = shift;
my %master;
my $key;
my #lines = read_file($file);
foreach (#lines) {
if ( m{^>(\w+)} ) { $key = $1 }
else { $master{$key} .= $_ }
}
return %master;
}

Load File2 in a Hash, with A1, A2... as keys, and the DNA sequence as value. This way you can get the DNA sequence easily.

This 2nd update turns the master file into a hash of arrays as well.
This treats each row in the 2nd file as individual sequences.
use strict;
my $master_file = "dna_master.txt";
if ($#ARGV) {
print "Usage: $0 [filename(s)]\n";
exit 1;
}
my %Data = read_master($master_file);
foreach my $index_file (#ARGV) {
my %Index = read_index($index_file);
foreach my $key (sort keys %Index) {
foreach my $i (#{$Index{$key}}) {
my ($start,$stop) = #$i;
print ">$key#$start:$stop\n";
my $pos = $start - 1;
my $count = $stop - $start + 1;
foreach my $seq (#{$Data{$key}}) {
print substr($seq,$pos,$count)."\n";
}
}
}
}
sub read_file {
my $file = shift;
my #lines;
open(FILE, $file) or die "Error: cannot open $file\n$!";
while(<FILE>){
chomp; #remove newline
s/(^\s+|\s+$)//g; # strip lead/trail whitespace
next if /^$/; # skip blanks
push #lines, $_;
}
close FILE;
return #lines;
}
sub read_index {
my $file = shift;
my #lines = read_file($file);
my %index;
foreach (#lines) {
my ($key,$start,$stop) = split /\s+/;
push #{$index{$key}}, [$start,$stop];
}
return %index;
}
sub read_master {
my $file = shift;
my %master;
my $key;
my #lines = read_file($file);
foreach (#lines) {
if ( m{^>(\w+)} ) { $key = $1 }
else { push #{ $master{$key} }, $_ }
}
return %master;
}
Output:
>A1#1:10
CTATTATTTA
AAGTGTGTTA
>A1#15:20
ACCTAC
ATTAAT
>A2#2:11
TCTGCACAGC
ACCCCCCCCT
AAACCCCAAA
>A2#13:16
GCTT
CCCC
ACAA

Related

how to replace specific line by write appending to file

i have two files . one is user's input file and another file is original config file. After comparing two files , do add/delete functions in my original config file.
user's input file: (showing line by line)
add:L28A:Z:W #add--> DID ID --> Bin ID
del:L28C:B:Q:X:
rpl:L38A:B:M:D:
original input file
L28A:B:Q:M:X:
L28C:B:Q:M:X:
L38A:B:Q:M:X:
based on user's input file , first is doing add function second is delete function and third is replace function.
so output for original input txt file should show:
L28A:B:Q:M:X:Z:W
L28C:M:
L38A:B:M:D:
but my code is showing :
L28A:B:Q:M:X:
L28C:B:Q:M:X:
L38A:B:Q:M:X:
L28A:B:Q:M:X:Z:W
L28C:M:
L38A:B:M:D:
how can i replace above three lines with new modify lines?
use strict;
use warnings;
use File::Copy;
use vars qw($requestfile $requestcnt $configfile $config2cnt $my3file $myfile3cnt $new_file $new_filecnt #output);
my $requestfile = "DID1.txt"; #user's input file
my $configfile = "DID.txt"; #original config file
my $new_file = "newDID.txt";
readFileinString($requestfile, \$requestcnt);
readFileinString($configfile, \$config2cnt);
copy($configfile, $new_file) or die "The copy operation failed: $!";
while ($requestcnt =~ m/^((\w){3})\:([^\n]+)$/mig) #Each line from user request
{
my $action = $1;
my $requestFullLine = $3;
while ($requestFullLine =~ m/^((\w){4})\:([^\n]+)$/mig) #Each line from user request
{
my $DID = $1; #DID
my $requestBinList = $3; #Bin List in user request
#my #First_values = split /\:/, $requestBinList;
if ($config2cnt =~ m/^$DID\:([^\n]+)$/m) #configfile
{
my $ConfigFullLine = $1; #Bin list in config
my $testfile = $1;
my #First_values = split /\:/, $ConfigFullLine;
my #second_values = split /\:/, $requestBinList;
foreach my $sngletter(#second_values) # Each line from user request
{
if( grep {$_ eq "$sngletter"} #First_values)
{
print " $DID - $sngletter - Existing bin..\n\n";
}
else
{
print "$DID - $sngletter - Not existing bin..\n\n";
}
}
print "Choose option 1.Yes 2.No\n";
my $option = <STDIN>;
if ($option == 1) {
open(DES,'>>',$configfile) or die $!;
if($action eq 'add')
{
$ConfigFullLine =~ s/$/$requestBinList/g;
my $add = "$DID:$ConfigFullLine";
print DES "$add\n" ;
print"New Added Bin Valu $add\n\n";
}
if ( $action eq 'del')
{
foreach my $sngletter(#second_values){
$ConfigFullLine =~ s/$sngletter://g;
}
print DES "$DID:$ConfigFullLine\n";
print "New Deleted Bin Value $DID:$ConfigFullLine\n\n";
}
if ( $action eq 'rpl')
{
my $ConfigFullLine = $requestBinList;
my $replace = "$DID:$ConfigFullLine";
print DES "$replace\n";
print"Replace Bin Value $replace\n\n";
}
}
elsif ($option == 2)
{
print"Start from begining\n";
}
else
{
print "user chose invalid process or input is wrong\n";
}
}
else
{
print "New DID $DID detected\n";}
}
}
sub readFileinString
{
my $File = shift;
my $string = shift;
use File::Basename;
my $filenames = basename($File);
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}

The problem is here:
open(DES,'>>',$configfile) or die $!;
You open your file for appending. So you get the original data, followed by your edited data.
Update: It appears that you have a working solution now, but I thought it might be interesting to show you how I would write this.
This program is a Unix filter. That is, it reads from STDIN and writes to STDOUT. I find that far more flexible than hard-coded filenames. You also don't have to explicitly open files - which saves time :-)
It also takes a command-line option, -c, telling it which file contains the edit definitions. So it is called like this (assuming we've called the program edit_files:
$ edit_files -c edit_definitions.txt < your_input_file > your_output_file
And here's the code.
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Std;
my %opts;
getopts('e:', \%opts);
my %edits = read_edits($opts{e});
while (<>) {
chomp;
my ($key, $val) = split /:/, $_, 2; #/ stop faulty syntax highlight
if (!exists $edits{$key}) {
print "$_\n";
next;
}
my $edit = $edits{$key};
if ($edit->[0] eq 'add') {
print "$_$edit->[1]\n";
} elsif ($edit->[0] eq 'del') {
$val =~ s/$_:// for split /:/, $edit->[1]; #/
print "$key:$val\n";
} elsif ($edit->[0] eq 'rpl') {
print "$key:$edit->[1]\n";
} else {
warn "$edit->[0] is an invalid edit type\n";
next;
}
}
sub read_edits {
my $file = shift;
open my $edit_fh, '<', $file or die $!;
my %edits;
while (<$edit_fh>) {
chomp;
# Remove comments
s/\s*#.*//; #/
my ($type, $key, $val) = split /:/, $_, 3; #/
$edits{$key} = [ $type, $val ];
}
}

how to display the hash value from my sample data

I'm learning perl at the moment, i wanted to ask help to answer this exercise.
My objective is to display the hash value of PartID 1,2,3
the sample output is displaying lot, wafer, program, version, testnames, testnumbers, hilimit, lolimit and partid values only.
sample data
lot=lot123
wafer=1
program=prgtest
version=1
Testnames,T1,T2,T3
Testnumbers,1,2,3
Hilimit,5,6,7
Lolimit,1,2,3
PartID,,,,
1,3,0,5
2,4,3,2
3,5,6,3
This is my code:
#!/usr/bin/perl
use strict;
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file ) or die("Error in command line arguments\n");
my $lotid = "";
open(DATA, $file) or die "Couldn't open file $file";
while(my $line = <DATA>) {
#print "$line";
if ( $line =~ /^lot=/ ) {
#print "$line \n";
my ($dump, $lotid) = split /=/, $line;
print "$lotid\n";
}
elsif ($line =~ /^program=/ ) {
my ($dump, $progid) = split /=/, $line;
print "$progid \n";
}
elsif ($line =~ /^wafer=/ ) {
my ($dump, $waferid) = split /=/, $line;
print "$waferid \n";
}
elsif ($line =~ /^version=/ ) {
my ($dump, $verid) = split /=/, $line;
print "$verid \n";
}
elsif ($line =~ /^testnames/i) {
my ($dump, #arr) = split /\,/, $line;
foreach my $e (#arr) {
print $e, "\n";
}
}
elsif ($line =~ /^testnumbers/i) {
my ($dump, #arr1) = split /\,/, $line;
foreach my $e1 (#arr1) {
print $e1, "\n";
}
}
elsif ($line =~ /^hilimit/i) {
my ($dump, #arr2) = split /\,/, $line;
foreach my $e2 (#arr2) {
print $e2, "\n";
}
}
elsif ($line =~ /^lolimit/i) {
my ($dump, #arr3) = split /\,/, $line;
foreach my $e3 (#arr3) {
print $e3, "\n";
}
}
}
Kindly help add to my code to display Partid 1,2,3 hash.

So I've rewritten your code a little to use a few more modern Perl idioms (along with some comments to explain what I've done). The bit I've added is near the bottom.
#!/usr/bin/perl
use strict;
# Added 'warnings' which you should always use
use warnings;
# Use say() instead of print()
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
# Use a lexical variable for a filehandle.
# Use the (safer) 3-argument version of open().
# Add $! to the error message.
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
# Read each record into $_ - which makes the following code simpler
while (<$fh>) {
# Match on $_
if ( /^lot=/ ) {
# Use "undef" instead of a $dump variable.
# split() works on $_ by default.
my (undef, $lotid) = split /=/;
# Use say() instead of print() - less punctuation :-)
say $lotid;
}
elsif ( /^program=/ ) {
my (undef, $progid) = split /=/;
say $progid;
}
elsif ( /^wafer=/ ) {
my (undef, $waferid) = split /=/;
say $waferid;
}
elsif ( /^version=/ ) {
my (undef, $verid) = split /=/;
say $verid;
}
elsif ( /^testnames/i) {
my (undef, #arr) = split /\,/;
# Changed all of these similar pieces of code
# to use the same variable names. As they are
# defined in different code blocks, they are
# completely separate variables.
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^testnumbers/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^hilimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^lolimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# And here's the new bit.
# If we're on the "partid" line, then read the next
# three lines, split each one and print the first
# element from the list returned by split().
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}
Update: By the way, there are no hashes anywhere in this code :-)
Update 2: I've just realised that you only have three different ways to process the data. So you can simplify your code drastically by using slightly more complex regexes.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
while (<$fh>) {
# Single value - just print it.
if ( /^(?:lot|program|wafer|version)=/ ) {
my (undef, $value) = split /=/;
say $value;
}
# List of values - split and print.
elsif ( /^(?:testnames|testnumbers|hilimit|lolimit)/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# Extract values from following lines.
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}

Running a nested while loop inside a foreach loop in Perl

I'm trying to use a foreach loop to loop through an array and then use a nested while loop to loop through each line of a text file to see if the array element matches a line of text; if so then I push data from that line into a new array to perform calculations.
The outer foreach loop appears to be working correctly (based on printed results with each array element) but the inner while loop is not looping (same data pushed into array each time).
Any advice?
The code is below
#! /usr/bin/perl -T
use CGI qw(:cgi-lib :standard);
print "Content-type: text/html\n\n";
my $input = param('sequence');
my $meanexpfile = "final_expression_complete.txt";
open(FILE, $meanexpfile) or print "unable to open file";
my #meanmatches;
#regex = (split /\s/, $input);
foreach $regex (#regex) {
while (my $line = <FILE>) {
if ( $line =~ m/$regex\s(.+\n)/i ) {
push(#meanmatches, $1);
}
}
my $average = average(#meanmatches);
my $std_dev = std_dev($average, #meanmatches);
my $average_round = sprintf("%0.4f", $average);
my $stdev_round = sprintf("%0.4f", $std_dev);
my $coefficient_of_variation = $stdev_round / $average_round;
my $cv_round = sprintf("%0.4f", $coefficient_of_variation);
print font(
{ color => "blue" }, "<br><B>$regex average: $average_round
&nbspStandard deviation: $stdev_round&nbspCoefficient of
variation(Cv): $cv_round</B>"
);
}
sub average {
my (#values) = #_;
my $count = scalar #values;
my $total = 0;
$total += $_ for #values;
return $count ? $total / $count : 0;
}
sub std_dev {
my ($average, #values) = #_;
my $count = scalar #values;
my $std_dev_sum = 0;
$std_dev_sum += ($_ - $average)**2 for #values;
return $count ? sqrt($std_dev_sum / $count) : 0;
}

Yes, my advice would be:
Turn on strict and warnings.
perltidy your code,
use 3 argument open: open ( my $inputfile, "<", 'final_expression.txt' );
die if it doesn't open - the rest of your program is irrelevant.
chomp $line
you are iterating your filehandle, but once you've done this you're at the end of file for the next iteration of the foreach loop so your while loops becomes a null operation. Simplistically, reading the file into an array my #lines = <FILE>; would fix this.
So with that in mind:
#!/usr/bin/perl -T
use strict;
use warnings;
use CGI qw(:cgi-lib :standard);
print "Content-type: text/html\n\n";
my $input = param('sequence');
my $meanexpfile = "final_expression_complete.txt";
open( my $input_file, "<", $meanexpfile ) or die "unable to open file";
my #meanmatches;
my #regex = ( split /\s/, $input );
my #lines = <$input_file>;
chomp (#lines);
close($input_file) or warn $!;
foreach my $regex (#regex) {
foreach my $line (#lines) {
if ( $line =~ m/$regex\s(.+\n)/i ) {
push( #meanmatches, $1 );
}
}
my $average = average(#meanmatches);
my $std_dev = std_dev( $average, #meanmatches );
my $average_round = sprintf( "%0.4f", $average );
my $stdev_round = sprintf( "%0.4f", $std_dev );
my $coefficient_of_variation = $stdev_round / $average_round;
my $cv_round = sprintf( "%0.4f", $coefficient_of_variation );
print font(
{ color => "blue" }, "<br><B>$regex average: $average_round
&nbspStandard deviation: $stdev_round&nbspCoefficient of
variation(Cv): $cv_round</B>"
);
}
sub average {
my (#values) = #_;
my $count = scalar #values;
my $total = 0;
$total += $_ for #values;
return $count ? $total / $count : 0;
}
sub std_dev {
my ( $average, #values ) = #_;
my $count = scalar #values;
my $std_dev_sum = 0;
$std_dev_sum += ( $_ - $average )**2 for #values;
return $count ? sqrt( $std_dev_sum / $count ) : 0;
}

The problem here is that starting from the second iteration of foreach you are trying to read from already read file handle. You need to rewind to the beginning to read it again:
foreach $regex (#regex) {
seek FILE, 0, 0;
while ( my $line = <FILE> ) {
However that does not look very performant. Why read file several times at all, when you can read it once before the foreach starts, and then iterate through the list:
my #lines;
while (<FILE>) {
push (#lines, $_);
}
foreach $regex (#regex) {
foreach $line (#lines) {
Having the latter, you might also what to consider using grep instead of the while loop.

stockholm to fasta format - include accession id in every header

Hello I've multiple sequences in stockholm format, at the top of every alignment there is a accession ID, for ex: '#=GF AC PF00406' and '//' --> this is the end of the alignment. When I'm converting the stockholm format to fasta format I need PF00406 in the header of every sequence of the particular alignment. Some times there will be multiple stockholm alignments in one file. I tried to modify the following perl script, it gave me bizarre results, any help will be greatly appreciated.
my $columns = 60;
my $gapped = 0;
my $progname = $0;
$progname =~ s/^.*?([^\/]+)$/$1/;
my $usage = "Usage: $progname [<Stockholm file(s)>]\n";
$usage .= " [-h] print this help message\n";
$usage .= " [-g] write gapped FASTA output\n";
$usage .= " [-s] sort sequences by name\n";
$usage .= " [-c <cols>] number of columns for FASTA output (default is $columns)\n";
# parse cmd-line opts
my #argv;
while (#ARGV) {
my $arg = shift;
if ($arg eq "-h") {
die $usage;
} elsif ($arg eq "-g") {
$gapped = 1;
} elsif ($arg eq "-s"){
$sorted = 1;
} elsif ($arg eq "-c") {
defined ($columns = shift) or die $usage;
} else {
push #argv, $arg;
}
}
#ARGV = #argv;
my %seq;
while (<>) {
next unless /\S/;
next if /^\s*\#/;
if (/^\s*\/\//) { printseq() }
else {
chomp;
my ($name, $seq) = split;
#seq =~ s/[\.\-]//g unless $gapped;
$seq{$name} .= $seq;
}
}
printseq();
sub printseq {
if($sorted){
foreach $key (sort keys %seq){
print ">$key\n";
for (my $i = 0; $i < length $seq{$key}; $i += $columns){
print substr($seq{$key}, $i, $columns), "\n";
}
}
} else{
while (my ($name, $seq) = each %seq) {
print ">$name\n";
for (my $i = 0; $i < length $seq; $i += $columns) {
print substr ($seq, $i, $columns), "\n";
}
}
}
%seq = ();
}

Depending on the how much variation there is in the line with the accessionID, you might need to modify the regex, but this works for your example file
my %seq;
my $aln;
while (<>) {
if ($_ =~ /#=GF AC (\w+)/) {
$aln = $1;
}
elsif ($_ =~ /^\s*\/\/\s*$/){
$aln = '';
}
next unless /\S/;
next if /^\s*\#/;
if (/^\s*\/\//) { printseq() }
else {
chomp;
my ($name, $seq) = split;
$name = $name . ' ' . $aln;
$seq{$name} .= $seq;
}
}
printseq();

Why does perl "hash of lists" do this?

I have a hash of lists that is not getting populated.
I checked that the block at the end that adds to the hash is in fact being called on input. It should either add a singleton list if the key doesn't exist, or else push to the back of the list (referenced under the right key) if it does.
I understand that the GOTO is ugly, but I've commented it out and it has no effect.
The problem is that when printhits is called, nothing is printed, as if there are no values in the hash. I also tried each (%genomehits), no dice.
THANKS!
#!/usr/bin/perl
use strict;
use warnings;
my $len = 11; # resolution of the peaks
#$ARGV[0] is input file
#$ARGV[1] is call number
# optional -s = spread number from call
# optional -o specify output file name
my $usage = "see arguments";
my $input = shift #ARGV or die $usage;
my $call = shift #ARGV or die $usage;
my $therest = join(" ",#ARGV) . " ";
print "the rest".$therest."\n";
my $spread = 1;
my $output = $input . ".out";
if ($therest =~ /-s\s+(\d+)\s/) {$spread = $1;}
if ($therest =~ /-o\s+(.+)\s/) {$output = $1;}
# initialize master hash
my %genomehits = ();
foreach (split ';', $input) {
my $mygenename = "err_naming";
if ($_ =~ /^(.+)-/) {$mygenename = $1;}
open (INPUT, $_);
my #wiggle = <INPUT>;
&singlegene(\%genomehits, \#wiggle, $mygenename);
close (INPUT);
}
&printhits;
#print %genomehits;
sub printhits {
foreach my $key (%genomehits) {
print "key: $key , values: ";
foreach (#{$genomehits{$key}}) {
print $_ . ";";
}
print "\n";
}
}
sub singlegene {
# let %hash be the mapping hash
# let #mygene be the gene to currently process
# let $mygenename be the name of the gene to currently process
my (%hash) = %{$_[0]};
my (#mygene) = #{$_[1]};
my $mygenename = $_[2];
my $chromosome;
my $leftbound = -2;
my $rightbound = -2;
foreach (#mygene) {
#print "Doing line ". $_ . "\n";
if ($_ =~ "track" or $_ =~ "output" or $_ =~ "#") {next;}
if ($_ =~ "Step") {
if ($_ =~ /chrom=(.+)\s/) {$chromosome = $1;}
if ($_ =~ /span=(\d+)/) {$1 == 1 or die ("don't support span not equal to one, see wig spec")};
$leftbound = -2;
$rightbound = -2;
next;
}
my #line = split /\t/, $_;
my $pos = $line[0];
my $val = $line[-1];
# above threshold for a call
if ($val >= $call) {
# start of range
if ($rightbound != ($pos - 1)) {
$leftbound = $pos;
$rightbound = $pos;
}
# middle of range, increment rightbound
else {
$rightbound = $pos;
}
if (\$_ =~ $mygene[-1]) {goto FORTHELASTONE;}
}
# else reinitialize: not a call
else {
FORTHELASTONE:
# typical case, in an ocean of OFFs
if ($rightbound != ($pos-1)) {
$leftbound = $pos;
}
else {
# register the range
my $range = $rightbound - $leftbound;
for ($spread) {
$leftbound -= $len;
$rightbound += $len;
}
#print $range . "\n";
foreach ($leftbound .. $rightbound) {
my $key = "$chromosome:$_";
if (not defined $hash{$key}) {
$hash{$key} = [$mygenename];
}
else { push #{$hash{$key}}, $mygenename; }
}
}
}
}
}

You are passing a reference to %genomehits to the function singlegene, and then copying it into a new hash when you do my (%hash) = %{$_[0]};. You then add values to %hash which goes away at the end of the function.
To fix it, use the reference directly with arrow notation. E.g.
my $hash = $_[0];
...
$hash->{$key} = yadda yadda;

I think it's this line:
my (%hash) = %{$_[0]};
You're passing in a reference, but this statement is making a copy of your hash. All additions you make in singlegene are then lost when you return.
Leave it as a hash reference and it should work.
PS - Data::Dumper is your friend when large data structures are not behaving as expected. I'd sprinkle a few of these in your code...
use Data::Dumper; print Dumper \%genomehash;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Manipulating files according to indexes by perl - perl

Load File2 in a Hash, with A1, A2... as keys, and the DNA sequence as value. This way you can get the DNA sequence easily.

Related

how to replace specific line by write appending to file

how to display the hash value from my sample data

Running a nested while loop inside a foreach loop in Perl

stockholm to fasta format - include accession id in every header

Why does perl "hash of lists" do this?

Categories

Resources