Hi guys I have an issue to solve,
I have 2 files.
File A
col1,col2, value_total_to_put
File A 201843,12345,30
File B
col1,col2,col3, value_inputted, missing_value, value_max
201843,12345,447,4,0,4
201843,12345,448,0,0,4
201843,12345,449,0,0,2
201843,12345,450,4,0,4
201843,12345,451,2,0,2
201843,12345,455,4,0,4
201843,12345,457,0,0,4
201843,12345,899,10,0,10
201843,12345,334,0,1,1
201843,12345,364,0,1,1
201843,12345,71,0,2,2
201843,12345,260,0,2,2
201843,12345,321,0,2,2
201843,12345,328,0,2,2
201843,12345,371,0,2,2
201843,12345,385,0,2,2
201843,12345,426,0,2,2
201843,12345,444,0,2,2
201843,12345,31,4,6,10
201843,12345,360,2,87,99
201843,12345,373,4,95,99
201843,12345,472,4,95,99
201843,12345,475,4,95,99
201843,12345,430,0,99,99
201843,12345,453,0,99,99
201843,12345,463,0,99,99
201843,12345,482,0,99,99
201843,12345,484,0,99,99
My keys are col1 and col2 from both files and I am doing this way below and my loop is wrong because when I reach the EOF from File B my loop is stopped.
What I want is match File A and B with $col1 and $col2 and while the value_total_to_put is > 0 withdraw 1 in each loop and in value_inputted from File B when value_inputted is less than value_max. For withdraw from File A missing_value might be > 0.
For the result I will print when value_inputted is equal to value_max in other words the last value until reach value_max or value_total_to_put is 0.
while ( <FA> ){
chomp;
my($col1,$col2, $value_total_to_put) = split ",";
push #A, [$col1,$col2, $value_total_to_put];
}
my #B;
while ( <FB> ){
chomp;
my($col1,$col2,$col3, $value_inputted, $missing_value, $value_max) = split ",";
push #B, [$col1,$col2,$col3, $value_inputted, $missing_value, $value_max];
}
foreach my $line (#A){
my $idxl = #$line[0].",".#$line[1];
my $value_total_to_put = #$line[2];
while ($value_total_to_put > 0){
foreach my $row ( #B ){
if ( $idxr eq $idxl ){
my $idxr = #$row[0].",".#$row[1];
my $value_inputted = #$row[3];
my $value_max = #$row[5];
my $missing_value = #$row[4];
if ( ($value_inputted eq 0) and ($missing_value eq 0)){
#do_nothing
} elsif($value_inputted == $value_max){
#do_nothing
print join(",", $idxr, #$row[2],"Value_inputted: ".$value_inputted, "Missing_value: ".$missing_value, "Value_max:".$value_max, "Total: ".$value_total_to_put)."\n";
}else{
$value_inputted++;
$missing_value--;
$value_total_to_put--;
}
}
}
last if $value_total_to_put > 0;
}
}
The third file will be this way:
201843,12345,447,4,0,4
201843,12345,450,4,0,4
201843,12345,451,2,0,2
201843,12345,455,4,0,4
201843,12345,899,10,0,10
201843,12345,334,1,0,1
201843,12345,364,1,0,1
201843,12345,71,2,0,2
201843,12345,260,2,0,2
201843,12345,321,2,0,2
201843,12345,328,2,0,2
201843,12345,371,2,0,2
201843,12345,385,2,0,2
201843,12345,426,2,0,2
201843,12345,444,2,0,2
201843,12345,31,10,0,10
201843,12345,360,3,86,99
201843,12345,373,5,94,99
201843,12345,472,5,94,99
201843,12345,475,5,94,99
201843,12345,430,1,98,99
201843,12345,453,1,98,99
As explained by #Dave Cross, your code is quite hard to read (and misses use strictand use warnings), and your explanation of what you are trying to achieve is quite unclear...
One thing that caught my eye however, is that you start a loop with this statement...
while ($value_total_to_put > 0){
... but at the very end of that same block you do :
last if $value_total_to_put > 0;
}
This will effectively cause Perl to exit the loop after the first iteration, no matter the value of the $value_total_to_put variable. This is probably not what you want. Hence, as far as I understand, you should start your investigation by removing that last statement.
I've been browsing over the already answered questions regarding this error message.
I am trying to solve a problem from the Rosalind web site that looks for some indexes using a binary search.
When my subroutine finds the number it seems to ignore it, and if I try to print the $found variable, it gives me the error
Can't use string "1" as a symbol ref while strict refs is in use
The code is this
sub binarysearch
{
my $numbertolook = shift;
my #intarray=#_;
my $lengthint = scalar #intarray;
my #sorted = sort {$a <=> $b} #intarray;
#print $numbertolook, " " , #sorted, "\n";
my $low=0;
my $high=$lengthint-1;
my $found =undef;
my $midpoint;
while ($low<$high)
{
$midpoint=int(($low+$high)/2);
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ($numbertolook<$sorted[$midpoint])
{
$high=$midpoint;
}
elsif ($numbertolook>$sorted[$midpoint])
{
$low=$midpoint;
}
elsif ($numbertolook==$sorted[$midpoint])
{
$found=1;
print $found "\n";
last;
}
if ($low==$high-1 and $low==$midpoint)
{
if ($numbertolook==$sorted[$high])
{
$found=1;
print $found "\n";
last;
}
$low=$high;
}
}
return $found;
}
You want
print $found, "\n";
Or
print $found . "\n";
With no operator between $found and the newline, it thinks $found is the filehandle to print a newline to, and is getting an error because it isn't a filehandle.
I'll try to help
First of all, as simple as it may seem, a binary search is quite difficult to code correctly. The main reason is that it's a hotbed of off-by-one errors, which are so prevalent that they have their own Wikipedia page
The issue is that an array containing, say, the values A to Z will have 26 elements with indices 0 to 25. I think FORTRAN bucks the trend, and Lua, but pretty much every other language has the first element of an array at index zero
A zero base works pretty well for everything until you start using divide and conquer algorithms. Merge Sort as well as Binary Search are such algorithms. Binary search goes
Is it in the first half?
If so then search the first half further
Else search the second half further
The hard part is when you have to decide when you've found the object, or when you need to give up looking. Splitting data in two nearly-halves is easy. Knowing when to stop is hard
It's highly efficient for sorted data, but the problem comes when implementing it that, if we do it properly, we have to deal with all sorts of weird index bases beyond zero or one.
Suppose I have an array
my #alpha = 'A' .. 'Q'
If I print scalar #alpha I will see 17, meaning the array has seventeen elements, indexed from 0 to 16
Now I'm looking for E in that array, so I do a binary search, so I want the "first half" and the "second half" of #alpha. If I add 0 to 16 and divide by 2 I get a neat "8", so the middle element is at index 8, which is H
But wait. There are 17 elements, which is an odd number, so if we say the first eight (A .. H) are left of the middle and the last eight (I .. Q) are right of the middle then surely the "middle" is I?
In truth this is all a deception, because a binary search will work however we partition the data. In this case binary means two parts, and although the search would be more efficient if those parts could be equal in size it's not necessary for the algorithm to work. So it can be the first third and the last two-thirds, or just the first element and the rest
That's why using int(($low+high)/2) is fine. It rounds down to the nearest integer so that with our 17-element array $mid is a usable 8 instead of 8.5
But your code still has to account for some unexpected things. In the case of our 17-element array we have calculated the middle index to be 8. So indexes 0 .. 7 are the "first half" while 8 .. 16 are the "second half", and the middle index is where the second half starts
But didn't we round the division down? So in the case of an odd number of elements, shouldn't our mid point be at the end of the first half, and not the start of the second? This is an arcane off-by-one error, but let's see if it still works with a simple even number of elements
#alpha = `A` .. `D`
The start and and indices are 0 and 3; the middle index is int((0+3)/2) == 1. So the first half is 0..1 and the second half is 2 .. 3. That works fine
But there's still a lot more. Say I have to search an array with two elements X and Y. That has two clear halves, and I'm looking for A, which is before the middle. So I now search the one-element list X for A. The minimum and maximum elements of the target array are both zero. The mid-point is int((0+0)/2) == 0. So what happens next?
It is similar but rather worse when we're searching for Z in the same list. The code has to be exactly right, otherwise we will be either searching off the end of the array or checking the last element again and again
Saving the worst for last, suppose
my #alpha = ( 'A', 'B, 'Y, 'Z' )
and I'm looking for M. That lest loose all sorts of optimisations that involve checks that may may the ordinary case much slower
Because of all of this it's by far the best solution to use a library or a language's built-in function to do all of this. In particular, Perl's hashes are usually all you need to check for specific strings and any associated data. The algorithm used is vastly better than a binary search for any non-trivial data sets
Wikipedia shows this algorithm for an iterative binary search
The binary search algorithm can also be expressed iteratively with two index limits that progressively narrow the search range.
int binary_search(int A[], int key, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imin <= imax)
{
// calculate the midpoint for roughly equal partition
int imid = midpoint(imin, imax);
if (A[imid] == key)
// key found at index imid
return imid;
// determine which subarray to search
else if (A[imid] < key)
// change min index to search upper subarray
imin = imid + 1;
else
// change max index to search lower subarray
imax = imid - 1;
}
// key was not found
return KEY_NOT_FOUND;
}
And here is a version of your code that is far from bug-free but does what you intended. You weren't so far off
use strict;
use warnings 'all';
print binarysearch( 76, 10 .. 99 ), "\n";
sub binarysearch {
my $numbertolook = shift;
my #intarray = #_;
my $lengthint = scalar #intarray;
my #sorted = sort { $a <=> $b } #intarray;
my $low = 0;
my $high = $lengthint - 1;
my $found = undef;
my $midpoint;
while ( $low < $high ) {
$midpoint = int( ( $low + $high ) / 2 );
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ( $numbertolook < $sorted[$midpoint] ) {
$high = $midpoint;
}
elsif ( $numbertolook > $sorted[$midpoint] ) {
$low = $midpoint;
}
elsif ( $numbertolook == $sorted[$midpoint] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
if ( $low == $high - 1 and $low == $midpoint ) {
if ( $numbertolook == $sorted[$high] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
return;
}
}
return $midpoint;
}
output
FOUND
66
If you call print with several parameters separated with a space print expects the first one to be a filehandle. This is interprented as print FILEHANDLE LIST from the documentation.
print $found "\n";
What you want to do is either to separate with ,, to call it as print LIST.
print $found, "\n";
or to concat as strings, which will also call it as print LIST, but with only one element in LIST.
print $found . "\n";
I am trying to convert the following set of characters into their corresponding values for a quality score that accompanies a fasta file:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
They should have the values 0-93. So when I input a fastq file that uses these symbols I want to output the numerical values for each in a quality score file.
I have tried putting them into an array using split // and then making into a hash where each key is the symbol and the value is its position in the array:
for (my $i = 0; $i<length(#qual); $i++) {
print "i is $i, elem is $qual[$i]\n";
$hash{$qual[$i]} = $i;
I have tried hard coding the hash:
my %hash = {"!"=>"0", "\""=>"1", "#"=>"2", "\$"=>"3"...
With and without escapes for the special characters that require them but cannot seem to get this to work.
This merely outputs:
.
.
.
i is 0, elem is !
i is 1, elem is "
i is 0, elem is !
i is 1, elem is "
i is 0, elem is !
i is 1, elem is "
" 1
Use of uninitialized value $hash{"HASH(0x100804ed0)"} in concatenation (.) or string at convert_fastq.pl line 24, <> line 40.
HASH(0x100804ed0)
! 0
Does anyone have any ideas? I appreciate the help.
Perhaps subtracting 33 from the character's ord to yield the value you want would be helpful:
use strict;
use warnings;
my $string = q{!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~};
for ( split //, $string ) {
print "$_ = ", ord($_) - 33, "\n";
}
Partial output:
! = 0
" = 1
# = 2
$ = 3
% = 4
& = 5
' = 6
( = 7
) = 8
* = 9
+ = 10
...
This way, you don't need to build a hash with character/value pairs, but just use $val = ord ($char) - 33; to get the value.
{ ... }
is similar to
do { my %anon; %anon = ( ... ); \%anon }
So when you did
my %hash = { ... };
you assigned a single item to the hash (a reference to a hash) rather than a list of key-values as you should. Perl warned you about that with the following:
Reference found where even-sized list expected
(Why didn't you mention this?!)
You should be using
my %decode_map = ( ... );
For example,
my %decode_map;
{
my $encoded = q{!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~};
my #encoded = split //, $encoded;
$decode_map{$encoded[$_]} = $_ for 0..$#encoded;
}
Given that those are basically the non-whitespace printable ASCII characters, so you could simply use
my %decode_map = map { chr($_ + 0x21) => $_ } 0x21..0x7E;
Which means you could avoid building the hash at all, replacing
my %decode_map = map { chr($_ + 0x21) => $_ } 0x21..0x7E;
die if !exists($decode_map{$c});
my $num = $decode_map{$c};
with just
die if ord($c) < 0x21 || ord($c) > 0x7E;
my $num = ord($c) - 0x21;
From a language-agnostic point of view: Use an array with 256 entries, one for each ASCII character. You can then store 0 at ['!'], 1 at ['"'] and so on. When parsing the input, you can lookup the index of a char in that array directly. Fore careful error handling, you could store -1 at all invalid chars and check that while parsing the file.