trying to run the following code:
$combs = combinations(\#set,$k);
while (my $c = $combs->next)
{
$nrc=1;
}
Gives me "out of memory!" when I hit Ctrl+C (because its taking too long and it should not) if I pass a set from, for example, (0..450) and numbers to combine ($k) of 6. This issue does not occur with, lets say, a set of 0..45 and $k=6.
Note that the while loop seems to do nothing, in the original script it printed out the combination and incremented a counter that will hold the total number of combinations. But since I was not sure what the problem was, I decided to eliminate that.
I've read the Algorithm:Combinatorics on CPAN and it states that memory usage is minimal, so I don't know what's happening.
I am using Strawberry Perl 32bit on a Windows 10 machine.
Thanks.
--------------------- COMPLETE CODE
#!/usr/bin/perl
use List::MoreUtils "uniq";
use Algorithm::Combinatorics "combinations";
my $argc = $#ARGV+1;
my #set;
if ($argc == 0)
{
print STDERR "Valor minimo de rango: "; # range min
my $minrange = int <STDIN>;
print STDERR "Valor maximo de rango: "; #range max
my $maxrange = int <STDIN>;
#set = uniq sort { $a <=> $b }($minrange...$maxrange);
}
elsif ($argc == 1)
{
open(SETFROMFILE,"<$ARGV[0]") or die "No se puedo abrir el fichero, $!";
chomp(#set = <SETFROMFILE>);
close(SETFROMFILE);
#set = uniq sort { $a <=> $b } #set;
}
else
{
print STDERR "Uso: $0 [file]\n";
exit;
}
my $nrc = 0;
print STDERR "\n";
print STDERR "Numeros a combinar: "; # get subset
my $k = <STDIN>;
if ($k == 0) { exit; }
$combs = combinations(\#set,$k);
print STDERR "\n";
while (my $c = $combs->next)
{
print join(";",#$c) . "\n";
$nrc++;
}
print STDERR "\n";
print STDERR "numero total de combinaciones: $nrc\n";
It works for me.
use strict;
use warnings;
use Algorithm::Combinatorics qw( combinations );
sub show_mem { system('ps', '--no-heading', '-o', 'rss', $$); }
my #set = (0..450);
my $k = 6;
my $count = 0;
#show_mem();
my $combs = combinations(\#set, $k);
#show_mem();
while (my $c = $combs->next) {
++$count;
if (($count % 100_000) == 0) {
print("$count\n");
#show_mem();
}
}
Output:
784
784
100000
776
200000
784
300000
788
400000
776
500000
780
600000
784
700000
768
800000
784
900000
784
1000000
776
...
Of course, it will take forever to go through all C(451, 6) = 11,303,769,578,640 combinations! (We're talking about 251 days on my machine[1].)
(Note that 11,303,769,578,640 is too large for a 32-bit integer. Fortunately, Perl will switching to using a double-precision floating-point number, and those are large enough to hold that all numbers up to and including that one.)
By the way, if you just need the number of combinations, you can use
my $count = 1; $count *= ( #set - $_ + 1 ) / $_ for 1..$k;
How I timed it:
use Algorithm::Combinatorics qw( combinations );
use Time::HiRes qw( time );
my #set = (0..450);
my $k = 6;
my $count = 0;
my $combs = combinations(\#set, $k);
my $s = time;
while (my $c = $combs->next) {
++$count;
last if $count == 1_000_000;
}
my $e = time;
print($e-$s, "\n");
There are 11.1 trillion combinations of six items out of 450. I'm not surprised it ran out of memory!
Related
I have a text file of the following format:
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
I want to print rows of this file only if the second column of data meets the requirement >= 2000 - how can I do this?
Currently I am reading the file and printing it like so:
sub read_file{
my $data_failed = 1;
my $file = 'task_file';
if(open (my $file, "task_file" || die "$!\n")){
my #COLUMNS = qw( memory cpu program );
my %sort_strings = ( program => sub { $a cmp $b } );
my (%process_details, %sort);
while (<$file>) {
$data_failed = 0;
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
if($option_a == 1){
if (-z $file){print "No tasks found\n";}
for my $column ($COLUMNS[2]) {
my $cmp = $sort_strings{$column} || sub { $a <=> $b };
for my $value (sort $cmp keys %{ $sort{$column} }
) {
my #pids = keys %{ $sort{$column}{$value} };
say join ' ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
}
}
} else { print "No tasks found\n"}
}
The if($option_a == 1) bit is just reading values from another function that parses command line options.
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
At this point, you can complete the loop, or you can continue to the next line. Just add the line:
next if $memory_size < 2000;
right after the split, and you'll eliminate all the records in memory that fail to meet your requirements.
Filtering a list is easily done with grep:
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
my #COLUMNS = qw( memory cpu program );
my (%process_details, %sort);
while (<DATA>) {
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
for my $value (sort { $a cmp $b } keys %{ $sort{program} }) {
my #pids = grep $process_details{$_}{memory} > 2000,
keys %{ $sort{program}{$value} };
say join ' ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
__DATA__
...
Something like this perhaps:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
print if (split)[1] > 2000;
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
With no arguments, split() splits $_ on whitespace (which is what we want). We can then use a list slice to look at the second element of that and print the line if that value is greater than 2000.
I'm having a problem coding my first Perl program.
What I'm trying to do here is getting the maximum, minimum,total and average of a list of numbers using a subroutine for each value and another subroutine to print the final values. I'm using a "private" for all my variables, but I still couldn't print my values.
Here is my code:
&max(<>);
&print_stat(<>);
sub max {
my ($mymax) = shift #_;
foreach (#_) {
if ( $_ > $mymax ) {
$mymax = $_;
}
}
return $mymax;
}
sub print_stat {
print max($mymax);
}
Please try this one:
use strict;
use warnings;
my #list_nums = qw(10 21 30 42 50 63 70);
ma_xi(#list_nums);
sub ma_xi
{
my #list_ele = #_;
my $set_val_max = '0'; my $set_val_min = '0';
my $add_all_vals = '0';
foreach my $each_ele(#list_ele)
{
$set_val_max = $each_ele if($set_val_max < $each_ele);
$set_val_min = $each_ele if($set_val_min eq '0');
$set_val_min = $each_ele if($set_val_min > $each_ele);
$add_all_vals += $each_ele;
}
my $set_val_avg = $add_all_vals / scalar(#list_ele) + 1;
print "MAX: $set_val_max\n";
print "MIN: $set_val_min\n";
print "TOT: $add_all_vals\n";
print "AVG: $set_val_avg\n";
#Return these values into array and get into the new sub routine's
}
Some notes
Use plenty of whitespace to lay out your code. I have tidied the Perl code in your question so that I could read it more easily, without changing its semantics
You must always use strict and use warnings 'all' at the top of every Perl program you write
Never use an ampersand & in a subroutine call. That hasn't been necessary or desirable since Perl 4 over twenty-five years ago. Any tutorial that tells you otherwise is wrong
Using <> in a list context (such as the parameters to a subroutine call) will read all of the file and exhaust the file handle. Thereafter, any calls to <> will return undef
You should use chomp to remove the newline from each line of input
You declare $mymax within the scope of the max subroutine, but then try to print it in print_stat where it doesn't exists. use strict and use warnings 'all' would have caught that error for you
Your max subroutine returns the maximum value that it calculated, but you never use that return value
Below is a fixed version of your code.
Note that I've read the whole file into array #values and then chomped them all at once. In general it's best to read and process input one line at a time, which would be quite possible here but I wanted to say as close to your original code as possible
I've also saved the return value from max in variable $max, and then passed that to print_stat. It doesn't make sense to try to read the file again and pass all of those values to print_stat, as your code does
I hope this helps
use strict;
use warnings 'all';
my #values = <>;
chomp #values;
my $max = max(#values);
print_stat( $max );
sub max {
my $mymax = shift;
for ( #_ ) {
if ( $_ > $mymax ) {
$mymax = $_;
}
}
return $mymax;
}
sub print_stat {
my ($val) = #_;
print $val, "\n";
}
Update
Here's a version that calculates all of the statistics that you mentioned. I don't think subroutines are a help in this case as the solution is short and no code is reusable
Note that I've added the data at the end of the program file, after __DATA__, which lets me read it from the DATA file handle. This is often handy for testing
use strict;
use warnings 'all';
my ($n, $max, $min, $tot);
while ( <DATA> ) {
next unless /\S/; # Skip blank lines
chomp;
if ( not defined $n ) {
$max = $min = $tot = $_;
}
else {
$max = $_ if $max < $_;
$min = $_ if $min > $_;
$tot += $_;
}
++$n;
}
my $avg = $tot / $n;
printf "\$n = %d\n", $n;
printf "\$max = %d\n", $max;
printf "\$min = %d\n", $min;
printf "\$tot = %d\n", $tot;
printf "\$avg = %.2f\n", $avg;
__DATA__
7
6
1
5
1
3
8
7
output
$n = 8
$max = 8
$min = 1
$tot = 38
$avg = 4.75
I have a problem tha bothers me a lot...
I have a file with two columns (thanks to your help in a previous question) like:
14430001 0.040
14430002 0.000
14430003 0.990
14430004 1.000
14430005 0.050
14430006 0.490
....................
the first column is coordinates the second probabilities.
I am trying to find the blocks with probability >=0.990 and to be more than 100 in size.
As output I want to be like this:
14430001 14430250
14431100 14431328
18750003 18750345
.......................
where the first column has the coordinate of the start of each block and the second the end of it.
I wrote this script:
use strict;
#use warnings;
use POSIX;
my $scores_file = $ARGV[0];
#finds the highly conserved subsequences
open my $scores_info, $scores_file or die "Could not open $scores_file: $!";
#open(my $fh, '>', $coords_file) or die;
my $count = 0;
my $cons = "";
my $newcons = "";
while( my $sline = <$scores_info>) {
my #data = split('\t', $sline);
my $coord = $data[0];
my $prob = $data[1];
if ($data[1] >= 0.990) {
#$cons = "$cons + '\n' + $sline + '\n'";
$cons = join("\n", $cons, $sline);
# print $cons;
$count++;
if($count >= 100) {
$newcons = join("\n", $newcons, $cons);
my #array = split /'\n'/, $newcons;
print #array;
}
}
else {
$cons = "";
$count = 0;
}
}
It gives me the lines with probability >=0.990 (the first if works) but the coordinates are wrong. When Im trying to print it in a file it stacks, so I have only one sample to check it.
Im terrible sorry if my explanations aren't helpful, but I am new in programming.
Please, I need your help...
Thank you very much in advance!!!
You seem to be using too much variables. Also, after splitting the array and assigning its parts to variables, use the new variables rather than the original array.
sub output {
my ($from, $to) = #_;
print "$from\t$to\n";
}
my $threshold = 0.980; # Or is it 0.990?
my $count = 0;
my ($start, $last);
while (my $sline = <$scores_info>) {
my ($coord, $prob) = split /\t/, $sline;
if ($prob >= $threshold) {
$count++;
defined $start or $start = $coord;
$last = $coord;
} else {
output($start, $last) if $count > 100;
undef $start;
$count = 0;
}
}
output($start, $last) if $count > 100;
(untested)
I have this output:
10dvex2_miRNA_ce.out.data|6361
10dvex2_miRNA_ce.out.data|6361
10dvex2_misc_RNA_ce.out.data|0
10dvex2_rRNA_ce.out.data|239
with this script in Perl:
#!/usr/bin/perl
use warnings;
use strict;
open(MYINPUTFILE, $ARGV[0]); # open for input
my #lines = <MYINPUTFILE>; # read file into list
my $count = 0;
print "Frag"."\t"."ncRNA"."\t"."Amount"."\n";
foreach my $lines (#lines){
my $pattern = $lines;
$pattern =~ s/(.*)dvex\d_(.*)_(.*).(out.data)\|(.*)/$1 $2 $3 $5/g;
$count += $5;
print $1."\t".$2.$3."\t".$5."\n";
}
close(MYINPUTFILE);
exit;
I extract this information:
Frag ncRNA Amount
10 miRNAce 6361
10 misc_RNAce 0
10 rRNAce 239
but in the Amount column I want to report those numbers divided by the total result (6600). In this case I want this output:
Frag ncRNA Amount
10 miRNAce 0.964
10 misc_RNAce 0
10 rRNAce 0.036
My problem is extract the TOTAL result in the loop...to normalize this data. Some ideas?
Perhaps the following will be helpful:
use strict;
use warnings;
my ( %hash, $total, %seen, #array );
while (<>) {
next if $seen{$_}++;
/(\d+).+?_([^.]+).+\|(\d+)$/;
$hash{$1}{$2} = $3;
$total += $3;
}
print "Frag\tncRNA\tAmount\n";
while ( my ( $key1, $val1 ) = each %hash ) {
while ( my ( $key2, $val2 ) = each %$val1 ) {
my $frac = $val2 / $total == 0 ? 0 : sprintf( '%.3f', $val2 / $total );
push #array, "$key1\t$key2\t$frac\n";
}
}
print map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, (split)[2] ] }
#array;
Output from your data set:
Frag ncRNA Amount
10 miRNA_ce 0.964
10 rRNA_ce 0.036
10 misc_RNA_ce 0
Identical lines are skipped, and then the required elements are captured from each line. A running total is kept for the subsequent calculation. Your desired output showed sorting from high to low, which is why each record is pushed onto #array. However, if sorting isn't necessary, you can just print that line and omit the Schwartzian transform on #array.
Hope this helps!
To do this you will need two passes over the data.
#! /usr/bin/env perl
use warnings;
use strict;
print join("\t",qw'Frag ncRNA Amount'),"\n";
my #data;
my $total = 0;
# parse the lines
while( <> ){
my #elem = /(.+?)(?>dvex)\d_(.+)_([^._]+)[.]out[.]data[|](d+)/;
next unless #elem;
# running total
$total += $elem[-1];
# combine $2 and $3
splice #elem, 1, 2, $2.$3; # $elem[1].$elem[2];
push #data, \#elem;
}
# print them
for( #data ){
my #copy = #$_;
$copy[-1] = $copy[-1] / $total;
$copy[-1] = sprintf('%.3f', $copy[-1]) if $copy[-1];
print join("\t",#copy),"\n";
}
Having heard about Perl for year I decided to give it a few hours of my time to see how much I could pick up. I got through the basics fine and then got to loops. As a test I wanted to see if I could build a script to recurse through all alphanumerical values of up to 4 characters. I had written a PHP code that did the same thing some time ago so I took the same concept and used it. However when I run the script it puts "a" as the first 3 values and then only loops through the last digit. Anyone see what I am doing wrong?
#!/usr/local/bin/perl
$chars = "abcdefghijklmnopqrstuvwxyz";
$chars .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$chars .= "0123456789";
#charset = split(//, $chars);
$charset_length = scalar(#charset);
sub recurse
{
($width, $position, $base_string) = #_;
for ($i = 0; $i < $charset_length; ++$i) {
$base = $base_string . $charset[$i];
if ($position < $width - 1) {
$pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print "\n";
}
}
recurse(4, 0, '');
This is what I get when I run it:
aaaa
aaab
aaac
aaad
aaae
aaaf
aaag
aaah
aaai
aaaj
aaak
aaal
aaam
aaan
aaao
aaap
aaaq
aaar
aaas
aaat
aaau
aaav
aaaw
aaax
aaay
aaaz
aaaA
aaaB
aaaC
aaaD
aaaE
aaaF
aaaG
aaaH
aaaI
aaaJ
aaaK
aaaL
aaaM
aaaN
aaaO
aaaP
aaaQ
aaaR
aaaS
aaaT
aaaU
aaaV
aaaW
aaaX
aaaY
aaaZ
aaa0
aaa1
aaa2
aaa3
aaa4
aaa5
aaa6
aaa7
aaa8
aaa9
aaa9
aaa9
aaa9
You've been bitten by non strict scoping, this code does what it should (note the use strict at the top and the subsequent use of my to guarantee variable scoping).
#!/usr/bin/env perl
use strict;
use warnings;
my $chars = "abcdefghijklmnopqrstuvwxyz";
$chars .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$chars .= "0123456789";
my #charset = split(//, $chars);
my $charset_length = scalar(#charset);
sub recurse {
my ($width, $position, $base_string) = #_;
for (my $i = 0; $i < $charset_length; ++$i) {
my $base = $base_string . $charset[$i];
if ($position < $width - 1) {
my $pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print "\n";
}
}
recurse(4, 0, '');
Already well answered, but a more idiomatic approach would be:
use strict;
use warnings;
sub recurse {
my ($width, $base_string, $charset) = #_;
if (length $base_string) {
print "$base_string\n";
}
if (length($base_string) < $width) {
$recurser->($base_string . $_) for #$charset;
}
}
my #charset = ('a'..'z', 'A'..'Z', '0'..'9');
recurse(4, '', \#charset);
There's no need to pass position; it's implicit in the width of the base string passed in. The charset, on the other hand, should be passed in rather than having the subroutine use an external variable.
Alternatively, since the width and character set stay constant, generate a closure that references them:
use strict;
use warnings;
sub make_recurser {
my ($width, $charset) = #_;
my $recurser;
$recurser = sub {
my ($base_string) = #_;
if (length $base_string) {
print "$base_string\n";
}
if (length($base_string) < $width) {
$recurser->($base_string . $_) for #$charset;
}
}
}
my #charset = ('a'..'z', 'A'..'Z', '0'..'9');
my $recurser = make_recurser(4, \#charset);
$recurser->('');
Alternatively, just:
print "$_\n" for glob(('{' . join(',', 'a'..'z', 'A'..'Z', '0'..'9') . '}') x 4);
It has to do with the scope of the variables, you're still changing the same vars when you're calling the recursion. The keyword 'my' declares the variables local to the subroutine.
(http://perl.plover.com/FAQs/Namespaces.html)
I always use perl with 'use strict;' declared, forcing me to decide on the scope of the variables.
sub recurse {
my ($width, $position, $base_string) = #_;
for (my $i = 0; $i < $charset_length; ++$i) {
my $base = $base_string . $charset[$i];
if ($position < $width - 1) {
my $pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print " ";
}
}
You seem to be running into some scoping issues. Perl is very flexible, so it is taking a guess at what you want because you haven't told it what you want. One of the first things you'll learn is to add use strict; as for your first statement after the shebang. It will point out the variables that are not being explicitly defined, as well as any variables that are accessed before being created (helps with misspelled variables, etc).
If you make your code look like this, you'll see why you are getting your errors:
sub recurse {
($width, $position, $base_string) = #_;
for ($i = 0; $i < $charset_length; ++$i) {
$base = $base_string . $charset[$i];
if ($position < $width - 1) {
$pos = $position + 1;
recurse($width, $pos, $base);
}
# print "$base\n";
}
print "$position\n";
}
This should output:
3
3
3
3
Because you are not scoping $position correctly with my, you aren't getting a new variable each recurse, you are re-using the same one. Toss a use strict; in there, and fix the errors you get, and the code should be good.
I realize that you're just tinkering with recursion. But as long as you're having fun comparing implementations between two languages you may as well also see how the CPAN can extend your tool set.
If you don't care about the order, you can generate all 13,388,280 permutations of ( 'a'..'z', 'A..'Z', '0'..'9' ) taken four at a time with the CPAN module, Algorithm::Permute
Here is an example of how that code may look.
use strict;
use warnings;
use Algorithm::Permute;
my $p = Algorithm::Permute->new(
[ 'a' .. 'z', 'A' .. 'Z', '0' .. '9' ], # Set of...
4 # <---- at a time.
);
while ( my #res = $p->next ) {
print #res, "\n";
}
The new() method accepts an array ref that enumerates the character set or list of what to permute. Its second argument is how many at a time to include in the permutation. So you're essentially taking 62 items 4 at a time. Then use the next() method to iterate through the permutations. The rest is just window dressing.
The same thing could be reduced to the following Perl one-liner:
perl -MAlgorithm::Permute -e '$p=Algorithm::Permute->new(["a".."z","A".."Z",0..9],4);print #r, "\n" while #r=$p->next;'
There is also a section on permutation, along with additional examples in perlfaq4. It includes several examples and lists some additional modules that handle the details for you. One of Perl's strengths is the size and completeness of the Comprehensive Perl Archive Network (the CPAN).