Speed problem with SPOJ occurence counting in perl - perl

I'm having a problem with a task similar to this one:
click (translated) (the one I was assigned with has way bigger tests and a lower time limit). A quick translation of the task:
Write a program that checks how many times the given number occurred in a given sequence.
Input: Given number, how many numbers are in the sequence, the sequence of numbers
Output: The number of occurrences
My solutions so far:
1:
#!/usr/bin/env perl
while (<>) {
$in = $_;
#nums = split / /, $in, 3;
$what = shift #nums;
shift #nums;
$rest = shift #nums;
$rest = " ".$rest." ";
$sum = () = $rest =~ /(?<=\s)$what(?=\s)/g;
print $sum;
print "\n";
}
2:
#!/usr/bin/env perl
while (<>) {
$in = $_;
#nums = split / /, $in, 3;
$what = shift #nums;
shift #nums;
$rest = shift #nums;
$rest = " ".$rest." ";
if(!$reg{$what}){
$reg{$what} = qr/(?<=\s)$what(?=\s)/;
}
$sum = () = $rest =~ /$reg{$what}/g;
print $sum;
print "\n";
}
I also tried the brute force approach, hash tables, grep... All exceed the given time limit, and I've got no idea how to write anything that will work faster than the above two. Any ideas?
edit: After getting rid of copying lists (turns out the numbers can also be negative):
#!/usr/bin/env perl
while ($line = <>) {
$line =~ s/^(-?\d+) \d+//;
$what = $1;
$sum = () = $line =~ / $what\b/g;
print $sum;
print "\n";
}
edit2: via http://www.chengfu.net/2005/10/count-occurrences-perl/:
print $sum = (($line =~ s/ $1\b//g)+0);
resulted in 2x faster code than:
print $sum = () = $line =~ / $1\b/g;
Works now, thanks :)

For one thing, you're doing an awful lot of copying. I've marked each time you copy a large string in your first example:
while (<>) {
$in = $_; # COPY
#nums = split / /, $in, 3; # COPY
$what = shift #nums;
shift #nums;
$rest = shift #nums; # COPY
$rest = " ".$rest." "; # COPY
$sum = () = $rest =~ /(?<=\s)$what(?=\s)/g;
print $sum;
print "\n";
}
To speed things up, avoid the copies. For example, use while ($in = <>) (or just skip $in and use $_).
For extracting $what and the count, I think I'd try this instead of split:
$in =~ s/^(\d+) \d+//;
$what = $1;
Instead of adding a space fore and aft, just use \b instead of lookarounds with \s.
$sum = () = $in =~ /\b$what\b/g;

Related

How to separate an array in Perl based on pattern

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.
my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}
split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;
Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

Perl - sort filenames by order based on the filemask YYY-MM-DD that is in the filename

Need some help, not grasping a solution here on what method I should use.
I need to scan a directory and obtain the filenames by order of
1.YYYY-MM-DD, YYYY-MM-DD is part of the filename.
2. Machinename which is at the start of the filename to the left of the first "."
For example
Machine1.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine2.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine3.output.log.2014-02-26
So that it outputs in an array as follows
Machine1.output.log.2014-02-26
Machine2.output.log.2014-02-26
Machine3.output.log.2014-02-26
Machine1.output.log.2014-02-27
Machine2.output.log.2014-02-27
Thanks,
Often, temporarily turning your strings into a hash or array for sorting purposes, and then turning them back into the original strings is the most maintainable way.
my #filenames = qw/
Machine1.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine2.output.log.2014-02-26
Machine2.output.log.2014-02-27
Machine3.output.log.2014-02-26
/;
#filenames =
map $_->{'orig_string'},
sort {
$a->{'date'} cmp $b->{'date'} || $a->{'machine_name'} cmp $b->{'machine_name'}
}
map {
my %attributes;
#attributes{ qw/orig_string machine_name date/ } = /\A(([^.]+)\..*\.([^.]+))\z/;
%attributes ? \%attributes : ()
} #filenames;
You can define your own sort like so ...
my #files = (
"Abc1.xxx.log.2014-02-26"
, "Abc1.xxx.log.2014-02-27"
, "Abc2.xxx.log.2014-02-26"
, "Abc2.xxx.log.2014-02-27"
, "Abc3.xxx.log.2014-02-26"
);
foreach my $i ( #files ) { print "$i\n"; }
sub bydate {
(split /\./, $a)[3] cmp (split /\./, $b)[3];
}
print "sort it\n";
foreach my $i ( sort bydate #files ) { print "$i\n"; }
You can take your pattern 'YYYY-MM-DD' and match it to what you need.
#!/usr/bin/perl
use strict;
opendir (DIRFILES, ".") || die "can not open data file \n";
my #maplist = readdir(DIRFILES);
closedir(MAPS);
my %somehash;
foreach my $tmp (#maplist) {
next if $tmp =~ /^.{1,2}$/;
next if $tmp =~ /test/;
$tmp =~ /(\d{4})-(\d{2})-(\d{2})/;
$somehash{$tmp} = $1 . $2 . $3; # keep the original file name
# allows for duplicate dates
}
foreach my $tmp (keys %somehash) {
print "-->", $tmp, " --> " , $somehash{$tmp},"\n";
}
my #list= sort { $somehash{$a} <=> $somehash{$b} } keys(%somehash);
foreach my $tmp (#list) {
print $tmp, "\n";
}
Works, tested it with touch files.

cant retrieve values from hash reversal (Perl)

I've initialized a hash with Names and their class ranking as follows
a=>5,b=>2,c=>1,d=>3,e=>5
I've this code so far
my %Ranks = reverse %Class; #As I need to find out who's ranked first
print "\nFirst place goes to.... ", $Ranks{1};
The code only prints out
"First place goes to...."
I want it to print out
First place goes to....c
Could you tell me where' I'm going wrong here?
The class hash prints correctly
but If I try to print the reversed hash using
foreach $t (keys %Ranks) {
print "\n $t $Ranks{$t}"; }
It prints
5
abc23
cab2
ord
If this helps in any way
FULL CODE
#Script to read from the data file and initialize it into a hash
my %Code;
my %Ranks;
#Check whether the file exists
open(fh, "Task1.txt") or die "The File Does Not Exist!\n", $!;
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}
close(fh);
#Prints the dataset
print "Code \t Name\n";
foreach $code ( keys %Code) {
print "$code \t $Code{$code}\n";
}
#Find out who comes first
my %Ranks = reverse %Class;
foreach $t (keys %Ranks)
{
print "\n $t $Ranks{$t}";
}
print "\nFirst place goes to.... ", $Ranks{1}, "\n";
When you want to check what your data structures actually contain, use Data::Dumper. use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper(\%Class));. You'll find un-chomped newlines.
You need to use chomp. At present your $fields[2] value has a trailing newline.
Change your file read loop to this
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}

Perl - How to change every $variable occurrence of ";" in a string

Very new here so be gentle. :)
Here is the jist of what I want to do:
I want to take a string that is made up of numbers separated by semi-colons (ex. 6;7;8;9;1;17;4;5;90) and replace every "X" number of semicolons with a "\n" instead. The "X" number will be defined by the user.
So if:
$string = "6;7;8;9;1;17;4;5;90";
$Nth_number_of_semicolons_to_replace = 3;
The output should be:
6;7;8\n9;1;17\n4;5;90
I've found lots on changing the Nth occurrence of something but I haven't been able to find anything on changing every Nth occurrence of something like I am trying to describe above.
Thanks for all your help!
use List::MoreUtils qw(natatime);
my $input_string = "6;7;8;9;1;17;4;5;90";
my $it = natatime 3, split(";", $input_string);
my $output_string;
while (my #vals = $it->()) {
$output_string .= join(";", #vals)."\n";
}
Here is a quick and dirty answer.
my $input_string = "6;7;8;9;1;17;4;5;90";
my $count = 0;
$input_string =~ s/;/++$count % 3 ? ";" : "\n"/eg;
Don't have time for a full answer now, but this should get you started.
$string = "6;7;8;9;1;17;4;5;90";
$Nth_number_of_semicolons_to_replace = 3;
my $regexp = '(' . ('\d+;' x ($Nth_number_of_semicolons_to_replace - 1)) . '\d+);';
$string =~ s{ $regexp ) ; }{$1\n}xsmg
sub split_x{
my($str,$num,$sep) = #_;
return unless defined $str;
$num ||= 1;
$sep = ';' unless defined $sep;
my #return;
my #tmp = split $sep, $str;
while( #tmp >= $num ){
push #return, join $sep, splice #tmp, 0, $num;
}
push #return, join $sep, #tmp if #tmp;
return #return;
}
print "$_\n" for split_x '6;7;8;9;1;17;4;5;90', 3
print join( ',', split_x( '6;7;8;9;1;17;4;5;90', 3 ) ), "\n";
my $string = "6;7;8;9;1;17;4;5;90";
my $Nth_number_of_semicolons_to_replace = 3;
my $num = $Nth_number_of_semicolons_to_replace - 1;
$string =~ s{ ( (?:[^;]+;){$num} [^;]+ ) ; }{$1\n}gx;
print $string;
prints:
6;7;8
9;1;17
4;5;90
The regex explained:
s{
( # start of capture group 1
(?:[^;]+;){$num} # any number of non ';' characters followed by a ';'
# repeated $num times
[^;]+ # any non ';' characters
) # end of capture group
; # the ';' to replace
}{$1\n}gx; # replace with capture group 1 followed by a new line
If you've got 5.10 or higher, this could do the trick:
#!/usr/bin/perl
use strict;
use warnings;
my $string = '1;2;3;4;5;6;7;8;9;0';
my $n = 3;
my $search = ';.*?' x ($n -1);
print "string before: [$string]\n";
$string =~ s/$search\K;/\n/g;
print "print string after: [$string]\n";
HTH,
Paul

How can I iterate through nested arrays?

I have created an array as follows
while (defined ($line = `<STDIN>`))
{
chomp ($line);
push #stack,($line);
}
each line has two numbers.
15 6
2 8
how do iterate over each item in each line?
i.e. I want to print
15
6
2
8
I understand it's something like
foreach (#{stack}) (#stack){
print "?????
}
This is where I am stuck.
See the perldsc documentation. That's the Perl Data Structures Cookbook, which has examples for dealing with arrays of arrays. From what you're doing though, it doesn't look like you need an array of arrays.
For your problem of taking two numbers per line and outputting one number per line, just turn the whitespace into newlines:
while( <> ) {
s/\s+/\n/; # turn all whitespace runs into newlines
print; # it's ready to print
}
With Perl 5.10, you can use the new \h character class that matches only horizontal whitespace:
while( <> ) {
s/\h+/\n/; # turn all horizontal whitespace runs into newlines
print; # it's ready to print
}
As a Perl one-liner, that's just:
% perl -pe 's/\h+/\n/' file.txt
#!/usr/bin/perl
use strict;
use warnings;
while ( my $data = <DATA> ) {
my #values = split ' ', $data;
print $_, "\n" for #values;
}
__DATA__
15 6
2 8
Output:
C:\Temp> h
15
6
2
8
Alternatively, if you want to store each line in #stack and print out later:
my #stack = map { [ split ] } grep { chomp; length } <DATA>;
The line above slurps everything coming from the DATA filehandle into a list of lines (because <DATA> happens in list context). The grep chomps each line and filters by length after chomping (to avoid getting any trailing empty lines in the data file -- you can avoid it if there are none). The map then splits each line along spaces, and then creates an anonymous array reference for each line. Finally, such array references are stored in each element of #stack. You might want to use Data::Dumper to look at #stack to understand what's going on.
print join("\n", #$_), "\n" for #stack;
Now, we look over each entry in stack, dereferencing each array in turn, then joining the elements of each array with newlines to print one element per line.
Output:
C:\Temp> h
15
6
2
8
The long way of writing essentially the same thing (with less memory consumption) would be:
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, \#values;
}
for my $ref ( #stack ) {
print join("\n", #$ref), "\n";
}
Finally, if you wanted do something other than printing all values, say, sum all the numbers, you should store one value per element of #stack:
use List::Util qw( sum );
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, #values;
}
printf "The sum is %d\n", sum #stack;
#!/usr/bin/perl
while ($line = <STDIN>) {
chomp ($line);
push #stack, $line;
}
# prints each line
foreach $line (#stack) {
print "$line\n";
}
# splits each line into items using ' ' as separator
# and prints the items
foreach $line (#stack) {
#items = split / /, $line;
foreach $item (#items) {
print $item . "\n";
}
}
I use 'for' for "C" style loops, and 'foreach' for iterating over lists.
#!/usr/bin/perl
use strict;
use warnings;
open IN, "< read.txt" or
die "Can't read in 'read.txt'!";
my $content = join '', <IN>;
while ($content =~ m`(\d+)`g) {
print "$1\n";
}