Perl: Best way to use counterbalancing across conditions - perl

I have a script with two conditions (C1 and C2). Every time a user log in, I want this user to be assigned to one of those two conditions using a strict counterbalancing (not random assignment), such that User1(U1) is assigned to C1, U2 to C2, U3 to C1, U4 to C2, etc.
What is the simplest way to do this?
Right now, I was thinking of doing this:
my $cond;
my $out_cfile = "cfile.txt"; #intial value printed in the file is 1
open(CFILE, "+<", $out_cfile) or die "cannot open $out_cfile";
flock(CFILE, 2);
my $cdata = <CFILE>;
my $last = (substr $cdata,-1,1); #get the latest printed value
if ($last == 1) {$cond = 1; print CFILE "2";}
if ($last == 2) {$cond = 2; print CFILE "1";}
close(CFILE);
print "my condition is: $cond";
Is there a way to do this without having to open and print to an output file?

I really don't understand the pain of creating a file in the /tmp directory or whereever if it doesn't already exist, but you could use memcached instead.
use Cache::Memcached qw( );
my $memd = Cache::Memcached->new({
server => [ '127.0.0.1:11211' ],
namespace => 'app_name:',
});
my $val = $memd->incr('counter');
if ($val % 2) {
...
} else {
...
}
incr increments and fetches atomically to avoid race conditions (so it's safe like your code).

The simplest way would be to use a lock (Warning: Untested, but something like this should work):
use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue;
my $num_users_logged_in :shared = 1;
my $num_watcher_threads = 1; #it may make sense to make this larger, maybe not.
my #users = (); #populate array of users that are logging in.
my $queue = Thread::Queue->new;
$queue->enqueue(#users);
my #threads = ();
foreach(1..$num_watcher_threads)
{
push #threads, threads->create(\&log_in);
}
# NOTE: If you ever want this program to stop,
# then you'll have to enqueue some undefs (one for each thread).
# Waiting for each of the threads to finish.
# If you don't want to do this, then detach() instead
$_->join() foreach(#threads);
#Do anything else that you want after all of the threads are done.
sub log_in
{
#Threads will block until it grabs an undef in the queue
while(my $user = $queue->dequeue)
{
#Do whatever you need to do before locking the variable.
{
lock $num_users_logged_in;
#Other threads will block until we're done with this block of code.
#Assign conditions based upon the parity of $num_users_logged_in
#And do whatever else you need for that particular user
$num_users_logged_in++;
}
}
}

I'd use a tied hash, but if you want thread safe you'll need to implement some form of mutex, maybe a semaphore.

Related

How to Hash in Perl

I am finding uniques URL in a log file along with the response stamp which can be available using $line[7]. I am using Hash to get the unique URLs.
How can I get the count of Unique URL?
How can I get the average of response time along with the count of Unique URL?
With below code I am getting
url1
url2
url3
but I want it along with the average response time and count of each URL
URL Av.RT Count
url1 10.5 125
url2 9.3 356
url3 7.8 98
Code:
#!/usr/bin/perl
open(IN, "web1.txt") or die "can not open file";
# Hash to store final list of unique IPs
my %uniqueURLs = ();
my $z;
# Read log file line by line
while (<IN>) {
#line = split(" ",$_);
$uniqueURLs{$line[9]}=1;
}
# Go through the hash table and print the keys
# which are the unique IPs
for $url (keys %uniqueURLs) {
print $url . "\n";
}
store a listref in your hashing directory:
$uniqueURLs{$line[9]} = [ <avg response time>, <count> ];
adjust the elements accordingly, eg. the count:
if (defined($uniqueURLs{$line[9]})) {
# url known, increment count,
# update average response time with data from current log entry
$uniqueURLs{$line[9]}->[0] =
(($uniqueURLs{$line[9]}->[0] * $uniqueURLs{$line[9]}->[1]) + ($line[7] + 0.0))
/ ($uniqueURLs{$line[9]}->[1] + 1)
;
$uniqueURLs{$line[9]}->[1] += 1;
}
else {
# url not yet known,
# init count with 1 and average response time with actual response time from log entry
$uniqueURLs{$line[9]} = [ $line[7] + 0.0, 1 ];
}
to print results:
# Go through the hash table and print the keys
# which are the unique IPs
for $url (keys %uniqueURLs) {
printf ( "%s %f %d\n", $url, $uniqueURLs{$url}->[0], $uniqueURLs{$url}->[1]);
}
adding 0.0 will guarantee type coercion from string to float as a safeguard.
Read up on References. Also, read up on modern Perl practices which will help improve your programming skills.
Instead of just using the keys of your hash of unique URLs, you could store information in those hashes. Let's start with just a count of the unique URLs:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
WEB_FILE => "web1.txt",
};
open my $web_fh, "<", WEBFILE; #Autodie will catch this for you
my %unique_urls;
while ( my $line = <$web_fh> ) {
my $url = (split /\s+/, $line)[9];
if ( not exists $unique_urls{$url} ) { #Not really needed
$unique_urls{$url} = 0;
}
$unique_urls{$url} += 1;
}
close $web_fh;
Now, each key in your %unique_urls hash will contain the number of unique URLs you have.
This, by the way, is your code written in a bit more modern style. The use strict; and use warnings; pragmas will catch about 90% of the standard programming errors. The use autodie; will catch exceptions to things that you forget to check. In this case, the program will automatically die if the file doesn't exist.
The three parameter version of the open command is preferred, and so is using scalar variables for file handles. Using scalar variables for the file handle makes them easier to pass in subroutines, and the file will automatically close if the file handle falls out of scope.
However, we want to store in two items per hash. We want to store the unique count, and we want to store something that will help us find the average response time. This is where references come in.
In Perl, variables deal with single data items. A scalar variable (like $foo) deals with an individual data item. Arrays and Hashes (like #foo and %foo) deal with lists of individual data items. References help you get around this limitation.
Let's look at an array of people:
$person[0] = "Bob";
$person[1] = "Ted";
$person[2] = "Carol";
$person[3] = "Alice";
However, people are more than just first names. They have last names, phone numbers, addresses, etc. Let's take a look at a hash for Bob:
my %bob_hash;
$bob_hash{FIRST_NAME} = "Bob";
$bob_hash{LAST_NAME} = "Jones";
$bob_hash{PHONE} = "555-1234";
We can take a reference to this hash by putting a backslash in front of it. A reference is merely the memory address where this hash is stored:
$bob_reference = \%bob_hash;
print "$bob_reference\n": # Prints out something like HASH(0x7fbf79004140)
However, that memory address is a single item, and could be stored in our array of people!
$person[0] = $bob_reference;
If we want to get to the items in our reference, we dereference it by putting the right data type symbol in front. Since this is a hash, we will use %:
$bob_hash = %{ $person[0] };
Perl provides an easy way to dereference hashes with the -> syntax:
$person[0]->{FIRST_NAME} = "Bob";
$person[0]->{LAST_NAME} = "Jones";
$person[0]->{PHONE} = "555-1212";
We'll use the same technique in %unique_urls to store the number of times, and the total amount of response time. (Average will be total time / number of times).
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
WEB_FILE => "web1.txt",
};
open my $web_fh, "<", WEB_FILE; #Autodie will catch this for you
my %unique_urls;
while ( my $line ( <$web_fh> ) {
my $url = (split /\s+/, $line)[9];
my $response_time = (split /\s+/, $line)[10]; #Taking a guess
if ( not exists $unique_urls{$url} ) { #Not really needed
$unique_urls{$url}->{INSTANCES} = 0;
$unique_urls{$url}->{TOTAL_RESP_TIME} = 0;
}
$unique_urls{$url}->{INSTANCES} += 1;
$unique_urls{$url}->{TOTAL_RESP_TIME} += $response_time;
}
$close $web_fh;
Now we can print them out:
print "%20.20s %6s %8s\n", "URL", "INST", "AVE";
for my $url ( sort keys %unique_urls ) {
my $total_resp_time = $unique_urls{$url}->{TOTAL_RESP_TIME};
my $instances = $unique_urls{$url}->{INSTANCES};
my $average = $total_resp_time / $instances
printf "%-20.20s %-6d %-8.5f\n", $url, $instances, $average";
}
I like using printf for tables.
Instead of setting the value to 1 here:
$uniqueURLs{$line[9]}=1;
Store a data structure indicating the response time and the number of times this URL has been seen (so you can properly calculate the average). You can use an array ref, or hashref if you want. If the key doesn't exist yet, that means it hasn't been seen yet, and you can set some initial values.
# Initialize 3-element arrayref: [count, total, average]
$uniqueURLS{$line[9]} = [0, 0, 0] if not exists $uniqueURLS{$line[9]};
$uniqueURLs{$line[9]}->[0]++; # Count
$uniqueURLs{$line[9]}->[1] += $line[7]; # Total time
# Calculate average
$uniqueURLs{$line[9]}->[2] = $uniqueURLs{$line[9]}->[1] / $uniqueURLs{$line[9]}->[0];
One way you can get count of uniqueURLS is by counting the keys:
print scalar(keys %uniqueURLS); # Print number of unique url's
In your loop, you can print out the url and average time like this:
for $url (keys %uniqueURLs) {
print $url, ' - ', $uniqueURLs[$url]->[2], "seconds \n";
}

Trying to use fork to do a seemingly simple task, but failing miserably

So, basically I have a very large array that I need to read data from. I want to be able to do this in parallel; however, when I tried, I failed miserably. For the sake of simplicity, let's say I have an array with 100 elements in it. My idea was to partition the array into 10 equals parts and try to read them in parallel (10 is arbitrary, but I don't know how many processes I could run at once and 10 seemed low enough). I need to return a computation (new data structure) based off of my readings from each partition, but I am NOT modifying anything in the original array.
Instead of trying the above exactly, I tried something simpler, but I did it incorrectly, because it didn't work in any capacity. So, then I tried to simply use child processes to push to a an array. The code below is using Time::HiRes to see how much faster I can get this to run using forking as opposed to not, but I'm not at that point yet (I'm going to be testing that when I have closer to a few million entries in my array):
use strict;
use warnings;
use Time::HiRes;
print "Starting main program\n";
my %child;
my #array=();
my $counter=0;
my $start = Time::HiRes::time();
for (my $count = 1; $count <= 10; $count++)
{
my $pid = fork();
if ($pid)
{
$child{$pid}++;
}
elsif ($pid == 0)
{
addToArray(\$counter,\#array);
exit 0;
}
else
{
die "couldnt fork: $!\n";
}
}
while (keys %child)
{
my $pid = waitpid(-1,0);
delete $child{$pid};
}
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "Size of array: ".scalar(#array)."\n";
print "End of main program\n";
sub addToArray
{
my $start=shift;
my $count=${$start};
${$start}+=10;
my $array=shift;
for (my $i=$count; $i<$count +10; $i++)
{
push #{$array}, $i;
}
print scalar(#{$array})."\n";
}
NB: I used push in lieu of ${$array}[$i]=$i, because I realized that my $counter wasn't actually updating, so that would never work with this code.
I assume that this doesn't work because the children are all copies of the original program and I'm never actually adding anything to the array in my "original program". On that note, I'm very stuck. Again, the actual problem that I'm actually trying to solve is how to partition my array (with data in it) and try to read them in parallel and return a computation based off of my readings (NOTE: I'm not going to modify the original array), but I'm never going to be able to do that if I can't figure out how to actually get my $counter to update. I'd also like to know how to get the code above to do what I want it to do, but that's a secondary goal.
Once I can get my counter to update correctly, is there any chance that another process would start before it updates and I wouldn't actually be reading in the entire array? If so, how do I account for this?
Please, any help would be much appreciated. I'm very frustrated/stuck. I hope there is an easy fix. Thanks in advance.
EDIT: I attempted to use Parallel::ForkManager, but to no avail:
#!/usr/local/roadm/bin/perl
use strict;
use warnings;
use Time::HiRes;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(10);
for (my $count = 1; $count <= 10; $count++)
{
my $pid = $pm->start and next;
sub1(\$counter,\#array);
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
I didn't include the other extraneous stuff, see above for missing code/sub... Again, help would be much appreciated. I'm very new to this and kind of need someone to hold my hand. I also tried to do something with run_on_start and run_on_finish, but they didn't work either.
Your code has two issues: Your child processes share no data, and you would have a race condition if forked processes would share data. The solution is to use threads. Any possibility for race conditions can be eliminated by partitioning the data in the parent thread, and of course, by not using shared data.
Threads
Threads in Perl behave similar to forking: by default, there is no shared memory. This makes using threads quite easy. However, each thread runs it own perl interpreter, which makes threads quite costly. Use sparingly.
First, we have to activate threading support via use threads. To start a thread, we do threads->create(\&code, #args), which returns a thread object. The code will then run in a separate thread, and will be invoked with the given arguments. After the thread has finished execution, we can collect the return value by calling $thread->join. Note: The context of the threaded code is determined by the create method, not by join.
We could mark variables with the :shared attribute. Your $counter and #array would be examples for this, but it is generally better to pass explicit copies of data around than to use shared state (disclaimer: from a theoretical standpoint, that is). To avoid race conditions with the shared data, you'd actually have to protect your $counter with a semaphore, but again, there is no need for shared state.
Here is a toy program showing how you could use threads to parallelize a calculation:
use strict;
use warnings;
use threads;
use 5.010; # for `say`, and sane threads
use Test::More;
# This program calculates differences between elements of an array
my #threads;
my #array = (1, 4, 3, 5, 5, 10, 7, 8);
my #delta = ( 3, -1, 2, 0, 5, -3, 1 );
my $number_of_threads = 3;
my #partitions = partition( $#array, $number_of_threads );
say "partitions: #partitions";
for (my $lower_bound = 0; #partitions; $lower_bound += shift #partitions) {
my $upper_bound = $lower_bound + $partitions[0];
say "spawning thread with [#array[$lower_bound .. $upper_bound]]";
# pass copies of the values in the array slice to new thread:
push #threads, threads->create(\&differences, #array[$lower_bound .. $upper_bound]);
# note that threads->create was called in list context
}
my #received;
push #received, $_->join for #threads; # will block until all are finished
is_deeply \#received, \#delta;
done_testing;
# calculates the differences. This doesn't need shared memory.
# note that #array could have been safely accessed, as it is never written to
# If I had written to a (unshared) variable, these changes would have been thread-local
sub differences {
say "Hi from a worker thread, I have ", 0+#_, " elements to work on";
return map $_[$_] - $_[$_-1], 1 .. $#_;
# or more readable:
# my #d;
# for my $i (1 .. $#_) {
# push #d, $_[$i] - $_[$i-1];
# }
# return #d;
}
# divide workload into somewhat fair parts, giving earlier threads more work
sub partition {
my ($total, $parts) = #_;
my $base_size = int($total / $parts);
my #partitions = ($base_size) x $parts;
$partitions[$_-1]++ for 1 .. $total - $base_size*$parts;
return #partitions;
}
A note on the number of threads: This should depend on the number of processors of your system. If you have four cores, more than four threads don't make much sense.
If you're going to use child processes after forking, each child process is autonomous and has its own copy of the data in the program as of the time it was forked from the main program. The changes made by the child in its own memory have no effect on the parent's memory. If you need that, either you need a threading Perl and to use threads, or you need to think again — maybe using shared memory, but locating Perl data into the shared memory might be tricky.
So, one option is to read all the data into memory before forking off and having the children work on their own copies of the data.
Depending on the structure of the problem, another possibility might be to have each child read and work on a portion of the data. This won't work if each child must have access to all the data.
It isn't clear how much speed up you'll get through threading or forking if the threads or processes are all tied up reading the same file. Getting the data into memory may be best treated as a single-threaded (single-tasking) operation; the parallelism can spring into effect — and yield benefits — once the data is in memory.
There are some CPAN modules that makes your life easier. One of them is Parallel::ForkManager, which is a simple parallel processing fork manager
So, after my struggle, here's the fix:
EDIT: THIS DOES NOT ACCOMPLISH WHAT I WANTED TO DO
#!/usr/local/roadm/bin/perl
use strict;
use warnings;
use Time::HiRes;
use Parallel::ForkManager;
print "Starting main program\n";
my #array=();
my $counter=0;
my $start = Time::HiRes::time();
my $max_processes=20;
my $partition=10;
my $max_elements=100;
my $pm = Parallel::ForkManager->new($max_processes);
$pm->run_on_start( sub {
my ($pid, $exit_code, $ident) = #_;
sub1(\$counter,\#array);
});
while ($counter < $max_elements)
{
my $pid = $pm->start and next;
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "Size of array: ".scalar(#array)."\n";
print "\nEnd of main program\n";
sub sub1 {
my $start=shift;
my $count=${$start};
${$start}+=$partition;
my $array=shift;
for (my $i=$count; $i<$count + $partition; $i++)
{
push #{$array}, $i;
}
return #{$array};
}

Fork multiple children in Perl and use pipes for bi-directional communication

I am trying to create a small Perl program which has multi processing capabilities. Since there are some small changes here and there in my requirements, I am not able to find any similar sample scripts anywhere.
I need to read a big logfile from STDIN and give first N number (a big number again) of lines to the first child process and then next N number of lines to the second child process etc. I have also a constant defined which is the maximum number of child processes allowed to run concurrently. Once maximum number of children reached, parent will wait for a child to finish its job and give another N number of lines to it.
The parent process also collects a multi-line (5-10 lines) output returned by each child process when they finish and stores it in an array. Parent then continues to process this array contents and display the results finally.
Is there a better sample script which I can modify and use or could someone help me by sharing one here? I prefer using only pipes for process intercommunication and keep things simpler as much as possible.
Edit:
Can someone show an example how this can be accomplished only using pipes from IO::Handle module ?
Use Forks::Super, which makes it easy to throttle the number of simultaneous processes and handle the interprocess communication. For example,
use Forks::Super MAX_PROC => 10, # allow 10 simultaneous processes
ON_BUSY => 'queue'; # don't block when >=10 jobs are active
#loglines = <>;
# set up all the background jobs
while (#loglines > 0) {
$pid = fork {
args => [ splice #loglines, 0, $N ], # to pass to sub, below
child_fh => "out", # make child STDOUT readable by parent
sub => sub {
my #loglines = #_;
my #result = ... do something with loglines ...
print #results; # use $pid->read_stdout() to read in child
}
};
}
# get the results
while ($pid = waitpid -1, 0) {
last if $pid == -1;
my #results_from_job = $pid->read_stdout();
push #results, #results_from_job;
}
I have found threads to be much simpler for this sort of process. You need the threads and Threads::Queue modules. The process is to set up a queue to feed the worker threads and one for them to return their results. The worker thread is just a function to read a record, process it and send a result back. I just put this code together and have not tested it so it may be buggy but I think shows the general idea:
use threads ();
use Thread::Queue;
#
#
# Set limit on number of workers
#
my $MAX_THREADS = 5;
my $EOD = "\n\n";
#
#
# Need a queue to feed the workers
# and one for them to return results
#
my $Qresult = Thread::Queue->new();
my $rec;
my $n;
#
#
# load STDIN into the input queue
#
my $Qin = Thread::Queue->new(<>);
#
#
# start worker threads
#
for($n = 0; $n < $MAX_THREADS; ++$n)
{
async{ProcessRecord($n);};
$Qin->enqueue($EOD); # need terminator for each
}
#
#
#
# Wait for the results to come in
#
$n = 0;
while($n < $MAX_THREADS)
{
$rec = $q->dequeue();
if($rec eq $EOD)
{
++$n;
next;
}
:
:
:
#-- process result --#
:
:
:
threads->yield(); # let other threads get a chance
sleep 1;
}
exit;
######################################
#
#
# Worker threads draw from the queue
# when a "terminator" is read, quit;
#
sub ProcessRecord
{
my $rec;
my $result;
while(1)
{
$rec = $Qin->dequeue();
last if $rec eq $EOD;
:
:
:
#-- process record --#
:
:
:
$Qresult->enqueue($result);
threads->yield(); # let other threads get a chance
}
threads->exit();
}

Are there any good timer implementations in Perl?

I'm looking for good timer implementation in perl. The situation I met is like: I need to keep track of I/O activities of many files and for thoes files keep untouched for enough time a remove action will be taken upon them, so an efficient timer implementation is really vital for
the app I'm involved right now. To avoid recreate the wheel, ask you guys for help first.
Time::HiRes comes with perl.
Furthermore, your application sounds like it could benefit from Linux::Inotify (note the Linux:: in front). When setting the timer for a file that you want to remove after a certain time of inactivity, remember the last access. In an inotify event hook, update this time to the current time. Then, you can periodically check whether the file's lifetime expired without doing a stat on all of the files you track. On expiration, you could add a final check just to make sure nothing went wrong, of course.
If you have huge numbers of files in flight, you may want to keep the list of files sorted by expiration time. That makes the periodic check for expiration trivial.
Update: I just did a little experimentation with Linux::Inotify. Things aren't as easy with that approach as I thought. First, here's the partially working code that I didn't have time to finish.
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw/min max/;
use Time::HiRes qw/time sleep/;
use Data::Dumper;
use Linux::Inotify;
# [s], but handles subsecond granularity, too
use constant CLEANUP_INTERVAL => 1.;
use constant FILE_ACCESS_TIMEOUT => 5.;
# for fast and readable struct access
use constant FILENAME => 0;
use constant ACCESSTIME => 1;
use constant WATCHER => 2;
my $notifier = Linux::Inotify->new;
my #tracked_files = populate_tracked_files(\#ARGV, $notifier);
warn Dumper \#tracked_files;
while (1) {
# update the tracked files according to inotify events
my #events = $notifier->read;
my %files_seen_this_round;
foreach my $event (#events) {
$event->print();
my $ev_filename = $event->{name}; # part of the API, apparently
# we mave have multiple events per file.
next if $files_seen_this_round{$ev_filename}++;
# find and update the right tracked file
# TODO: this could be optimized to O(1) with a hash at
# the cost of more bookkeeping
foreach my $tfile (#tracked_files) {
if ($tfile->[FILENAME] eq $ev_filename) {
my $atime = $^T + 60*60*24 * -A $ev_filename; # update access time
$tfile->[ACCESSTIME] = $atime;
# a partial bubble sort would be hugely more efficient here!
# => O(n) from O(n*log(n))
#tracked_files = sort {$a->[ACCESSTIME] <=> $b->[ACCESSTIME]}
#tracked_files;
last;
}
} # end foreach tracked file
} # end foreach event
cleanup_files(\#tracked_files);
sleep(CLEANUP_INTERVAL);
last if not #tracked_files;
} # end while(1)
$notifier->close;
sub cleanup_files {
my $files = shift;
my $now = time();
for (my $fileno = 0; $fileno < $#{$files}; ++$fileno) {
my $file = $files->[$fileno];
if ($now - $file->[ACCESSTIME] > FILE_ACCESS_TIMEOUT) {
warn "File '" . $file->[FILENAME] . "' timed out";
# remove this file from the watch list
# (and delete in your scenario)
$file->[WATCHER]->remove;
splice #$files, $fileno, 1;
$fileno--;
}
}
}
sub populate_tracked_files {
my $files = shift;
my $notifier = shift;
my #tracked_files;
foreach my $file (#$files) {
die "Not a file: '$file'" if not -f $file;
my $watch = $notifier->add_watch($file, Linux::Inotify::ALL_EVENTS);
push #tracked_files, [$file, $^T + 60*60*24*-A $file, $watch];
}
#tracked_files = sort {$a->[ACCESSTIME] <=> $b->[ACCESSTIME]}
#tracked_files;
return #tracked_files;
}
There's still some bug in the time-checking logic. But the main problem is that $notifier->read() will block until a new event. Whereas we really just want to see whether there's a new event and then proceed to cleanup. This would have to be added to Linux::Inotify as a non-blocking read of the file descriptor. Anybody can take over maintenance of the module since the author is no longer interested.
A your program seems clearly event-driven, you would benefit of implementing it using event-driven frameworks such as POE or AnyEvent. Those have all the pieces to handle I/O events and timer events.

How can I prevent perl from reading past the end of a tied array that shrinks when accessed?

Is there any way to force Perl to call FETCHSIZE on a tied array before each call to FETCH? My tied array knows its maximum size, but could shrink from this size depending on the results of earlier FETCH calls. here is a contrived example that filters a list to only the even elements with lazy evaluation:
use warnings;
use strict;
package VarSize;
sub TIEARRAY { bless $_[1] => $_[0] }
sub FETCH {
my ($self, $index) = #_;
splice #$self, $index, 1 while $$self[$index] % 2;
$$self[$index]
}
sub FETCHSIZE {scalar #{$_[0]}}
my #source = 1 .. 10;
tie my #output => 'VarSize', [#source];
print "#output\n"; # array changes size as it is read, perl only checks size
# at the start, so it runs off the end with warnings
print "#output\n"; # knows correct size from start, no warnings
for brevity I have omitted a bunch of error checking code (such as how to deal with accesses starting from an index other than 0)
EDIT: rather than the above two print statements, if ONE of the following two lines is used, the first will work fine, the second will throw warnings.
print "$_ " for #output; # for loop "iterator context" is fine,
# checks FETCHSIZE before each FETCH, ends properly
print join " " => #output; # however a list context expansion
# calls FETCHSIZE at the start, and runs off the end
Update:
The actual module that implements a variable sized tied array is called List::Gen which is up on CPAN. The function is filter which behaves like grep, but works with List::Gen's lazy generators. Does anyone have any ideas that could make the implementation of filter better?
(the test function is similar, but returns undef in failed slots, keeping the array size constant, but that of course has different usage semantics than grep)
sub FETCH {
my ($self, $index) = #_;
my $size = $self->FETCHSIZE;
...
}
Ta da!
I suspect what you're missing is they're just methods. Methods called by tie magic, but still just methods you can call yourself.
Listing out the contents of a tied array basically boils down to this:
my #array;
my $tied_obj = tied #array;
for my $idx (0..$tied_obj->FETCHSIZE-1) {
push #array, $tied_obj->FETCH($idx);
}
return #array;
So you don't get any opportunity to control the number of iterations. Nor can FETCH reliably tell if its being called from #array or $array[$idx] or #array[#idxs]. This sucks. Ties kinda suck, and they're really slow. About 3 times slower than a normal method call and 10 times than a regular array.
Your example already breaks expectations about arrays (10 elements go in, 5 elements come out). What happen when a user asks for $array[3]? Do they get undef? Alternatives include just using the object API, if your thing doesn't behave exactly like an array pretending it does will only add confusion. Or you can use an object with array deref overloaded.
So, what you're doing can be done, but its difficult to get it to work well. What are you really trying to accomplish?
I think that order in which perl calls FETCH/FETCHSIZE methods can't be changed. It's perls internal part.
Why not just explicitly remove warnings:
sub FETCH {
my ($self, $index) = #_;
splice #$self, $index, 1 while ($$self[$index] || 0) % 2;
exists $$self[$index] ? $$self[$index] : '' ## replace '' with default value
}