Using ForkManager and Perl properly? - perl

My developer recently disappeared and I need to make a small change to my website.
Here's the code I'll be referring to:
my $process = scalar(#rows);
$process = 500 if $process > 500;
my $pm = Parallel::ForkManager->new($process);
This is code from a Perl script that scrapes data from an API system through a cron job. Every time the cron job is run, it's opening a ton of processes for that file. For example, cron-job.pl will be running 100+ times.
The number of instances it opens is based on the amount of data that needs to be checked so it's different every time, but never seems to exceed 500. Is the code above what would be causing this to happen?
I'm not familiar with using ForkManager, but from the research I've done it looks like it runs the same file multiple times, that way it'll be extracting multiple streams of data from the API system all at the same time.
The problem is that the number of instances being run is significantly slowing down the server. To lower the amount of instances, is it really as simple as changing 500 to a lower number or am I missing something?

To lower the number of instances created, yes, just lower 500 (in both cases) to something else.
Parallel::ForkManager is a way of using fork (spawning new processes) to handle parallel processing. The parameter passed to new() specifies the maximum number of concurrent processes to create.

Your code simplifies to
my $pm = Parallel::ForkManager->new(500);
It means: Limit the the number of children to 500 at any given time.
If you have fewer than 500 jobs, only that many workers will be created.
If you have more than 500 jobs, the manager will start 500 jobs, wait for one to finish, then start the next job.
If you want fewer children executing any given time, lower that number.
my $pm = Parallel::ForkManager->new(50);

Related

Wait for all LSF jobs with given name, overriding JOB_DEP_LAST_SUB = 1

I've got a large computational task, consisting of several steps, that I run on a PC cluster, managed by LSF.
Part of this task includes launching several parallel jobs with identical names. Jobs are somewhat different, therefore it is hard to transform them to a job array.
The next step of this computation, following these jobs, summarizes their results, therefore it must wait until all of them are finished.
I'm trying to use -w ended(job-name) command line switch for bsub, as usual, to specify job dependencies.
However, admins of the cluster have set JOB_DEP_LAST_SUB = 1 in lsb.params.
According to the LSF manual, this makes LSF to wait for only one most recent job with supplied name to complete, instead of all jobs.
Is it possible to override this behavior for my task only without asking admins to reconfigure the whole cluster (this cluster is used by many people, it is very unlikely that they agree)?
I cannot find any clues in the manual.
Looks like it cannot be overridden.
I've changed job names to make them unique by appending random value, then I've changed condition to -w ended(job-name-*)

How to interpret LocustIO's output / simulate short user visits

I like Locust, but I'm having a problem interpreting the results.
e.g. my use case is that I have a petition site. I expect 10,000 people to sign the petition over a 12 hour period.
I've written a locust file that simulates user behaviour:
Some users load but don't sign petition
Some users load and submit invalid data
Some users (hopefully) successfully submit.
In real life the user now goes away (because the petition is an API not a main website).
Locust shows me things like:
with 50 concurrent users the median time is 11s
with 100 concurent users the median time is 20s
But as one "Locust" just repeats the tasks over and over, it's not really like one user. If I set it up with a swarm of 1 user, then that still represents many real world users, over a period of time; e.g. in 1 minute it might do the task 5 times: that would be 5 users.
Is there a way I can interpret the data ("this means we can handle N people/hour"), or some way I can see how many "tasks" get run per second or minute etc. (ie locust gives me requests per second but not tasks)
Tasks dont really exist on the logging level in locust.
If you want, you could log your own fake samples, and use that as your task counter. This has an unfortunate side effect of inflating your request rate, but it should not impact things like average response times.
Like this:
from locust.events import request_success
...
#task(1)
def mytask(self):
# do your normal requests
request_success.fire(request_type="task", name="completed", response_time=None, response_length=0)
Here's the hacky way that I've got somewhere. I'm not happy with it and would love to hear some other answers.
Create class variables on my HttpLocust (WebsiteUser) class:
WebsiteUser.successfulTasks = 0
Then on the UserBehaviour taskset:
#task(1)
def theTaskThatIsConsideredSuccessful(self):
WebsiteUser.successfulTasks += 1
# ...do the work...
# This runs once regardless how many 'locusts'/users hatch
def setup(self):
WebsiteUser.start_time = time.time();
WebsiteUser.successfulTasks = 0
# This runs for every user when test is stopped.
# I could not find another method that did this (tried various combos)
# It doesn't matter much, you just get N copies of the result!
def on_stop(self):
took = time.time() - WebsiteUser.start_time
total = WebsiteUser.successfulTasks
avg = took/total
hr = 60*60/avg
print("{} successful\nAverage: {}s/success\n{} successful signatures per hour".format(total, avg, hr)
And then set a zero wait_time and run till it settles (or failures emerge) and then stop the test with the stop button in the web UI.
Output is like
188 successful
0.2738157498075607s/success
13147.527132862522 successful signatures per hour
I think this therefore gives me the max conceivable throughput that the server can cope with (determined by changing the No. users hatched until failures emerge, or until the average response time becomes unbearable).
Obviously real users would have pauses, but that makes it harder to test the maximums.
Drawbacks
Can't use distributed Locust instances
Messy; also can't 'reset' - have to quit the process and restart for another test.

Partial batch sizes

I'm trying to simulate pallet behavior by using batch and move to. This works fine except towards the end where the number of elements left is smaller than the batch size, and these never get picked up. Any way out of this situation?
Have tried messing with custom queues, pickup/dropoff pairs.
To elaborate, the batch object has a queue size of 15. However once the entire set has been processed a number of elements less than 15 remain which don't get picked up by the subsequent moveTo block. I need to send the agents to the subsequent block once the queue size falls below 15.
You can dynamically change the batch size of your Batch object towards "the end" (whatever you mean by that :-) ). You need to figure out when to change the batch size (as this depends on your model). But once it is time to adjust, you can call myBatchItem.set_batchSize(1) and it will now batch things together individually.
However, a better model design might be to have a cool-down period before the model end, i.e. stop taking model measurements before your batch objects run out of agents to batch.
You need to know what the last element is somehow for example using a boolean variable called isLast in your agent that is true for the last agent.
and in the batch you have to change the batch size programatically.. maybe like this in the on enter action of your batch:
if(agent.isLast)
self.set_batchSize(self.size());
To determine if the "end" or any lack of supply is reached, I suggest a timeout. I would save a timestamp in a variable lastBatchDate in the OnExit code of the batch block:
lastBatchDate = date();
A cyclically activated event checkForLeftovers will check every once in a while if there is objects waiting to be batched and the timeout (here: 10 minutes) is reached. In this case, the batch size will be reduced to exactly the number of waiting objects, in order for them to continue in a smaller batch:
if( lastBatchDate!=null //prevent a NullPointerError when date is not yet filled
&& ((date().getTime()-lastBatchDate.getTime())/1000)>600 //more than 600 seconds since last batch
&& batch.size()>0 //something is waiting
&& batch.size()<BATCH_SIZE //not more then a normal batch is waiting
){
batch.set_batchSize(batch.size()); // set the batch size to exactly the amount waiting
}
else{
batch.set_batchSize(BATCH_SIZE); // reset the batch size to the default value BATCH_SIZE
}
The model will look something like this:
However, as Benjamin already noted, you should be careful if this is what you really need to model. Take care for example on these aspects:
Is the timeout long enough to not accidentally push smaller batches during normal operations?
Is the timeout short enough to have any effect?
Is it ok to have a batch of a smaller size downstream in your process?
etc.
You might just want to make sure upstream that the number of objects reaches the batching station are always filling full batches, or you might to just stop your simulation before the line "runs dry".
You can see the model and download the source code here.

Does the Overpas API installation's Area Creation process resume if stopped/started?

The Area Creation process can take up to 24 hours. If something happens during that time which causes the process to stop, will it resume when I run it again or does it start back over from the beginning?
We can assume for this question that the files in $DB_DIR remain in place throughout the running/stopping/starting process.
It will start over from the beginning, assuming you're using areas.osm3s to define the area creation rules. This file contains a number of queries which are being executed to generate the areas. If you restart the process, it will execute those very same queries again from the beginning.
For performance reasons, we use areas_delta.osm3s and the accompanying rules_delta_loop.sh script on the production servers. This way, we can limit the workload to those areas, which have been changed since the last area creation run.

Spark shuffle read takes significant time for small data

We are running the following stage DAG and experiencing long shuffle read time for relatively small shuffle data sizes (about 19MB per task)
One interesting aspect is that waiting tasks within each executor/server have equivalent shuffle read time. Here is an example of what it means: for the following server one group of tasks waits about 7.7 minutes and another one waits about 26 s.
Here is another example from the same stage run. The figure shows 3 executors / servers each having uniform groups of tasks with equal shuffle read time. The blue group represents killed tasks due to speculative execution:
Not all executors are like that. There are some that finish all their tasks within seconds pretty much uniformly, and the size of remote read data for these tasks is the same as for the ones that wait long time on other servers.
Besides, this type of stage runs 2 times within our application runtime. The servers/executors that produce these groups of tasks with large shuffle read time are different in each stage run.
Here is an example of task stats table for one of the severs / hosts:
It looks like the code responsible for this DAG is the following:
output.write.parquet("output.parquet")
comparison.write.parquet("comparison.parquet")
output.union(comparison).write.parquet("output_comparison.parquet")
val comparison = data.union(output).except(data.intersect(output)).cache()
comparison.filter(_.abc != "M").count()
We would highly appreciate your thoughts on this.
Apparently the problem was JVM garbage collection (GC). The tasks had to wait until GC is done on the remote executors. The equivalent shuffle read time resulted from the fact that several tasks were waiting on a single remote host performing GC. We followed advise posted here and the problem decreased by an order of magnitude. There is still small correlation between GC time on remote hosts and local shuffle read time. In the future we think to try shuffle service.
Since google brought me here with the same problem but I needed another solution...
Another possible reason for small shuffle size taking a long time to read could be the data is split over many partitions. For example (apologies this is pyspark as it is all I have used):
my_df_with_many_partitions\ # say has 1000 partitions
.filter(very_specific_filter)\ # only very few rows pass
.groupby('blah')\
.count()
The shuffle write from the filter above will be very small, so for the stage after we will have a very small amount to read. But to read it you need to check a lot of empty partitions. One way to address this would be:
my_df_with_many_partitions\
.filter(very_specific_filter)\
.repartition(1)\
.groupby('blah')\
.count()