Can I use ScalaMeter with no input? - scala

I want to benchmark the runtime of several methods in my Scala application, and I am looking into using ScalaMeter. Let's say I want to measure the time of a method called doSomething().
I only want to call doSomething and measure the time it takes to run once. However all of the documentation I see for ScalaMeter requires providing some kind of input, whether it is a series of integers, a string, or something.
Is it possible to use ScalaMeter to do what I am asking? Is it an appropriate use case?

It is possible, but it would be a waste of time.
As you're probably aware, ScalaMeter is designed to remove the effects of variation in function execution times, so that it's possible to accurately benchmark those execution times. For example, you might want to verify that a function completes within a required time, or to determine whether its performance is maintained over time as changes are made to the code base.
Why is that so challenging? Well, there's a number of obstacles to overcome:
The JVM has a number of different options for executing the resulting Java bytecode in a program. Some (such as the Zero VM) just interprets code; others utilize just-in-time (JIT) compilation to optimize translation into the host CPU's machine code; the HotSpot Server VM aggressively improves performance over time, so that code performance incrementally improves the longer it runs. For benchmarking purposes, the HotSpot Client VM performs very good optimization and reaches a steady-state quickly, which therefore allows us to start measuring performance rapidly. However, we still need to allow the JIT compiler to warm up, and so we must disregard the first few, slower executions (runs) that would otherwise bias our results. ScalaMeter does a pretty good job of undertaking this warmup by itself, but the number of runs to be discarded is configurable.
The JVM performs a number of garbage collection (GC) cycles, seemingly at random, which can similarly slow down performance when they occur. ScalaMeter can be configured to ignore executions in which GC cycles occurred.
The host machine's load can vary as it executes threads from other processes running on the same machine. These also potentially slow down execution times. ScalaMeter deals with this by considering only the fastest observed time in a fixed number of runs, rather than by taking an average.
If you're running from SBT, a forked JVM execution session will perform better, and with less variation, than one that shares the same JVM instance as SBT (because more of the SBT JVM's resources will be in use).
Virtual memory page faults (in which the memory making up the application's working set is switched to/from a paging file) will also randomly impact performance.
The performance of many functions will depend upon its arguments (and, if you're not into functional programming, shared mutuble state). Tying performance to argument values is also something ScalaMeter is good at, through it's use of generators. (For example, consider a size operation on a List—it will clearly take longer to execute as the number of elements in the List increases.)
Etc. You can find more on these issues in the ScalaMeter Getting Started Introduction.
Clearly, benchmarks should be performed on the same host machine so that the results are comparable, since CPU, OS, Memory, BIOS Config, etc. all affect performance too.
So, having explained all that, you will understand why ScalaMeter needs to execute the same function a lot! ;-)
In your case, doSomething() takes no arguments, so you can use a Gen[T].single generator that identifies the class or object to which doSomething() belongs, which will look something like the following:
Note: This is written as a ScalaMeter test, and so the source should be under src/test/scala:
import org.scalameter.api._
import org.scalameter.picklers.Implicits._
object MyBenchmark
extends Bench.ForkedTime {
// We have no arguments. Instead, create a single "generator" that identifies the class or
// object that doSomething belongs to. This assumes doSomething() belongs to object
// MyObject.
val owner = Gen.single("owner")(MyObject)
// Measure MyObject.doSomething()'s performance.
performance of "MyObject" in {
measure method "doSomething()" in {
using(owner) in {
_.doSomething()
}
}
}
}
(BTW: I would have thought that benchmarking functions with no arguments would be more straightforward than this, but this is the best I've been able to come up with so far. If anyone has a better idea, please add a comment and let me know!)
So, if all of that is overkill, you might want to try something like this:
// Measure nanoseconds taken to execute by name argument.
def measureTime(x: => Unit): Long = {
val start = System.nanoTime()
x
// Calculate how long that took and return the value.
System.nanoTime() - start
}
measureTime {
doSomething()
}
You'll only execute the function once, and the time taken will be wildly different each time.

Related

How to delete memory usage during an Experiment?

I am constructing an experiment in Anylogic, which saves data in the Parameter variation tab under a custom-class list. The model needs to perform a lot of simulations, and repetitions to optimize for Setting variables in the model itself. After x amount of iterations, I use a Python connector to run some code in finding new possible parameters for the underlaying model.
The problem I am having right now, is that around Simulation-run number 200, the memory usage is maximum (4Gb), and it proceeds to run super-slow. I have found some interesting ways to cut on memory usage, but I believe there is only one thing that could help me right now: let the system delete memory that is used for past iterations. After each iteration, the data of a simulation is stored, so I am fine with anylogic deleting the logs of the specific simulation afterwards.
Is such a thing possible? If so, how can I implement that?
Java makes use of a Garbage collector to manage memory usage and you have no control over it. How it works is that every now and then, based on some internal logic, it will collect and remove all instances of classes in memory that do not contain any active references and remove them.
Thus to reduce memory you must ensure that any instances that are no longer needed are not referenced by any of the objects currently active in your model.
To identify these you must use a Java profiler like JProfiler, or some of the free alternatives - see here for more.
This will show you exactly what classes are using up all your memory and with some deep diving you should be able to identify who is keeping reference to them.

Measure the function-level memory usage in Scala in my Application Code during runtime

First of all, this is not an "off-line" profiling task!
I am working on some SCala codebase, and currently what I am trying to do is, if a function foo consumes too many memory (let's say over 10G), kill this function and return a default value.
So it should look like:
monitor{
foo() <--- if foo has used over 10G memory, just cut it off
}
catch {
case MemoryUsageError => default_value
}
Note that currently foo is running in the same process with my main function.
Is it possible to do so? I quickly googled such materials and only find a way to show the current memory usage of a SCala application; it is not as fine-grained as what I am looking for.
Am I clear on this? Could anyone shed some lights here? Thanks a lot!
========================================================================
Note that what I am looking for is an "online" method! It is not like off-line profiling. My application ifself should determine the memory usage of foo function, and if it goes too high, just cut it off.
Is it possible?
In general jvm doesn't track creator of objects allocated on heap and place of creation. This is very costly and doesn't matter for GC.
How to live with it
Termination
Self-controlled program. If you want terminate some continuous computation then computation shouldn't be continuous. What you need is check points where condition could be validated. For example, every start of iteration in a loop or at the beginning of every recursive call. Obviously computation could consist of several different stages instead of simple loop but approach is the same.
Separation of computation and control. For example, execute function as Future with predetermined Thread and interrupt it if needed or using ForkJoinTask and cancel() method.
Measurement
Usually only one or couple of classes fulfill most of the memory. If instances are about the same size then memory control could be implemented with counter of objects. Classes of 'heavy' objects could be find by inspection of algorithm or using jvisualvm. Increase counter during instance creation. Decrement is harder. Update counter when references are released (count instances that couldn't be removed by GC) or use PhantomReference (count all instances existed in VM). But don't use finalize()!
Second method is java instrumentation package. It allows to measure objects size (probably there are methods determining consumption of all objects of certain class). Also you could try measuring available memory. The flaw is you measure objects of not certain function but all of them.
For time control write down timestamp at the beginning of computation and measure duration at every check point.

Scala immutable collections cannot be shared without synchronization?

From the «Learning concurrent programming in Scala» book:
In current versions of Scala (2.11.1), however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Could anyone demonstrate this with a small example? And does this still apply to 2.11.7?
The behavior of changes made in one thread when viewed from another is governed by the Java Memory Model. In particular, these rules are extremely weak when it comes to something like building a collection and then passing the built-and-now-immutable collection to another thread. The JMM does not guarantee that the other thread won't see an earlier view where the collection was not fully built!
Since synchronized blocks enforce an ordering, they can be used to get a consistent view if they're used on every single operation.
In practice, though, this is rarely actually necessary. On the CPU side, there is typically a memory barrier operation that can be used to enforce memory consistency (i.e. if you write the tail of your list and then pass a memory barrier, no other thread can see the tail un-set). And in practice, JVMs usually have to implement synchronized by using memory barriers. So one could hope that you could just pass the created list within a synchronzied block, trusting that a memory barrier would be issued, and everything thereafter would be fine.
Unfortunately, the JMM doesn't require that it be implemented in this way (and you can't assume that the memory-barrier-like behavior of object creation will actually be a full memory barrier that applies to everything in that thread as opposed to simply the final fields of that object), which is both why the recommendation is what it is, and why it's not fixed (yet, anyway) in the library.
For what it's worth, on x86 architectures, I've never observed a problem if you hand off the immutable object within a synchronized block. I have observed problems if you try to do it with CAS (e.g. by using the java.util.concurrent.atomic classes).
As an addition to the excellent answer from Rex Kerr:
it should be noted that most common use cases of immutable collections in a multithreading context are not affected by this problem. The only situation where this might affect you is when you do something that you probably should not do in the first place.
E.g. you have a variable var x: Vector[Int], which you write from one thread A and read from another thread B.
If you mark x with #volatile, there will be no problem, since the volatile write introduces a memory barrier. So you will never be able to observe the Vector in an inconsistent state. The same is true when using a synchronized { } block when writing and reading, or when using java.util.concurrent.atomic.AtomicReference.
If you don't mark x with #volatile, you might observe the vector in an inconsistent state (not just wrong elements, but internally inconsistent!). But in that case your code is arguably broken to begin with. It is completely undefined when you will see the changes from A in B.
You might see them
immediately
after there is a memory barrier somewhere else in your program
not at all
depending on the architecture you`re running on, the phase of the moon, whatever. So as Viktor Klang put it: "Unsafe publication is unsafe..."
Note that if you use a higher level concurrency framework such as akka actors, it is also guaranteed that receivers of messages can not see immutable collections in an inconsistent state.

Importance of knowing if a standard library function is executing a system call

Is it actually important for a programmer to know if the standard library function he/she is using is actually executing a system call? If so, why?
Intuitively I'm guessing the only importance is in knowing if the general standard function is a library function or a system call itself. In other cases, I'm guessing there isn't much of a need to know if a library functions uses internally a system call?
It is not always possible to know (for sure) if a library function wraps a system call. But in one way or another, this knowledge can help improve the portability and (or) efficiency of your program. At least in the following two cases, knowing the syscall-level behaviours of your program is helpful.
When your program is time critical. Some system calls are expensive, and the library functions that wrap them are even more expensive. Thus time-critical tasks may need to switch to equivalent functions that do not enter kernel space at all.
It is also worth noticing the vsyscall (or vdso) mechanism of linux, which accelerates some system calls (i.e. gettimeofday) through mapping their implementations into user-space memory. See this for more details.
When your program needs to be deployed to some restricted environments with system call auditing. In order for your programs to survive such environments, it could be necessary to profile your program for any potential policy violations, or perhaps less tough if you are aware of the restrictions when you wrote the program.
Sometimes it might be important, and sometimes it isn't. I don't think there's any universal answer to this question. Reasons I can think of that might be important in some contexts are: if the system call requires user permissions that the user might not have; in performance critical code a system call might be too heavyweight; if you're writing a signal-handler where most system calls are forbidden; if it might use some system resource (e.g. reading from /dev/random for every random number could use up the whole entropy pool - you'd want to know if that's going to happen every time you call rand()).

What is the smallest unit of work that is sensible to parallelize with actors?

While Scala actors are described as light-weight, Akka actors even more so, there is obviously some overhead to using them.
So my question is, what is the smallest unit of work that is worth parallelising with Actors (assuming it can be parallelized)? Is it only worth it if there is some potentially latency or there are a lot of heavy calculations?
I'm looking for a general rule of thumb that I can easily apply in my everyday work.
EDIT: The answers so far have made me realise that what I'm interested in is perhaps actually the inverse of the question that I originally asked. So:
Assuming that structuring my program with actors is a very good fit, and therefore incurs no extra development overhead (or even incurs less development overhead than a non-actor implementation would), but the units of work it performs are quite small - is there a point at which using actors would be damaging in terms of performance and should be avoided?
Whether to use actors is not primarily a question of the unit of work, its main benefit is to make concurrent programs easier to get right. In exchange for this, you need to model your solution according to a different paradigm.
So, you need to decide first whether to use concurrency at all (which may be due to performance or correctness) and then whether to use actors. The latter is very much a matter of taste, although with Akka 2.0 I would need good reasons not to, since you get distributability (up & out) essentially for free with very little overhead.
If you still want to decide the other way around, a rule of thumb from our performance tests might be that the target message processing rate should not be higher than a few million per second.
My rule of thumb--for everyday work--is that if it takes milliseconds then it's potentially worth parallelizing. Although the transaction rates are higher than that (usually no more than a few 10s of microseconds of overhead), I like to stay well away from overhead-dominated cases. Of course, it may need to take much longer than a few milliseconds to actually be worth parallelizing. You always have to balance time time taken by writing more code against the time saved running it.
If no side effects are expected in work units then it is better to make decision for work splitting in run-time:
protected T compute() {
if (r – l <= T1 || getSurplusQueuedTaskCount() >= T2)
return problem.solve(l, r);
// decompose
}
Where:
T1 = N / (L * Runtime.getRuntime.availableProcessors())
N - Size of work in units
L = 8..16 - Load factor, configured manually
T2 = 1..3 - Max length of work queue after all stealings
Here is presentation with much more details and figures:
http://shipilev.net/pub/talks/jeeconf-May2012-forkjoin.pdf