Proper way to measure time in swift - swift

I am fooling around with a few languages and I want to compare time it takes to perform some computation. I have troubles with proper time measurement in swift. I am trying solution from this answer but I get improper results-the exeution takes much longer when i run swift code.swift than after compilation and the results are telling me the opposite:
$ swiftc sort.swift -o csort
gonczor ~ Projects Learning Swift timers
$ ./csort
Swift: 27858 ns
gonczor ~ Projects Learning Swift timers
$ swift sort.swift
Swift: 22467 ns
This is the code:
iimport Dispatch
import CoreFoundation
var data = [some random integers]
func sort(data: inout [Int]){
for i in 0..<data.count{
for j in i..<data.count{
if data[i] > data[j]{
let tmp = data[i]
data[i] = data[j]
data[j] = tmp
}
}
}
}
// let start = DispatchTime.now()
// sort(data: &data)
// let stop = DispatchTime.now()
// let nanoTime = stop.uptimeNanoseconds - start.uptimeNanoseconds
// let nanoTimeDouble = Double(nanoTime) / 1_000_000_000
let startTime = clock()
sort(data: &data)
let endTime = clock()
print("Swift:\t \(endTime - startTime) ns")
Same happens when I change timer to clock() call or use CFAbsoluteTimeGetCurrent() and whether I compare 1000 or 5000 element array.
EDIT:
To be clearer. I know that pasting one run does not produce statistically meaningful results but the problem is that I see one approach takes significantly longer than the other and I am told something different.
EDIT2:
It seems I am still not expressing my problem clear enough. I have created a bash script to show the problem.
I am using time utility to check how much time it takes to execute command. Once again: I am only fooling around, I do not need statistically meaningful results. I am just wondering why the swift utilities tell me something different that I am experiencing.
Script:
#!/bin/bash
echo "swift sort.swift"
time swift sort.swift
echo "./cswift"
time ./csort
Result:
$ ./test.sh
swift sort.swift
Swift: 22651 ns
real 0m0.954s
user 0m0.845s
sys 0m0.098s
./cswift
Swift: 25388 ns
real 0m0.046s
user 0m0.033s
sys 0m0.008s
AS you can see the results using time show that it takes more or less 10 times longer to execute one command than another. And from the swift code I get info that it is more or less the same.

A couple of observations:
In terms of the best way to measure speed you can use Date or CFAbsoluteTimeGetCurrent, but you'll see that the documentation for those will warn you that
Repeated calls to this function do not guarantee monotonically increasing results.
This is effectively warning you that, in the unlikely event that there is an adjustment to the system's clock in the intervening period, the calculated elapsed time may not be entirely accurate.
It is advised to use mach_time if you need a great deal of accuracy when measuring the elapsed time. This involves some annoying CPU-specific adjustments (See Technical Q&A 1398.), but CACurrentMediaTime offers a simple alternative because it uses mach_time (which does not suffer this problem), but converts it to seconds to make it really easy to use.
The aforementioned notwithstanding, it seems that there is a more fundamental issue at play here: It looks like you're trying to reconcile a difference between to very different ways of running Swift code, namely:
swiftc hello.swift -o hello
time ./hello
and
time swift hello.swift
The former is compiling hello.swift into a stand alone executable. The latter is loading swift REPL, which then effectively interprets the Swift code.
This has nothing to do with the "proper" way to measure time. The time to execute the pre-compiled version should always be faster than invoking swift and passing it a source file. Not only is there more more overhead in invoking the latter, but the execution of the pre-compiled version is likely to be faster once execution starts, as well.
If you're really benchmarking the performance of running these routines, you should not rely on a single sort of 5000 items. I'd suggest sorting millions of items and repeating this multiple times and averaging the statistics. A single iteration of the sort is unsufficient to draw any meaningful conclusions.
Bottom line, you need to decide whether you want to benchmark just the execution of the code, but also the overhead of starting the REPL, too.

Related

swiftc compile time is more slow when using -O than not using

I have studied swift compiler ( swiftc )
I just make one swift file written about sorting algorithms. ( radix, merge, quick, heap .. )
and then I compiled with or without optimization flags ( -O , -wmo ) and checked time by using flag (-driver-time-compilation)
⬇️ result1 - not using optimization flag
⬇️ result2 - using optimization flag.
but result1 was taken 0.3544 wall time. I think wall time is really taken time.
and result2 was taken 0.9037 wall time.
I think using optimization flag should be more faster than not using.
Can you help me understand why is this?
I want to reduce compile time only using swiftc.
The times you are showing are compilation times, not execution times.
Optimizations take time and the compiler has to work harder to complete them, it's completely normal that compilation takes longer when optimizing the code.
This is in general the intended behaviour, one small disadvantage is the larger executable size that can be produced, but that's generally not an issue

Recursive Algorithm Is Much Slower in Playgrounds (1 minute) Than Xcode (0.1 seconds)

I have code to solve a sudoku board using a recursive algorithm.
The problem is that, when this code is run in Xcode, it solves the algorithm in 0.1 seconds, and when it is run in playgrounds, where I need it, it takes almost one minute.
When run in iPad, it takes about 30 seconds, but still obviously nowhere near the time it takes in xcode.
Any help or ideas would be appreciated, thank you.
Playground try to get result of each your operation and print it out (repl style)
It just slow and laggy by itself
In Xcode you can compile your code with additional optimization that speedup your code a lot (e. g. Swift Beta performance: sorting arrays)
Source files compiles as separate module, so don't forget about public/open access modifiers.
To create source files:

Can I force Swift code to utilize all cores?

Building on a previous question, I need a program that can take a percent between 0% and 100% and then utilize roughly that much of a machine's CPU, in order to test a service that triggers when a certain amount of CPU has been used. I have written some Swift code that can do this for a single core:
// workInterval is the fraction of CPU to use, between 0 (none) and 1 (all).
let workInterval: TimeInterval = <utilization>
let sleepInterval: UInt32 = UInt32((1 - workInterval) * 1_000_000)
let startDate = Date()
var sleepDate = Date()
while startDate.timeIntervalSinceNow > -<time> {
if sleepDate.timeIntervalSinceNow < (workInterval * -1) {
print("Sleep")
usleep(sleepInterval)
sleepDate = Date()
}
}
For 60% utilization, it basically checks our if condition for 0.6 seconds, and then sleeps for 0.4 seconds, repeating. This works great for whatever individual core the code runs on, but I need to make this work on all cores on a machine. Is there any way to do this in Swift? Am I better off writing this code in another language and executing that script through Swift?
(Yes, this is a very ridiculous task I have been given.)
Most likely you can achieve what you want with a concurrent queue. Add one instance of your above code to the queue for each available core. Then each of those instances should run in parallel - one on each core.
Though you might need to run one on the main queue and then run "cores - 1" instances on the concurrent queue.
But in the end you don't have any control over how the cores are utilized. The above relies on the runtime making good use of the available cores for you.

Performance of immutable set implementations in Scala

I have recently been diving into Scala and (perhaps predictably) have spent quite a bit of time studying the immutable collections API in the Scala standard library.
I am writing an application that necessarily does many +/- operations on large sets. For this reason, I want to ensure that the implementation I choose is a so-called "persistent" data structure so that I avoid doing copy-on-write. I saw this answer by Martin Odersky, but it didn't really clear up the issue for me.
I wrote the following test code to compare the performance of ListSet and HashSet for add operations:
import scala.collection.immutable._
object TestListSet extends App {
var set = new ListSet[Int]
for(i <- 0 to 100000) {
set += i
}
}
object TestHashSet extends App {
var set = new HashSet[Int]
for(i <- 0 to 100000) {
set += i
}
}
Here is a rough runtime measurement of the HashSet:
$ time scala TestHashSet
real 0m0.955s
user 0m1.192s
sys 0m0.147s
And ListSet:
$ time scala TestListSet
real 0m30.516s
user 0m30.612s
sys 0m0.168s
Cons on a singly linked list is a constant-time operation, but this performance looks linear or worse. Is this performance hit related to the need to check each element of the set for object equality to conform to the no-duplicates invariant of Set? If this is the case, I realize it's not related to "persistence".
As for official documentation, all I could find was the following page, but it seems incomplete: Scala 2.8 Collections API -- Performance Characteristics. Since ListSet seems initially to be a good choice for its memory footprint, maybe there should be some information about its performance in the API docs.
An old question but also a good example of conclusions being drawn on the wrong foundation.
Connor, basically you're trying to do a microbenchmark. That is generally not recommended and damn hard to do properly.
Why? Because the JVM is doing many other things than executing the code in your examples. It's loading classes, doing garbage collection, compiling bytecode to native code, etc. All dynamically and based on different metrics sampled at runtime.
So you cannot conclude anything about the performance of the two collections with the above test code. For example, what you could actually be measuring could be the compilation time of the += method of HashSet and garbage collection times of ListSet. So it's a comparison between apples and pears.
To do a micro benchmark properly, you should:
Warm up the JVM: Load all classes, ensure all code paths in the benchmark are run and hot spots in the code are compiled (e.g. the += method).
Run the benchmark and ensure neither the GC or the compiler runs during the test (use the JVM flags -XX:-PrintCompilation and -XX:-PrintGC). If either runs during the test, discard the result.
Repeat step 2 and sample 10-15 good measurements. Calculate variance and standard deviation.
Evaluate: If the mean of each benchmark +/- 3 std do not overlap, then you can draw a conclusion about which is faster. Otherwise, it's a blurry result (depending on the amount of overlap).
I can recommend reading Oracle's recommendations for doing micro benchmarks and a great article about benchmark pitfalls by Brian Goetz.
Also, if you want to use a good tool, which does all the above for you, try Caliper by Google.
The key line from the ListSet source is (within subclass Node):
override def +(e: A): ListSet[A] = if (contains(e)) this else new Node(e)
where you can see that an item is added only if it is not already contained. So adding to the set is O(n). You can generally assume that XMap has similar performance characteristics to XSet, and ListMap is listed as linear time all the way along. This is why, and it is how a set is supposed to behave.
P.S. In the TestHashSet case you're measuring startup time. It's way more than 30x faster.
Since a set has to have with no duplicates, before adding an element, a Set must check to see if it already contains the element. This search in a list that has no guarantee of an element's position will be O(N) linear time. The same general idea applies to its remove operation.
With a HashSet, the class defines a function that picks a location for any element in O(1), which makes the contains(element) method much quicker, at the expense of taking up more space to decrease the chance of element location collisions.

Genetic Algorithm optimization - using the -O3 flag

Working on a problem that requires a GA. I have all of that working and spent quite a bit of time trimming the fat and optimizing the code before resorting to compiler optimization. Because the GA runs as a result of user input it has to find a solution within a reasonable period of time, otherwise the UI stalls and it just won't play well at all. I got this binary GA solving a 27 variable problem in about 0.1s on an iPhone 3GS.
In order to achieve this level of performance the entire GA was coded in C, not Objective-C.
In a search for a further reduction of run time I was considering the idea of using the "-O3" optimization switch only for the solver module. I tried it and it cut run time nearly in half.
Should I be concerned about any gotcha's by setting optimization to "-O3"? Keep in mind that I am doing this at the file level and not for the entire project.
-O3 flag will make the code work in the same way as before (only faster), as long as you don't do any tricky stuff that is unsafe or otherwise dependant on what the compiler does to it.
Also, as suggested in the comments, it might be a good idea to let the computation run in a separate thread to prevent the UI from locking up. That also gives you the flexibility to make the computation more expensive, or to display a progress bar, or whatever.
Tricky stuff
Optimisation will produce unexpected results if you try to access stuff on the stack directly, or move the stack pointer somewhere else, or if you do something inherently illegal, like forget to initialise a variable (some compilers (MinGW) will set them to 0).
E.g.
int main() {
int array[10];
array[-2] = 13; // some negative value might get the return address
}
Some other tricky stuff involves the optimiser stuffing up by itself. Here's an example of when -O3 completely breaks the code.