Why is while loop much slower than for loop in Swift? - swift

I'm trying to evaluate the performance of these two loop method, I tried number from 0 to 99999 using for in and while loop clause.
for i in 0..<s.count - 9 {
print("\(i)")
}
var j = 0
while j < s.count - 9 {
print("\(j)")
j = j+1
}
In both loop, will print the current number and add number by 1 until it reaches 99999.
Turns out that for in clause use 0.91 to go through every number, at same time while take much much much longer time (around 80.8).
I searched on Internet and documents, but cannot figure out why.
What cause this huge performance difference?

Related

"Appending" to an ArraySlice?

Say ...
you have about 20 Thing
very often, you do a complex calculation running through a loop of say 1000 items. The end result is a varying number around 20 each time
you don't know how many there will be until you run through the whole loop
you then want to quickly (and of course elegantly!) access the result set in many places
for performance reasons you don't want to just make a new array each time. note that unfortunately there's a differing amount so you can't just reuse the same array trivially.
What about ...
var thingsBacking = [Thing](repeating: Thing(), count: 100) // hard limit!
var things: ArraySlice<Thing> = []
func fatCalculation() {
var pin: Int = 0
// happily, no need to clean-out thingsBacking
for c in .. some huge loop {
... only some of the items (roughly 20 say) become the result
x = .. one of the result items
thingsBacking[pin] = Thing(... x, y, z )
pin += 1
}
// and then, magic of slices ...
things = thingsBacking[0..<pin]
(Then, you can do this anywhere... for t in things { .. } )
What I am wondering, is there a way you can call to an ArraySlice<Thing> to do that in one step - to "append to" an ArraySlice and avoid having to bother setting the length at the end?
So, something like this ..
things = ... set it to zero length
things.quasiAppend(x)
things.quasiAppend(x2)
things.quasiAppend(x3)
With no further effort, things now has a length of three and indeed the three items are already in the backing array.
I'm particularly interested in performance here (unusually!)
Another approach,
var thingsBacking = [Thing?](repeating: Thing(), count: 100) // hard limit!
and just set the first one after your data to nil as an end-marker. Again, you don't have to waste time zeroing. But the end marker is a nuisance.
Is there a more better way to solve this particular type of array-performance problem?
Based on MartinR's comments, it would seem that for the problem
the data points are incoming and
you don't know how many there will be until the last one (always less than a limit) and
you're having to redo the whole thing at high Hz
It would seem to be best to just:
(1) set up the array
var ra = [Thing](repeating: Thing(), count: 100) // hard limit!
(2) at the start of each run,
.removeAll(keepingCapacity: true)
(3) just go ahead and .append each one.
(4) you don't have to especially mark the end or set a length once finished.
It seems it will indeed then use the same array backing. And it of course "increases the length" as it were each time you append - and you can iterate happily at any time.
Slices - get lost!

Merge Sort algorithm efficiency

I am currently taking an online algorithms course in which the teacher doesn't give code to solve the algorithm, but rather rough pseudo code. So before taking to the internet for the answer, I decided to take a stab at it myself.
In this case, the algorithm that we were looking at is merge sort algorithm. After being given the pseudo code we also dove into analyzing the algorithm for run times against n number of items in an array. After a quick analysis, the teacher arrived at 6nlog(base2)(n) + 6n as an approximate run time for the algorithm.
The pseudo code given was for the merge portion of the algorithm only and was given as follows:
C = output [length = n]
A = 1st sorted array [n/2]
B = 2nd sorted array [n/2]
i = 1
j = 1
for k = 1 to n
if A(i) < B(j)
C(k) = A(i)
i++
else [B(j) < A(i)]
C(k) = B(j)
j++
end
end
He basically did a breakdown of the above taking 4n+2 (2 for the declarations i and j, and 4 for the number of operations performed -- the for, if, array position assignment, and iteration). He simplified this, I believe for the sake of the class, to 6n.
This all makes sense to me, my question arises from the implementation that I am performing and how it effects the algorithms and some of the tradeoffs/inefficiencies it may add.
Below is my code in swift using a playground:
func mergeSort<T:Comparable>(_ array:[T]) -> [T] {
guard array.count > 1 else { return array }
let lowerHalfArray = array[0..<array.count / 2]
let upperHalfArray = array[array.count / 2..<array.count]
let lowerSortedArray = mergeSort(array: Array(lowerHalfArray))
let upperSortedArray = mergeSort(array: Array(upperHalfArray))
return merge(lhs:lowerSortedArray, rhs:upperSortedArray)
}
func merge<T:Comparable>(lhs:[T], rhs:[T]) -> [T] {
guard lhs.count > 0 else { return rhs }
guard rhs.count > 0 else { return lhs }
var i = 0
var j = 0
var mergedArray = [T]()
let loopCount = (lhs.count + rhs.count)
for _ in 0..<loopCount {
if j == rhs.count || (i < lhs.count && lhs[i] < rhs[j]) {
mergedArray.append(lhs[i])
i += 1
} else {
mergedArray.append(rhs[j])
j += 1
}
}
return mergedArray
}
let values = [5,4,8,7,6,3,1,2,9]
let sortedValues = mergeSort(values)
My questions for this are as follows:
Do the guard statements at the start of the merge<T:Comparable> function actually make it more inefficient? Considering we are always halving the array, the only time that it will hold true is for the base case and when there is an odd number of items in the array.
This to me seems like it would actually add more processing and give minimal return since the time that it happens is when we have halved the array to the point where one has no items.
Concerning my if statement in the merge. Since it is checking more than one condition, does this effect the overall efficiency of the algorithm that I have written? If so, the effects to me seems like they vary based on when it would break out of the if statement (e.g at the first condition or the second).
Is this something that is considered heavily when analyzing algorithms, and if so how do you account for the variance when it breaks out from the algorithm?
Any other analysis/tips you can give me on what I have written would be greatly appreciated.
You will very soon learn about Big-O and Big-Theta where you don't care about exact runtimes (believe me when I say very soon, like in a lecture or two). Until then, this is what you need to know:
Yes, the guards take some time, but it is the same amount of time in every iteration. So if each iteration takes X amount of time without the guard and you do n function calls, then it takes X*n amount of time in total. Now add in the guards who take Y amount of time in each call. You now need (X+Y)*n time in total. This is a constant factor, and when n becomes very large the (X+Y) factor becomes negligible compared to the n factor. That is, if you can reduce a function X*n to (X+Y)*(log n) then it is worthwhile to add the Y amount of work because you do fewer iterations in total.
The same reasoning applies to your second question. Yes, checking "if X or Y" takes more time than checking "if X" but it is a constant factor. The extra time does not vary with the size of n.
In some languages you only check the second condition if the first fails. How do we account for that? The simplest solution is to realize that the upper bound of the number of comparisons will be 3, while the number of iterations can be potentially millions with a large n. But 3 is a constant number, so it adds at most a constant amount of work per iteration. You can go into nitty-gritty details and try to reason about the distribution of how often the first, second and third condition will be true or false, but often you don't really want to go down that road. Pretend that you always do all the comparisons.
So yes, adding the guards might be bad for your runtime if you do the same number of iterations as before. But sometimes adding extra work in each iteration can decrease the number of iterations needed.

What does for i = 1 ; i <= power ; 1 += 1) { mean?

Sorry for pretty basic question.
Just starting out. Using flowgorithm to write code that gives out a calculation of exponential numbers.
The code to do it is:
function exponential(base, power) {
var answer;
answer = 1;
var i;
for (i = 1 ; i <= power ; i+= 1) {
answer = answer * base;
}
return answer;
f
then it loops up to the number of power. And i just understand that in the flowgorithm chart but i dont understand the code for it.
what does each section of the for statement mean?
i = 1 to power, i just need help understanding how it is written? What is the 1+= 1 bit?
Thanks.
The exponential function will take in 2 parameters, base and power.
You can create this function and call (fire) it when ever it is needed like so exponential(2,4).
The for (i = 1; 1 <= power; i+=1) is somewhat of an ugly for loop.
for loops traditionaly take three parameters. The first parameter in this case i =1 is the assignment parameter, the next one 1 <= power is the valadation parameter. So if we call the function like so...exponential(2,4) is i less than 4? The next parameter is an increment/decrement parameter. but this doesnt get executed until the code inside the for loop gets executed. Once the code inside the for loop is executed then this variable i adds 1 to itself so it is now 2. This is usefull because once i is no longer less than or equal to power it will exit the for loop. So in the case of exponential(2,4) once the code inside this for loop is ran 5 times it will exit the for loop because 6 > 5.
So if we look at a variable answer, we can see that before this for loop was called answer was equal to 1. After the first iteration of this for loop answer = answer times base. In the case of exponential(2,4) then answer equals 1 times 2, now answer =2. But we have only looped through the foor loop once , and like i said a for loop goes like (assignment, validator, "code inside the foor loop". then back up to increment/decrement). So since we to loop through this for loop 5 times in the case of exponential(2,4) it will look like so.
exponential(2,4)
answer = 1 * 2
now answer = 2
answer = 2 * 2
now answer = 4
answer = 4 * 2
now answer = 8
answer = 8 * 2
now answer = 16
answer = 16 * 2
now answer = 32
So if we could say... var int ans = exponential(2,4)
Then ans would equal 32 hence, the return answer; at the last line of your code.

arc4random() and arc4random_uniform() not really random?

I have been using arc4random() and arc4random_uniform() and I always had the feeling that they wasn't exactly random, for example, I was randomly choosing values from an Array but often the values that came out were the same when I generated them multiple times in a row, so today I thought that I would use an Xcode playground to see how these functions are behaving, so I first tests arc4random_uniform to generate a number between 0 and 4, so I used this algorithm :
import Cocoa
var number = 0
for i in 1...20 {
number = Int(arc4random_uniform(5))
}
And I ran it several times, and here is how to values are evolving most of the time :
So as you can see the values are increasing and decreasing repeatedly, and once the values are at the maximum/minimum, they often stay at it during a certain time (see the first screenshot at the 5th step, the value stays at 3 during 6 steps, the problem is that it isn't at all unusual, the function actually behaves in that way most of the time in my tests.
Now, if we look at arc4random(), it's basically the same :
So here are my questions :
Why is this function behaving in this way ?
How to make it more random ?
Thank you.
EDIT :
Finally, I made two experiments that were surprising, the first one with a real dice :
What surprised me is that I wouldn't have said that it was random, since I was seeing the same sort of pattern that as described as non-random for arc4random() & arc4random_uniform(), so as Jean-Baptiste Yunès pointed out, humans aren't good to see if a sequence of numbers is really random.
I also wanted to do a more "scientific" experiment, so I made this algorithm :
import Foundation
var appeared = [0,0,0,0,0,0,0,0,0,0,0]
var numberOfGenerations = 1000
for _ in 1...numberOfGenerations {
let randomNumber = Int(arc4random_uniform(11))
appeared[randomNumber]++
}
for (number,numberOfTimes) in enumerate(appeared) {
println("\(number) appeard \(numberOfTimes) times (\(Double(numberOfGenerations)/Double(numberOfTimes))%)")
}
To see how many times each number appeared, and effectively the numbers are randomly generated, for example, here is one output from the console :
0 appeared 99 times.
1 appeared 97 times.
2 appeared 78 times.
3 appeared 80 times.
4 appeared 87 times.
5 appeared 107 times.
6 appeared 86 times.
7 appeared 97 times.
8 appeared 100 times.
9 appeared 91 times.
10 appeared 78 times.
So it's definitely OK 😊
EDIT #2 : I made again the dice experiment with more rolls, and it's still as surprising to me :
A true random sequence of numbers cannot be generated by an algorithm. They can only produce pseudo-random sequence of numbers (something that looks like a random sequence). So depending on the algorithm chosen, the quality of the "randomness" may vary. The quality of arc4random() sequences is generally considered to have a good randomness.
You cannot analyze the randomness of a sequence visually... Humans are very bad to detect randomness! They tend to find some structure where there is no. Nothing really hurts in your diagrams (except the rare subsequence of 6 three in-a-row, but that is randomness, sometimes unusual things happens). You would be surprised if you had used a dice to generate a sequence and draw its graph. Beware that a sample of only 20 numbers cannot be seriously analyzed against its randomness, your need much bigger samples.
If you need some other kind of randomness, you can try to use /dev/random pseudo-file, which generate a random number each time you read in. The sequence is generated by a mix of algorithms and external physical events that ay happens in your computer.
It depends on what you mean when you say random.
As stated in the comments, true randomness is clumpy. Long strings of repeats or close values are expected.
If this doesn't fit your requirement, then you need to better define your requirement.
Other options could include using a shuffle algorithm to dis-order things in an array, or use an low-discrepancy sequence algorithm to give a equal distribution of values.
I don’t really agree with the idea of humans who are very bad to detect randomness.
Would you be satisfied if you obtain 1-1-2-2-3-3-4-4-5-5-6-6 after throwing 6 couples of dices ? however the dices frequencies are perfect…
This is exactly the problem i’m encountering with arc4random or arc4random_uniform functions.
I’m developing a backgammon application since many years which is based on a neural network trained by word champions players. I DO know that it plays much better than any one but many users think it is cheating. I also have doubts sometimes so I’ve decided to throw all dices by myself…
I’m not satisfied at all with arc4random, even if frequencies are OK.
I always throw a couple of dices and results lead to unacceptable situations, for example : getting five consecutive double dices for the same player, waiting 12 turns (24 dices) until the first 6 occurs.
It is easy to test (C code) :
void randomDices ( int * dice1, int * dice2, int player )
{
( * dice1 ) = arc4random_uniform ( 6 ) ;
( * dice2 ) = arc4random_uniform ( 6 ) ;
// Add to your statistics
[self didRandomDice1:( * dice1 ) dice2:( * dice2 ) forPlayer:player] ;
}
Maybe arc4random doesn’t like to be called twice during a short time…
So I’ve tried several solutions and finally choose this code which runs a second level of randomization after arc4random_uniform :
int CFRandomDice ()
{
int __result = -1 ;
BOOL __found = NO ;
while ( ! __found )
{
// random int big enough but not too big
int __bigint = arc4random_uniform ( 10000 ) ;
// Searching for the first character between '1' and '6'
// in the string version of bigint :
NSString * __bigString = #( __bigint ).stringValue ;
NSInteger __nbcar = __bigString.length ;
NSInteger __i = 0 ;
while ( ( __i < __nbcar ) && ( ! __found ) )
{
unichar __ch = [__bigString characterAtIndex:__i] ;
if ( ( __ch >= '1' ) && ( __ch <= '6' ) )
{
__found = YES ;
__result = __ch - '1' + 1 ;
}
else
{
__i++ ;
}
}
}
return ( __result ) ;
}
This code create a random number with arc4random_uniform ( 10000 ), convert it to string and then searches for the first digit between ‘1’ and ‘6’ in the string.
This appeared to me as a very good way to randomize the dices because :
1/ frequencies are OK (see the statistics hereunder) ;
2/ Exceptional dice sequences occur at exceptional times.
10000 dices test:
----------
Game Stats
----------
HIM :
Total 1 = 3297
Total 2 = 3378
Total 3 = 3303
Total 4 = 3365
Total 5 = 3386
Total 6 = 3271
----------
ME :
Total 1 = 3316
Total 2 = 3289
Total 3 = 3282
Total 4 = 3467
Total 5 = 3236
Total 6 = 3410
----------
HIM doubles = 1623
ME doubles = 1648
Now I’m sure that players won’t complain…

Problem looking at data between 0 and -1

I'm trying to write a program that cleans data, using Matlab. This program takes in the max and min that the data can be, and throws out data that is less than the min or greater than the max. There looks like a small issue with the cleaning part. This case ONLY happens when the minimum range of the variable being checked is 0. If this is the case, for one reason or another, the program won't throw away data points that are between 0 and -1. I've been trying to fix this for some time now, and noticed that this is the only case where this happens, and if you try to run a SQL query selecting data that is < 0, it will leave out data between 0 and -1, so effectively the same error as what's happening to me. Wondering if anyone might recognize this and know what it could be.
I would write such a function as:
function data = cleanseData(data, limits)
limits = sort(limits);
data = data( limits(1) <= data & data <= limits(2) );
end
an example usage:
a = rand(100,1)*10;
b = cleanseData(a, [-2 5]);
c = cleanseData(a, [0 -1]);
-1 is less than 0, so 0 should be the max value. And if this is the case it will keep points between -1 and 0 by your definition of the cleaning operation:
and throws out data that is less than the min or greater than the max.
If you want to throw away (using the above definition)
data points that are between 0 and -1
then you need to set 0 as the min value and -1 as the max value --- which does not make sense.
Also, I think you mean
and throws out data that is less than the min AND greater than the max.
It may be that the floats are getting casted to ints before the comparison. I don't know matlab, but in python int(-0.5)==0, which could explain the extra data points getting in. You can test this by setting the min to -1, if you then also get values from -1 to -2 then you'll need to make sure casting isn't being done.
If I try to mimic your situation with SQL, and run the following query against a datatable that has 1.00, 0.00, -0.20, -0.80. -1.00, -1.20 and -2.00 in the column SomeVal, it correctly returns -0.20 and -0.80, which is as expected.
SELECT SomeVal
FROM SomeTable
WHERE (SomeVal < 0) AND (SomeVal > - 1)
The same is true for MatLab. Perhaps there's an error in your code. Dheck the above statement with your own SELECT statement to see if something's amiss.
I can imagine such a bug if you do something like
minimum = 0
if minimum and value < minimum