Assignment vs if not equal to then assign... Swift being the language I care most about - swift

I have some code that will get ran every so often. As far as performance goes, is there any difference between the following to satements, and if so, which one is faster?
num = 4
vs
if num != 4 {
num = 4
}
I understand the difference is probably minimal, but I have had this question run through my mind on occasion. Also I would be interested in the closely related questions to this that might use a Bool or String instead of an Int.

The first one is faster for sure, because the processor has to do 1 instruction, which takes 1 clock cycle. In the second one there is at least 1 instruction or more (Comparison and optional assignment).
Assuming we have this code:
var x = 0
x = 4
Here are the important lines of the assembly (swiftc -emit-assembly):
movq $0, __Tv4test3numSi(%rip) // Assigns 0 to the variable
movq $4, __Tv4test3numSi(%rip) // Assigns 4 to the variable
As you can see, a single instruction is needed
And with this code:
var x = 0
if x != 4 {
x = 4
}
Assembly:
movq $0, __Tv4test3numSi(%rip) // Assign 0 to the variable
cmpq $4, __Tv4test3numSi(%rip) // Compare variable with 4
je LBB0_4 // If comparison was equal, jump to LBB0_4
movq $4, __Tv4test3numSi(%rip) // Otherwise set variable to 4
LBB0_4:
xorl %eax, %eax // Zeroes the eax register (standard for every label)
As you can see, the second one uses either 3 instructions (when already equal to 4) or 4 instructions (when not equal 4).
It was clear to me from the beginning, but the assembly nicely demonstrates that the second one can't be faster.

It's faster for me to read "num = 4". About three times faster. That's what I care about. You are attempting a microoptimisation of very dubious value, so I would be really worried if I saw someone writing that kind of code.

I believe this check
if num != 4 { ... }
if faster than an assignment
num = 4
Scenario #1
So if most of the times num is equals to 4 you should skip the useless assignment and use this code
if num != 4 {
num = 4
}
Scenario #2
On the other hand if most of the times num is different from 4 you could remove the check and juts go with the assignment
num = 4

Related

Dot product in Scala without heap allocation

I have a Scala project with some intensive arithmetics, and it sometimes allocates Floats faster than the GC can clean them up. (This is not about memory leaks caused by retained references, just fast memory consumption for temporary values.) I try to use Arrays with primitive types, and reuse them when I can, but still some new allocations sneak in.
One piece that puzzles me, for instance:
import org.specs2.mutable.Specification
class CalcTest extends Specification {
def dot(a: Array[Float], b: Array[Float]): Float = {
require(a.length == b.length, "array size mismatch")
val n = a.length
var sum: Float = 0f
var i = 0
while (i < n) {
sum += a(i) * b(i)
i += 1
}
sum
}
val vector = Array.tabulate(1000)(_.toFloat)
"calculation" should {
"use memory sparingly" >> {
val before = Runtime.getRuntime().freeMemory()
for (i <- 0 to 1000000)
dot(vector, vector)
val after = Runtime.getRuntime().freeMemory()
(before - after) must be_<(1000L) // actual result above 4M
}
}
}
I would expect it to compute the dot products using only stack memory, but apparently it allocates about 4 bytes per call on the heap. This may not sound like much, but it adds up quickly in my code.
I was suspecting the sum, but from the bytecode output, looks like it is on the stack:
aload 1
arraylength
istore 3
fconst_0
fstore 4
iconst_0
istore 5
l2
iload 5
iload 3
if_icmpge l3
fload 4
aload 1
iload 5
faload
aload 2
iload 5
faload
fmul
fadd
fstore 4
iload 5
iconst_1
iadd
istore 5
_goto l2
l3
fload 4
freturn
Is it the return value that goes on the heap? Is there any way to avoid this overhead entirely? Is there a better way to investigate and solve such memory problems?
From the visualVM output for my project, I only see that i have an awful lot of Floats allocated. It is hard to track there small objects like that, being allocated rapidly. It is more useful for large objects and memory snapshots taken at long intervals.
Update:
I was so focused on the function code, I missed the problem in the test. If I rewrite it with a while loop, it succeeds:
var i = 0
while (i < 1000000) {
dot(vector, vector)
i += 1
}
I would still appreciate more ideas for other ways to debug this sort of issues, in addition to tests like this and using visualVM memory snapshots.
Range implementation in
for (i <- 0 to 1000000)
dot(vector, vector)
might use some memory, or just be slow enough to let JVM allocate something else in background and break the fragile measurement method used in the test.
Try to modify these lines into a while loop, for example.
(The original version of this post said that for() was equivalent to map(), which was wrong. It is equivalent to foreach() here because it does not have a yield clause.)

How Jump instruction is executed based on value of Out- The Alu Output

Figure from The Elements of Computer System (Nand2Tetris)
Have a look at the scenario where
j1 = 1 (out < 0 )
j2 = 0 (out = 0 )
j3 = 1 (out > 0 )
How this scenario is possible as out < 0 is true as well as out > 0 but out = 0 is false. How out can have both positive and negative values at the same time?
In other words when JNE instruction is going to execute although it theoretically seems possible to me but practically its not?
If out < 0, the jump is executed if j1 = 1.
If out = 0, the jump is executed if j2 = 1.
If out > 0, the jump is executed if j3 = 1.
Hopefully now you can understand the table better. In particular, JNE is executed if out is non-zero, and is skipped if out is zero.
The mnemonic makes sense if those are match-any conditions, not match-all. i.e. jump if the difference is greater or less than zero, but not if it is zero.
Specifically, sub x, y / jne target works the usual way: it jumps if x and y were equal before the subtraction. (So the subtraction result is zero). This is what the if(out!=0) jump in the Effect column is talking about.
IDK the syntax for Nand2Tetris, but hopefully the idea is clear.
BTW, on x86 JNZ is a synonym for JNE, so you can use whichever one is semantically relevant. JNE only really makes sense after something that works as a compare, even though most operations set ZF based on whether the result is zero or not.

How to do bitwise operation decently?

I'm doing analysis on binary data. Suppose I have two uint8 data values:
a = uint8(0xAB);
b = uint8(0xCD);
I want to take the lower two bits from a, and whole content from b, to make a 10 bit value. In C-style, it should be like:
(a[2:1] << 8) | b
I tried bitget:
bitget(a,2:-1:1)
But this just gave me separate [1, 1] logical type values, which is not a scalar, and cannot be used in the bitshift operation later.
My current solution is:
Make a|b (a or b):
temp1 = bitor(bitshift(uint16(a), 8), uint16(b));
Left shift six bits to get rid of the higher six bits from a:
temp2 = bitshift(temp1, 6);
Right shift six bits to get rid of lower zeros from the previous result:
temp3 = bitshift(temp2, -6);
Putting all these on one line:
result = bitshift(bitshift(bitor(bitshift(uint16(a), 8), uint16(b)), 6), -6);
This is doesn't seem efficient, right? I only want to get (a[2:1] << 8) | b, and it takes a long expression to get the value.
Please let me know if there's well-known solution for this problem.
Since you are using Octave, you can make use of bitpack and bitunpack:
octave> a = bitunpack (uint8 (0xAB))
a =
1 1 0 1 0 1 0 1
octave> B = bitunpack (uint8 (0xCD))
B =
1 0 1 1 0 0 1 1
Once you have them in this form, it's dead easy to do what you want:
octave> [B A(1:2)]
ans =
1 0 1 1 0 0 1 1 1 1
Then simply pad with zeros accordingly and pack it back into an integer:
octave> postpad ([B A(1:2)], 16, false)
ans =
1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0
octave> bitpack (ans, "uint16")
ans = 973
That or is equivalent to an addition when dealing with integers
result = bitshift(bi2de(bitget(a,1:2)),8) + b;
e.g
a = 01010111
b = 10010010
result = 00000011 100010010
= a[2]*2^9 + a[1]*2^8 + b
an alternative method could be
result = mod(a,2^x)*2^y + b;
where the x is the number of bits you want to extract from a and y is the number of bits of a and b, in your case:
result = mod(a,4)*256 + b;
an extra alternative solution close to the C solution:
result = bitor(bitshift(bitand(a,3), 8), b);
I think it is important to explain exactly what "(a[2:1] << 8) | b" is doing.
In assembly, referencing individual bits is a single operation. Assume all operations take the exact same time and "efficient" a[2:1] starts looking extremely inefficient.
The convenience statement actually does (a & 0x03).
If your compiler actually converts a uint8 to a uint16 based on how much it was shifted, this is not a 'free' operation, per se. Effectively, what your compiler will do is first clear the "memory" to the size of uint16 and then copy "a" into the location. This requires an extra step (clearing the "memory" (register)) that wouldn't normally be needed.
This means your statement actually is (uint16(a & 0x03) << 8) | uint16(b)
Now yes, because you're doing a power of two shift, you could just move a into AH, move b into AL, and AH by 0x03 and move it all out but that's a compiler optimization and not what your C code said to do.
The point is that directly translating that statement into matlab yields
bitor(bitshift(uint16(bitand(a,3)),8),uint16(b))
But, it should be noted that while it is not as TERSE as (a[2:1] << 8) | b, the number of "high level operations" is the same.
Note that all scripting languages are going to be very slow upon initiating each instruction, but will complete said instruction rapidly. The terse nature of Python isn't because "terse is better" but to create simple structures that the language can recognize so it can easily go into vectorized operations mode and start executing code very quickly.
The point here is that you have an "overhead" cost for calling bitand; but when operating on an array it will use SSE and that "overhead" is only paid once. The JIT (just in time) compiler, which optimizes script languages by reducing overhead calls and creating temporary machine code for currently executing sections of code MAY be able to recognize that the type checks for a chain of bitwise operations need only occur on the initial inputs, hence further reducing runtime.
Very high level languages are quite different (and frustrating) from high level languages such as C. You are giving up a large amount of control over code execution for ease of code production; whether matlab actually has implemented uint8 or if it is actually using a double and truncating it, you do not know. A bitwise operation on a native uint8 is extremely fast, but to convert from float to uint8, perform bitwise operation, and convert back is slow. (Historically, Matlab used doubles for everything and only rounded according to what 'type' you specified)
Even now, octave 4.0.3 has a compiled bitshift function that, for bitshift(ones('uint32'),-32) results in it wrapping back to 1. BRILLIANT! VHLL place you at the mercy of the language, it isn't about how terse or how verbose you write the code, it's how the blasted language decides to interpret it and execute machine level code. So instead of shifting, uint32(floor(ones / (2^32))) is actually FASTER and more accurate.

arc4random() and arc4random_uniform() not really random?

I have been using arc4random() and arc4random_uniform() and I always had the feeling that they wasn't exactly random, for example, I was randomly choosing values from an Array but often the values that came out were the same when I generated them multiple times in a row, so today I thought that I would use an Xcode playground to see how these functions are behaving, so I first tests arc4random_uniform to generate a number between 0 and 4, so I used this algorithm :
import Cocoa
var number = 0
for i in 1...20 {
number = Int(arc4random_uniform(5))
}
And I ran it several times, and here is how to values are evolving most of the time :
So as you can see the values are increasing and decreasing repeatedly, and once the values are at the maximum/minimum, they often stay at it during a certain time (see the first screenshot at the 5th step, the value stays at 3 during 6 steps, the problem is that it isn't at all unusual, the function actually behaves in that way most of the time in my tests.
Now, if we look at arc4random(), it's basically the same :
So here are my questions :
Why is this function behaving in this way ?
How to make it more random ?
Thank you.
EDIT :
Finally, I made two experiments that were surprising, the first one with a real dice :
What surprised me is that I wouldn't have said that it was random, since I was seeing the same sort of pattern that as described as non-random for arc4random() & arc4random_uniform(), so as Jean-Baptiste Yunès pointed out, humans aren't good to see if a sequence of numbers is really random.
I also wanted to do a more "scientific" experiment, so I made this algorithm :
import Foundation
var appeared = [0,0,0,0,0,0,0,0,0,0,0]
var numberOfGenerations = 1000
for _ in 1...numberOfGenerations {
let randomNumber = Int(arc4random_uniform(11))
appeared[randomNumber]++
}
for (number,numberOfTimes) in enumerate(appeared) {
println("\(number) appeard \(numberOfTimes) times (\(Double(numberOfGenerations)/Double(numberOfTimes))%)")
}
To see how many times each number appeared, and effectively the numbers are randomly generated, for example, here is one output from the console :
0 appeared 99 times.
1 appeared 97 times.
2 appeared 78 times.
3 appeared 80 times.
4 appeared 87 times.
5 appeared 107 times.
6 appeared 86 times.
7 appeared 97 times.
8 appeared 100 times.
9 appeared 91 times.
10 appeared 78 times.
So it's definitely OK 😊
EDIT #2 : I made again the dice experiment with more rolls, and it's still as surprising to me :
A true random sequence of numbers cannot be generated by an algorithm. They can only produce pseudo-random sequence of numbers (something that looks like a random sequence). So depending on the algorithm chosen, the quality of the "randomness" may vary. The quality of arc4random() sequences is generally considered to have a good randomness.
You cannot analyze the randomness of a sequence visually... Humans are very bad to detect randomness! They tend to find some structure where there is no. Nothing really hurts in your diagrams (except the rare subsequence of 6 three in-a-row, but that is randomness, sometimes unusual things happens). You would be surprised if you had used a dice to generate a sequence and draw its graph. Beware that a sample of only 20 numbers cannot be seriously analyzed against its randomness, your need much bigger samples.
If you need some other kind of randomness, you can try to use /dev/random pseudo-file, which generate a random number each time you read in. The sequence is generated by a mix of algorithms and external physical events that ay happens in your computer.
It depends on what you mean when you say random.
As stated in the comments, true randomness is clumpy. Long strings of repeats or close values are expected.
If this doesn't fit your requirement, then you need to better define your requirement.
Other options could include using a shuffle algorithm to dis-order things in an array, or use an low-discrepancy sequence algorithm to give a equal distribution of values.
I don’t really agree with the idea of humans who are very bad to detect randomness.
Would you be satisfied if you obtain 1-1-2-2-3-3-4-4-5-5-6-6 after throwing 6 couples of dices ? however the dices frequencies are perfect…
This is exactly the problem i’m encountering with arc4random or arc4random_uniform functions.
I’m developing a backgammon application since many years which is based on a neural network trained by word champions players. I DO know that it plays much better than any one but many users think it is cheating. I also have doubts sometimes so I’ve decided to throw all dices by myself…
I’m not satisfied at all with arc4random, even if frequencies are OK.
I always throw a couple of dices and results lead to unacceptable situations, for example : getting five consecutive double dices for the same player, waiting 12 turns (24 dices) until the first 6 occurs.
It is easy to test (C code) :
void randomDices ( int * dice1, int * dice2, int player )
{
( * dice1 ) = arc4random_uniform ( 6 ) ;
( * dice2 ) = arc4random_uniform ( 6 ) ;
// Add to your statistics
[self didRandomDice1:( * dice1 ) dice2:( * dice2 ) forPlayer:player] ;
}
Maybe arc4random doesn’t like to be called twice during a short time…
So I’ve tried several solutions and finally choose this code which runs a second level of randomization after arc4random_uniform :
int CFRandomDice ()
{
int __result = -1 ;
BOOL __found = NO ;
while ( ! __found )
{
// random int big enough but not too big
int __bigint = arc4random_uniform ( 10000 ) ;
// Searching for the first character between '1' and '6'
// in the string version of bigint :
NSString * __bigString = #( __bigint ).stringValue ;
NSInteger __nbcar = __bigString.length ;
NSInteger __i = 0 ;
while ( ( __i < __nbcar ) && ( ! __found ) )
{
unichar __ch = [__bigString characterAtIndex:__i] ;
if ( ( __ch >= '1' ) && ( __ch <= '6' ) )
{
__found = YES ;
__result = __ch - '1' + 1 ;
}
else
{
__i++ ;
}
}
}
return ( __result ) ;
}
This code create a random number with arc4random_uniform ( 10000 ), convert it to string and then searches for the first digit between ‘1’ and ‘6’ in the string.
This appeared to me as a very good way to randomize the dices because :
1/ frequencies are OK (see the statistics hereunder) ;
2/ Exceptional dice sequences occur at exceptional times.
10000 dices test:
----------
Game Stats
----------
HIM :
Total 1 = 3297
Total 2 = 3378
Total 3 = 3303
Total 4 = 3365
Total 5 = 3386
Total 6 = 3271
----------
ME :
Total 1 = 3316
Total 2 = 3289
Total 3 = 3282
Total 4 = 3467
Total 5 = 3236
Total 6 = 3410
----------
HIM doubles = 1623
ME doubles = 1648
Now I’m sure that players won’t complain…

While loop in CoffeeScript

I'm new to CoffeeScript and have been reading the book, The Little Book on CoffeeScript. Here are a few lines from the book's Chapter 2 which confused me while reading :
The only low-level loop that CoffeeScript exposes is the while loop. This has similar behavior to the while loop in pure JavaScript, but has the added advantage that it returns an array of results, i.e. like the Array.prototype.map() function.
num = 6
minstrel = while num -= 1
num + " Brave Sir Robin ran away"
Though it may look good for a CoffeeScript programmer, being a newbie, I'm unable to understand what the code does. Moreover, the words returns an array of results doesn't seem to go together with the fact that while is a loop construct, not a function. So the notion of it returning something seems confusing. Furthermore, the variable num with the string "Brave Sir Robin ran away" in every iteration of the loop seems to be awkward, as the value num is being used as the loop counter.
I would be thankful if you could explain the behavior of the code and perhaps illustrate what the author is trying to convey with simpler examples.
Wow! I didn't know that but it absolutely makes sense if you remember that Coffeescript always returns the last expression of a "block".
So in your case it returns (not via the "return" statement if that is what confuses you) the expression
num + " Brave Sir Robin ran away"
from the block associated with the while condition and as you will return multiple such expressions it pushes them on an array.
Have a look on the generated JavaScript and it might be clearer as the generated code is pretty much procedural
var minstrel, num;
num = 6;
minstrel = (function() {
var _results;
_results = [];
while (num -= 1) {
_results.push(num + " Brave Sir Robin ran away");
}
return _results;
})();
I hope that makes sense to you.
Beware, that function call can be very inefficient!
Below is a prime factors generator
'use strict'
exports.generate = (number) ->
return [] if number < 2
primes = []
candidate = 1
while number > 1
candidate++
while number % candidate is 0
primes.push candidate
number /= candidate
candidate = number - 1 if Math.sqrt(number) < candidate
primes
This is the version using while as expression
'use strict'
exports.generate = (number) ->
return [] if number < 2
candidate = 1
while number > 1
candidate++
primes = while number % candidate is 0
number /= candidate
candidate
candidate = number - 1 if Math.sqrt(number) < candidate
primes
First version ran my tests in 4 milliseconds, the last one takes 18 milliseconds. I believe the reason is the generated closure which returns the primes.