Is there any performance impact or best practice conflict regarding readablity if I do this in Swift:
var (a, b, c, d) = (0, 0, 0, 0)
Instead of this:
var a = 0, b = 0, c = 0, d = 0
The best practice is to be as clear (to other programmers) as possible with your code.
To be concerned about optimization at the level is just crazy. Donald Knuth: "Premature optimization is the root of all evil (or at least most of it) in programming.".
Related
Is there a more efficient way to write this code considering 12 variables all have the same value.
var b1 = 0, b2 = 0 , b3 = 0, b4 = 0, b5=0,b6=0,b7=0,b8=0,b9=0,b10=0,b11=0,b12=0
If you are okay with using an array instead, you might want to use this shortcut:
var b = [Int](repeating: 0, count: 12)
From struct Array documentation:
Creates a new array containing the specified number of a single, repeated value.
The range primitives that are dedicated to the built-in arrays consume their sources but one could easily design a range system that would rather be based on the .ptr of the source (at first look that's more flexible).
struct ArrayRange(T)
{
private T* _front;
private T* _back;
this(ref T[] stuff) {
_front = stuff.ptr;
_back = _front + stuff.length;
}
#property bool empty(){return _front == _back;}
#property T front(){ return *_front;}
#property T back(){return *_back;}
void popFront(){ ++_front;}
void popBack(){--_back;}
T[] array(){return _front[0 .. _back - _front];}
typeof(this) save() {
auto r = this.array.dup;
return typeof(this)(r);
}
}
void main(string[] args)
{
auto s = "bla".dup;
// here the source is 'fire-proofed'
auto rng = ArrayRange!char(s);
rng.popFront;
assert (s == "bla");
assert (rng.array == "la");
// default primitives: now the source is consumed
import std.array;
s.popFront;
assert (s == "la");
}
Why is the default system not based on pointer arithmetic since poping the front imply reallocations/less efficiency?
Any rationale for this design?
I agree with you, there is no reason to reallocate at each popFront. Good thing that's not what happens then!
The popFront mechanism is quite alike what you presented, and the source isn't consumed, only the array on which you call popFront (because, it is a pop after all). What you implemented is what happens when you slice an array: you get a reference range to the original array:
auto a = [1, 2, 3];
auto s = a[];
s.popFront;
assert(s == [2, 3]);
assert(a == [1, 2, 3]);
.dup is there to provide a copy of an array so that you can safely modify the copy without changing the original, so it copies the original array then provides an input range to this copy. Of course you can modify the copy (that's the point), and popFront will change it but still uses pointer arithmetic and without changing the source.
auto a = [1, 2, 3];
auto s = a.dup;
s.popFront;
assert(s = [2, 3]);
assert(a = [1, 2, 3]);
.dup may not seem very useful as such because we are using it with arrays, but it is really important when dealing with "pure" ranges as a range is often lazy not to consume it. As the copy of the range is consumed and not the initial one, we can safely pass this copy to a function and still have our initial lazy range intact.
Let's say I have the following:
scoringObject =
a : -1
b : 0
c : 1
d : 2
resultsArray = ['a','c','b','b','c','c','d']
Using Coffescript, how can I calculate aggregateScore (+4 in the example) ?
Since your example doesn't make much sense as is, I'm going to assume that what you have is:
resultsArray = ['a','c','b','b','c','c','d']
with the scoringObject from your post. Then you could calculate like this:
aggregateScore = 0
aggregateScore += scoringObject[k] for k in resultsArray
# => 4
Let me know if I assumed wrongly.
If you don't mind using features from ECMAScript 5, Array::reduce lets you express this kind of thing quite succinctly:
aggregateScore = resultsArray.reduce ((sum, x) -> sum + scoringObject[x]), 0
(I feel that the parameter order of reduce is quite unfortunate; the initial value should be the first one, and the reducing function the last one)
Underscore.js provides a cross-browser reduce implementation :)
What's the common practice to deal with Integer overflows like 999999*999999 (result > Integer.MAX_VALUE) from an Application Development Team point of view?
One could just make BigInt mandatory and prohibit the use of Integer, but is that a good/bad idea?
If it is extremely important that the integer not overflow, you can define your own overflow-catching operations, e.g.:
def +?+(i: Int, j: Int) = {
val ans = i.toLong + j.toLong
if (ans < Int.MinValue || ans > Int.MaxValue) {
throw new ArithmeticException("Int out of bounds")
}
ans.toInt
}
You may be able to use the enrich-your-library pattern to turn this into operators; if the JVM manages to do escape analysis properly, you won't get too much of a penalty for it:
class SafePlusInt(i: Int) {
def +?+(j: Int) = { /* as before, except without i param */ }
}
implicit def int_can_be_safe(i: Int) = new SafePlusInt(i)
For example:
scala> 1000000000 +?+ 1000000000
res0: Int = 2000000000
scala> 2000000000 +?+ 2000000000
java.lang.ArithmeticException: Int out of bounds
at SafePlusInt.$plus$qmark$plus(<console>:12)
...
If it is not extremely important, then standard unit testing and code reviews and such should catch the problem in the large majority of cases. Using BigInt is possible, but will slow your arithmetic down by a factor of 100 or so, and won't help you when you have to use an existing method that takes an Int.
By far the most common practice regarding integer overflows is that programmers are expected to know that the issue exists, to watch for cases where they might happen, and to make the appropriate checks or rearrange the math so that overflows won't happen, things like doing a * (b / c) rather than (a * b) / c . If the project uses unit test, they will include cases to try and force overflows to happen.
I have never worked on or seen code from a team that required more than that, so I'm going to say that's good enough for almost all software.
The one embedded application I've seen that actually, honest-to-spaghetti-monster NEEDED to prevent overflows, they did it by proving that overflows weren't possible in each line where it looked like they might happen.
If you're using Scala (and based on the tag I'm assuming you are), one very generic solution is to write your library code against the scala.math.Integral type class:
def naturals[A](implicit f: Integral[A]) =
Stream.iterate(f.one)(f.plus(_, f.one))
You can also use context bounds and Integral.Implicits for nicer syntax:
import scala.math.Integral.Implicits._
def squares[A: Integral] = naturals.map(n => n * n)
Now you can use these methods with either Int or Long or BigInt as needed, since instances of Integral exist for all of them:
scala> squares[Int].take(10).toList
res0: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
scala> squares[Long].take(10).toList
res0: List[Long] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
scala> squares[BigInt].take(10).toList
res1: List[BigInt] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
No need to change the library code: just use Long or BigInt where overflow is a concern and Int otherwise.
You will pay some penalty in terms of performance, but the genericity and the ability to defer the Int-or-BigInt decision may be worth it.
In addition to simple mindfulness, as noted by #mjfgates, there are a couple of practices that I always use when dealing with scaled-decimal (non-floating-point) real-world quantities. This may not be on point for your particular application - apologies in advance if not.
First, if there are multiple units of measure in use, values must always clearly identify what they are. This can be by naming convention, or by using a separate class for each unit of measure. I've always just used names - a suffix on every variable name. In addition to eliminating errors from confusion over the units, it encourages thinking about overflow because the measures are less likely to be thought of as just numbers.
Second, my most frequent source of overflow concern is usually rescaling - converting from one measure to another - when it requires a lot of significant digits. For example, the conversion factor from cm to inches is 0.393700787402. In order to avoid both overflow and loss of significant digits, you need to be careful to multiply and divide in the right order. I haven't done this in a long time, but I believe what you want is something like:
Add to Rational.scala, from The Book:
def rescale(i:Int) : Int = {
(i * (numer/denom)) + (i/denom * (numer % denom))
Then you get as results (shortened from a specs2 test):
val InchesToCm = new Rational(1000000000,393700787)
InchesToCm.rescale(393700787) must_== 1000000000
InchesToCm.rescale(1) must_== 2
This doesn't round, or deal with negative scaling factors.
A production implementation may want to factor out numer/denom and numer % denom.
I'm attempting to hash the values
10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0
I need a function that will map them to an array that has a size of 13 without causing any collisions.
I've spent several hours thinking this over and googling and can't figure this out. I haven't come close to a viable solution.
How would I go about finding a hash function of this sort? I've played with gperf, but I don't really understand it and I couldn't get the results I was looking for.
if you know the exact keys then it is trivial to produce a perfect hash function -
int hash (int n) {
switch (n) {
case 10: return 0;
case 100: return 1;
case 32: return 2;
// ...
default: return -1;
}
}
Found One
I tried a few things and found one semi-manually:
(n ^ 28) % 13
The semi-manual part was the following ruby script that I used to test candidate functions with a range of parameters:
t = [10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0]
(1..200).each do |i|
t2 = t.map { |e| (e ^ i) % 13 }
puts i if t2.uniq.length == t.length
end
On some platforms (e.g. embedded), modulo operation is expensive, so % 13 is better avoided. But AND operation of low-order bits is cheap, and equivalent to modulo of a power-of-2.
I tried writing a simple program (in Python) to search for a perfect hash of your 11 data points, using simple forms such as ((x << a) ^ (x << b)) & 0xF (where & 0xF is equivalent to % 16, giving a result in the range 0..15, for example). I was able to find the following collision-free hash which gives an index in the range 0..15 (expressed as a C macro):
#define HASH(x) ((((x) << 2) ^ ((x) >> 2)) & 0xF)
Here is the Python program I used:
data = [ 10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0 ]
def shift_right(value, shift_value):
"""Shift right that allows for negative values, which shift left
(Python shift operator doesn't allow negative shift values)"""
if shift_value == None:
return 0
if shift_value < 0:
return value << (-shift_value)
else:
return value >> shift_value
def find_hash():
def hashf(val, i, j = None, k = None):
return (shift_right(val, i) ^ shift_right(val, j) ^ shift_right(val, k)) & 0xF
for i in xrange(-7, 8):
for j in xrange(i, 8):
#for k in xrange(j, 8):
#j = None
k = None
outputs = set()
for val in data:
hash_val = hashf(val, i, j, k)
if hash_val >= 13:
pass
#break
if hash_val in outputs:
break
else:
outputs.add(hash_val)
else:
print i, j, k, outputs
if __name__ == '__main__':
find_hash()
Bob Jenkins has a program for this too: http://burtleburtle.net/bob/hash/perfect.html
Unless you're very lucky, there's no "nice" perfect hash function for a given dataset. Perfect hashing algorithms usually use a simple hashing function on the keys (using enough bits so it's collision-free) then use a table to finish it off.
Just some quasi-analytical ramblings:
In your set of numbers, eleven in all, three are odd and eight are even.
Looking at the simplest forms of hashing - %13 - will give you the following hash values:
10 - 3,
100 - 9,
32 - 6,
45 - 6,
58 - 6,
126 - 9,
3 - 3,
29 - 3,
200 - 5,
400 - 10,
0 - 0
Which, of course, is unusable due to the number of collisions. Something more elaborate is needed.
Why state the obvious?
Considering that the numbers are so few any elaborate - or rather, "less simple" - algorithm will likely be slower than either the switch statement or (which I prefer) simply searching through an unsigned short/long vector of size eleven positions and using the index of the match.
Why use a vector search?
You can fine-tune it by placing the most often occuring values towards the beginning of the vector.
I assume the purpose is to plug in the hash index into a switch with nice, sequential numbering. In that light it seems wasteful to first use a switch to find the index and then plug it into another switch. Maybe you should consider not using hashing at all and go directly to the final switch?
The switch version of hashing cannot be fine-tuned and, due to the widely differing values, will cause the compiler to generate a binary search tree which will result in a lot of comparisons and conditional/other jumps (especially costly) which take time (I've assumed you've turned to hashing for its speed) and require space.
If you want to speed up the vector search additionally and are using an x86-system you can implement a vector search based on the assembler instructions repne scasw (short)/repne scasd (long) which will be much faster. After a setup time of a few instructions you will find the first entry in one instruction and the last in eleven followed by a few instructions cleanup. This means 5-10 instructions best case and 15-20 worst. This should beat the switch-based hashing in all but maybe one or two cases.
I did a quick check and using the SHA256 hash function and then doing modular division by 13 worked when I tried it in Mathematica. For c++ this function should be in the openssl library. See this post.
If you were doing a lot of hashing and lookup though, modular division is a pretty expensive operation to do repeatedly. There is another way of mapping an n-bit hash function into a i-bit indices. See this post by Michael Mitzenmacher about how to do it with a bit shift operation in C. Hope that helps.
Try the following which maps your n values to unique indices between 0 and 12
(1369%(n+1))%13