GCD test - to test dependency between loop statements - compiler-optimization

I understand how the GCD works on a trivial example as below:
for(i=1; i<=100; i++)
{
X[2*i+3] = X[2*i] + 50;
}
we first transform it into the following form:
X[a*i + b] and X[c*i + d]
a=2, b=3, c=2, d=0 and GCD(a,c)=2 and (d-b) is -3. Since 2 does not divide -3, no dependence is possible.
But how can we do this GCD test on a doubly nested loop?
For example:
for (i=0; i<10; i++){
for (j=0; j<10; j++){
A[1+2*i + 20*j] = A[2+20*i + 2*j);
}
}

While the subscripts can be delinearized, the GCD test is simple to apply directly. In your example, the subscript pair is [1+2*i + 20*j] and [2+20*i + 2*j], so we're looking for an integer solution to the equation
1 + 2*i + 20*j = 2 + 20*i' + 2*j'
Rearranging, we get
2*i - 20*i' + 20*j - 2*j = 1
Compute the GCD of all the coefficients, 2, -20, 20, and -2, and see if it divides the constant. In this case, the GCD is 2. Since 2 doesn't divide 1, there's no dependence.

The "easy" way to apply GCD in the nested loop case is to apply it only in cases where the arrays themselves are multidemsional; i.e., the original source code uses multiple subscripts rather than already linearized expressions. Of course if you can "back transform" these linearized subscripts then you'll have the equivalent.
Once you've cast the problem as a multidemsional problem then you may simply apply the GCD test "dimension by dimension". If any dimension shows no dependence then you can stop and declare there is no dependence for the entire multidemsional subscripting sequence.
The key of course is that casting as a multidimensional indexing problem gives you the nice property that there's a one-to-one mapping between individual index values and the corresponding index expression tuples. Without this the problem is harder.
This is the approach I took in the ASC Fortran vectorizing compiler back in the 70's and it worked pretty well, particularly used in conjunction with directional subscript analysis for the non disjoint case. The GCD test by itself is really not sufficient, but it does give you a relatively inexpensive way of making an early decision in your analysis in those cases where you then can avoid the more expensive dependence analysis.

Related

Random pivot selection for quicksort not working

I am trying to choose a random index for quicksort, but for some reason, the array is not sorting. In fact, the algorithm returns a different array (ex. input [2,1,4] and [1,1,4] is outputted) sometimes. Any help would be much appreciated. This algorithm works if, instead of choosing a random index, I always choose the first element of the array as the pivot.
def quicksort(array):
if len(array) < 2:
return array
else:
random_pivot_index = randint(0, len(array) - 1)
pivot = array[random_pivot_index]
less = [i for i in array[1:] if i =< pivot]
greater = [i for i in array[1:] if i > pivot]
return quicksort(less) + [pivot] + quicksort(greater)
less = [i for i in array[1:] if i =< pivot]
You're including elements equal to the pivot value in less here.
But here, you also include the pivot value explicitly in the result:
return quicksort(less) + [pivot] + quicksort(greater)
Instead try it with just:
return quicksort(less) + quicksort(greater)
Incidentally, though this does divide-and-conquer in the same way as QuickSort does, it's not really an implementation of that algorithm: Actual QuickSort sorts the elements in place - your version will suffer from the run-time overhead associated with allocating and concatenating the utility arrays.

z3py: How to implement a counter in z3?

I want to design logics similar to a counter in Z3py.
If writing python script, we usually define a variable "counter" and then keep incrementing it when necessary. However, in Z3, there is no variant. Therefore, instead of defining an variant, I define a trace of that variant.
This is a sample code. Suppose there is an array "myArray" of size 5, and the elements in the array are 1 or 2. I want to assert a constraint that there must be two '2's in "myArray"
from z3 import *
s = Solver()
myArray = IntVector('myArray',5)
for i in range(5):
s.add(Or(myArray[i]==1,myArray[i]==2))
counterTrace = IntVector('counterTrace',6)
s.add(counterTrace[0]==0)
for i in range(5):
s.add(If(myArray[i]==2,counterTrace[i+1]==counterTrace[i]+1,counterTrace[i+1]==counterTrace[i]))
s.add(counterTrace[5]==2)
print s.check()
print s.model()
My question is that is this an efficient way of implementing the concept of counter in Z3? In my real problem, which is more complicated, this is really inefficient.
You can do this but it is much easier to create the sum over myArray[i] == 2 ? 1 : 0. That way you don't need to assert anything and you are dealing with normal expressions.

For each element A[i] of array A, find the closest j such that A[j] > A[i]

Given : An array A[1..n] of real numbers.
Goal : An array D[1..n] such that
D[i] = min{ distance(i,j) : A[j] > A[i] }
or some default value (like 0) when there is no higher-valued element. I would really like to use Euclidean distance here.
Example :
A = [-1.35, 3.03, 0.73, -0.06, 0.71, -0.21, -0.12, 1.49, 1.41, 1.42]
D = [1, 0, 1, 1, 2, 1, 1, 6, 1, 2]
Is there any way to beat the obvious O(n^2) solution? The only progress I've made so far is that D[i] = 1 whenever A[i] is not a local maxima. I've been thinking a lot and have come up with NOTHING. I hope to eventually extend this to 2D (so A and D are matrices).
So I've puzzled on this a bit but I haven't come up with anything better that works. A few ideas:
Augment the array with extra information that can be gained in O(n) time or better. e.g., add indices, difference between neighbors, etc.
Would sorting (O(n(log n)) help in any way?
Seems like dynamic programming could be helpful here, if you can figure out a way to solve for each element based on the solution for its neighbors (augmenting the answers with information like the j for each A[i] instead of just the distance maybe).
Sort the array from highest to lowest element. If I understand your problem correctly, this gives you the answer immediately, since the closest bigger element to any element in the original list is the one before it. This way you don't even need to create the D[] array, since computation of its contents can be done using the array A[] exclusively. The first element in the sorted A[] array does not have a bigger friend so the answer for it would be your default valye ( 0 perhaps?). Extending the algorithm for matrices might be easy (depends on how you "look" at the matrix) - just use a mapping function which sort of transofrms the matrix into a 1D array.

In what circumstances can a compiler change the execution order of programme statements?

If this is not a real question then feel free to close ;)
Not only the compiler can reorder execution (mostly for optimization), most modern processors do so, too. Read more about execution reordering and memory barriers.
The compiler can change the execution order of statements when it sees fit for optimization purposes, and when such changes wouldn't alter the observable behavior of the code.
A very simple example -
int func (int value)
{
int result = value*2;
if (value > 10)
{
return result;
}
else
{
return 0;
}
}
A naive compiler can generate code for this in exactly the sequence shown. First calculate "result" and return it only if the original value is larger than 10 (if it isn't, "result" would be ignored - calculated needlessly).
A sane compiler, though, would see that the calculation of "result" is only needed when "value" is larger than 10, so may easily move the calculation "value*2" inside the first braces and only do it if "value" is actually larger than 10 (needless to mention, the compiler doesn't really look at the C code when optimizing - it works in lower levels).
This is only a simple example. Much more complicated examples can be created. It is very possible that a C function would end up looking almost nothing like its C representation in compiled form, with aggressive enough optimizations.
Many compilers use something called "common subexpression elimination". For example, if you had the following code:
for(int i=0; i<100; i++) {
x += y * i * 15;
}
the compiler would notice that y * 15 is invariant (its value doesn't change). So it would compute y * 15, stick the result in a register and change the loop statement to "x += r0 * i". This is kind of a contrived example, but you often see expressions like this when working with array indexes or any other base + offset type of situation.

hash function providing unique uint from an integer coordinate pair

The problem in general:
I have a big 2d point space, sparsely populated with dots.
Think of it as a big white canvas sprinkled with black dots.
I have to iterate over and search through these dots a lot.
The Canvas (point space) can be huge, bordering on the limits
of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal:
I need a hash function taking a 2D point, returning a unique uint32.
So that no collisions can occur. You can assume that the number of
dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand
(it may even change),
so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise:
A hash function that provides keys with a very low probability of collision.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
Like Emil, but handles 16-bit overflows in x in a way that produces fewer collisions, and takes fewer instructions to compute:
hash = ( y << 16 ) ^ x;
You can recursively divide your XY plane into cells, then divide these cells into sub-cells, etc.
Gustavo Niemeyer invented in 2008 his Geohash geocoding system.
Amazon's open source Geo Library computes the hash for any longitude-latitude coordinate. The resulting Geohash value is a 63 bit number. The probability of collision depends of the hash's resolution: if two objects are closer than the intrinsic resolution, the calculated hash will be identical.
Read more:
https://en.wikipedia.org/wiki/Geohash
https://aws.amazon.com/fr/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/
https://github.com/awslabs/dynamodb-geo
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
Perhaps?
hash = ((y & 0xFFFF) << 16) | (x & 0xFFFF);
Works as long as x and y can be stored as 16 bit integers. No idea about how many collisions this causes for larger integers, though. One idea might be to still use this scheme but combine it with a compression scheme, such as taking the modulus of 2^16.
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
You can do
a >= b ? a * a + a + b : a + b * b
taken from here.
That works for points in positive plane. If your coordinates can be in negative axis too, then you will have to do:
A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;
But to restrict the output to uint you will have to keep an upper bound for your inputs. and if so, then it turns out that you know the bounds. In other words in programming its impractical to write a function without having an idea on the integer type your inputs and output can be and if so there definitely will be a lower bound and upper bound for every integer type.
public uint GetHashCode(whatever a, whatever b)
{
if (a > ushort.MaxValue || b > ushort.MaxValue ||
a < ushort.MinValue || b < ushort.MinValue)
{
throw new ArgumentOutOfRangeException();
}
return (uint)(a * short.MaxValue + b); //very good space/speed efficiency
//or whatever your function is.
}
If you want output to be strictly uint for unknown range of inputs, then there will be reasonable amount of collisions depending upon that range. What I would suggest is to have a function that can overflow but unchecked. Emil's solution is great, in C#:
return unchecked((uint)((a & 0xffff) << 16 | (b & 0xffff)));
See Mapping two integers to one, in a unique and deterministic way for a plethora of options..
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ).
You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs