Does the Swift compiler use fusion to optimise code?
Let's say we want to write code to calculate the sum of the square roots of all positive numbers in a list. In Haskell you can write
sumOfSquareRoots xs = sum (allSquareRoots (filterPositives xs))
where
allSquareRoots [] = []
allSquareRoots (x:xs) = sqrt x : allSquareRoots xs
filterPositives []
= []
filterPositives (x:xs)
| x > 0 = x : filterPositives xs
| otherwise = filterPositives xs
This code is quite easy to read (the first line is very neat - almost English; the parts after the where are local.) This style also makes use of powerful built-in functions such as sum and we could make the other functions public and have them reused. So, good style.
However, we might be concerned that it is less efficient than having a one-pass function. (It pass through the list first to filterPositives then to get allSquareRoots of this and finally to sum this up.) Due to Haskell's, so-called, lazy evaluation execution strategy, however, the overhead is significantly less than in most other languages. Moreover, a good Haskell compiler can usually derive the one traversal version from the more elegant multiple traversal version using a process called fusion.
My question - does the Swift compiler deploy such optimisation strategies when compiling recursive functions?
Related
I build a program based on a rather complex mathematical algorithm. In this I want to account for vectors that have missing values, so NaN. Until now I implemented those by having two vectors - both implemented with breeze's DenseVector[Double]: a vector location which contains the actual values and a vector evidence where a 1.0 denotes that the value is there and a 0.0 that a value isn't. With that I can do thing like this:
val ones = DenseVector.ones[Double](one.evidence.length)
val derivedLocation = one.evidence :* one.location :+ ((ones :- one.evidence) :* two.evidence :* two.location)
Another example would be:
val firstnewvector = myothervector(evidence :== 1.0)
val secondnewvector = myothervector(evidence :== 0.0)
but I also have some other example where I do need 0 as a result not NaN:
def gradientAt: DenseVector[Double] =
(one.location - two.location) :* evidence :* othervalue
For the sake of argument this example has been simplified. I am thinking about dropping evidence and using NaN where there is no concrete value present, but I am not sure whether that is a good idea. I think it might already be more difficult to implement the above lines, wouldn't it? Also, I am not sure about performance. DenseVector is backed by an Array containing Java primitives and preventing slow auto-boxing if I am not mistaken. Using Double.NaN might require classes instead of primitives, and might slow the whole program down a little and cost more memory - is that right? (Speed and memory is a issue in general).
So: Is it a good idea in my case to use Double.NaN or considering 1) nice code and 2) performance (memory and speed)?
As far as I understand aggregate is a generalisation of fold which in turn is a generalisation of reduce.
Similarily combineByKey is a generalisation of aggregateByKey which in turn is a generalisation of foldByKey which in turn is a generalisation of reduceByKey.
However I have trouble finding simple examples for each of those seven methods which can in turn only be expressed by them and not their less general versions. For example I found http://blog.madhukaraphatak.com/spark-rdd-fold/ giving an example for fold, but I have been able to use reduce in the same situation as well.
What I found out so far:
I read that the more generalised methods can be more efficient, but that would be a non-functional requirement and I would like to get examples which can not be implemented with the more specific method.
I also read that e.g. the function passed to fold only has to be associative, while the one for reduce has to be commutative additionally: https://stackoverflow.com/a/25158790/4533188 (However, I still don't know any good simple example.) whereas in https://stackoverflow.com/a/26635928/4533188 I read that fold needs both properties to hold...
We could think of the zero value as a feature (e.g. for fold over reduce) as in "add all elements and add 3" and using 3 as the zero value, but that would be misleading, because 3 would be added for each partition, not just once. Also this is simply not the purpose of fold as far as I understood - it wasn't meant as a feature, but as a necessity to implement it to be able to take non-commutative functions.
What would simple examples for those seven methods be?
Let's work through what is actually needed logically.
First, note that if your collection is unordered, any set of (binary) operations on it need to be both commutative and associative, or you'll get different answers depending on which (arbitrary) order you choose each time. Since reduce, fold, and aggregate all use binary operations, if you use these things on a collection that is unordered (or is viewed as unordered), everything must be commutative and associative.
reduce is an implementation of the idea that if you can take two things and turn them into one thing, you can collapse an arbitrarily long collection into a single element. Associativity is exactly the property that it doesn't matter how you pair things up as long as you eventually pair them all and keep the left-to-right order unchanged, so that's exactly what you need.
a b c d a b c d a b c d
a # b c d a # b c d a b # c d
(a#b) c # d (a#b) # c d a (b#c) d
(a#b) # (c#d) ((a#b)#c) # d a # ((b#c)#d)
All of the above are the same as long as the operation (here called #) is associative. There is no reason to swap around which things go on the left and which go on the right, so the operation does not need to be commutative (addition is: a+b == b+a; concat is not: ab != ba).
reduce is mathematically simple and requires only an associative operation
Reduce is limited, though, in that it doesn't work on empty collections, and in that you can't change the type. If you're working sequentially, you can a function that takes a new type and the old type, and produces something with the new type. This is a sequential fold (left-fold if the new type goes on the left, right-fold if it goes on the right). There is no choice about the order of operations here, so commutativity and associativity and everything are irrelevant. There's exactly one way to work through your list sequentially. (If you want your left-fold and right-fold to always be the same, then the operation must be associative and commutative, but since left- and right-folds don't generally get accidentally swapped, this isn't very important to ensure.)
The problem comes when you want to work in parallel. You can't sequentially go through your collection; that's not parallel by definition! So you have to insert the new type at multiple places! Let's call our fold operation #, and we'll say that the new type goes on the left. Furthermore, we'll say that we always start with the same element, Z. Now we could do any of the following (and more):
a b c d a b c d a b c d
Z#a b c d Z#a b Z#c d Z#a Z#b Z#c Z#d
(Z#a) # b c d (Z#a) # b (Z#c) # d
((Z#a)#b) # c d
(((Z#a)#b)#c) # d
Now we have a collection of one or more things of the new type. (If the original collection was empty, we just take Z.) We know what to do with that! Reduce! So we make a reduce operation for our new type (let's call it $, and remember it has to be associative), and then we have aggregate:
a b c d a b c d a b c d
Z#a b c d Z#a b Z#c d Z#a Z#b Z#c Z#d
(Z#a) # b c d (Z#a) # b (Z#c) # d Z#a $ Z#b Z#c $ Z#d
((Z#a)#b) # c d ((Z#a)#b) $ ((Z#c)#d) ((Z#a)$(Z#b)) $ ((Z#c)$(Z#d))
(((Z#a)#b)#c) # d
Now, these things all look really different. How can we make sure that they end up to be the same? There is no single concept that describes this, but the Z# operation has to be zero-like and $ and # have to be homomorphic, in that we need (Z#a)#b == (Z#a)$(Z#b). That's the actual relationship that you need (and it is technically very similar to a semigroup homomorphism). There are all sorts of ways to pick badly even if everything is associative and commutative. For example, if Z is the double value 0.0 and # is actually +, then Z is zero-like and # is associative and commutative. But if $ is actually *, which is also associative and commutative, everything goes wrong:
(0.0+2) * (0.0+3) == 2.0 * 3.0 == 6.0
((0.0+2) + 3) == 2.0 + 3 == 5.0
One example of a non-trival aggregate is building a collection, where # is the "append an element" operator and $ is the "concat two collections" operation.
aggregate is tricky and requires an associative reduce operation, plus a zero-like value and a fold-like operation that is homomorphic to the reduce
The bottom line is that aggregate is not simply a generalization of reduce.
But there is a simplification (less general form) if you're not actually changing the type. If Z is actually z and is an actual zero, we can just stick it in wherever we want and use reduce. Again, we don't need commutativity conceptually; we just stick in one or more z's and reduce, and our # and $ operations can be the same thing, namely the original # we used on the reduce
a b c d () <- empty
z#a z#b z
z#a (z#b)#c
z#a ((z#b)#c)#d
(z#a)#((z#b)#c)#d
If we just delete the z's from here, it works perfectly well, and in fact is equivalent to if (empty) z else reduce. But there's another way it could work too. If the operation # is also commutative, and z is not actually a zero but just occupies a fixed point of # (meaning z#z == z but z#a is not necessarily just a), then you can run the same thing, and since commutivity lets you switch the order around, you conceptually can reorder all the z's together at the beginning, and then merge them all together.
And this is a parallel fold, which is really a rather different beast than a sequential fold.
(Note that neither fold nor aggregate are strictly generalizations of reduce even for unordered collections where operations have to be associative and commutative, as some operations do not have a sensible zero! For instance, reducing strings by shortest length has as its "zero" the longest possible string, which conceptually doesn't exist, and practically is an absurd waste of memory.)
fold requires an associative reduce operation plus either a zero value or a reduce operation that's commutative plus a fixed-point value
Now, when would you ever use a parallel fold that wasn't just a reduceOrElse(zero)? Probably never, actually, though they can exist. For example, if you have a ring, you often have fixed points of the type we need. For instance, 10 % 45 == (10*10) % 45, and * is both associative and commutative in integers mod 45. Thus, if our collection is numbers mod 45, we can fold with a "zero" of 10 and an operation of *, and parallelize however we please while still getting the same result. Pretty weird.
However, note that you can just plug the zero and operation of fold into aggregate and get exactly the same result, so aggregate is a proper generalization of fold.
So, bottom line:
Reduce requires only an associative merge operation, but doesn't change the type, and doesn't work on empty collecitons.
Parallel fold tries to extend reduce but requires a true zero, or a fixed point and the merge operation must be commutative.
Aggregate changes the type by (conceptually) running sequential folds followed by a (parallel) reduce, but there are complex relationships between the reduce operation and the fold operation--basically they have to be doing "the same thing".
An unordered collection (e.g. a set) always requires an associative and commutative operation for any of the above.
With regard to the byKey stuff: it's just the same as this, except it only applies it to the collection of values associated with a (potentially repeated) key.
If Spark actually requires commutativity where the above analysis does not suggest it's needed, one could reasonably consider that a bug (or at least an unnecessary limitation of the implementation, given that operations like map and filter preserve order on ordered RDDs).
the function passed to fold only has to be associative, while the one for reduce has to be commutative additionally.
It is not correct. fold on RDDs requires the function to be commutative as well. It is not the same operation as fold on Iterable what is pretty well described in the official documentation:
This behaves somewhat differently from fold operations implemented for non-distributed
collections in functional languages like Scala.
This fold operation may be applied to
partitions individually, and then fold those results into the final result, rather than
apply the fold to each element sequentially in some defined ordering. For functions
that are not commutative, the result may differ from that of a fold applied to a
non-distributed collection.
As you can see order of merging partial values is not part of the contract hence function which is used for fold has to be commutative.
I read that the more generalised methods can be more efficient
Technically speaking there should be no significant difference. For fold vs reduce you can check my answers to reduce() vs. fold() in Apache Spark and Why is the fold action necessary in Spark?
Regarding *byKey methods all are implemented using the same basic construct which is combineByKeyWithClassTag and can be reduced to three simple operations:
createCombiner - create "zero" value for a given partition
mergeValue - merge values into accumulator
mergeCombiners - merge accumulators created for each partition.
Is it safe to say that the following dichotomy holds:
Each given function is
either pure
or has side effects
If so, side effects (of a function) are anything that can't be found in a pure function.
This very much depends on the definitions that you choose. It is definitely fair to say that a function is pure or impure. A pure function always returns the same result and does not modify the environment. An impure function can return different results when it is executed repeatedly (which can be caused by doing something to the environment).
Are all impurities side-effects? I would not say so - a function can depend on something in the environment in which it executes. This could be reading some configuration, GPS location or reading data from the internet. These are not really "side-effects" because it does not do anything to the world.
I think there are two different kinds of impurities:
Output impurity is when a function does something to the world. In Haskell, this is modelled using monads - an impure function a -> b is actually a function a -> M b where M captures the other things that it does to the world.
Input impurity is when a function requires something from the environment. An impure function a -> b can be modelled as a function C a -> b where the type C captures other things from the environment that the function may access.
Monads and output impurities are certainly better known, but I think input impurities are equally important. I wrote my PhD thesis about input impurities which I call coeffects, so I this might be a biased answer though.
For a function to be pure it must:
not be affected by side-effects (i.e. always return same value for same arguments)
not cause side-effects
But, you see, this defines functional purity with the property or absence of side-effects. You are trying to apply backwards logic to deduct the definition of side-effects using pure functions, which logically should work, but practically the definition of a side-effect has nothing to do with functional purity.
I don't see problems with the definition of a pure function: a pure function is a function. I.e. it has a domain, a codomain and maps the elements of the former to the elements of the latter. It's defined on all inputs. It doesn't do anything to the environment, because "the environment" at this point doesn't exist: there are no machines that can execute (for some definition of "execute") a given function. There is just a total mapping from something to something.
Then some capitalist decides to invade the world of well-defined functions and enslave such pure creatures, but their fair essence can't survive in our cruel reality, functions become dirty and start to make the CPU warmer.
So it's the environment is responsible for making the CPU warmer and it makes perfect sense to talk about purity before its owner was abused and executed. In the same way referential transparency is a property of a language — it doesn't hold in the environment in general, because there can be a bug in the compiler or a meteorite can fall upon your head and your program will stop producing the same result.
But there are other creatures: the dark inhabitants of the underworld. They look like functions, but they are aware of the environment and can interact with it: read variables, send messages and launch missiles. We call these fallen relatives of functions "impure" or "effectful" and avoid as much as possible, because their nature is so dark that it's impossible to reason about them.
So there is clearly a big difference between those functions which can interact with the outside and those which doesn't. However the definition of "outside" can vary too. The State monad is modeled using only pure tools, but we think about f : Int -> State Int Int as about effectful computation. Moreover, non-termination and exceptions (error "...") are effects, but haskellers usually don't consider them so.
Summarizing, a pure function is a well-defined mathematical concept, but we usually consider functions in programming languages and what is pure there depends on your point of view, so it doesn't make much sense to talk about dichotomies when involved concepts are not well-defined.
A way to define purity of a function f is ∀x∀y x = y ⇒ f x = f y, i.e. given the same argument the function returns the same result, or it preserves equality.
This isn't what people usually mean when they talk about "pure functions"; they usually mean "pure" as "does not have side effects". I haven't figured out how to qualify a "side effect" (comments welcome!) so I don't have anything to say about it.
Nonetheless, I'll explore this concept of purity because it might offer some related insight. I'm no expert here; this is mostly me just rambling. I do however hope it sparks some insightful (and corrective!) comments.
To understand purity we have to know what equality we are talking about. What does x = y mean, and what does f x = f y mean?
One choice is the Haskell semantic equality. That is, equality of the semantics Haskell assigns to its terms. As far as I know there are no official denotational semantics for Haskell, but Wikibooks Haskell Denotational Semantics offers a reasonable standard that I think the community more or less agrees to ad-hoc. When Haskell says its functions are pure this is the equality it refers to.
Another choice is a user-defined equality (i.e. (==)) by deriving the Eq class. This is relevant when using denotational design — that is, we are assigning our own semantics to terms. With this choice we can accidentally write functions which are impure; Haskell is not concerned with our semantics.
I will refer to the Haskell semantic equality as = and the user-defined equality as ==. Also I assume that == is an equality relation — this does not hold for some instances of == such as for Float.
When I use x == y as a proposition what I really mean is x == y = True ∨ x == y = ⊥, because x == y :: Bool and ⊥ :: Bool. In other words, when I say x == y is true, I mean that if it computes to something other than ⊥ then it computes to True.
If x and y are equal according to Haskell's semantics then they are equal according to any other semantic we may choose.
Proof: if x = y then x == y ≡ x == x and x == x is true because == is pure (according to =) and reflexive.
Similarly we can prove ∀f∀x∀y x = y ⇒ f x == f y. If x = y then f x = f y (because f is pure), therefore f x == f y ≡ f x == f x and f x == f x is true because == is pure and reflexive.
Here is a silly example of how we can break purity for a user-defined equality.
data Pair a = Pair a a
instance (Eq a) => Eq (Pair a) where
Pair x _ == Pair y _ = x == y
swap :: Pair a -> Pair a
swap (Pair x y) = Pair y x
Now we have:
Pair 0 1 == Pair 0 2
But:
swap (Pair 0 1) /= swap (Pair 0 2)
Note: ¬(Pair 0 1 = Pair 0 2) so we were not guaranteed that our definition of (==) would be okay.
A more compelling example is to consider Data.Set. If x, y, z :: Set A then you would hope this holds, for example:
x == y ⇒ (Set.union z) x == (Set.union z) y
Especially when Set.fromList [1,2,3] and Set.fromList [3,2,1] denote the same set but probably have different (hidden) representations (not equivalent by Haskell's semantics). That is to say we want to be sure that ∀z Set.union z is pure according to (==) for Set.
Here is a type I have played with:
newtype Spry a = Spry [a]
instance (Eq a) => Eq (Spry a) where
Spry xs == Spry ys = fmap head (group xs) == fmap head (group ys)
A Spry is a list which has non-equal adjacent elements. Examples:
Spry [] == Spry []
Spry [1,1] == Spry [1]
Spry [1,2,2,2,1,1,2] == Spry [1,2,1,2]
Given this, what is a pure implementation (according to == for Spry) for flatten :: Spry (Spry a) -> Spry a such that if x is an element of a sub-spry it is also an element of the flattened spry (i.e. something like ∀x∀xs∀i x ∈ xs[i] ⇒ x ∈ flatten xs)? Exercise for the reader.
It is also worth noting that the functions we've been talking about are across the same domain, so they have type A → A. That is except for when we proved ∀f∀x∀y x = y ⇒ f x == f y which crosses from Haskell's semantic domain to our own. This might be a homomorphism of some sorts… maybe a category theorist could weigh in here (and please do!).
Side effects are part of the definition of the language. In the expression
f e
the side effects of e are all the parts of e's behavior that are 'moved out' and become part of the behavior of the application expression, rather than being passed into f as part of the value of e.
For a concrete example, consider this program:
f x = x; x
f (print 3)
where conceptually the syntax x; x means 'run x, then run it again and return the result'.
In a language where print writes to stdout as a side effect, this writes
3
because the output is part of the semantics of the application expression.
In a language where the output of print is not a side effect, this writes
3
3
because the output is part of the semantics of the x variable inside the definition of f.
The following Scala code complete in 1.5 minutes while the equivalent code in GO finish in 2.5 minutes.
Up to fib(40) both take 2 sec. The gap appear in fib(50)
I got the impression the GO, being native, should be faster then Scala.
Scala
def fib(n:Int):Long = {
n match {
case 0 => 0
case 1 => 1
case _ => fib(n-1) + fib(n-2)
}
}
GO
func fib(n int) (ret int) {
if n > 1 {
return fib(n-1) + fib(n-2)
}
return n
}
Scala optimization?
Golang limitation?
As "My other car is a cadr" said the question is "how come Scala is faster than GO in this particular microbenchmark?"
Forget the Fibonacci lets say I do have a function that require recursion.
Is Scala superior in recursion situations?
Its probably an internal compiler implementation or even Scala specific optimization.
Please answer just if you know.
Go in loop run 15000000000 in 12 sec
func fib(n int) (two int) {
one := 0
two = 1
for i := 1; i != n; i++ {
one, two = two, (one + two)
}
return
}
For Go, use iteration not recursion. Recursion can be replaced by iteration with an explicit stack. It avoids the overhead of function calls and call stack management. For example, using iteration and increasing n from 50 to 1000 takes almost no time:
package main
import "fmt"
func fib(n int) (f int64) {
if n < 0 {
n = 0
}
a, b := int64(0), int64(1)
for i := 0; i < n; i++ {
f = a
a, b = b, a+b
}
return
}
func main() {
n := 1000
fmt.Println(n, fib(n))
}
Output:
$ time .fib
1000 8261794739546030242
real 0m0.001s
user 0m0.000s
sys 0m0.000s
Use appropriate algorithms. Avoid exponential time complexity. Don't use recursion for Fibonacci numbers when performance is important.
Reference:
Recursive Algorithms in Computer Science Courses: Fibonacci Numbers
and Binomial Coefficients
We observe that the computational inefficiency of branched recursive
functions was not appropriately covered in almost all textbooks for
computer science courses in the first three years of the curriculum.
Fibonacci numbers and binomial coefficients were frequently used as
examples of branched recursive functions. However, their exponential
time complexity was rarely claimed and never completely proved in the
textbooks. Alternative linear time iterative solutions were rarely
mentioned. We give very simple proofs that these recursive functions
have exponential time complexity.
Recursion is an efficient technique for definitions and algorithms
that make only one recursive call, but can be extremely inefficient if
it makes two or more recursive calls. Thus the recursive approach is
frequently more useful as a conceptual tool rather than as an
efficient computational tool. The proofs presented in this paper were
successfully taught (over a five-year period) to first year students
at the University of Ottawa. It is suggested that recursion as a
problem solving and defining tool be covered in the second part of the
first computer science course. However, recursive programming should
be postponed for the end of the course (or perhaps better at the
beginning of the second computer science course), after iterative
programs are well mastered and stack operation well understood.
The Scala solution will consume stack, since it's not tail recursive (the addition happens after the recursive call), but it shouldn't be creating any garbage at all.
Most likely whichever Hotspot compiler you're using (probably server) is just a better compiler, for this code pattern, than the Go compiler.
If you're really curious, you can download a debug build of the JVM, and have it print out the assembly code.
I'm interested in multi-level data integrity checking and correcting. Where multiple error correcting codes are being used (they can be 2 of the same type of codes). I'm under the impression that a system using 2 codes would achieve maximum effectiveness if the 2 hash codes being used were orthogonal to each other.
Is there a list of which codes are orthogonal to what? Or do you need to use the same hashing function but with different parameters or usage?
I expect that the first level ecc will be a reed-solomon code, though I do not actually have control over this first function, hence I cannot use a single code with improved capabilities.
Note that I'm not concerned with encryption security.
Edit: This is not a duplicate of
When are hash functions orthogonal to each other? due to it essentially asking what the definition of orthogonal hash functions are. I want examples of which hash functions that are orthogonal.
I'm not certain it is even possible to enumerate all orthogonal hash functions. However, you only asked for some examples, so I will endeavour to provide some as well as some intuition as to what properties seem to lead to orthogonal hash functions.
From a related question, these two functions are orthogonal to each other:
Domain: Reals --> Codomain: Reals
f(x) = x + 1
g(x) = x + 2
This is a pretty obvious case. It is easier to determine orthogonality if the hash functions are (both) perfect hash functions such as these are. Please note that the term "perfect" is meant in the mathematical sense, not in the sense that these should ever be used as hash functions.
It is a more or less trivial case for perfect hash functions to satisfy orthogonality requirements. Whenever the functions are injective they are perfect hash functions and are thus orthogonal. Similar examples:
Domain: Integers --> Codomain: Integers
f(x) = 2x
g(x) = 3x
In the previous case, this is an injective function but not bijective as there is exactly one element in the codomain mapped to by each element in the domain, but there are many elements in the codomain that are not mapped to at all. These are still adequate for both perfect hashing and orthogonality. (Note that if the Domain/Codomain were Reals, this would be a bijection.)
Functions that are not injective are more tricky to analyze. However, it is always the case that if one function is injective and the other is not, they are not orthogonal:
Domain: Reals --> Codomain: Reals
f(x) = e^x // Injective -- every x produces a unique value
g(x) = x^2 // Not injective -- every number other than 0 can be produced by two different x's
So one trick is thus to know that one function is injective and the other is not. But what if neither is injective? I do not presently know of an algorithm for the general case that will determine this other than brute force.
Domain: Naturals --> Codomain: Naturals
j(x) = ceil(sqrt(x))
k(x) = ceil(x / 2)
Neither function is injective, in this case because of the presence of two obvious non-injective functions: ceil and abs combined with a restricted domain. (In practice most hash functions will not have a domain more permissive than integers.) Testing out values will show that j will have non-unique results when k will not and vice versa:
j(1) = ceil(sqrt(1)) = ceil(1) = 1
j(2) = ceil(sqrt(2)) = ceil(~1.41) = 2
k(1) = ceil(x / 2) = ceil(0.5) = 1
k(2) = ceil(x / 2) = ceil(1) = 1
But what about these functions?
Domain: Integers --> Codomain: Reals
m(x) = cos(x^3) % 117
n(x) = ceil(e^x)
In these cases, neither of the functions are injective (due to the modulus and the ceil) but when do they have a collision? More importantly, for what tuples of values of x do they both have a collision? These questions are hard to answer. I would suspect they are not orthogonal, but without a specific counterexample, I'm not sure I could prove that.
These are not the only hash functions you could encounter, of course. So the trick to determining orthogonality is first to see if they are both injective. If so, they are orthogonal. Second, see if exactly one is injective. If so, they are not orthogonal. Third, see if you can see the pieces of the function that are causing them to not be injective, see if you can determine its period or special cases (such as x=0) and try to come up with counter-examples. Fourth, visit math-stack-exchange and hope someone can tell you where they break orthogonality, or prove that they don't.