Why doesn't "((left union right) union other)" behave associatively? - scala

The code in the following gist is lifted almost verbatim out of a lecture in Martin Odersky's Functional Programming Principles in Scala course on Coursera:
The issue occurs in line 38, within the definition of union in class NonEmpty:
def union(other: IntSet): IntSet =
// The following expression doesn't behave associatively
((left union right) union other) incl elem
With the given expression, ((left union right) union other), largeSet.union(Empty) takes an inordinate amount of time to complete with sets with 100 elements or more.
When that expression is changed to (left union (right union other)), then the union operation finishes relatively instantly.
ADDED: Here's an updated worksheet that shows how even with larger sets/trees with random elements, the expression ((left ∪ right) ∪ other) can take forever but (left ∪ (right ∪ other)) will finish instantly.

The answer to your question is very much connected to Relational databases - and the smart choices they make. When a database "unions" tables - a smart controller system will make some decisions around things like "How large is Table A? Would it make more sense to Join A & B first, or A & C when the user writes:
A Join B Join C
Anyhow, you can't expect the same behavior when you are writing the code by hand - because you have specified the order you want exactly, using parenthesis. None of those smart decisions can happen automatically. (Though in theory they could, and that's why Oracle ,Teradata, mySql exist)
Consider a ridiculously large example:
Set A - 1 Billion Records
Set B - 500 Million Records
Set C - 10 Records
For arguments sake assume that the union operator takes O(N) records by the SMALLEST of the 2 sets being joined. This is reasonable, each key can be looked up in the other as a hashed retrieval:
A & B runtime = O(N) runtime = 500 Million
(let's assume the class is just smart enough to use the smaller of the two for lookups)
(A & B) & C
Results in:
O(N) 500 million + O(N) 10 = 500,000,010 comparisons
Again pointing to the fact that it was forced to compare 1 Billion records to 500 Million records FIRST, per inner parenthesis, then - pull in 10 more.
But consider this:
A & (B & C)
Well now something amazing happens:
(B & C) runtime O(N) = 10 record comparisons (each of the 10 C records is checked against B for existence)
A & (result) = O(N) = 10
Total = 20 comparisons
Notice that once (B & C) was completed, we only had to bump 10 records against 1 billion!
Both examples produces the exact same result; one in O(N) = 20 runtime, the other in 500,000,010 !
To summarize, this problem illustrates in just a small way some of the complex thinking that goes into database design and the smart optimization that happens in that software. These things do not always happen automatically in programming languages unless you've coded them that way, or by using a library of some sorts. You could for example write a function that takes several sets and intelligently decides the union order. But, the issue becomes unbelievable complex if other set operations have to be mixed in. Hope this helps.

Associativity is not about performance. Two expressions may be equivalent by associativity but one may be vastly harder than the other to actually compute:
(23 * (14/2)) * (1/7)
Is the same as
23 * ((14/2) * (1/7))
But if it were me evaluating the two, I'd reach the answer (23) in a jiffy with the second one, but take longer if I forced myself to work with just the first one.


