Binary Tree Sets - Recursive Union - scala

So i am taking the scala course in coursera and I encountered something I cannot understand in a solution to the week 3 assignment regarding binary tree sets.
I was requested to write a union function recursively and the following is a working solution:
class Empty extends TweetSet:
override def union(that: TweetSet): TweetSet = that
def incl(tweet: Tweet): TweetSet = NonEmpty(tweet, new Empty, new Empty)
class NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet) extends TweetSet:
override def union(that: TweetSet): TweetSet =
left.union(right.union(that)).incl(elem)
def incl(x: Tweet): TweetSet =
if x.text < elem.text then
NonEmpty(elem, left.incl(x), right)
else if elem.text < x.text then
NonEmpty(elem, left, right.incl(x))
else
this
Initially I tried the following code for union function and it didn't terminate:
override def union(that: TweetSet): TweetSet =
that.union(left.union(right)).incl(elem)
My question is - why?
I saw few examples and questions about this in previous threads, and I did managed to understand the first solution through step-by-step cases. But why do switching right\left with 'that' makes the program not terminate? (for clearance - swapping between left and right themselves changes nothing, only with 'that')
Thanks!
EDIT: Inputs are of reasonable size ofc.

Related

Why is the return type of a method, a class name in the below code of scala?

abstract class IntSet {
def contains(x: Int): Boolean
def incl(x: Int): IntSet
}
I'm learning Scala. Why is the return type of method incl a class name which is IntSet? What does this mean?
Usually logic and functional programming languages are declarative (not per se enforced, but in most cases, a declarative style is advised). It means you do not alter a datastructure: you construct a new one that is a modified original one. So in a declarative language when you have an IntSet and you add an element, you do not add it to that IntSet, you construct a new IntSet that has all elements of the original IntSet and the one you add x.
So the reason why incl returns an IntSet is because the original IntSet is not modified, it constructs a new one.
Usually when a new IntSet is constructed, it will of course not construct a complete new object. Since the orignal IntSet is not supposed to change, it can use sub-structures of the original IntSet. Therefore declarative languages usually have their own set of dedicated data-structures, like for instance a finger tree.
Take for instance this implementation of an IntSet. What we see is:
object Empty extends IntSet {
override def toString = "."
def contains(x: Int): Boolean = false
def include(x: Int): IntSet = new NonEmpty(x, Empty, Empty)
def union(other: IntSet): IntSet = other
}
So when we use the Empty IntSet object, and we include an element x, instead of altering the object, we construct a new IntSet: NonEmpty(x,Empty,Empty). This IntSet contains x.

Tracing recursive calls in OOP

I am somewhat comfortable with tracing/understanding recursive calls in functional languages like Javascript, Haskell, recently I am taking course in Scala and at present the course is relying heavily on recursion.
Here is the simple example:
abstract class IntSet {
def incl(x: Int): IntSet
def contains(x: Int): Boolean
def union(other: IntSet): IntSet
}
class Empty extends IntSet {
def contains(x: Int): Boolean = false
def incl(x: Int): IntSet = new NonEmpty(x, new Empty, new Empty)
def union(other: IntSet): IntSet = other
}
class NonEmpty(elem: Int, left: IntSet, right: IntSet) extends IntSet {
def contains(x: Int): Boolean =
if (x < elem) left contains x
else if (x > elem) right contains x
else true
def incl(x: Int): IntSet =
if (x < elem) new NonEmpty(elem, left incl x, right)
else if (x > elem) new NonEmpty(elem, left, right incl x)
else this
def union(other: IntSet): IntSet =
((left union right) union other)incl(elem)
}
Although intuitively recursion seems understandable but I am having hard time expanding the general case whereas the base case feels perfectly fine.
((left union right) union other)incl(elem)
mainly because of the left reference context which are not there in the normal functional languages. How can I make myself comfortable while working and understanding with these recursive calls?
Update
Based on the answers I think below would be the sequence of calls to expand the recursion tree.
incl(union(union(left, right), other), elem)
incl(union(incl(union(union(left, right), other), elem), other), elem)
But I think it would become too hairy very soon, is there any pictorial alternative or modal to understand this?
Union is a tricky method to understand here, it's true. When I first saw it, it seemed peculiar that it worked at all, seeing as it seems to be doing very little. It does help to draw out two small trees and trace through it to see how the whole thing works, but essentially, it's no different than any other recursion. Most of the work is done by incl, which is actually adding every element to the int set, one at a time. All union does is recursively break down the problem over and over again, until you get to an empty int set. Then one by one incl method adds every element back into the set, building it up to a set that includes all the elements in the original int set and the other int set.
Odersky's course is great by the way. Finished it myself a few months ago.

Scala Higher order functions in details

I am learning Scala Higher Order Functions. I am studying an example that is a class; there is one method that receives a function and a value parameter and returns a value. The function is p: Tweet => Boolean and the method implementation is further below. I want to know where is the implementation of the p function.
class NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet) extends TweetSet {
def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = {
if (p(elem)) {
left.filterAcc(p, acc.incl(elem))
right.filterAcc(p, acc.incl(elem))
} else {
left.filterAcc(p, acc)
right.filterAcc(p, acc)
}
}
I wonder to know where is the implementation of the p function
If you go further down in the class definition, you'll see one of the implementations of p in union
def union(that: TweetSet): TweetSet = {
this.filterAcc(elem => true, that)
}
With Higher Order Functions, the caller of the method is the one in charge of providing an implementation of the function that he wishes to run. You can take a look at common use cases such as map, flatMap, filter, etc on Scalas collection library.

Scala generics; Why do I get 'Type mismatch, expected: T, actual T'

I'm working through the Coursera course, Functional Programming Principles in Scala. I'm taking the IntSet example from week3 and attempting to make it use generics. Generics have only been covered very briefly at this point, so I'm probably doing something obviously wrong, but it is not clear to me. I started by make T <: Comparable, but this I found Ordered and so I'm attempting to require values in the set to be ordered. The issue is that I get a 'Type mismatch, expected: T, actual T' error in a couple of places; I've commented the lines in the source below. That is a strange error; it found the expected type but that is an error? Note: This is NOT an assignment so I am not asking anyone to take a test for me. I just wanted to go back and make the Set type more useful and I ran into this unexpected behaviorenter code here. Thank you for your help.
package week3
trait Set[T <: Ordered]
{
def isEmpty: Boolean
def contains(i: T): Boolean
def include(i: T): Set[T]
def union(that: Set[T]): Set[T]
}
/**
* empty set
*/
class EmptySet[T <: Ordered] extends Set[T]
{
def isEmpty = true;
def contains(i: T): Boolean = false
def include(i: T): Set[T] =
new TreeSet(i, new EmptySet[T], new EmptySet[T])
def union(that: Set[T]): Set[T] = that
override def toString() = "{}"
}
/**
* Immutable set
*
* #param value
* #param left
* #param right
*/
class TreeSet[T <: Ordered] (value: T, left: Set[T], right: Set[T]) extends Set[T]
{
def isEmpty = false;
def this(v: T) = this(v, new EmptySet[T], new EmptySet[T])
def contains(v: T): Boolean =
{
if(v < value) left.contains(v) // Type mismatch, expected: T, actual T
else if(v > value) right.contains(v) // Type mismatch, expected: T, actual T
else true
}
def include(v: T): Set[T] =
{
if(v < value) new TreeSet(value, left.include(v), right) // Type mismatch, expected: T, actual T
else if(v > value) new TreeSet(value, left, right.include(v)) // Type mismatch, expected: T, actual T
else this
}
def union(that: Set[T]): Set[T] =
{
if(that.isEmpty) this
else if(that == this) this
else ((left union right) union that) include value
}
override def toString() = "{ " + left.toString + ' ' + value + ' ' + right.toString + " }"
}
Ordered has parameter type too. You should use it as Ordered[T] to fix it. And Set type should be T <: Ordered[T]. It's not Scala's problem. It's correct way of usage java's Comparable interface.

Apache Spark: distinct doesnt work?

Here is my code example:
case class Person(name:String,tel:String){
def equals(that:Person):Boolean = that.name == this.name && this.tel == that.tel}
val persons = Array(Person("peter","139"),Person("peter","139"),Person("john","111"))
sc.parallelize(persons).distinct.collect
It returns
res34: Array[Person] = Array(Person(john,111), Person(peter,139), Person(peter,139))
Why distinct doesn't work?I want the result as Person("john",111),Person("peter",139)
Extending further from the observation of #aaronman, there is a workaround for this issue.
On the RDD, there're two definitions for distinct:
/**
* Return a new RDD containing the distinct elements in this RDD.
*/
def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] =
map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
/**
* Return a new RDD containing the distinct elements in this RDD.
*/
def distinct(): RDD[T] = distinct(partitions.size)
It's apparent from the signature of the first distinct that there must be an implicit ordering of the elements and it's assumed null if absent, which is what the short version .distinct() does.
There's no default implicit ordering for case classes, but it's easy to implement one:
case class Person(name:String,tel:String) extends Ordered[Person] {
def compare(that: Person): Int = this.name compare that.name
}
Now, trying the same example delivers the expected results (note that I'm comparing names):
val ps5 = Array(Person("peter","138"),Person("peter","55"),Person("john","138"))
sc.parallelize(ps5).distinct.collect
res: Array[P5] = Array(P5(john,111), P5(peter,139))
Note that case classes already implement equals and hashCode, so the impl on the provided example is unnecessary and also incorrect. The correct signature for equals is: equals(arg0: Any): Boolean -- BTW, I first thought that the issue had to do with the incorrect equals signature, which sent me looking in the wrong path.
For me the problem was related to object equality, as mentioned by Martin Odersky in Programming in Scala (chapter 30), although I have a normal class (not a case class). For a correct equality test, you must re-define (override) hashCode() if you have a custom equals(). Also you need to have a canEqual() method for 100% correctness. I haven't looked at the implementation details of an RDD, but since it is a collection, probably it uses some complex/parallel variation of a HashSet or other hash-based data structure for comparing objects and ensuring distinctness.
Declaring hashSet(), equals(), canEqual(), and compare() methods solved my problem:
override def hashCode(): Int = {
41 * (41 + name.hashCode) + tel.hashCode
}
override def equals(other: Any) = other match {
case other: Person =>
(other canEqual this) &&
(this.name == other.name) && (this.tel == other.tel)
case _ =>
false
}
def canEqual(other: Any) = other.isInstanceOf[Person]
def compare(that: Person): Int = {
this.name compare that.name
}
As others have pointed out this is a bug in spark 1.0.0. My theory as to where it is coming from is that if you look at the diff of 1.0.0 to 9.0 you see
- def repartition(numPartitions: Int): RDD[T] = {
+ def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = {
And if you run
case class A(i:Int)
implicitly[Ordering[A]]
You get an error
<console>:13: error: No implicit Ordering defined for A.
implicitly[Ordering[A]]
So I think the workaround is define an implicit ordering for a the case class, unfortunately I'm not a scala expert but this answer seems to do it correctly