Spark Scala compiler not complaining about double vs. triple equals - scala

I get a compiler error if I try this
df.filter($"foo" == lit(0))
forgetting that I need a triple equals in Spark.
However, if I do this, I get the wrong answer but no error:
df.filter($"foo".between(baz, quux) || $"foo" == lit(0))
Can someone explain why compile-time checks help me in the first case, but not the second?

Because $"foo" == lit(0) is always evaluated as Boolean = false.
So in the first case, you trying to call method filter by passing a Boolean whereas it expects a string expression or column expression. Thus you get an error.
Now in the second, case:
$"foo".between(baz, quux) || $"foo" == lit(0) is evaluated as:
(((foo >= baz) AND (foo <= quux)) OR false)
which is accepted beacause you doing an OR || between a column expression ($"foo".between(baz, quux)) and a literal boolean false.
In other words, it is interpreted as $"foo".between(baz, quux) || lit(false)

Related

If Statement Scala behaving Strangely

if statment in scala is behaving strangely
scala> val a = 10
a: Int = 10
scala> if (a > 10) 1
res10: AnyVal = ()
scala> if (a <= 10) 1
res12: AnyVal = 1
Why didn't we get return type as Int? Why we got AnyVal
if statment in scala is behaving strangely
Scala doesn't have an if statement. In fact, Scala has no statements at all. Scala has an if expression, or more precisely, a conditional expression.
In fact, if Scala had an if statement, then your question wouldn't make sense, because statements have no value and thus no type.
Why didn't we get return type as Int? Why we got AnyVal
As the documentation says:
The conditional expression if (e1) e2 else e3 chooses one of the values of e2 and e3, depending on the value of e1. […] The type of the conditional expression is the weak least upper bound of the types of e2 and e3.
This makes sense: the conditional can be either true or false, so the value of the conditional expression is either the "then" part or the else part. Since the value can be either the "then" part or the else part, the type of the expression obviously must be compatible with both.
In your case, the value of the "then" part is 1 whose type is Int and the value of the else part is () whose type is Unit:
A short form of the conditional expression eliminates the else-part. The conditional expression if (e1) e2 is evaluated as if it was if (e1) e2 else ().
Therefore, whatever the type of the whole expression is, it must be compatible with both Int and Unit. And the most-precise possible type that is compatible with both Int and Unit is AnyVal:[Source: Scala 2.13 Language Specification – Section 12 The Scala Standard Library]
As you can see in the left half of the tree, the weak-LUB of Int and Unit is obviously AnyVal.

Why has !== lower precedence than === in Scala?

Following code does not compile:
implicit class TripleEq(val x: Int) {
def === (y: Int) = x == y
def !== (y: Int) = x != y
}
val a = 0
val b = 1
if (a == a && b === b) {
println("Equal")
}
if (a != b && a !== b) {
println("Not equal")
}
The error is:
type mismatch;
found : Int
required: Boolean
The error goes away when I enclose the a !== b in parentheses.
I thought that operator precedence is defined by its first letter (see Tour of Scala) and therefore the precedence of !== should be the same as of ===, != or ==.
Why are parentheses required in the code above?
The answer is in the language specification of Assignment Operators:
There's one exception to this rule, which concerns assignment operators. The precedence of an assignment operator is the same as the one of simple assignment (=). That is, it is lower than the precedence of any other operator.
6.12.14 Assignment Operators
An assignment operator is an operator symbol (syntax category op in Identifiers) that ends in an equals character “=”, with the exception of operators for which one of the following conditions holds:
the operator also starts with an equals character, or
the operator is one of (<=), (>=), (!=).
Based on these rules !== is considered to be an assignment operator, while === is not.

Why does Slick require using three equal signs (===) for comparison?

I was reading through coming from SQL to Slick and it states to use === instead of == for comparison.
For example,
people.filter(p => p.age >= 18 && p.name === "C. Vogt").run
What is the difference between == and ===, and why is the latter used here?
== is defined on Any in Scala. Slick can't overload it for Column[...] types like it can for other operators. That's why slick needs a custom operator for equality. We chose === just like several other libraries, such as scalatest, scalaz, etc.
a == b will lead to true or false. It's a client-side comparison. a === b will lead to an object of type Column[Boolean], with an instance of Library.Equals(a,b) behind it, which Slick will compile to a server-side comparison using the SQL "a = b" (where a and b are replaced by the expressions a and b stand for).
== calls for equals, === is a custom defined method in slick which is used for column comparison:
def === [P2, R](e: Column[P2])(implicit om: o#arg[B1, P2]#to[Boolean, R]) =
om.column(Library.==, n, e.toNode)
The problem of using == for objects is this (from this question):
Default implementation of equals() class provided by java.lang.Object compares memory location and only return true if two reference variable are pointing to same memory location i.e. essentially they are same object.
What this means is that two variables must point to the same object to be equal, example:
scala> class A
defined class A
scala> new A
res0: A = A#4e931efa
scala> new A
res1: A = A#465670b4
scala> res0 == res1
res2: Boolean = false
scala> val res2 = res0
res2: A = A#4e931efa
scala> res2 == res0
res4: Boolean = true
In the first case == returns false because res0 and res1 point to two different objects, in the second case res2 is equal to res0 because they point to the same object.
In Slick columns are abstracted in objects, so having column1 == column2 is not what you are looking for, you want to check equality for the value a column hold and not if they point to the same object. Slick then probably translates that === in a value equality in the AST (Library.== is a SqlOperator("="), n is the left hand side column and e the right hand side), but Christopher can explain that better than me.
'==' compare value only and result in Boolean 'True' & 'False
'===' compare completely (i.e. compare value with its data types) and result in column
example
1=='1' True
1==='1' False

"true && E" returns "E" in Scala?

In the course on Scala at Coursera (lecture 1.4, around 3 mins), Martin Odersky says that the expression true && e always returns e (e is any object). And the expression false || e also returns e. He explains that sometimes the last expression is not always evaluated.
But when I run these expressions I get error: type mismatch.
For true && 5 I get found: Int(5); required: Boolean
Has Scala evolved in recent times or what am I doing wrong?
e stands for boolean expression.
Predicate: a boolean expression to be evaluated e.g. (x >= 4), (x != 0), etc
see https://sites.google.com/a/stonybrook.edu/functional-programming-scala/lecture-1-4
From the Scala Reference book, paragraph 6.16 Conditional expressions, given:
if (e1) e2 else e3
The condition e1 is expected to conform to type Boolean. The then-part
e2 and the else-part e3 are both expected to conform to the expected
type of the conditional expression. The type of the conditional
expression is the weak least upper bound (§3.5.3) of the types of e2
and e3.

&& and || in Scala

since normal operators like +, ::, -> and so on are all methods which can be overloaded and stuff I was wondering if || and && are methods as well. This could theoretically work if this were methods in of the boolean object.
But if they are, why is something like
if (foo == bar && buz == fol)
possible? If the compiler reads from right to left this would invoke && on bar instead of (foo == bar)
6.12.3 Infix Operations An infix operator can be an arbitrary
identifier. Infix operators have
precedence and associativity defined
as follows:
The precedence of an infix
operator is determined by the
operator’s first character. Characters
are listed below in increasing order
of precedence, with characters on the
same line having the same precedence.
(all letters)
|
^
&
< >
= !
:
+ -
* / %
(all other special characters)
That is, operators starting with a
letter have lowest precedence,
followed by operators starting with
‘|’, etc.
There’s one exception to
this rule, which concerns assignment
operators(§6.12.4). The precedence of
an assigment operator is the same as
the one of simple assignment (=). That
is, it is lower than the precedence of
any other operator.
It follows with an explanation of associativity, and how it all combines in a expression with multiple operators. The Scala Reference makes good reading.
Because method starts with = has a higher precedence than method starts with &.
So (foo == bar && buz == fol) will become something like the following:
val tmp1: Boolean = (foo == bar)
val tmp2: Boolean = (buz == fol)
tmp1 && tmp2
Those two are definitely methods in scala.