Override equality for floating point values in Scala - scala

Note: Bear with me, I'm not asking how to override equals or how to create a custom method to compare floating point values.
Scala is very nice in allowing comparison of objects by value, and by providing a series of tools to do so with little code. In particular, case classes, tuples and allowing comparison of entire collections.
I've often call methods that do intensive computations and generate o non-trivial data structure to return and I can then write a unit test that given a certain input will call the method and then compare the results against a hardcoded value. For instance:
def compute() =
{
// do a lot of computations here to produce the set below...
Set(('a', 1), ('b', 3))
}
val A = compute()
val equal = A == Set(('a', 1), ('b', 3))
// equal = true
This is a bare-bones example and I'm omitting here any code from specific test libraries, etc.
Given that floating point values are not reliably compared with equals, the following, and rather equivalent example, fails:
def compute() =
{
// do a lot of computations here to produce the set below...
Set(('a', 1.0/3.0), ('b', 3.1))
}
val A = compute()
val equal2 = A == Set(('a', 0.33333), ('b', 3.1)) // Use some arbitrary precision here
// equal2 = false
What I would want is to have a way to make all floating-point comparisons in that call to use an arbitrary level of precision. But note that I don't control (or want to alter in any way) either Set or Double.
I tried defining an implicit conversion from double to a new class and then overloading that class to return true. I could then use instances of that class in my hardcoded validations.
implicit class DoubleAprox(d: Double)
{
override def hashCode = d.hashCode()
override def equals(other : Any) : Boolean = other match {
case that : Double => (d - that).abs < 1e-5
case _ => false
}
}
val equals3 = DoubleAprox(1.0/3.0) == 0.33333 // true
val equals4 = 1.33333 == DoubleAprox(1.0/3.0) // false
But as you can see, it breaks symmetry. Given that I'm then comparing more complex data-structures (sets, tuples, case classes), I have no way to define a priori if equals() will be called on the left or the right. Seems like I'm bound to traverse all the structures and then do single floating-point comparisons on the branches... So, the question is: is there any way to do this at all??
As a side note: I gave a good read to an entire chapter on object equality and several blogs, but they only provides solutions for inheritance problems and requires you to basically own all classes involved and change all of them. And all of it seems rather convoluted given what it is trying to solve.
Seems to me that equality is one of those things that is fundamentally broken in Java due to the method having to be added to each class and permanently overridden time and again. What seems more intuitive to me would be to have comparison methods that the compiler can find. Say, you would provide equals(DoubleAprox, Double) and it would be used every time you want to compare 2 objects of those classes.

I think that changing the meaning of equality to mean anything fuzzy is a bad idea. See my comments in Equals for case class with floating point fields for why.
However, it can make sense to do this in a very limited scope, e.g. for testing. I think for numerical problems you should consider using the spire library as a dependency. It contains a large amount of useful things. Among them a type class for equality and mechanisms to derive type class instances for composite types (collections, tuples, etc) based on the type class instances for the individual scalar types.
Since as you observe, equality in the java world is fundamentally broken, they are using other operators (=== for type safe equality).
Here is an example how you would redefine equality for a limited scope to get fuzzy equality for comparing test results:
// import the machinery for operators like === (when an Eq type class instance is in scope)
import spire.syntax.all._
object Test extends App {
// redefine the equality for double, just in this scope, to mean fuzzy equali
implicit object FuzzyDoubleEq extends spire.algebra.Eq[Double] {
def eqv(a:Double, b:Double) = (a-b).abs < 1e-5
}
// this passes. === looks up the Eq instance for Double in the implicit scope. And
// since we have not imported the default instance but defined our own, this will
// find the Eq instance defined above and use its eqv method
require(0.0 === 0.000001)
// import automatic generation of type class instances for tuples based on type class instances of the scalars
// if there is an Eq available for each scalar type of the tuple, this will also make an Eq instance available for the tuple
import spire.std.tuples._
require((0.0, 0.0) === (0.000001, 0.0)) // works also for tuples containing doubles
// import automatic generation of type class instances for arrays based on type class instances of the scalars
// if there is an Eq instance for the element type of the array, there will also be one for the entire array
import spire.std.array._
require(Array(0.0,1.0) === Array(0.000001, 1.0)) // and for arrays of doubles
import spire.std.seq._
require(Seq(1.0, 0.0) === Seq(1.000000001, 0.0))
}

Java equals is indeed not as principled as it should be - people who are very bothered about this use something like Scalaz' Equal and ===. But even that assumes a symmetry of the types involved; I think you would have to write a custom typeclass to allow comparing heterogeneous types.
It's quite easy to write a new typeclass and have instances recursively derived for case classes, using Shapeless' automatic type class instance derivation. I'm not sure that extends to a two-parameter typeclass though. You might find it best to create distinct EqualityLHS and EqualityRHS typeclasses, and then your own equality method for comparing A: EqualityLHS and B: EqualityRHS, which could be pimped onto A as an operator if desired. (Of course it should be possible to extend the technique generically to support two-parameter typeclasses in full generality rather than needing such workarounds, and I'm sure shapeless would greatly appreciate such a contribution).
Best of luck - hopefully this gives you enough to find the rest of the answer yourself. What you want to do is by no means trivial, but with the help of modern Scala techniques it should be very much within the realms of possibility.

Related

How to design abstract classes if methods don't have the exact same signature?

This is a "real life" OO design question. I am working with Scala, and interested in specific Scala solutions, but I'm definitely open to hear generic thoughts.
I am implementing a branch-and-bound combinatorial optimization program. The algorithm itself is pretty easy to implement. For each different problem we just need to implement a class that contains information about what are the allowed neighbor states for the search, how to calculate the cost, and then potentially what is the lower bound, etc...
I also want to be able to experiment with different data structures. For instance, one way to store a logic formula is using a simple list of lists of integers. This represents a set of clauses, each integer a literal. We can have a much better performance though if we do something like a "two-literal watch list", and store some extra information about the formula in general.
That all would mean something like this
object BnBSolver[S<:BnBState]{
def solve(states: Seq[S], best_state:Option[S]): Option[S] = if (states.isEmpty) best_state else
val next_state = states.head
/* compare to best state, etc... */
val new_states = new_branches ++ states.tail
solve(new_states, new_best_state)
}
class BnBState[F<:Formula](clauses:F, assigned_variables) {
def cost: Int
def branches: Seq[BnBState] = {
val ll = clauses.pick_variable
List(
BnBState(clauses.assign(ll), ll :: assigned_variables),
BnBState(clauses.assign(-ll), -ll :: assigned_variables)
)
}
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Int) :F =
Formula(clauses.filterNot(_ contains ll)
.map(_.filterNot(_==-ll))))
}
Hopefully this is not too crazy, wrong or confusing. The whole issue here is that this assign method from a formula would usually take just the current literal that is going to be assigned. In the case of two-literal watch lists, though, you are doing some lazy thing that requires you to know later what literals have been previously assigned.
One way to fix this is you just keep this list of previously assigned literals in the data structure, maybe as a private thing. Make it a self-standing lazy data structure. But this list of the previous assignments is actually something that may be naturally available by whoever is using the Formula class. So it makes sense to allow whoever is using it to just provide the list every time you assign, if necessary.
The problem here is that we cannot now have an abstract Formula class that just declares a assign(ll:Int):Formula. In the normal case this is OK, but if this is a two-literal watch list Formula, it is actually an assign(literal: Int, previous_assignments: Seq[Int]).
From the point of view of the classes using it, it is kind of OK. But then how do we write generic code that can take all these different versions of Formula? Because of the drastic signature change, it cannot simply be an abstract method. We could maybe force the user to always provide the full assigned variables, but then this is a kind of a lie too. What to do?
The idea is the watch list class just becomes a kind of regular assign(Int) class if I write down some kind of adapter method that knows where to take the previous assignments from... I am thinking maybe with implicit we can cook something up.
I'll try to make my answer a bit general, since I'm not convinced I'm completely following what you are trying to do. Anyway...
Generally, the first thought should be to accept a common super-class as a parameter. Obviously that won't work with Int and Seq[Int].
You could just have two methods; have one call the other. For instance just wrap an Int into a Seq[Int] with one element and pass that to the other method.
You can also wrap the parameter in some custom class, e.g.
class Assignment {
...
}
def int2Assignment(n: Int): Assignment = ...
def seq2Assignment(s: Seq[Int]): Assignment = ...
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Assignment) :F = ...
}
And of course you would have the option to make those conversion methods implicit so that callers just have to import them, not call them explicitly.
Lastly, you could do this with a typeclass:
trait Assigner[A] {
...
}
implicit val intAssigner = new Assigner[Int] {
...
}
implicit val seqAssigner = new Assigner[Seq[Int]] {
...
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign[A : Assigner](ll: A) :F = ...
}
You could also make that type parameter at the class level:
case class Formula[A:Assigner,F<:Formula[A,F]](clauses:List[List[Int]]) {
def assign(ll: A) :F = ...
}
Which one of these paths is best is up to preference and how it might fit in with the rest of the code.

Why do you need Arbitraries in scalacheck?

I wonder why Arbitrary is needed because automated property testing requires property definition, like
val prop = forAll(v: T => check that property holds for v)
and value v generator. The user guide says that you can create custom generators for custom types (a generator for trees is exemplified). Yet, it does not explain why do you need arbitraries on top of that.
Here is a piece of manual
implicit lazy val arbBool: Arbitrary[Boolean] = Arbitrary(oneOf(true, false))
To get support for your own type T you need to define an implicit def
or val of type Arbitrary[T]. Use the factory method Arbitrary(...) to
create the Arbitrary instance. This method takes one parameter of type
Gen[T] and returns an instance of Arbitrary[T].
It clearly says that we need Arbitrary on top of Gen. Justification for arbitrary is not satisfactory, though
The arbitrary generator is the generator used by ScalaCheck when it
generates values for property parameters.
IMO, to use the generators, you need to import them rather than wrapping them into arbitraries! Otherwise, one can argue that we need to wrap arbitraries also into something else to make them usable (and so on ad infinitum wrapping the wrappers endlessly).
You can also explain how does arbitrary[Int] convert argument type into generator. It is very curious and I feel that these are related questions.
forAll { v: T => ... } is implemented with the help of Scala implicits. That means that the generator for the type T is found implicitly instead of being explicitly specified by the caller.
Scala implicits are convenient, but they can also be troublesome if you're not sure what implicit values or conversions currently are in scope. By using a specific type (Arbitrary) for doing implicit lookups, ScalaCheck tries to constrain the negative impacts of using implicits (this use also makes it similar to Haskell typeclasses that are familiar for some users).
So, you are entirely correct that Arbitrary is not really needed. The same effect could have been achieved through implicit Gen[T] values, arguably with a bit more implicit scoping confusion.
As an end-user, you should think of Arbitrary[T] as the default generator for the type T. You can (through scoping) define and use multiple Arbitrary[T] instances, but I wouldn't recommend it. Instead, just skip Arbitrary and specify your generators explicitly:
val myGen1: Gen[T] = ...
val mygen2: Gen[T] = ...
val prop1 = forAll(myGen1) { t => ... }
val prop2 = forAll(myGen2) { t => ... }
arbitrary[Int] works just like forAll { n: Int => ... }, it just looks up the implicit Arbitrary[Int] instance and uses its generator. The implementation is simple:
def arbitrary[T](implicit a: Arbitrary[T]): Gen[T] = a.arbitrary
The implementation of Arbitrary might also be helpful here:
sealed abstract class Arbitrary[T] {
val arbitrary: Gen[T]
}
ScalaCheck has been ported from the Haskell QuickCheck library. In Haskell type-classes only allow one instance for a given type, forcing you into this sort of separation.
In Scala though, there isn't such a constraint and it would be possible to simplify the library. My guess is that, ScalaCheck being (initially written as) a 1-1 mapping of QuickCheck, makes it easier for Haskellers to jump into Scala :)
Here is the Haskell definition of Arbitrary
class Arbitrary a where
-- | A generator for values of the given type.
arbitrary :: Gen a
And Gen
newtype Gen a
As you can see they have a very different semantic, Arbitrary being a type class, and Gen a wrapper with a bunch of combinators to build them.
I agree that the argument of "limiting the scope through semantic" is a bit vague and does not seem to be taken seriously when it comes to organizing the code: the Arbitrary class sometimes simply delegates to Gen instances as in
/** Arbirtrary instance of Calendar */
implicit lazy val arbCalendar: Arbitrary[java.util.Calendar] =
Arbitrary(Gen.calendar)
and sometimes defines its own generator
/** Arbitrary BigInt */
implicit lazy val arbBigInt: Arbitrary[BigInt] = {
val long: Gen[Long] =
Gen.choose(Long.MinValue, Long.MaxValue).map(x => if (x == 0) 1L else x)
val gen1: Gen[BigInt] = for { x <- long } yield BigInt(x)
/* ... */
Arbitrary(frequency((5, gen0), (5, gen1), (4, gen2), (3, gen3), (2, gen4)))
}
So in effect this leads to code duplication (each default Gen being mirrored by an Arbitrary) and some confusion (why isn't Arbitrary[BigInt] not wrapping a default Gen[BigInt]?).
My reading of that is that you might need to have multiple instances of Gen, so Arbitrary is used to "flag" the one that you want ScalaCheck to use?

Prevent Mixin overriding equals from breaking case class equality

Squeryl defines a trait KeyedEntity which overrides equals, checking for several conditions in an if and calling super.equals in the end. Since super is Object, it will always fail.
Consider:
trait T { override def equals(z: Any):Boolean = super.equals(z)} }
case class A(a: Int) extends T
val a = A(1); val b = A(1)
a==b // false
Thus, if you declare
case class Record(id: Long, name: String ...) extends KeyedEntity[Long] { ... }
-- and you create several Record instances but do not persist them, their comparison will break. I found this by implementing both Salat and Squeryl back ends for the same class, and then all Salat tests fail since isPersisted from KeyedEntity is false.
Is there a design whereby KeyedEntity will preserve case class equality if mixed into a case class? I tried self-typing and parameterizing BetterKeyedEntity[K,P] { self: P => ... } for the case class type as P but it causes infinite recursion in equals.
As things stand right now, super is Object so the final branch of the overridden equals in KeyedEntity will always return false.
The structural equality check usually generated for case classes seems not to be generated if there is an equals override. However some subtleties have to be noted.
Mixing the concept of id-based equality falling back to structural equality might not be a good idea, since I can imagine that it may lead to subtle bugs. For example:
x: A(1) and and y: A(1) are not yet persisted, so they are equal
then they get persisted, and since they are separate objects, the persistence framework may persist them as separate entities (I don't know Squeryl, maybe not an issue there, but this is a thin line to walk)
after persisting, they are suddenly not equal since the id differs.
Even worse, if x and y get persisted to the same id, the hashCode will differ before and after persisting (the source shows that if persisted it is the hashCode of the id). This breaks immutability, and will lead to very bad behavior (when put in maps for example). See this gist in which I demonstrate the assert failing.
So don't mix structural and id-based equality implicitly. Also see this explained in the context of Hibernate.
Typeclasses
It have to be noted that others pointed out (ref needed) that the concept of method-based equality is flawed, for such reasons (there is not only one way two thing can be equal). Therefore you can define a typeclass which describes Equality:
trait Eq[A] {
def equal(x: A, y: A): Boolean
}
and define (possibly multiple) instances of that typeclass for your classes:
// structural equality
implicit object MyClassEqual extends Eq[MyClass] { ... }
// id based equality
def idEq[K, A <: KeyedEntity[K]]: Eq[A] = new Eq[A] {
def equal(x: A, y: A) = x.id == y.id
}
then you can request that things are members of the Eq typeclass:
def useSomeObjects[A](a: A, b: A)(implicit aEq: Eq[A]) = {
... aEq.equal(a, b) ...
}
So you can decide which notion of equality to use by importing the appropriate typeclass in scope, or passing the typeclass instance directly as in useSomeObjects(x, y)(idEq[Int, SomeClass])
Note that you might also need a Hashable typeclass, similarly.
Autogenerating Eq instances
This situation is pretty similar to the Scala stdlib's scala.math.Ordering typeclass. Here is an example for auto-deriving structural Ordering instances for case classes using the excellent shapeless library.
The same would easy to be done for Eq and Hashable.
Scalaz
Note that scalaz has Equal typeclass, with nice pimp patterns with which you can write x === y instead of eqInstance.equal(x, y). I'm not aware it has Hashable typeclass, yet.

Scala: checking if an object is Numeric

Is it possible for a pattern match to detect if something is a Numeric? I want to do the following:
class DoubleWrapper(value: Double) {
override def equals(o: Any): Boolean = o match {
case o: Numeric => value == o.toDouble
case _ => false
}
override def hashCode(): Int = value ##
}
But of course this doesn't really work because Numeric isn't the supertype of things like Int and Double, it's a typeclass. I also can't do something like def equals[N: Numeric](o: N) because o has to be Any to fit the contract for equals.
So how do I do it without listing out every known Numeric class (including, I guess, user-defined classes I may not even know about)?
The original problem is not solvable, and here is my reasoning why:
To find out whether a type is an instance of a typeclass (such as Numeric), we need implicit resolution. Implicit resolution is done at compile time, but we would need it to be done at runtime. That is currently not possible, because as far as I can tell, the Scala compiler does not leave all necessary information in the compiled class file. To see that, one can write a test class with a method that contains a local variable, that has the implicit modifier. The compilation output will not change when the modifier is removed.
Are you using DoubleWrapper to add methods to Double? Then it should be a transparent type, i.e. you shouldn't be keeping instances, but rather define the pimped methods to return Double instead. That way you can keep using == as defined for primitives, which already does what you want (6.0 == 6 yields true).
Ok, so if not, how about
override def equals(o: Any): Boolean = o == value
If you construct equals methods of other wrappers accordingly, you should end up comparing the primitive values again.
Another question is whether you should have such an equals method for a stateful wrapper. I don't think mutable objects should be equal according to one of the values they hold—you will most likely run into trouble with that.

Should I use implicit conversions to enforce preconditions?

It occurs to me that I could use use implicit conversions to both announce and enforce preconditions. Consider this:
object NonNegativeDouble {
implicit def int2nnd(d : Double) : NonNegativeDouble = new NonNegativeDouble(d)
implicit def nnd2int(d : NonNegativeDouble) : Double = d.v
def sqrt(n : NonNegativeDouble) : NonNegativeDouble = scala.math.sqrt(n)
}
class NonNegativeDouble(val v : Double ) {
if (v < 0) {
throw new IllegalArgumentException("negative value")
}
}
object Test {
def t1 = {
val d : Double = NonNegativeDouble.sqrt(3.0);
printf("%f\n", d);
val n : Double = NonNegativeDouble.sqrt(-3.0);
}
}
Ignore for the moment the actual vacuity of the example: my point is, the subclass NonNegativeDouble expresses the notion that a function only takes a subset of the entire range of the class's values.
First is this:
A good idea,
a bad idea, or
an obvious idea everybody else already knows about
Second, this would be most useful with basic types, like Int and String. Those classes are final, of course, so is there a good way to not only use the restricted type in functions (that's what the second implicit is for) but also delegate to all methods on the underlying value (short of hand-implementing every delegation)?
This is an extremely cool idea, but unfortunately its true potential can't be realized in Scala's type system. What you really want here is dependent types, which allow you to impose a proof obligation on the caller of your method to verify that the argument is in range, such that the method can't even be invoked with an invalid argument.
But without dependent types and the ability to verify specifications at compile-time, I think this has questionable value, even leaving aside performance considerations. Consider, how is it any better than using the require function to state the initial conditions required by your method, like so:
def foo(i:Int) = {
require (i >= 0)
i * 9 + 4
}
In both cases, a negative value will cause an exception to be thrown at runtime, either in the require function or when constructing your NonNegativeDouble. Both techniques state the contract of the method clearly, but I would argue that there is a large overhead in building all these specialized types whose only purpose is to encapsulate a particular expression to be asserted at runtime. For instance, what if you wanted to enforce a slightly different precondition; say, that i > 45? Will you build an IntGreaterThan45 type just for that method?
The only argument I can see for building e.g. a NonNegativeFoo type is if you have many methods which consume and return positive numbers only. Even then, I think the payoff is dubious.
Incidentally, this is similar to the question How far to go with a strongly typed language?, to which I gave a similar answer.
Quite a neat idea actually, though I wouldn't use it in any performance sensitive loops.
#specialisation could also help out by a fair amount here to help make the code more efficient...
This would usually be called "unsigned int" in C. I don't think it's very useful, because you wouldn't be able to define operators properly. Consider this:
val a = UnsignedInt(5)
val b = a - 3 // now, b should be an UnsignedInt(2)
val c = b - 3 // now, c must be an Int, because it's negative!
Therefore, how would you define the minus operator? Like this maybe:
def -(i:Int):Either[UnsignedInt,Int]
That would make arithmetics with UnsignedInt practically unusable.
Or you define a superclass, MaybeSignedInt, that has two subclasses, SignedInt and UnsignedInt. Then you could define subtraction in UnsignedInt like this:
def -(i:Int):MaybeSignedInt
Seems totally awful, doesn't it? Actually, the sign of the number should not conceptually be a property of the number's type, but of it's value.