Input validation with the scala type system - scala

Having played a bit with Scala now, I question myself how you should do input validation in Scala.
This is what I have seen many times:
def doSomethingWithPositiveIntegers(i: Int) = {
require(i>0)
//do something
}
to bring matters to a head, it feels like doing this in Java:
void doSomething(Object o) {
if (!o instanceof Integer)
throw new IllegalArgumentException();
}
There, you first accept more than you are willing to accept, and then introduce some "guard" that only lets the "good ones" in. To be exact, you'd need these guards in every function that does something with positive integers, and in case you'd like for example to include zero later on, you'd need to change every function. Of course you can shift it to another function, but nevertheless you'd always need to rember to call the correct function, and it might not survive type refactorings etc. Does not sound that I'd like to have that. I was thinking about pushing this validation code to the data type itself, like this:
import scala.util.Try
object MyStuff {
implicit class PositiveInt(val value: Int) {
require(value>0)
}
implicit def positiveInt2Int(positiveInt: PositiveInt): Int = positiveInt.value
}
import MyStuff._
val i: MyStuff.PositiveInt = 5
val j: Int = i+5
println(i) //Main$$anon$1$MyStuff$PositiveInt#3a16cef5
println(j) //10
val sum = i + i
println(sum) //10
def addOne(i: MyStuff.PositiveInt) = i + 1
println(Try(addOne(-5))) //Failure(java.lang.IllegalArgumentException: requirement failed)
println(Try(addOne(5))) //Success(6)
Then I have a type PositiveInt that can only contain positive integers, and I can use it (almost) everywhere like an Int. Now, my API defines what I am willing to take - this is what I'd like to have! The function itself has nothing to validate, because it knows it can only get valid positive integers - they cannot be constructed without validation. You'd have to run your validation only once - upon creation of the type! Think of other cases, where validation might be more expensive (validate an email address or URL, or that a number is a prime).
Advantages:
Your API tells you directly what kind of objects you accept (no more do(String, String, String) what could be do(User, Email, Password))
Your objects get validated "automatically"
The compiler can help you reduce the risk of bugs. Some things that you'd before see on run time can be seen on compile time. Example:
def makeNegative(i: PositiveInt): NegativeInt = -i
addOne(makeNegative(1)) //will create a compile-time error!
However, there are some drawbacks:
Unfortunately, you break many functions that work due to implicit conversions. E.g., this will not work:
val i: PositiveInteger = 5
val range = i to 10 //error: value to is not a member of this.MyStuff.PositiveInt
val range = i.value to 10 //will work
It could be solved if you could extend Int and just add the require, because then all PositiveInt are Ints (what really is the case!), but Int is final :). You could add implicit conversions for all the cases you need, but that would be pretty verbose.
More objects are created. Maybe one can lower that burden with value classes (can anybody show me how?).
These are my questions:
Am I missing something? I have not seen anybody do this before, and I wonder why. Maybe there are good reasons for not doing this.
Is there a better way to integrate validation into my types?
How can I avoid the problems with the need of duplicate implicits (drawback #1)? Maybe some kind of macro that looks at other implicits in scope and adds implicits at compile time for me (Example: implicit conversion from PositiveInt to RichInt)?

You can create a class with a private constructor visible to a companion object with a factory method e.g.
class PositiveInt private[PositiveInt](val i: Int)
object PositiveInt {
def apply(i: Int): Option[PositiveInt] = if(i > 0) Some(new PositiveInt(i)) else None
}
clients cannot create instances of PositiveInt directly so they have to go through the apply method which does the validation and only returns valid instances if the input value is valid.

Related

How to design abstract classes if methods don't have the exact same signature?

This is a "real life" OO design question. I am working with Scala, and interested in specific Scala solutions, but I'm definitely open to hear generic thoughts.
I am implementing a branch-and-bound combinatorial optimization program. The algorithm itself is pretty easy to implement. For each different problem we just need to implement a class that contains information about what are the allowed neighbor states for the search, how to calculate the cost, and then potentially what is the lower bound, etc...
I also want to be able to experiment with different data structures. For instance, one way to store a logic formula is using a simple list of lists of integers. This represents a set of clauses, each integer a literal. We can have a much better performance though if we do something like a "two-literal watch list", and store some extra information about the formula in general.
That all would mean something like this
object BnBSolver[S<:BnBState]{
def solve(states: Seq[S], best_state:Option[S]): Option[S] = if (states.isEmpty) best_state else
val next_state = states.head
/* compare to best state, etc... */
val new_states = new_branches ++ states.tail
solve(new_states, new_best_state)
}
class BnBState[F<:Formula](clauses:F, assigned_variables) {
def cost: Int
def branches: Seq[BnBState] = {
val ll = clauses.pick_variable
List(
BnBState(clauses.assign(ll), ll :: assigned_variables),
BnBState(clauses.assign(-ll), -ll :: assigned_variables)
)
}
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Int) :F =
Formula(clauses.filterNot(_ contains ll)
.map(_.filterNot(_==-ll))))
}
Hopefully this is not too crazy, wrong or confusing. The whole issue here is that this assign method from a formula would usually take just the current literal that is going to be assigned. In the case of two-literal watch lists, though, you are doing some lazy thing that requires you to know later what literals have been previously assigned.
One way to fix this is you just keep this list of previously assigned literals in the data structure, maybe as a private thing. Make it a self-standing lazy data structure. But this list of the previous assignments is actually something that may be naturally available by whoever is using the Formula class. So it makes sense to allow whoever is using it to just provide the list every time you assign, if necessary.
The problem here is that we cannot now have an abstract Formula class that just declares a assign(ll:Int):Formula. In the normal case this is OK, but if this is a two-literal watch list Formula, it is actually an assign(literal: Int, previous_assignments: Seq[Int]).
From the point of view of the classes using it, it is kind of OK. But then how do we write generic code that can take all these different versions of Formula? Because of the drastic signature change, it cannot simply be an abstract method. We could maybe force the user to always provide the full assigned variables, but then this is a kind of a lie too. What to do?
The idea is the watch list class just becomes a kind of regular assign(Int) class if I write down some kind of adapter method that knows where to take the previous assignments from... I am thinking maybe with implicit we can cook something up.
I'll try to make my answer a bit general, since I'm not convinced I'm completely following what you are trying to do. Anyway...
Generally, the first thought should be to accept a common super-class as a parameter. Obviously that won't work with Int and Seq[Int].
You could just have two methods; have one call the other. For instance just wrap an Int into a Seq[Int] with one element and pass that to the other method.
You can also wrap the parameter in some custom class, e.g.
class Assignment {
...
}
def int2Assignment(n: Int): Assignment = ...
def seq2Assignment(s: Seq[Int]): Assignment = ...
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign(ll: Assignment) :F = ...
}
And of course you would have the option to make those conversion methods implicit so that callers just have to import them, not call them explicitly.
Lastly, you could do this with a typeclass:
trait Assigner[A] {
...
}
implicit val intAssigner = new Assigner[Int] {
...
}
implicit val seqAssigner = new Assigner[Seq[Int]] {
...
}
case class Formula[F<:Formula[F]](clauses:List[List[Int]]) {
def assign[A : Assigner](ll: A) :F = ...
}
You could also make that type parameter at the class level:
case class Formula[A:Assigner,F<:Formula[A,F]](clauses:List[List[Int]]) {
def assign(ll: A) :F = ...
}
Which one of these paths is best is up to preference and how it might fit in with the rest of the code.

How do I write a shorthand for a datatype in Scala

How do I write shorthand for a datatype?
For example.
lets say instead of List[Integer], I would rather type Integers
instead of this
def processNumbers(input:List[Integer]):List[Integer] = ...
to
def processNumbers(input:Integers):Integers = ...
Is this possible?
Thanks
Yes, you can do this with a type alias.
type Integers = List[Int] // scala.Int is preferred over java.lang.Integer
That being said, this isn't really a good use for them. List[Int] is very clear to other scala developers, wheres your type Integers provides no extra information and so will detract from the readability of your code over time.
A use of type aliases that would improve your code's readability though would be something like
type UserId = Int
def processUsers(ids: List[UserId]): Foo
In this case it provides extra information to the reader vs
def processUsers(ids: List[Int]): Foo
Using that kind of type alias also will allow you to gradually make your code more type-safe over time by changing the definition from a type alias to a value class.
case class UserId(value: Int) extends AnyVal
You won't need to change the method signatures of anything already having "UserId", but this will let the compiler assist you in making sure you don't do something like
val ids: List[Int] = getBlogPostIds()
val foo = processUsers(ids) // Oops, those Ints are for blog posts, not users
Using the value class approach, a mistake like that becomes a compiler error. Used pervasively it adds quite a lot of guidance in writing correct code.
val ids: List[BlogPostId] = getBlogPostIds
val foo = processUsers(ids) // Compile error; BlogPostId != UserId

When does it make sense to use implicit parameters in Scala, and what may be alternative scala idioms to consider?

Having used a Scala library that liberally exposes the reliance on implicits to the caller, I had experienced friction around this mechanism, as Scala makes it quite hard at times to debug implicit arguments, and because there's quite a bunch of places Scala would fill in values for implicit arguments from. (I could almost relate to it as "implicits hell" at one time).
At one time in my coding, Scala "complained" an implicit value could not be matched whereas in fact there was a "collision" of implicit values each coming from a different import.
Regardless of that perceived brittleness, it may at times feel borderline to an abuse of the context design pattern.
Why does it make sense to have implicit parameters in Scala?
In what scenarios would you use them and how would you avoid trouble?
As I'm not sure the experimentation-curve and potential for other team members getting totally confused are worth it, could you possibly suggest other scala idioms for sharing context between a multitude of Scala functions?
This questions is not for a specific implementation at hand, hopefully it's still a good fit for this site.
Generally, using a common type as an implicit parameter is a bad idea.
def badIdea(n: Int)(implicit s: String) = s * n
It doesn't take much to imagine why: you'll get conflicting implicits for the same thing if anyone else adopts this policy. Better to avoid it.
But who really wants to manually stuff in a scala.concurrent.ExecutionContext manually every time it's needed (which is practically everywhere)?
So the key is: when you have something with a specialized type, especially if it's bookkeeping that might need to be overridden manually but mostly should just do the right thing, then use implicit parameters. (This usually covers type classes as well.)
Then what do you do if you really need a string? Well, wrap it (at least formally--here it's a value class so in some contexts it will just pass the string around):
class MyWrappedString(val underlying: String) extends AnyVal {}
implicit val myString = new MyWrappedString("bird")
def decentIdea(n: Int)(implicit mws: MyWrappedString) = mws.underlying * n
scala> decentIdea(2) // In the bush?
res14: String = birdbird
Or if you think some additional logic is helpful, write a wrapper that takes an extra type parameter:
class ImplicitWithValue[K,V](val value: V) {
// Any extra generic logic goes here
}
object ImplicitWithValue {
class ValuePart[K] {
def apply[V](v: V) = new ImplicitWithValue[K,V](v)
}
private val genericValuePart = new ValuePart[Any]
private def typedValuePart[K] = genericValuePart.asInstanceOf[ValuePart[K]]
def apply[K] = typedValuePart[K]
}
Then you can
trait Marker1
implicit val implicit1 = ImplicitWithValue[Marker1]("fish")
def goodIdea(n: Int)(implicit ms: ImplicitWithValue[Marker1, String]) = ms.value * n
scala> goodIdea(3)
res17: String = fishfishfish

Scala: Why use implicit on function argument?

I have a following function:
def getIntValue(x: Int)(implicit y: Int ) : Int = {x + y}
I see above declaration everywhere. I understand what above function is doing. It is a currying function which takes two arguments. If you omit the second argument, it will invoke implicit definition which returns int instead. So I think it is something very similar to defining a default value for the argument.
implicit val temp = 3
scala> getIntValue(3)
res8: Int = 6
I was wondering what are the benefits of above declaration?
Here's my "pragmatic" answer: you typically use currying as more of a "convention" than anything else meaningful. It comes in really handy when your last parameter happens to be a "call by name" parameter (for example: : => Boolean):
def transaction(conn: Connection)(codeToExecuteInTransaction : => Boolean) = {
conn.startTransaction // start transaction
val booleanResult = codeToExecuteInTransaction //invoke the code block they passed in
//deal with errors and rollback if necessary, or commit
//return connection to connection pool
}
What this is saying is "I have a function called transaction, its first parameter is a Connection and its second parameter will be a code-block".
This allows us to use this method like so (using the "I can use curly brace instead of parenthesis rule"):
transaction(myConn) {
//code to execute in a transaction
//the code block's last executable statement must be a Boolean as per the second
//parameter of the transaction method
}
If you didn't curry that transaction method, it would look pretty unnatural doing this:
transaction(myConn, {
//code block
})
How about implicit? Yes it can seem like a very ambiguous construct, but you get used to it after a while, and the nice thing about implicit functions is they have scoping rules. So this means for production, you might define an implicit function for getting that database connection from the PROD database, but in your integration test you'll define an implicit function that will superscede the PROD version, and it will be used to get a connection from a DEV database instead for use in your test.
As an example, how about we add an implicit parameter to the transaction method?
def transaction(implicit conn: Connection)(codeToExecuteInTransaction : => Boolean) = {
}
Now, assuming I have an implicit function somewhere in my code base that returns a Connection, like so:
def implicit getConnectionFromPool() : Connection = { ...}
I can execute the transaction method like so:
transaction {
//code to execute in transaction
}
and Scala will translate that to:
transaction(getConnectionFromPool) {
//code to execute in transaction
}
In summary, Implicits are a pretty nice way to not have to make the developer provide a value for a required parameter when that parameter is 99% of the time going to be the same everywhere you use the function. In that 1% of the time you need a different Connection, you can provide your own connection by passing in a value instead of letting Scala figure out which implicit function provides the value.
In your specific example there are no practical benefits. In fact using implicits for this task will only obfuscate your code.
The standard use case of implicits is the Type Class Pattern. I'd say that it is the only use case that is practically useful. In all other cases it's better to have things explicit.
Here is an example of a typeclass:
// A typeclass
trait Show[a] {
def show(a: a): String
}
// Some data type
case class Artist(name: String)
// An instance of the `Show` typeclass for that data type
implicit val artistShowInstance =
new Show[Artist] {
def show(a: Artist) = a.name
}
// A function that works for any type `a`, which has an instance of a class `Show`
def showAListOfShowables[a](list: List[a])(implicit showInstance: Show[a]): String =
list.view.map(showInstance.show).mkString(", ")
// The following code outputs `Beatles, Michael Jackson, Rolling Stones`
val list = List(Artist("Beatles"), Artist("Michael Jackson"), Artist("Rolling Stones"))
println(showAListOfShowables(list))
This pattern originates from a functional programming language named Haskell and turned out to be more practical than the standard OO practices for writing a modular and decoupled software. The main benefit of it is it allows you to extend the already existing types with new functionality without changing them.
There's plenty of details unmentioned, like syntactic sugar, def instances and etc. It is a huge subject and fortunately it has a great coverage throughout the web. Just google for "scala type class".
There are many benefits, outside of your example.
I'll give just one; at the same time, this is also a trick that you can use on certain occasions.
Imagine you create a trait that is a generic container for other values, like a list, a set, a tree or something like that.
trait MyContainer[A] {
def containedValue:A
}
Now, at some point, you find it useful to iterate over all elements of the contained value.
Of course, this only makes sense if the contained value is of an iterable type.
But because you want your class to be useful for all types, you don't want to restrict A to be of a Seq type, or Traversable, or anything like that.
Basically, you want a method that says: "I can only be called if A is of a Seq type."
And if someone calls it on, say, MyContainer[Int], that should result in a compile error.
That's possible.
What you need is some evidence that A is of a sequence type.
And you can do that with Scala and implicit arguments:
trait MyContainer[A] {
def containedValue:A
def aggregate[B](f:B=>B)(implicit ev:A=>Seq[B]):B =
ev(containedValue) reduce f
}
So, if you call this method on a MyContainer[Seq[Int]], the compiler will look for an implicit Seq[Int]=>Seq[B].
That's really simple to resolve for the compiler.
Because there is a global implicit function that's called identity, and it is always in scope.
Its type signature is something like: A=>A
It simply returns whatever argument is passed to it.
I don't know how this pattern is called. (Can anyone help out?)
But I think it's a neat trick that comes in handy sometimes.
You can see a good example of that in the Scala library if you look at the method signature of Seq.sum.
In the case of sum, another implicit parameter type is used; in that case, the implicit parameter is evidence that the contained type is numeric, and therefore, a sum can be built out of all contained values.
That's not the only use of implicits, and certainly not the most prominent, but I'd say it's an honorable mention. :-)

Scala Implicit Conversion Gotchas

EDIT
OK, #Drexin brings up a good point re: loss of type safety/surprising results when using implicit converters.
How about a less common conversion, where conflicts with PreDef implicits would not occur? For example, I'm working with JodaTime (great project!) in Scala. In the same controller package object where my implicits are defined, I have a type alias:
type JodaTime = org.joda.time.DateTime
and an implicit that converts JodaTime to Long (for a DAL built on top of ScalaQuery where dates are stored as Long)
implicit def joda2Long(d: JodaTime) = d.getMillis
Here no ambiguity could exist between PreDef and my controller package implicits, and, the controller implicits will not filter into the DAL as that is in a different package scope. So when I do
dao.getHeadlines(articleType, Some(jodaDate))
the implicit conversion to Long is done for me, IMO, safely, and given that date-based queries are used heavily, I save some boilerplate.
Similarly, for str2Int conversions, the controller layer receives servlet URI params as String -> String. There are many cases where the URI then contains numeric strings, so when I filter a route to determine if the String is an Int, I do not want to stringVal.toInt everytime; instead, if the regex passes, let the implicit convert the string value to Int for me. All together it would look like:
implicit def str2Int(s: String) = s.toInt
get( """/([0-9]+)""".r ) {
show(captures(0)) // captures(0) is String
}
def show(id: Int) = {...}
In the above contexts, are these valid use cases for implicit conversions, or is it more, always be explicit? If the latter, then what are valid implicit conversion use cases?
ORIGINAL
In a package object I have some implicit conversions defined, one of them a simple String to Int:
implicit def str2Int(s: String) = s.toInt
Generally this works fine, methods that take an Int param, but receive a String, make the conversion to Int, as do methods where the return type is set to Int, but the actual returned value is a String.
Great, now in some cases the compiler errors with the dreaded ambiguous implicit:
both method augmentString in object Predef of type (x: String)
scala.collection.immutable.StringOps and method str2Int(s: String) Int
are possible conversion functions from java.lang.String to ?{val
toInt: ?}
The case where I know this is happening is when attempting to do manual inline String-to-Int conversions. For example, val i = "10".toInt
My workaround/hack has been to create an asInt helper along with the implicits in the package object: def asInt(i: Int) = i and used as, asInt("10")
So, is implicit best practice implicit (i.e. learn by getting burned), or are there some guidelines to follow so as to not get caught in a trap of one's own making? In other words, should one avoid simple, common implicit conversions and only utilize where the type to convert is unique? (i.e. will never hit ambiguity trap)
Thanks for the feedback, implicits are awesome...when they work as intended ;-)
I think you're mixing two different use cases here.
In the first case, you're using implicit conversions used to hide the arbitrary distinction (or arbitrary-to-you, anyway) between different classes in cases where the functionality is identical. The JodaTime to Long implicit conversion fits in that category; it's probably safe, and very likely a good idea. I would probably use the enrich-my-library pattern instead, and write
class JodaGivesMS(jt: JodaTime) { def ms = jt.getMillis }
implicit def joda_can_give_ms(jt: JodaTime) = new JodaGivesMS(jt)
and use .ms on every call, just to be explicit. The reason is that units matter here (milliseconds are not microseconds are not seconds are not millimeters, but all can be represented as ints), and I'd rather leave some record of what the units are at the interface, in most cases. getMillis is rather a mouthful to type every time, but ms is not too bad. Still, the conversion is reasonable (if well-documented for people who may modify the code in years to come (including you)).
In the second case, however, you're performing an unreliable transformation between one very common type and another. True, you're doing it in only a limited context, but that transformation is still liable to escape and cause problems (either exceptions or types that aren't what you meant). Instead, you should write those handy routines that you need that correctly handle the conversion, and use those everywhere. For example, suppose you have a field that you expect to be "yes", "no", or an integer. You might have something like
val Rint = """(\d+)""".r
s match {
case "yes" => println("Joy!")
case "no" => println("Woe!")
case Rint(i) => println("The magic number is "+i.toInt)
case _ => println("I cannot begin to describe how calamitous this is")
}
But this code is wrong, because "12414321431243".toInt throws an exception, when what you really want is to say that the situation is calamitous. Instead, you should write code that matches properly:
case object Rint {
val Reg = """([-]\d+)""".r
def unapply(s: String): Option[Int] = s match {
case Reg(si) =>
try { Some(si.toInt) }
catch { case nfe: NumberFormatException => None }
case _ => None
}
}
and use this instead. Now instead of performing a risky and implicit conversion from String to Int, when you perform a match it will all be handled properly, both the regex match (to avoid throwing and catching piles of exceptions on bad parses) and the exception handling even if the regex passes.
If you have something that has both a string and an int representation, create a new class and then have implicit conversions to each if you don't want your use of the object (which you know can safely be either) to keep repeating a method call that doesn't really provide any illumination.
I try not to convert anything implicitly just to convert it from one type to another, but only for the pimp my library pattern. It can be a bit confusing, when you pass a String to a function that takes an Int. Also there is a huge loss of type safety. If you would pass a string to a function that takes an Int by mistake the compiler could not detect it, as it assumes you want to do it. So always do type conversion explicitly and only use implicit conversions to extend classes.
edit:
To answer your updated question: For the sake of readability, please use the explicit getMillis. In my eyes valid use cases for implicits are "pimp my library", view/context bounds, type classes, manifests, builders... but not being too lazy to write an explicit call to a method.