Pattern for generating negative Scalacheck scenarios: Using property based testing to test validation logic in Scala - scala

We are looking for a viable design pattern for building Scalacheck Gen (generators) that can produce both positive and negative test scenarios. This will allow us to run forAll tests to validate functionality (positive cases), and also verify that our case class validation works correctly by failing on all invalid combinations of data.
Making a simple, parameterized Gen that does this on a one-off basis is pretty easy. For example:
def idGen(valid: Boolean = true): Gen[String] = Gen.oneOf(ID.values.toList).map(s => if (valid) s else Gen.oneOf(simpleRandomCode(4), "").sample.get)
With the above, I can get a valid or invalid ID for testing purposes. The valid one, I use to make sure business logic succeeds. The invalid one, I use to make sure our validation logic rejects the case class.
Ok, so -- problem is, on a large scale, this becomes very unwieldly. Let's say I have a data container with, oh, 100 different elements. Generating a "good" one is easy. But now, I want to generate a "bad" one, and furthermore:
I want to generate a bad one for each data element, where a single data element is bad (so at minimum, at least 100 bad instances, testing that each invalid parameter is caught by validation logic).
I want to be able to override specific elements, for instance feeding in a bad ID or a bad "foobar." Whatever that is.
One pattern we can look to for inspiration is apply and copy, which allows us to easily compose new objects while specifying overridden values. For example:
val f = Foo("a", "b") // f: Foo = Foo(a,b)
val t = Foo.unapply(f) // t: Option[(String, String)] = Some((a,b))
Foo(t.get._1, "c") // res0: Foo = Foo(a,c)
Above we see the basic idea of creating a mutating object from the template of another object. This is more easily expressed in Scala as:
val f = someFoo copy(b = "c")
Using this as inspiration we can think about our objectives. A few things to think about:
First, we could define a map or a container of key/values for the data element and generated value. This could be used in place of a tuple to support named value mutation.
Given a container of key/value pairs, we could easily select one (or more) pairs at random and change a value. This supports the objective of generating a data set where one value is altered to create failure.
Given such a container, we can easily create a new object from the invalid collection of values (using either apply() or some other technique).
Alternatively, perhaps we can develop a pattern that uses a tuple and then just apply() it, kind of like the copy method, as long as we can still randomly alter one or more values.
We can probably explore developing a reusable pattern that does something like this:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => invalidV = v.invalidate(1); validate(invalidV) must beFalse }
In the above code, we have a generator thingGen that returns (valid) Things. Then for all instances returned, we invoke a generic method invalidate(count: Int) which will randomly invalidate count values, returning an invalid object. We can then use that to ascertain whether our validation logic works correctly.
This would require defining an invalidate() function that, given a parameter (either by name, or by position) can then replace the identified parameter with a value that is known to be bad. This implies have an "anti-generator" for specific values, for instance, if an ID must be 3 characters, then it knows to create a string that is anything but 3 characters long.
Of course to invalidate a known, single parameter (to inject bad data into a test condition) we can simply use the copy method:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => v2 = v copy(id = "xxx"); validate(v2) must beFalse }
That is the sum of my thinking to date. Am I barking up the wrong tree? Are there good patterns out there that handle this kind of testing? Any commentary or suggestions on how best to approach this problem of testing our validation logic?

We can combine a valid instance and an set of invalid fields (so that every field, if copied, would cause validation failure) to get an invalid object using shapeless library.
Shapeless allows you to represent your class as a list of key-value pairs that are still strongly typed and support some high-level operations, and converting back from this representation to your original class.
In example below I'll be providing an invalid instance for each single field provided
import shapeless._, record._
import shapeless.labelled.FieldType
import shapeless.ops.record.Updater
A detailed intro
Let's pretend we have a data class, and a valid instance of it (we only need one, so it can be hardcoded)
case class User(id: String, name: String, about: String, age: Int) {
def isValid = id.length == 3 && name.nonEmpty && age >= 0
}
val someValidUser = User("oo7", "Frank", "A good guy", 42)
assert(someValidUser.isValid)
We can then define a class to be used for invalid values:
case class BogusUserFields(name: String, id: String, age: Int)
val bogusData = BogusUserFields("", "1234", -5)
Instances of such classes can be provided using ScalaCheck. It's much easier to write a generator where all fields would cause failure. Order of fields doesn't matter, but their names and types do. Here we excluded about from User set of fields so we can do what you asked for (feeding only a subset of fields you want to test)
We then use LabelledGeneric[T] to convert User and BogusUserFields to their corresponding record value (and later we will convert User back)
val userLG = LabelledGeneric[User]
val bogusLG = LabelledGeneric[BogusUserFields]
val validUserRecord = userLG.to(someValidUser)
val bogusRecord = bogusLG.to(bogusData)
Records are lists of key-value pairs, so we can use head to get a single mapping, and the + operator supports adding / replacing field to another record. Let's pick every invalid field into our user one at a time. Also, here's the conversion back in action:
val invalidUser1 = userLG.from(validUserRecord + bogusRecord.head)// invalid name
val invalidUser2 = userLG.from(validUserRecord + bogusRecord.tail.head)// invalid ID
val invalidUser3 = userLG.from(validUserRecord + bogusRecord.tail.tail.head) // invalid age
assert(List(invalidUser1, invalidUser2, invalidUser3).forall(!_.isValid))
Since we basically are applying the same function (validUserRecord + _) to every key-value pair in our bogusRecord, we can also use map operator, except we use it with an unusual - polymorphic - function. We can also easily convert it to List, because every element will be of a same type now.
object polymerge extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[userLG.Repr, FieldType[K, V]]) =
at[FieldType[K, V]](upd(validUserRecord, _))
}
val allInvalidUsers = bogusRecord.map(polymerge).toList.map(userLG.from)
assert(allInvalidUsers == List(invalidUser1, invalidUser2, invalidUser3))
Generalizing and removing all the boilerplate
Now the whole point of this was that we can generalize it to work for any two arbitrary classes. The encoding of all relationships and operations is a bit cumbersome and it took me a while to get it right with all the implicit not found errors, so I'll skip the details.
class Picks[A, AR <: HList](defaults: A)(implicit lgA: LabelledGeneric.Aux[A, AR]) {
private val defaultsRec = lgA.to(defaults)
object mergeIntoTemplate extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[AR, FieldType[K, V]]) =
at[FieldType[K, V]](upd(defaultsRec, _))
}
def from[B, BR <: HList, MR <: HList, F <: Poly](options: B)
(implicit
optionsLG: LabelledGeneric.Aux[B, BR],
mapper: ops.hlist.Mapper.Aux[mergeIntoTemplate.type, BR, MR],
toList: ops.hlist.ToTraversable.Aux[MR, List, AR]
) = {
optionsLG.to(options).map(mergeIntoTemplate).toList.map(lgA.from)
}
}
So, here it is in action:
val cp = new Picks(someValidUser)
assert(cp.from(bogusData) == allInvalidUsers)
Unfortunately, you cannot write new Picks(someValidUser).from(bogusData) because implicit for mapper requires a stable identifier. On the other hand, cp instance can be reused with other types:
case class BogusName(name: String)
assert(cp.from(BogusName("")).head == someValidUser.copy(name = ""))
And now it works for all types! And bogus data is required to be any subset of class fields, so it will work even for class itself
case class Address(country: String, city: String, line_1: String, line_2: String) {
def isValid = Seq(country, city, line_1, line_2).forall(_.nonEmpty)
}
val acp = new Picks(Address("Test country", "Test city", "Test line 1", "Test line 2"))
val invalidAddresses = acp.from(Address("", "", "", ""))
assert(invalidAddresses.forall(!_.isValid))
You can see the code running at ScalaFiddle

Related

Prevent empty values in an array being inserted into Mongo collection

I am trying to prevent empty values being inserted into my mongoDB collection. The field in question looks like this:
MongoDB Field
"stadiumArr" : [
"Old Trafford",
"El Calderon",
...
]
Sample of (mapped) case class
case class FormData(_id: Option[BSONObjectID], stadiumArr: Option[List[String]], ..)
Sample of Scala form
object MyForm {
val form = Form(
mapping(
"_id" -> ignored(Option.empty[BSONObjectID]),
"stadiumArr" -> optional(list(text)),
...
)(FormData.apply)(FormData.unapply)
)
}
I am also using the Repeated Values functionality in Play Framework like so:
Play Template
#import helper._
#(myForm: Form[models.db.FormData])(implicit request: RequestHeader, messagesProvider: MessagesProvider)
#repeatWithIndex(myForm("stadiumArr"), min = 5) { (stadium, idx) =>
#inputText(stadium, '_label -> ("stadium #" + (idx + 1)))
}
This ensures that whether there are at least 5 values or not in the array; there will still be (at least) 5 input boxes created. However if one (or more) of the input boxes are empty when the form is submitted an empty string is still being added as value in the array, e.g.
"stadiumArr" : [
"Old Trafford",
"El Calderon",
"",
"",
""
]
Based on some other ways of converting types from/to the database; I've tried playing around with a few solutions; such as:
implicit val arrayWrite: Writes[List[String]] = new Writes[List[String]] {
def writes(list: List[String]): JsValue = Json.arr(list.filterNot(_.isEmpty))
}
.. but this isn't working. Any ideas on how to prevent empty values being inserted into the database collection?
Without knowing specific versions or libraries you're using it's hard to give you an answer, but since you linked to play 2.6 documentation I'll assume that's what you're using there. The other assumption I'm going to make is that you're using reactive-mongo library. Whether or not you're using the play plugin for that library or not is the reason why I'm giving you two different answers here:
In that library, with no plugin, you'll have defined a BSONDocumentReader and a BSONDocumentWriter for your case class. This might be auto-generated for you with macros or not, but regardless how you get it, these two classes have useful methods you can use to transform the reads/writes you have to another one. So, let's say I defined a reader and writer for you like this:
import reactivemongo.bson._
case class FormData(_id: Option[BSONObjectID], stadiumArr: Option[List[String]])
implicit val formDataReaderWriter = new BSONDocumentReader[FormData] with BSONDocumentWriter[FormData] {
def read(bson: BSONDocument): FormData = {
FormData(
_id = bson.getAs[BSONObjectID]("_id"),
stadiumArr = bson.getAs[List[String]]("stadiumArr").map(_.filterNot(_.isEmpty))
)
}
def write(formData: FormData) = {
BSONDocument(
"_id" -> formData._id,
"stadiumArr" -> formData.stadiumArr
)
}
}
Great you say, that works! You can see in the reads I went ahead and filtered out any empty strings. So even if it's in the data, it can be cleaned up. That's nice and all, but let's notice I didn't do the same for the writes. I did that so I can show you how to use a useful method called afterWrite. So pretend the reader/writer weren't the same class and were separate, then I can do this:
val initialWriter = new BSONDocumentWriter[FormData] {
def write(formData: FormData) = {
BSONDocument(
"_id" -> formData._id,
"stadiumArr" -> formData.stadiumArr
)
}
}
implicit val cleanWriter = initialWriter.afterWrite { bsonDocument =>
val fixedField = bsonDocument.getAs[List[String]]("stadiumArr").map(_.filterNot(_.isEmpty))
bsonDocument.remove("stadiumArr") ++ BSONDocument("stadiumArr" -> fixedField)
}
Note that cleanWriter is the implicit one, that means when the insert call on the collection happens, it will be the one chosen to be used.
Now, that's all a bunch of work, if you're using the plugin/module for play that lets you use JSONCollections then you can get by with just defining play json Reads and Writes. If you look at the documentation you'll see that the reads trait has a useful map function you can use to transform one Reads into another.
So, you'd have:
val jsonReads = Json.reads[FormData]
implicit val cleanReads = jsonReads.map(formData => formData.copy(stadiumArr = formData.stadiumArr.map(_.filterNot(_.isEmpty))))
And again, because only the clean Reads is implicit, the collection methods for mongo will use that.
NOW, all of that said, doing this at the database level is one thing, but really, I personally think you should be dealing with this at your Form level.
val form = Form(
mapping(
"_id" -> ignored(Option.empty[BSONObjectID]),
"stadiumArr" -> optional(list(text)),
...
)(FormData.apply)(FormData.unapply)
)
Mainly because, surprise surprise, form has a way to deal with this. Specifically, the mapping class itself. If you look there you'll find a transform method you can use to filter out empty values easily. Just call it on the mapping you need to modify, for example:
"stadiumArr" -> optional(
list(text).transform(l => l.filter(_.nonEmpty), l => l.filter(_.nonEmpty))
)
To explain a little more about this method, in case you're not used to reading the signatures in the scaladoc.
def
transform[B](f1: (T) ⇒ B, f2: (B) ⇒ T): Mapping[B]
says that by calling transform on some mapping of type Mapping[T] you can create a new mapping of type Mapping[B]. In order to do this you must provide functions that convert from one to the other. So the code above causes the list mapping (Mapping[List[String]]) to become a Mapping[List[String]] (the type did not change here), but when it does so it removes any empty elements. If I break this code down a little it might be more clear:
def convertFromTtoB(list: List[String]): List[String] = list.filter(_.nonEmpty)
def convertFromBtoT(list: List[String]): List[String] = list.filter(_.nonEmpty)
...
list(text).transform(convertFromTtoB, convertFromBtoT)
You might wondering why you need to provide both, the reason is because when you call Form.fill and the form is populated with values, the second method will be called so that the data goes into the format the play form is expecting. This is more obvious if the type actually changes. For example, if you had a text area where people could enter CSV but you wanted to map it to a form model that had a proper List[String] you might do something like:
def convertFromTtoB(raw: String): List[String] = raw.split(",").filter(_.nonEmpty)
def convertFromBtoT(list: List[String]): String = list.mkString(",")
...
text.transform(convertFromTtoB, convertFromBtoT)
Note that when I've done this in the past sometimes I've had to write a separate method and just pass it in if I didn't want to fully specify all the types, but you should be able to work from here given the documentation and type signature for the transform method on mapping.
The reason I suggest doing this in the form binding is because the form/controller should be the one with the concern of dealing with your user data and cleaning things up I think. But you can always have multiple layers of cleaning and whatnot, it's not bad to be safe!
I've gone for this (which always seems obvious when it's written and tested):
implicit val arrayWrite: Writes[List[String]] = new Writes[List[String]] {
def writes(list: List[String]): JsValue = Json.toJson(list.filterNot(_.isEmpty).toIndexedSeq)
}
But I would be interested to know how to
.map the existing Reads rather than redefining from scratch
as #cchantep suggests

HList/KList from class values

I want to be able to create a class/trait that behaves somewhat like an enumeration (HEnum in the first snippet below). I can't use a plain enumeration because each enum value could have a different type (though the container class will be the same): Key[A]. I'd like to be able to construct the enum roughly like this:
class Key[A](val name: String)
object A extends HEnum {
val a = new Key[String]("a")
val b = new Key[Int]("b")
val c = new Key[Float]("c")
}
And then I'd like to be able to perform more or less basic HList operations like:
A.find[String] // returns the first element with type Key[String]
A.find("b") // returns the first element with name "b", type should (hopefully) be Key[Int]
So far I've been playing with an HList as the underlying data structure, but constructing one with the proper type has proven difficult. My most successful attempt looks like this:
class Key[A](val name: String)
object Key {
def apply[A, L <: HList](name: String, l: L): (Key[A], Key[A] :: L) = {
val key = new Key[A](name)
(key, key :: l)
}
}
object A {
val (a, akeys) = Key[String, HNil]("a", HNil)
val (b, bkeys) = Key[Int, Key[String] :: HList]("b", akeys)
val (c, ckeys) = Key[Float, Key[Int] :: HList]("c", bkeys)
val values = ckeys // use this for lookups, etc
def find[A]: Key[A] = values.select[A]
def find[A](name: String): Key[A] = ...
}
The problem here is that the interface is clunky. Adding a new value anywhere besides the end of the list of values is error prone and no matter what, you have to manually update values any time a new value is introduced. My solution without HList involved a List[Key[_]] and error prone/unsafe casting to the proper type when needed.
EDIT
I should also mention that the enum example found here is not particularly helpful to me (although, if that can be adapted, then great). The added compiler checks for exhaustive pattern matches are nice (and I would ultimately want that) but this enum still only allows a homogeneous collection of enum values.

Input validation with the scala type system

Having played a bit with Scala now, I question myself how you should do input validation in Scala.
This is what I have seen many times:
def doSomethingWithPositiveIntegers(i: Int) = {
require(i>0)
//do something
}
to bring matters to a head, it feels like doing this in Java:
void doSomething(Object o) {
if (!o instanceof Integer)
throw new IllegalArgumentException();
}
There, you first accept more than you are willing to accept, and then introduce some "guard" that only lets the "good ones" in. To be exact, you'd need these guards in every function that does something with positive integers, and in case you'd like for example to include zero later on, you'd need to change every function. Of course you can shift it to another function, but nevertheless you'd always need to rember to call the correct function, and it might not survive type refactorings etc. Does not sound that I'd like to have that. I was thinking about pushing this validation code to the data type itself, like this:
import scala.util.Try
object MyStuff {
implicit class PositiveInt(val value: Int) {
require(value>0)
}
implicit def positiveInt2Int(positiveInt: PositiveInt): Int = positiveInt.value
}
import MyStuff._
val i: MyStuff.PositiveInt = 5
val j: Int = i+5
println(i) //Main$$anon$1$MyStuff$PositiveInt#3a16cef5
println(j) //10
val sum = i + i
println(sum) //10
def addOne(i: MyStuff.PositiveInt) = i + 1
println(Try(addOne(-5))) //Failure(java.lang.IllegalArgumentException: requirement failed)
println(Try(addOne(5))) //Success(6)
Then I have a type PositiveInt that can only contain positive integers, and I can use it (almost) everywhere like an Int. Now, my API defines what I am willing to take - this is what I'd like to have! The function itself has nothing to validate, because it knows it can only get valid positive integers - they cannot be constructed without validation. You'd have to run your validation only once - upon creation of the type! Think of other cases, where validation might be more expensive (validate an email address or URL, or that a number is a prime).
Advantages:
Your API tells you directly what kind of objects you accept (no more do(String, String, String) what could be do(User, Email, Password))
Your objects get validated "automatically"
The compiler can help you reduce the risk of bugs. Some things that you'd before see on run time can be seen on compile time. Example:
def makeNegative(i: PositiveInt): NegativeInt = -i
addOne(makeNegative(1)) //will create a compile-time error!
However, there are some drawbacks:
Unfortunately, you break many functions that work due to implicit conversions. E.g., this will not work:
val i: PositiveInteger = 5
val range = i to 10 //error: value to is not a member of this.MyStuff.PositiveInt
val range = i.value to 10 //will work
It could be solved if you could extend Int and just add the require, because then all PositiveInt are Ints (what really is the case!), but Int is final :). You could add implicit conversions for all the cases you need, but that would be pretty verbose.
More objects are created. Maybe one can lower that burden with value classes (can anybody show me how?).
These are my questions:
Am I missing something? I have not seen anybody do this before, and I wonder why. Maybe there are good reasons for not doing this.
Is there a better way to integrate validation into my types?
How can I avoid the problems with the need of duplicate implicits (drawback #1)? Maybe some kind of macro that looks at other implicits in scope and adds implicits at compile time for me (Example: implicit conversion from PositiveInt to RichInt)?
You can create a class with a private constructor visible to a companion object with a factory method e.g.
class PositiveInt private[PositiveInt](val i: Int)
object PositiveInt {
def apply(i: Int): Option[PositiveInt] = if(i > 0) Some(new PositiveInt(i)) else None
}
clients cannot create instances of PositiveInt directly so they have to go through the apply method which does the validation and only returns valid instances if the input value is valid.

Migrate from MurmurHash to MurmurHash3

In Scala 2.10, MurmurHash for some reason is deprecated, saying I should use MurmurHash3 now. But the API is different, and there is no useful scaladocs for MurmurHash3 -> fail.
For instance, current code:
trait Foo {
type Bar
def id: Int
def path: Bar
override def hashCode = {
import util.MurmurHash._
var h = startHash(2)
val c = startMagicA
val k = startMagicB
h = extendHash(h, id, c, k)
h = extendHash(h, path.##, nextMagicA(c), nextMagicB(k))
finalizeHash(h)
}
}
How would I do this using MurmurHash3 instead? This needs to be a fast operation, preferably without allocations, so I do not want to construct a Product, Seq, Array[Byte] or whathever MurmurHash3 seems to be offering me.
The MurmurHash3 algorithm was changed, confusingly, from an algorithm that mixed in its own salt, essentially (c and k), to one that just does more bit-mixing. The basic operation is now mix, which you should fold over all your values, after which you should finalizeHash (the Int argument for length is for convenience also, to help with distinguishing collections of different length). If you want to replace your last mix by mixLast, it's a little faster and removes redundancy with finalizeHash. If it takes you too long to detect what the last mix is, just mix.
Typically for a collection you'll want to mix in an extra value to indicate what type of collection it is.
So minimally you'd have
override def hashCode = finalizeHash(mixLast(id, path.##), 0)
and "typically" you'd
// Pick any string or number that suits you, put in companion object
val fooSeed = MurmurHash3.stringHash("classOf[Foo]")
// I guess "id" plus "path" is two things?
override def hashCode = finalizeHash(mixLast( mix(fooSeed,id), path.## ), 2)
Note that the length field is NOT there to give a high-quality hash that mixes in that number. All mixing of important hash values should be done with mix.
Looking at the source code of MurmurHash3 suggests something like this:
override def hashCode = {
import util.hashing.MurmurHash3._
val h = symmetricSeed // I'm not sure which seed to use here
val h1 = mix(h, id)
val h2 = mixLast(h1, path ##)
finalizeHash(h2, 2)
}
or, in (almost) one line:
import util.hashing.MurmurHash3._
override def hashCode = finalizeHash(mix(mix(symmetricSeed, id), path ##), 2)

covariant type T occurs in invariant position

I'm moving my first steps in Scala and I would like to make the following code works:
trait Gene[+T] {
val gene: Array[T]
}
The error that the compiler gives is: covariant type T occurs in invariant position in type => Array[T] of value gene
I know I could do something like:
trait Gene[+T] {
def gene[U >: T]: Array[U]
}
but this doesn't solve the problem because I need a value: pratically what I'm trying to say is "I don't care of the inside type, I know that genes will have a gene field that return its content". (the +T here is because I wanna do something like type Genome = Array[Gene[Any]] and then use it as a wrapper against the single gene classes so I can have a heterogeneous array type)
Is it possible to do it in Scala or I'm simply taking a wrong approach? Would it be better to use a different structure, like a Scala native covariant class?
Thanks in advance!
P.S.: I've also tried with class and abstract class instead than trait but always same results!
EDIT: with kind suggestion by Didier Dupont I came to this code:
package object ga {
class Gene[+T](val gene: Vector[T]){
def apply(idx: Int) = gene(idx)
override def toString() = gene.toString
}
implicit def toGene[T](a: Vector[T]) = new Gene(a)
type Genome = Array[Gene[Any]]
}
package test
import ga._
object Test {
def main(args: Array[String]) {
val g = Vector(1, 3, 4)
val g2 = Vector("a", "b")
val genome1: Genome = Array(g, g2)
println("Genome")
for(gene <- genome1) println(gene.gene)
}
}
So I now think I can put and retrieve data in different types and use them with all type checking goodies!
Array is invariant because you can write in it.
Suppose you do
val typed = new Gene[String]
val untyped : Gene[Any] = typed // covariance would allow that
untyped.gene(0) = new Date(...)
this would crash (the array in your instance is an Array[String] and will not accept a Date). Which is why the compiler prevents that.
From there, it depends very much on what you intend to do with Gene. You could use a covariant type instead of Array (you may consider Vector), but that will prevent user to mutate the content, if this was what you intended. You may also have an Array inside the class, provided it is decladed private [this] (which will make it quite hard to mutate the content too). If you want the client to be allowed to mutate the content of a Gene, it will probably not be possible to make Gene covariant.
The type of gene needs to be covariant in its type parameter. For that to be possible, you have to choose an immutable data structure, for example list. But you can use any data structure from the scala.collection.immutable package.