Generate arbitrary Function1 in ScalaCheck - scala

For my testing, I would like to generate arbitrary random functions of type String => Boolean.
Is it possible to do that using ScalaCheck?

Yes, just like you'd generate arbitrary values of other types:
import org.scalacheck._
// Int instead of Boolean to better see that it is a function
val arb = implicitly[Arbitrary[String => Int]].arbitrary.sample.get
println(arb("a"))
println(arb("a")) // same
println(arb("b"))
because there is an implicit Cogen[String] and an Arbitrary[Boolean]. Cogen is not documented in the User Guide, but it's equivalent to QuickCheck's CoArbitrary which is explained in https://kseo.github.io/posts/2016-12-14-how-quick-check-generate-random-functions.html and https://begriffs.com/posts/2017-01-14-design-use-quickcheck.html (under "CoArbitrary and Gen (a -> b)").
Is it possible then to generate arbitrary function from a random case class then? For example
case class File(name: Str, size:Long)
It should be enough to define a Cogen[File]. Either manually:
implicit val cogenFile: Cogen[File] = Cogen {
(seed: Seed, file: File) => Cogen.perturbPair(seed, (file.name, file.size))
}
Slightly more code, but generalizes to more than 2 fields:
implicit val cogenFile: Cogen[File] = Cogen { (seed: Seed, file: File) =>
val seed1 = Cogen.perturb(seed, file.name)
Cogen.perturb(seed1, file.size)
}
Or automatically using scalacheck-shapeless:
implicit val cogenFile: Cogen[File] = MkCogen[File].cogen

I don't think, you need to generate anything. If you want a random function, just create a random function:
val randomFunction: String => Boolean = _ => Random.nextBoolean
Or if you want the output to be stable (same result for multiple calls of the same function with the same parameter):
def newRandomFunction: String => Boolean =
mutable.Map.empty[String, Boolean].getOrElseUpdate(_, Random.nextBoolean)

Related

Empty parameter functions passing as arguments

Here are two functions to be used as UDFs:
def nextString(): String = Random.nextString(10)
def plusOne(a: Int): Int = a + 1
def udfString = udf(nextString)
def udfInt = udf(plusOne)
If I try to use withColumn, myUDF1 will work perfectly fine with udfInt, but throws: can't use Char in Schema for udfString
Probably cause it uses (Int) => (Int) for udfInt type, which is what udf expects
But treats nextString as type String, which obviously leads to an assumption, that I am trying to extract Chars when I apply the function.
It will work if I do something like:
def myUDF: () => String = udf(() => nextString)
Which seems ugly for something that simple. Is there any way to pass udfString as a function, not as String?
when you write the following code:
def udfString = udf(nextString)
it's the same as writing
val s = nextString
def udfString = udf(s)
this compiles because a string is also a function of Int => Char (see here)
you can tell the compiler that you are passing a function to the udf by:
def udfString = udf(nextString _)

Scala - evaluate function calls sequentially until one return

I have a few 'legacy' endpoints that can return the Data I'm looking for.
def mainCall(id): Data {
maybeMyDataInEndpoint1(id: UUID): DataA
maybeMyDataInEndpoint2(id: UUID): DataB
maybeMyDataInEndpoint3(id: UUID): DataC
}
null can be returned if no DataX found
return types for each method are different. There are a convert method that converting each DataX to unified Data.
The endpoints are not Scala-ish
What is the best Scala approach to evaluate those method calls sequentially until I have the value I need?
In pseudo I would do something like:
val myData = maybeMyDataInEndpoint1 getOrElse maybeMyDataInEndpoint2 getOrElse maybeMyDataInEndpoint3
I'd use an easier approach, though the other Answers use more elaborate language features.
Just use Option() to catch the null, chain with orElse. I'm assuming methods convertX(d:DataX):Data for explicit conversion. As it might not be found at all we return an Option
def mainCall(id: UUID): Option[Data] {
Option(maybeMyDataInEndpoint1(id)).map(convertA)
.orElse(Option(maybeMyDataInEndpoint2(id)).map(convertB))
.orElse(Option(maybeMyDataInEndpoint3(id)).map(convertC))
}
Maybe You can lift these methods as high order functions of Lists and collectFirst, like:
val fs = List(maybeMyDataInEndpoint1 _, maybeMyDataInEndpoint2 _, maybeMyDataInEndpoint3 _)
val f = (a: UUID) => fs.collectFirst {
case u if u(a) != null => u(a)
}
r(myUUID)
The best Scala approach IMHO is to do things in the most straightforward way.
To handle optional values (or nulls from Java land), use Option.
To sequentially evaluate a list of methods, fold over a Seq of functions.
To convert from one data type to another, use either (1.) implicit conversions or (2.) regular functions depending on the situation and your preference.
(Edit) Assuming implicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(implicit convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1),
legacyEndpoint(maybeMyDataInEndpoint2),
legacyEndpoint(maybeMyDataInEndpoint3)
)
def mainCall(id: UUID): Option[Data] =
legacyEndpoints.foldLeft(Option.empty[Data])(_ orElse _(id))
(Edit) Using explicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1)(fromDataA),
legacyEndpoint(maybeMyDataInEndpoint2)(fromDataB),
legacyEndpoint(maybeMyDataInEndpoint3)(fromDataC)
)
... // same as before
Here is one way to do it.
(1) You can make your convert methods implicit (or wrap them into implicit wrappers) for convenience.
(2) Then use Stream to build chain from method calls. You should give type inference a hint that you want your stream to contain Data elements (not DataX as returned by legacy methods) so that appropriate implicit convert will be applied to each result of a legacy method call.
(3) Since Stream is lazy and evaluates its tail "by name" only first method gets called so far. At this point you can apply lazy filter to skip null results.
(4) Now you can actually evaluate chain, getting first non-null result with headOption
(HACK) Unfortunately, scala type inference (at the time of writing, v2.12.4) is not powerful enough to allow using #:: stream methods, unless you guide it every step of the way. Using cons makes inference happy but is cumbersome. Also, building stream using vararg apply method of companion object is not an option too, since scala does not support "by-name" varargs yet. In my example below I use combination of stream and toLazyData methods. stream is a generic helper, builds streams from 0-arg functions. toLazyData is an implicit "by-name" conversion designed to interplay with implicit convert functions that convert from DataX to Data.
Here is the demo that demonstrates the idea with more detail:
object Demo {
case class Data(value: String)
class DataA
class DataB
class DataC
def maybeMyDataInEndpoint1(id: String): DataA = {
println("maybeMyDataInEndpoint1")
null
}
def maybeMyDataInEndpoint2(id: String): DataB = {
println("maybeMyDataInEndpoint2")
new DataB
}
def maybeMyDataInEndpoint3(id: String): DataC = {
println("maybeMyDataInEndpoint3")
new DataC
}
implicit def convert(data: DataA): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataB): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataC): Data = if (data == null) null else Data(data.toString)
implicit def toLazyData[T](value: => T)(implicit convert: T => Data): (() => Data) = () => convert(value)
def stream[T](xs: (() => T)*): Stream[T] = {
xs.toStream.map(_())
}
def main (args: Array[String]) {
val chain = stream(
maybeMyDataInEndpoint1("1"),
maybeMyDataInEndpoint2("2"),
maybeMyDataInEndpoint3("3")
)
val result = chain.filter(_ != null).headOption.getOrElse(Data("default"))
println(result)
}
}
This prints:
maybeMyDataInEndpoint1
maybeMyDataInEndpoint2
Data(Demo$DataB#16022d9d)
Here maybeMyDataInEndpoint1 returns null and maybeMyDataInEndpoint2 needs to be invoked, delivering DataB, maybeMyDataInEndpoint3 never gets invoked since we already have the result.
I think #g.krastev's answer is perfectly good for your use case and you should accept that. I'm just expending a bit on it to show how you can make the last step slightly better with cats.
First, the boilerplate:
import java.util.UUID
final case class DataA(i: Int)
final case class DataB(i: Int)
final case class DataC(i: Int)
type Data = Int
def convertA(a: DataA): Data = a.i
def convertB(b: DataB): Data = b.i
def convertC(c: DataC): Data = c.i
def maybeMyDataInEndpoint1(id: UUID): DataA = DataA(1)
def maybeMyDataInEndpoint2(id: UUID): DataB = DataB(2)
def maybeMyDataInEndpoint3(id: UUID): DataC = DataC(3)
This is basically what you have, in a way that you can copy/paste in the REPL and have compile.
Now, let's first declare a way to turn each of your endpoints into something safe and unified:
def makeSafe[A, B](evaluate: UUID ⇒ A, f: A ⇒ B): UUID ⇒ Option[B] =
id ⇒ Option(evaluate(id)).map(f)
With this in place, you can, for example, call the following to turn maybeMyDataInEndpoint1 into a UUID => Option[A]:
makeSafe(maybeMyDataInEndpoint1, convertA)
The idea is now to turn your endpoints into a list of UUID => Option[A] and fold over that list. Here's your list:
val endpoints = List(
makeSafe(maybeMyDataInEndpoint1, convertA),
makeSafe(maybeMyDataInEndpoint2, convertB),
makeSafe(maybeMyDataInEndpoint3, convertC)
)
You can now fold on it manually, which is what #g.krastev did:
def mainCall(id: UUID): Option[Data] =
endpoints.foldLeft(None: Option[Data])(_ orElse _(id))
If you're fine with a cats dependency, the notion of folding over a list of options is just a concrete use case of a common pattern (the interaction of Foldable and Monoid):
import cats._
import cats.implicits._
def mainCall(id: UUID): Option[Data] = endpoints.foldMap(_(id))
There are other ways to make this nicer still, but they might be overkill in this context - I'd probably declare a type class to turn any type into a Data, say, to give makeSafe a cleaner type signature.

How can I add a class to Scala Set

I would like to passing a parameter via String and construct a Set of class objects, like this:
def getTypes(value: String) : Set[Class[Base]] = {
var set = Set[Class[Base]]()
var input = value.split(",")
if(input.contains("XXX"))
set ++ Class[xxx]
if(input.contains("YYY"))
set ++ Class[yyy]
if(input.contains("ZZZ"))
set ++ Class[zzz]
set
}
Then looping the set and use class.newInstance() to create the actual object to do something later. The able code can compile, but when it run, it complaint that
Error:(32, 16) object java.lang.Class is not a value
set ++ Class[xxx]
Any clue about about that?
There are two problems in your snippet. One, as aravindKrishna pointed, is you're trying to get Class literals improperly.
The other one is, you're treating your immutable Set like you would a mutable one. Remember you can't mutate the object itself - every operation returns a new one - so you should either reassign the variable every time (and using vars is discouraged for functional code), or use recursion, or construct the entire set in one go.
Here's an example of how to construct the set in one go:
def getTypes(value: String): Set[Class[_ <: Base]] = {
val mapping = Map(
"XXX", classOf[xxx],
"YYY", classOf[yyy],
"ZZZ", classOf[zzz])
val input = value.split(",").toSet
mapping.collect {
case (k, v) if input contains k => v
}.toSet
}
Or, translating your original code snippet more literally,
def getTypes(value: String): Set[Class[_ <: Base]] = {
val input = value.split(",").toSet
Set[Class[_ <: Base]]() ++
input.find("XXX" ==).map(_ => classOf[xxx]) ++
input.find("YYY" ==).map(_ => classOf[yyy]) ++
input.find("ZZZ" ==).map(_ => classOf[zzz])
}

Define return value in Spark Scala UDF

Imagine the following code:
def myUdf(arg: Int) = udf((vector: MyData) => {
// complex logic that returns a Double
})
How can I define the return type for myUdf so that people looking at the code will know immediately that it returns a Double?
I see two ways to do it, either define a method first and then lift it to a function
def myMethod(vector:MyData) : Double = {
// complex logic that returns a Double
}
val myUdf = udf(myMethod _)
or define a function first with explicit type:
val myFunction: Function1[MyData,Double] = (vector:MyData) => {
// complex logic that returns a Double
}
val myUdf = udf(myFunction)
I normally use the firt approach for my UDFs
Spark functions define several udf methods that have the following modifier/type: static <RT,A1, ..., A10> UserDefinedFunction
You can specify the input/output data types in square brackets as follows:
def myUdf(arg: Int) = udf[Double, MyData]((vector: MyData) => {
// complex logic that returns a Double
})
You can pass a type parameter to udf but you need to seemingly counter-intuitively pass the return type first, followed by the input types like [ReturnType, ArgTypes...], at least as of Spark 2.3.x. Using the original example (which seems to be a curried function based on arg):
def myUdf(arg: Int) = udf[Double, Seq[Int]]((vector: Seq[Int]) => {
13.37 // whatever
})
There is nothing special about UDF with lambda functions, they behave just like scala lambda function (see Specifying the lambda return type in Scala) so you could do:
def myUdf(arg: Int) = udf(((vector: MyData) => {
// complex logic that returns a Double
}): (MyData => Double))
or instead explicitly define your function:
def myFuncWithArg(arg: Int) {
def myFunc(vector: MyData): Double = {
// complex logic that returns a Double. Use arg here
}
myFunc _
}
def myUdf(arg: Int) = udf(myFuncWithArg(arg))

Creating a modified `filter` function

Consider the filter function.
I am interested in the following modifications of the filter function, if possible:
We know for a collection we can do:
case class People(val age: Int)
val a: List[People] = ...
a.filter(i => i.age ==10 )
Or more simply:
a.filter(_.age==10 )
Any simple way I can define another modified filter that works just like the following (no underline)
a.myfilter1( age==10 )
the filter function does not work when its argument is no Boolean. Suppose I want to create a modified filter that when a non-Boolean is given, it translates to equality automatically. Here is an example:
val anotherPerson: People = ...
a.myFilter2(anotherPerson)
I want the above myFilter2 to get translated as following:
a.filter(_.equals(anotherPerson))
Using implicit def:
case class MyFilterable[T](seq: Seq[T]) {
def suchAFilter(v: Any): Seq[T] = {
seq.filter(v.equals)
}
}
implicit def strongFilter[T](seq: Seq[T]): MyFilterable[T] = {
MyFilterable(seq)
}
println(List(1,2,3).suchAFilter(2))