Here are two functions to be used as UDFs:
def nextString(): String = Random.nextString(10)
def plusOne(a: Int): Int = a + 1
def udfString = udf(nextString)
def udfInt = udf(plusOne)
If I try to use withColumn, myUDF1 will work perfectly fine with udfInt, but throws: can't use Char in Schema for udfString
Probably cause it uses (Int) => (Int) for udfInt type, which is what udf expects
But treats nextString as type String, which obviously leads to an assumption, that I am trying to extract Chars when I apply the function.
It will work if I do something like:
def myUDF: () => String = udf(() => nextString)
Which seems ugly for something that simple. Is there any way to pass udfString as a function, not as String?
when you write the following code:
def udfString = udf(nextString)
it's the same as writing
val s = nextString
def udfString = udf(s)
this compiles because a string is also a function of Int => Char (see here)
you can tell the compiler that you are passing a function to the udf by:
def udfString = udf(nextString _)
Related
For my testing, I would like to generate arbitrary random functions of type String => Boolean.
Is it possible to do that using ScalaCheck?
Yes, just like you'd generate arbitrary values of other types:
import org.scalacheck._
// Int instead of Boolean to better see that it is a function
val arb = implicitly[Arbitrary[String => Int]].arbitrary.sample.get
println(arb("a"))
println(arb("a")) // same
println(arb("b"))
because there is an implicit Cogen[String] and an Arbitrary[Boolean]. Cogen is not documented in the User Guide, but it's equivalent to QuickCheck's CoArbitrary which is explained in https://kseo.github.io/posts/2016-12-14-how-quick-check-generate-random-functions.html and https://begriffs.com/posts/2017-01-14-design-use-quickcheck.html (under "CoArbitrary and Gen (a -> b)").
Is it possible then to generate arbitrary function from a random case class then? For example
case class File(name: Str, size:Long)
It should be enough to define a Cogen[File]. Either manually:
implicit val cogenFile: Cogen[File] = Cogen {
(seed: Seed, file: File) => Cogen.perturbPair(seed, (file.name, file.size))
}
Slightly more code, but generalizes to more than 2 fields:
implicit val cogenFile: Cogen[File] = Cogen { (seed: Seed, file: File) =>
val seed1 = Cogen.perturb(seed, file.name)
Cogen.perturb(seed1, file.size)
}
Or automatically using scalacheck-shapeless:
implicit val cogenFile: Cogen[File] = MkCogen[File].cogen
I don't think, you need to generate anything. If you want a random function, just create a random function:
val randomFunction: String => Boolean = _ => Random.nextBoolean
Or if you want the output to be stable (same result for multiple calls of the same function with the same parameter):
def newRandomFunction: String => Boolean =
mutable.Map.empty[String, Boolean].getOrElseUpdate(_, Random.nextBoolean)
Imagine the following code:
def myUdf(arg: Int) = udf((vector: MyData) => {
// complex logic that returns a Double
})
How can I define the return type for myUdf so that people looking at the code will know immediately that it returns a Double?
I see two ways to do it, either define a method first and then lift it to a function
def myMethod(vector:MyData) : Double = {
// complex logic that returns a Double
}
val myUdf = udf(myMethod _)
or define a function first with explicit type:
val myFunction: Function1[MyData,Double] = (vector:MyData) => {
// complex logic that returns a Double
}
val myUdf = udf(myFunction)
I normally use the firt approach for my UDFs
Spark functions define several udf methods that have the following modifier/type: static <RT,A1, ..., A10> UserDefinedFunction
You can specify the input/output data types in square brackets as follows:
def myUdf(arg: Int) = udf[Double, MyData]((vector: MyData) => {
// complex logic that returns a Double
})
You can pass a type parameter to udf but you need to seemingly counter-intuitively pass the return type first, followed by the input types like [ReturnType, ArgTypes...], at least as of Spark 2.3.x. Using the original example (which seems to be a curried function based on arg):
def myUdf(arg: Int) = udf[Double, Seq[Int]]((vector: Seq[Int]) => {
13.37 // whatever
})
There is nothing special about UDF with lambda functions, they behave just like scala lambda function (see Specifying the lambda return type in Scala) so you could do:
def myUdf(arg: Int) = udf(((vector: MyData) => {
// complex logic that returns a Double
}): (MyData => Double))
or instead explicitly define your function:
def myFuncWithArg(arg: Int) {
def myFunc(vector: MyData): Double = {
// complex logic that returns a Double. Use arg here
}
myFunc _
}
def myUdf(arg: Int) = udf(myFuncWithArg(arg))
I have the below code and in my findOrCreate() function, I'm getting an error saying Type mismatch found Unit, required Future[Customer]. The customerByPhone() function that is being called inside findOrCreate() also contains calls that are expecting Futures, which is why I'm using a fatmap. I don't know why the result of the flatmap is resulting in Unit. What am I doing wrong?
override def findOrCreate(phoneNumber: String, creationReason: String): Future[AvroCustomer] = {
//query for customer in db
val avroCustomer: Future[AvroCustomer] = customerByPhone(phoneNumber).flatMap(_ => createUserAndEvent(phoneNumber, creationReason, 1.0))
}
override def customerByPhone(phoneNumber: String): Future[AvroCustomer] = {
val query = Schema.Customers.byPhoneNumber(phoneNumber)
val dbAction: DBIO[Option[Schema.Customer]] = query.result.headOption
db.run(dbAction)
.map(_.map(AvroConverters.toAvroCustomer).orNull)
}
private def createUserAndEvent(phoneNumber: String, creationReason: String, version: Double): Future[AvroCustomer] = {
val query = Schema.Customers.byPhoneNumber(phoneNumber)
val dbAction: DBIO[Option[Schema.Customer]] = query.result.headOption
val data: JsValue = Json.obj(
"phone_number" -> phoneNumber,
"agent_number" -> "placeholder for agent number",
"creation_reason" -> creationReason
)
//empty for now
val metadata: JsValue = Json.obj()
//creates user
val avroCustomer: Future[AvroCustomer] = db.run(dbAction).map(_.map(AvroConverters.toAvroCustomer).orNull)
avroCustomer.onComplete({
case Success(null) => {
}
//creates event
case Success(customer) => {
val uuid: UUID = UUID.fromString(customer.id)
//create event
val event: Future[CustomerEvent] = db.run(Schema.CustomerEvents.create(
uuid,
"customer_creation",
version,
data,
metadata)
).map(AvroConverters.toAvroEvent)
}
case Failure(exception) => {
}
})
Future.successful(new AvroCustomer)
}
While Reactormonk basically answered this in the comments, I'm going to actually write an answer with some details. His comment that a val statement produces Unit is fundamentally correct, but I'm hoping some elaboration will make things more clear.
The key element that I see is that val is a declaration. Declarations in Scala are statements that don't produce useful values. Because of the functional nature of Scala, they do produce a value, but it is Unit and as there is only one instance of Unit, it doesn't carry any meaning.
The reason programmers new to Scala are often tempted to do something like this is that they don't think of blocks of code as statements and are often used to using return in other languages. So let's consider a simplified function here.
def foo(i: Int): Int = {
42 * i
}
I include a code block as I think that is key to this error, though it really isn't needed here. The value of a code block is simply the value of the last expression in the code block. This is why we don't have to specify return, but most programmers who are used to return are a bit uncomfortable with that naked expression at the end of a block. That is why it is tempting to throw in the val declaration.
def foo(i: Int): Int = {
val result = 42 * i // Error: type mismatch.
}
Of course, as was mentioned, but val results in Unit making this incorrect. You could add an extra line with just result, and that will compile, but it is overly verbose and non-idiomatic.
Scala supports the use of return to leave a method/function and give back a particular value, though the us is generally frowned upon. As such, the following code works.
def foo(i: Int): Int = {
return 42 * i
}
While you shouldn't use return in Scala code, I feel that imagining it being there can help with understanding what is wrong here. If you stick a return in front of the val you get code like the following.
def foo(i: Int): Int = {
return val result = 42 * i // Error: type mismatch.
}
At least to me, this code is clearly incorrect. The val is a declaration and as such it just doesn't work with a return. It takes some time to get used to the functional style of blocks as expressions. Until you get to that point, it might help just to act like there is a return at the end of methods without actually putting one there.
It is worth noting that, in the comments, jwvh claims that a declaration like this in C would return a value. That is false. Assignments in most C-family languages give back the value that was assigned, so a = 5 returns the value 5, but declarations don't, so int a = 5; does not give back anything and can't be used as an expression.
I have a use-case where I need to define a new enum type LongShort but I need it in a way to also carry the sign so it can be directly used in mathematical expressions e.g.
object LongShortType extends Enumeration {
type Type = Value
val Long = Value(+1)
val Short = Value(-1)
}
I'd then like to use it like this:
val longShort = LongShortType.Short
val numberOfContracts: Int = 10
val vanillaOptionNotional: Double = longShort*numberOfContracts
but this leads to compiler error cannot resolve symbol * ... is there a way to extract the value of the enum? Or am I not understanding how enum types work?
The type of LongShortType.Short isn't Int, it's Value. You can either extract the underlying id of the value:
val longShort = LongShortType.Short.id
Which is a little ugly. Or you could not use an enum type at all:
object LongShortType {
val Long = 1
val Short = -1
}
And then your equation would work as is.
OK I worked out a solution to accomplish what I wanted without any compromisse and by that I mean that this solution has all the advantages of using Scala enum e.g. the withName and still allows me to define extra features on it:
object LongShortType extends Enumeration {
type Type = LongShortVal
val Long = Value("Long", +1)
val Short = Value("Short", -1)
case class LongShortVal(name: String, sign: Int) extends Val(nextId, name)
protected final def Value(name: String, sign: Int) = new LongShortVal(name, sign)
}
and now can do:
val longShort = LongShortType.Short
val numberOfContracts: Int = 10
val vanillaOptionNotional: Double = longShort.sign*numberOfContracts
and can also do:
val longShort = LongShort.withName("Long") // returns LongShort.Long
I have to define a second order function that takes as a parameter a function.
In my application, the input function may have any input type, but the output type has to be a specified one (suppose Int, it does not matter).
I define the second order function has:
def sof(f : (Any => Int) ) = {...}
Now, if I have a function f : Int => Int, and I call:
sof(f)
I get:
found : Int => Int
required: Any => Int
I guess I am misunderstanding the meaning of the Any type.
How can I make it work?
The parameters of functions in Scala are contravariant. That means that Int => Int is not a subtype of Any => Int, but vice-versa. Imagine the following: You pass a String to the Any => Int function (that is actually implemented by a Int => Int function). How would the Int => Int handle the String argument?
You shouldn't use Any there, but a type parameter such as:
def sof[A](f: A => Int) = {...}
However I don't think you can do much with that method. Probably you would want something like this:
def sof[A](a: A)(f: A => Int) = f(a)
// usage example
val x = sof("hello")(_.size * 2) // the result is 10
// you can also partially apply it to obtain other functions
val sizeDoubler: String => Int = sof(_)(_.size * 2)
val helloDoubleSize = sizeDoubler("hello") // the result is 10
This way you can pass any type to sof plus you'll have the compiler by your side to signal any strange behaviour. Using Any you lose this ability.
Final Note: In case the syntax I used to pass two parameters (the A value and the A => Int function) to a method looks strange to you that's called currying. If you Google it you'll find many good articles about it.