Partially applied isBefore function in Scala gives error - scala

I am trying to merge two sequences of dates in Scala such that the merged sequence has sorted elements. I am using a partial implementation of isBefore as follows:
val seq1 = Seq(LocalDate.of(2014, 4, 5), LocalDate.of(2013, 6 ,7), LocalDate.of(2014, 3, 1))
val seq2 = Seq(LocalDate.of(2012, 2, 2), LocalDate.of(2015, 2, 1))
var arr = (seq1 ++ seq2).sortWith(_.isBefore(_) = 1)
println(arr)
But it shows compilation error for the isBefore function:
Multiple markers at this line
- missing arguments for method isBefore in class LocalDate; follow this method with `_' if you want to
treat it as a partially applied function
- missing arguments for method isBefore in class LocalDate; follow this method with `_' if you want to
treat it as a partially applied function
I am relatively new to Scala. What seems to be the problem?

At first there is no such term as partial implementation, at least i didn't heard of such, i guess you've meant partial application, but there is no partial application in this case, partial application is about curried functions, what complier is trying to tell you in your error message. Example of this:
def test(a: String)(f: String => String) = f(a)
val onString = test("hello world") _
onString(_.capitalize)
test: (a: String)(f: String => String)String
onString: (String => String) => String = <function1>
res8: String = Hello world
This is a partial application, you take a curried function, which returns another function and passes it one argument (partially applies it) and latter passes another argument.
As for you sorting problem, that should work. I don't which which library you are using but with Date time it's similar. I think the problem is in assignment (_.isBefore(_) = 1), this is illegal. Should be like this:
val seq1 = Seq(LocalDate.parse("2014-04-05"), LocalDate.parse("2013-06-07"), LocalDate.parse("2014-03-01"))
val seq2 = Seq(LocalDate.parse("2012-02-02"), LocalDate.parse("2015-02-01"))
var arr = (seq1 ++ seq2).sortWith(_.isBefore(_))
arr: Seq[org.joda.time.LocalDate] = List(2012-02-02, 2013-06-07, 2014-03-01, 2014-04-05, 2015-02-01)

Related

Why I can't apply just an underscore to first parameter in Scala?

I don't know why pattern d is bad in this list below.
Why need expicit type declaration?
def adder1(m:Int,n:Int) = m + n
val a = adder1(2,_) //OK
val b = adder1(_,2) //OK
def adder2(m:Int)(n:Int) = m + n
val c = adder2(2)(_) //OK
val d = adder2(_)(2) //NG:missing parameter type
val e = adder2(_:Int)(2) //OK
I just want to know the reason pattern d needs parameter type.
Very welcome just showing citation language spec.
So I believe this comes from the concept of Partial Application.
Intuitively, partial function application says "if you fix the first arguments of the function, you get a function of the remaining arguments"
...
Scala implements optional partial application with placeholder, e.g. def add(x: Int, y: Int) = {x+y}; add(1, _: Int) returns an incrementing function. Scala also support multiple parameter lists as currying, e.g. def add(x: Int)(y: Int) = {x+y}; add(1) _.
Lets take a look at adder2
From the REPL:
scala> def adder2(m:Int)(n:Int) = m + n
def adder2(m: Int)(n: Int): Int
Lets get a value to represent this:
scala> val adder2Value = adder2
^
error: missing argument list for method adder2
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `adder2 _` or `adder2(_)(_)` instead of `adder2`.
Ok, let's try:
val adder2Value = adder2 _
val adder2Value: Int => (Int => Int) = $Lambda$1382/0x0000000840703840#4b66a923
Ahha!
In English: "A function that takes an Int and returns a function that takes an Int and returns an Int"
How can we bind the second argument using this signature? How can we access the inner function unless we first have gone through the outer one?
As far as I know, this is not possible to do using this signature, unless you explicitly define the type of your first argument.
(But what about adder2(_)(_)?)
scala> adder2(_)(_)
^
error: missing parameter type for expanded function ((<x$1: error>, x$2) => adder2(x$1)(x$2))
^
error: missing parameter type for expanded function ((<x$1: error>, <x$2: error>) => adder2(x$1)(x$2))
(Maybe this hints at our solution?)
Notice what happens if we explicitly define both arguments:
val adder2Value2= adder2Value (_:Int) (_:Int)
val adder2Value2: (Int, Int) => Int = $Lambda$1394/0x000000084070d840#32f7d983
This is much more manageable, we can now fix either argument, and get a simplified partial function:
scala> val adder2FirstArg = adder2Value (_:Int) (10)
val adder2FirstArg: Int => Int = $Lambda$1395/0x000000084070d040#47f5ddf4
scala> val adder2SecondArg = adder2Value (5) (_:Int)
val adder2SecondArg: Int => Int = $Lambda$1396/0x000000084070c840#21ed7ce
So what's really going on here?
When you bind an argument to a value, you have explicitly expressed the type (maybe it's inferred, but it's definitely that type, in this case, Ints). It's sugar so we don't need to write it. But under the hood, these are composed functions, and how they are composed is very important. To be able to match and simplify the function signature, the compiler requires us to provide this information in an outside-in manner. Otherwise, we need to give it some help to get there.
EDIT:
I think that this question serves as more of a Scala language spec. puzzle exercise, however. I can't think of any good reason, from a design perspective, for which you would need to implement a curried function in such a way that you cannot order the parameters such that the last parameters are the ones being inferred.

Automatic type recognition in Scala

I am learning Scala right now. I see that specifying type while assigning to new val is not necessary. But then consider the following code:
object MyObject {
def firstResponse(r: Array[String]): String = r(0)
def mostFrequent(r: Array[String]): String = {
(r groupBy identity mapValues (_.length) maxBy(_._2))._1
}
def mostFrequent(r: Array[String], among: Int): String = { mostFrequent(r take among) }
// throws compile error
val heuristics = Array(
firstResponse(_), mostFrequent(_, 3), mostFrequent(_, 4), mostFrequent(_, 5)
)
}
If I change the last line and specify the type explicitly, then the error is gone
val heuristics: Array[Array[String] => String] = Array(
firstResponse, mostFrequent(_, 3), mostFrequent(_, 4), mostFrequent(_, 5)
)
What's wrong here?
Edit: As #mdm correctly pointed out,
//This works
val heuristics = Array(firstResponse(_), firstResponse(_))
//This does not work
val heuristics = Array(mostFrequent(_,1), mostFrequent(_,2))
Open question is, why Scala can determine the type of firstResponse(_) correctly while it has difficulty to do the same for mostFrequent(_,1).
The compiler complains with something similar to this:
Error:(28, 29) missing parameter type for expanded function ((x$3: ) => mostFrequent(x$3, 3))
As you probably already figured out, that happens because the compiler cannot figure out automatically (infer) the type of the input parameter of those functions, when you use _. More precisely, it can't infer the type of mostFrequent(_, 3).
So, if you give the compiler a nudge, either by val heuristics: Array[Array[String] => String] = or by the following:
val heuristics = Array(
(a : Array[String]) => firstResponse(a),
(a : Array[String]) => mostFrequent(a, 3),
(a : Array[String]) => mostFrequent(a, 4),
(a : Array[String]) => mostFrequent(a, 5)
)
Things will work as expected.
Looking at posts about _ uses like this or this, you will see that it can mean very many things, depending on the context. In this case I suspect the confusion comes from the fact that you are using _ to transform a call to a method with more than one parameter to an anonymous function.
Notice that both of the following will work fine:
val heuristics = Array(
firstResponse(_),
firstResponse(_),
firstResponse(_)
)
val heuristics2 = Array(
firstResponse(_),
mostFrequent(_: Array[String], 3)
)
As to the specific reason why a method with more than one argument cannot be transformed into an anonymous function, while one with one argument can, I will delegate to someone with more in-depth knowledge of the compiler's inference mechanics.
Sometimes when you use underscores as placeholders for parameters,
the compiler might not have enough information to infer missing parameter
types. Therefore, you need to explicitly provide type information. Placeholder syntax act as a “blank” in the expression that needs to be “filled in" and you can fill any value to it. Therefore, compiler will have no information about the type of this placeholder.
val foo = _ + _
//will fail - error: missing parameter type for expanded function ((x$1: <error>, x$2) => x$1.$plus(x$2))
The above expression will fail, because compiler will unable to find type of value that fill the placeholder. Therefore, there need to be some way for compiler to know the type. The one way is to provide type information of variable/method explicitly.
val foo: (String, String) => String = _ + _
The above expression will successfully compiled. Because, compiler resolve type of the parameter from type of variable foo (1st and 2nd placeholder are both as String).
In certain case, compiler can resolve the type from value:
List(1,2,3).foreach(println(_))
In above case, List(1,2,3) is a List of type Int, hence compiler will know type information of placeholder in println(_) as Int which is resolved from value of List.
In addition, you can also provide type of value explicitly in order to let compiler know about type.
val foo = (_:String) + (_:String) //will return function (String, String) => String
In certain case, if your method have only one parameter, then you don't need to provide explicit type parameter otherwise you need to provide type for placeholder syntax as below:
scala> def firstResponse(r: Array[String]): String = r(0)
firstResponse: (r: Array[String])String
scala> val foo = firstResponse(_) //no need to provide type information
foo: Array[String] => String = <function1>
scala> def firstResponse2(r: Array[String], index:Int): String = r(index)
firstResponse2: (r: Array[String], index: Int)String
scala> val foo = firstResponse2(_, 3) //will fail, need to provide type information.
<console>:12: error: missing parameter type for expanded function ((x$1) => firstResponse2(x$1, 3))
val foo = firstResponse2(_, 3)
^
scala> val foo = firstResponse2((_:Array[String]), 3)
foo: Array[String] => String = <function1>
Now coming to your case:
val heuristics = Array(
firstResponse(_), mostFrequent(_, 3), mostFrequent(_, 4), mostFrequent(_, 5)
)
Here, compiler will have no idea of what is the type because:
val heuristics have no type
Type for placeholder syntax is not explicitly provided.
You have solve the issue by providing type Array[Array[String] => String] to heuristics val as in case 1, and hence compiler compiles it fine.
For case 2, you can modify your code as below:
val heuristics = Array(
firstResponse(_), mostFrequent(_:Array[String], 3), mostFrequent(_:Array[String], 4), mostFrequent(_:Array[String], 5)
)
The weird thing is that val foo = firstResponse(_) works, because the specification directly forbids it:
If there is no expected type for the function literal, all formal parameter types Ti must be specified explicitly, and the expected type of e is undefined.
I thought that it could be treated as equivalent to eta-expansion firstResponse _ which worked without expected type because firstResponse isn't overloaded, but it's defined to be the other way around: firstResponse _ means the same as x => firstResponse(x), which is not supposed to work according to the above quote.
So strictly speaking, it appears to be a bug and you should write firstResponse(_: Array[String]) as well.
Though in this case, to avoid repetition I'd provide the expected type as
val heuristics = Array[Array[String] => String](
firstResponse(_), mostFrequent(_, 3), mostFrequent(_, 4), mostFrequent(_, 5)
)

Scala: a template for function to accept only a certain arity and a certain output?

I have a class, where all of its functions have the same arity and same type of output. (Why? Each function is a separate processor that is applied to a Spark DataFrame and yields another DataFrame).
So, the class looks like this:
class Processors {
def p1(df: DataFrame): DataFrame {...}
def p2(df: DataFrame): DataFrame {...}
def p3(df: DataFrame): DataFrame {...}
...
}
I then apply all the methods of a given DataFrame by mapping over Processors.getClass.getMethod, which allows me to add more processors without changing anything else in the code.
What I'd like to do is define a template to the methods under Processors which will restrict all of them to accept only one DataFrame and return a DataFrame. Is there a way to do this?
Implementing a restriction on what kind of functions can be added to a "list" is possible by using an appropriate container class instead of a generic class to hold the methods that are restricted. The container of restricted methods can then be part of some new class or object or part of the main program.
What you lose below by using containers (e.g. a Map with string keys and restricted values) to hold specific kinds of functions is compile-time checking of the names of the methods. e.g. calling triple vs trilpe
The restriction of a function to take a type T and return that same type T can be defined as a type F[T] using Function1 from the scala standard library. Function1[A,B] allows any single-parameter function with input type A and output type B, but we want these input/output types to be the same, so:
type F[T] = Function1[T,T]
For a container, I will demonstrate scala.collection.mutable.ListMap[String,F[T]] assuming the following requirements:
string names reference the functions (doThis, doThat, instead of 1, 2, 3...)
functions can be added to the list later (mutable)
though you could choose some other mutable or immutable collection class (e.g. Vector[F[T]] if you only want to number the methods) and still benefit from the restriction of what kind of functions future developers can include into the container.
An abstract type can be defined as:
type TaskMap[T] = ListMap[String, F[T]]
For your specific application you would then instantiate this as:
val Processors:TaskMap[Dataframe] = ListMap(
"p1" -> ((df: DataFrame) => {...code for p1 goes here...}),
"p2" -> ((df: DataFrame) => {...code for p2 goes here...}),
"p3" -> ((df: DataFrame) => {...code for p3 goes here...})
)
and then to call one of these functions you use
Processors("p2")(someDF)
For simplicity of demonstration, let's forget about Dataframes for a moment and consider whether this scheme works with integers.
Consider the short program below. The collection "myTasks" can only contain functions from Int to Int. All of the lines below have been tested in the scala interpreter, v2.11.6, so you can follow along line by line.
import scala.collection.mutable.ListMap
type F[T] = Function1[T,T]
type TaskMap[T] = ListMap[String, F[T]]
val myTasks: TaskMap[Int] = ListMap(
"negate" -> ((x:Int)=>(-x)),
"triple" -> ((x:Int)=>(3*x))
)
we can add a new function to the container that adds 7 and name it "add7"
myTasks += ( "add7" -> ((x:Int)=>(x+7)) )
and the scala interpreter responds with:
res0: myTasks.type = Map(add7 -> <function1>, negate -> <function1>, triple -> <function1>)
but we can't add a function named "half" because it would return a Float, and a Float is not an Int and should trigger a type error
myTasks += ( "half" -> ((x:Int)=>(0.5*x)) )
Here we get this error message:
scala> myTasks += ( "half" -> ((x:Int)=>(0.5*x)) )
<console>:12: error: type mismatch;
found : Double
required: Int
myTasks += ( "half" -> ((x:Int)=>(0.5*x)) )
^
In a compiled application, this would be found at compile time.
How to call the functions stored this way is a bit more verbose for single calls, but can be very convenient.
Suppose we want to call "triple" on 10.
We can't write
triple(10)
<console>:9: error: not found: value triple
Instead it is
myTasks("triple")(10)
res4: Int = 30
Where this notation becomes more useful is if you have a list of tasks to perform but only want to allow tasks listed in myTasks.
Suppose we want to run all the tasks on the input data "10"
myTasks mapValues { _ apply 10 }
res9: scala.collection.Map[String,Int] =
Map(add7 -> 17, negate -> -10, triple -> 30)
Suppose we want to triple, then add7, then negate
If each result is desired separately, as above, that becomes:
List("triple","add7","negate") map myTasks.apply map { _ apply 10 }
res11: List[Int] = List(30, 17, -10)
But "triple, then add 7, then negate" could also be describing a series of steps to do 10, i.e. we want -((3*10)+7)" and scala can do that too
val myProgram = List("triple","add7","negate")
myProgram map myTasks.apply reduceLeft { _ andThen _ } apply 10
res12: Int = -37
opening the door to writing an interpreter for your own customizable set of tasks because we can also write
val magic = myProgram map myTasks.apply reduceLeft { _ andThen _ }
and magic is then a function from int to int that can take aribtrary ints or otherwise do work as a function should.
scala> magic(1)
res14: Int = -10
scala> magic(2)
res15: Int = -13
scala> magic(3)
res16: Int = -16
scala> List(10,20,30) map magic
res17: List[Int] = List(-37, -67, -97)
Is this what you mean?
class Processors {
type Template = DataFrame => DataFrame
val p1: Template = ...
val p2: Template = ...
val p3: Template = ...
def applyAll(df: DataFrame): DataFrame =
p1(p2(p3(df)))
}

Call a function with arguments from a list

Is there a way to call a function with arguments from a list? The equivalent in Python is sum(*args).
// Scala
def sum(x: Int, y: Int) = x + y
val args = List(1, 4)
sum.???(args) // equivalent to sum(1, 4)
sum(args: _*) wouldn't work here.
Don't offer change the declaration of the function anyhow. I'm acquainted with a function with repeated parameters def sum(args: Int*).
Well, you can write
sum(args(0), args(1))
But I assume you want this to work for any list length? Then you would go for fold or reduce:
args.reduce(sum) // args must be non empty!
(0 /: args)(sum) // aka args.foldLeft(0)(sum)
These methods assume a pair-wise reduction of the list. For example, foldLeft[B](init: B)(fun: (B, A) => B): B reduces a list of elements of type A to a single element of type B. In this example, A = B = Int. It starts with the initial value init. Since you want to sum, the sum of an empty list would be zero. It then calls the function with the current accumulator (the running sum) and each successive element of the list.
So it's like
var result = 0
result = sum(result, 1)
result = sum(result, 4)
...
The reduce method assumes that the list is non-empty and requires that the element type doesn't change (the function must map from two Ints to an Int).
I wouldn't recommend it for most uses since it's a bit complicated and hard to read, bypasses compile-time checks, etc., but if you know what you're doing and need to do this, you can use reflection. This should work with any arbitary parameter types. For example, here's how you might call a constructor with arguments from a list:
import scala.reflect.runtime.universe
class MyClass(
val field1: String,
val field2: Int,
val field3: Double)
// Get our runtime mirror
val runtimeMirror = universe.runtimeMirror(getClass.getClassLoader)
// Get the MyClass class symbol
val classSymbol = universe.typeOf[MyClass].typeSymbol.asClass
// Get a class mirror for the MyClass class
val myClassMirror = runtimeMirror.reflectClass(classSymbol)
// Get a MyClass constructor representation
val myClassCtor = universe.typeOf[MyClass].decl(universe.termNames.CONSTRUCTOR).asMethod
// Get an invokable version of the constructor
val myClassInvokableCtor = myClassMirror.reflectConstructor(myClassCtor)
val myArgs: List[Any] = List("one", 2, 3.0)
val myInstance = myClassInvokableCtor(myArgs: _*).asInstanceOf[MyClass]

Scala's lazy arguments: How do they work?

In the file Parsers.scala (Scala 2.9.1) from the parser combinators library I seem to have come across a lesser known Scala feature called "lazy arguments". Here's an example:
def ~ [U](q: => Parser[U]): Parser[~[T, U]] = { lazy val p = q // lazy argument
(for(a <- this; b <- p) yield new ~(a,b)).named("~")
}
Apparently, there's something going on here with the assignment of the call-by-name argument q to the lazy val p.
So far I have not been able to work out what this does and why it's useful. Can anyone help?
Call-by-name arguments are called every time you ask for them. Lazy vals are called the first time and then the value is stored. If you ask for it again, you'll get the stored value.
Thus, a pattern like
def foo(x: => Expensive) = {
lazy val cache = x
/* do lots of stuff with cache */
}
is the ultimate put-off-work-as-long-as-possible-and-only-do-it-once pattern. If your code path never takes you to need x at all, then it will never get evaluated. If you need it multiple times, it'll only be evaluated once and stored for future use. So you do the expensive call either zero (if possible) or one (if not) times, guaranteed.
The wikipedia article for Scala even answers what the lazy keyword does:
Using the keyword lazy defers the initialization of a value until this value is used.
Additionally, what you have in this code sample with q : => Parser[U] is a call-by-name parameter. A parameter declared this way remains unevaluated, until you explicitly evaluate it somewhere in your method.
Here is an example from the scala REPL on how the call-by-name parameters work:
scala> def f(p: => Int, eval : Boolean) = if (eval) println(p)
f: (p: => Int, eval: Boolean)Unit
scala> f(3, true)
3
scala> f(3/0, false)
scala> f(3/0, true)
java.lang.ArithmeticException: / by zero
at $anonfun$1.apply$mcI$sp(<console>:9)
...
As you can see, the 3/0 does not get evaluated at all in the second call. Combining the lazy value with a call-by-name parameter like above results in the following meaning: the parameter q is not evaluated immediately when calling the method. Instead it is assigned to the lazy value p, which is also not evaluated immediately. Only lateron, when p is used this leads to the evaluation of q. But, as p is a val the parameter q will only be evaluated once and the result is stored in p for later reuse in the loop.
You can easily see in the repl, that the multiple evaluation can happen otherwise:
scala> def g(p: => Int) = println(p + p)
g: (p: => Int)Unit
scala> def calc = { println("evaluating") ; 10 }
calc: Int
scala> g(calc)
evaluating
evaluating
20