I'm trying to dynamically create a chain of functions to perform on a numeric value. The chain is created at runtime from text instructions.
The trick is that the functions vary in what types they produce. Some functions produce a Double, some produce a Long.
Update
The core issue is that I have a massive amount of data to process, but different values require different processing. In addition to the data I have specifications on how to extract and manipulate values to their final form, such as applying a polynomial, using a lookup table, changing the binary format (like 2s Compliment), etc. These specs are in a file of some sort (I'm creating the file form a database, but that's not important to the conversation), and I can apply these specs to multiple data files.
so with functions (these are just exmaples; there are tons of them):
def Multiply(input: Long, factor:Double):Double = input*factor
def Poly(input:Double, co:Array[Double]):Double = // do some polynomial math
I can manually create a chain like this:
val poly = (x: Double) => EUSteps.Poly(x,Array[Double](1,2))
val mult = (x: Long) => EUSteps.Multiply(x, 1.5)
val chain = mult andThen poly
And if I call chain(1) I get 4
Now I want to be able to parse a string like "MULT(1.5);POLY(1,2)" and get that same chain. The idea is that I can define the chain however I want. Maybe its "MULT(1.5);MULT(2);POLY(1,2,3)." for example. So I can make the functions generic, like this:
def Multiply[A](value: A, factor:Double)(implicit num: Numeric[A]) = num.toDouble(value)*factor
def Poly[A](value:A, co:Array[Double])(implicit num: Numeric[A]) = { // do some poly math
Parsing the string isn't hard as it's very simple.
How can I build the chain dynamically?
If it helps, the input is always going to be Long for the first step in the chain. The result could be Long or Double, and I'm OK with it if I have to do two versions based on the end result, so one that goes Long to Long, the other that goes Long to Double.
What I've tried
If I define my functions as having the same signature, like this:
def Multiply(value: Double, factor:Double) = value*factor
def Poly(value:Double, co:Array[Double]) = {
I can do it as part of a map operation:
def ParseList(instruction:String) = {
var instructions = instruction.split(';')
instructions.map(inst => {
val instParts = inst.split(Array(',','(',')'))
val instruction = instParts(0).toUpperCase()
val instArgs = instParts.drop(1).map(arg => arg.toDouble)
instruction match {
case "POLY" => (x: Double) => EUSteps.Poly(x,instArgs)
case "MULTI" => (x: Double) => Multiply(x,instArgs(0))
}
}).reduceLeft((a,b) => a andThen b)
However, that breaks as soon as I change one of the arguments or return types to Long:
def Multiply(value: Long, factor:Double) = value*factor
And change my case
instruction match {
case "POLY" => (x: Double) => EUSteps.Poly(x,instArgs)
case "MULTI" => (x: Long) => Multiply(x,instArgs(0))
}
}).reduceLeft((a,b) => a andThen b)
Now the Reduce is complaining because it wanted Double => Double instead of Long => Double
Update 2
The way I solved it was to do what Levi suggested in the comments. I'm sure this is not very Scala-y, but when in doubt I go back to my OO roots. I suspect there is a more elegant way to do it though.
I declared an abstract class called ParamVal:
abstract class ParamVal {
def toDouble(): Double
def toLong(): Long
}
Then Long and Double types to go with it that implement the conversions:
case class DoubleVal(value: Double) extends ParamVal {
override def toDouble(): Double = value
override def toLong(): Long = value.toLong
}
case class LongVal(value: Long) extends ParamVal {
override def toDouble(): Double = value.toDouble
override def toLong(): Long = value
}
This lets me define all function inputs as ParamVal, and since each one expects a certain input type it's easy to just call toDouble or toLong as needed.
NOTE: The app that creates these instructions already makes sure the chain is correct.
Some ideas:
Analyze the string chain upfront and figure out what will be the type of the final result and then use it for all steps all along. You will need a family of functions for each type.
Try to use Either[Long, Double] in the reduce part.
Related
I am trying implement the fibonacci function in Scala with memoization
One example given here uses a case statement:
Is there a generic way to memoize in Scala?
import scalaz.Memo
lazy val fib: Int => BigInt = Memo.mutableHashMapMemo {
case 0 => 0
case 1 => 1
case n => fib(n-2) + fib(n-1)
}
It seems the variable n is implicitly defined as the first argument, but I get a compilation error if I replace n with _
Also what advantage does the lazy keyword have here, as the function seems to work equally well with and without this keyword.
However I wanted to generalize this to a more generic function definition with appropriate typing
import scalaz.Memo
def fibonachi(n: Int) : Int = Memo.mutableHashMapMemo[Int, Int] {
var value : Int = 0
if( n <= 1 ) { value = n; }
else { value = fibonachi(n-1) + fibonachi(n-2) }
return value
}
but I get the following compilation error
cmd10.sc:4: type mismatch;
found : Int => Int
required: Int
def fibonachi(n: Int) : Int = Memo.mutableHashMapMemo[Int, Int] {
^Compilation Failed
Compilation Failed
So I am trying to understand the generic way of adding adding a memoization annotation to a scala def function
One way to achieve a Fibonacci sequence is via a recursive Stream.
val fib: Stream[BigInt] = 0 #:: fib.scan(1:BigInt)(_+_)
An interesting aspect of streams is that, if something holds on to the head of the stream, the calculation results are auto-memoized. So, in this case, because the identifier fib is a val and not a def, the value of fib(n) is calculated only once and simply retrieved thereafter.
However, indexing a Stream is still a linear operation. If you want to memoize that away you could create a simple memo-wrapper.
def memo[A,R](f: A=>R): A=>R =
new collection.mutable.WeakHashMap[A,R] {
override def apply(a: A) = getOrElseUpdate(a,f(a))
}
val fib: Stream[BigInt] = 0 #:: fib.scan(1:BigInt)(_+_)
val mfib = memo(fib)
mfib(99) //res0: BigInt = 218922995834555169026
The more general question I am trying to ask is how to take a pre-existing def function and add a mutable/immutable memoization annotation/wrapper to it inline.
Unfortunately there is no way to do this in Scala unless you are willing to use a macro annotation for this which feels like an overkill to me or to use some very ugly design.
The contradicting requirements are "def" and "inline". The fundamental reason for this is that whatever you do inline with the def can't create any new place to store the memoized values (unless you use a macro that can re-write code introducing new val/vars). You may try to work this around using some global cache but this IMHO falls under the "ugly design" branch.
The design of ScalaZ Memo is used to create a val of the type Function[K,V] which is often written in Scala as just K => V instead of def. In this way the produced val can contain also the storage for the cached values. On the other hand syntactically the difference between usage of a def method and of a K => V function is minimal so this works pretty well. Since the Scala compiler knows how to convert def method into a function, you can wrap a def with Memo but you can't get a def out of it. If for some reason you need def anyway, you'll need another wrapper def.
import scalaz.Memo
object Fib {
def fib(n: Int): BigInt = n match {
case 0 => BigInt(0)
case 1 => BigInt(1)
case _ => fib(n - 2) + fib(n - 1)
}
// "fib _" converts a method into a function. Sometimes "_" might be omitted
// and compiler can imply it but sometimes the compiler needs this explicit hint
lazy val fib_mem_val: Int => BigInt = Memo.mutableHashMapMemo(fib _)
def fib_mem_def(n: Int): BigInt = fib_mem_val(n)
}
println(Fib.fib(5))
println(Fib.fib_mem_val(5))
println(Fib.fib_mem_def(5))
Note how there is no difference in syntax of calling fib, fib_mem_val and fib_mem_def although fib_mem_val is a value. You may also try this example online
Note: beware that some ScalaZ Memo implementations are not thread-safe.
As for the lazy part, the benefit is typical for any lazy val: the actual value with the underlying storage will not be created until the first usage. If the method will be used anyway, I see no benefits in declaring it as lazy
Consider the follwing generic function:
def compare[I:Ordering,T:Ordering](i:I,t:T):Int
It should compare a value of type I with a value of type T with both of them assumed to have Ordering defined. The comparison should work if there is either a way to implicitly convert I to T, or T to I. Obviously, if one uses types I and T that do not have any of the two conversions, the compiler should complain.
I am tempted to write something like this:
def compare[I:Ordering,T:Ordering](i:I,t:T)(implicit c1:I=>T, c2:T=>I):Int
But this actually asks for both conversions to exist, not at least one.
Any ideas?
EDIT: Given the comments I want to make the question complete. If both implicit conversions exist, I would like to assume a priority among the types. Then use the higher priority implicit conversion for the comparison.
Wrong Answer which I wrote initially:
Of course it will ask because you are trying to compare two different ordering. T:Ordering means that there should be an Ordering[T] available in the scope. Ordering[T] is different from Ordering[I]. It is like comparing numbers and strings where both can be ordered differently but ordering together does not makes sense.
PS: Both numbers and strings can be ordered together but that means numbers & strings will represent the same datatype here and there will be only one instance of Ordering for that data type.
Better answer:
Use a wrapper class to define the converters
object Main extends App {
def compare[I: Ordering, T: Ordering](i: I, t: T)(implicit wrapper: Wrapper[I, T]): Int = {
val converter: Either[(I) => T, (T) => I] = wrapper.getConverterBasedOnPriority
val convertedValue = if(converter.isLeft){
converter.left.map(c => c(i))
} else{
converter.right.map(c => c(t))
}
// do what ever you want
1
}
val iToT: (Int => String) = i => i.toString
val tToI: (String => Int) = s => s.toInt
// implicit def iToTWrapper = new Wrapper[Int , String ](iToT, null)
implicit def tToIWrapper = new Wrapper[Int , String ](null, tToI)
compare(1, "a")
}
class Wrapper[I, T](iToT: I => T, tToI : T => I) {
def getConverterBasedOnPriority:Either[I => T, T => I] = {
// return ordering based on priority check.
// returning iToT for example sake. Do the priority check and return accordingly
Left(iToT)
}
}
If you uncomment both implicits, it will throw and error. If you comment both implicits, it will throw and error.
I wrote one example to use scalaz.Free to to map Action to Future, it looks pretty cool. However, I am trying to understand the benefits of it. I hope I can get the answer here. Here is my code snippet
Firstly, I create an Action, which is AST.
trait Action[A]
case class GetNumberAction(x: Int) extends Action[Int]
case class GetStringAction(x: String) extends Action[String]
case class ConvertToIntAction(x: String) extends Action[Int]
case class AddAction(x: Int, y: Int) extends Action[Int]
Then, I create a class to map Action to ASTMonad by using Scalaz Free and Coyonda.
type Functor[A] = Coyoneda[Action, A]
type ASTMonad[A]= Free[Functor, A]
def toMonad[A](action: Action[A]): ASTMonad[A] = Free.liftFC[Action, A](action)
object ADTMonad {
def getNumber(x: Int): ASTMonad[Int] = toMonad(GetNumberAction(x))
def getString(x: String): ASTMonad[String] = toMonad(GetStringAction(x))
def converToInt(x: String): ASTMonad[Int] = toMonad(ConvertToIntAction(x))
def add(x: Int, y: Int): ASTMonad[Int] = toMonad(AddAction(x, y))
}
At last, I create an Interpreter to interpret Action to Future
object Interpreter extends (Action ~> Future) {
def apply[A](action: Action[A]): Future[A] = {
action match {
case GetNumberAction(x) => Future(x)
case GetStringAction(x) => Future(x)
case ConvertToIntAction(x) => Future(x.toInt)
case AddAction(x, y) => Future(x + y)
}
}
}
When I run it, I can use
val chain = for {
number <- ASTMonad.getNumber(x)
str <- ASTMonad.getString(y)
convertedNumber <- ASTMonad.converToInt(str)
total <- ASTMonad.add(number, convertedNumber)
} yield total
chain.runWith(Interpreter)
It seems to work and I think I understand this monad and interpreter things. However, I am thinking what is the benefits comparing to the solution if I am using Future.flatmap and map directly ?
for {
number <- Future(x)
str <- Future(y)
convertedNumber <- Future(str.toInt)
total <- Future(number + convertedNumber)
} yield total
The code of using Future flatmap and map looks simpler to me. So back to my questions, do we need to use Free monad to interpret the business logic to Future, since Future has already provided flatMap and map. If it does, can someone give me more concrete example, so I can see the benefits ?
Thanks in advance
A good and motivated example for using free applicative are command-line parsers, let's call the type CLI[A].
A value of type CLI[A] means you will get an A if you provide command-line arguments (Array[String]) and they can be parsed successfully. Now this functionality is isomorphic to Array[String] -> Either[String,A] when using Either for error handling.
Because you made CLI applicative, you can map and apply (combine) values. You can for example create a Int argument count, another Int argument count2, and combine them to a final sum: CLI[Int] that holds their sum.
Suppose you apply the computation directly, this yields something that is "only" equivalent to Array[String] -> Either[String,Int]. But if you want to create a help text you have to know both initial arguments, and this information is lost.
Free to the rescue. With Free you can retain the computation graph, which you can use to extract all initial CLI values that are directly parsed from the arguments. You can then later run the computation which yields the final value of sum by providing the parse results for all initial arguments.
Of course you could implement a special CLI that keeps track of all the initial values over computations, but Free let's you avoid this extra work.
Let me introduce this question by way of an example. This was taken from Lecture 2.3 of Martin Odersky's Functional Programming course.
I have a function to find fixed points iteratively like so
object fixed_points {
println("Welcome to Fixed Points")
val tolerance = 0.0001
def isCloseEnough(x: Double, y: Double) =
abs((x-y)/x) < tolerance
def fixedPoint(f: Double => Double)(firstGuess: Double) = {
def iterate(guess: Double): Double = {
println(guess)
val next = f(guess)
if (isCloseEnough(guess, next)) next
else iterate(next)
}
iterate(firstGuess)
}
I can adapt this function to finding square roots like so
def sqrt(x: Double) =
fixedPoint(y => x/y)(1.0)
However, this does not converge for certain arguments (like 4 for example). So I apply an average damping to it, essentially converting it to Newton-Raphson like so
def sqrt(x: Double) =
fixedPoint(y => (x/y+y)/2)(1.0)
which converges.
Now average damping is general enough to warrant its own function, so I refactor my code like so
def averageDamp(f: Double => Double)(x: Double) = (x+f(x))/2
and
def sqrtDamp(x: Double) =
fixedPoint(averageDamp(y=>x/y))(1.0) (*)
Whoa! What just happened?? I'm using averageDamp with only one parameter (when it was defined with two) and the compiler does not complain!
Now, I understand that I can use partial application like so
def a = averageDamp(x=>2*x)_
a(3) // returns 4.5
No problems there. But when I attempt to use averageDamp with less than the requisite number of parameters (as was done in sqrtDamp) like so
def a = averageDamp(x=>2*x) (**)
I get an error missing arguments for method averageDamp.
Questions:
How is what I have done in (**) different from (*) that the compiler complains in the former but not the latter?
So it looks like using less than the requisite parameters is allowed under certain circumstances. What are these circumstances and what is the name given to this mechanism? (I realize this would come under the topic of 'currying', but I'm after the specific name of this subset of currying, as it were)
This answer expands on the comment posted by #som-snytt.
The difference between (**) and (*) is that in the former, fixedPoint provides a type definition, whereas in the latter a does not. Essentially, whenever your code provides an explicit type declaration, the compiler is happy yo overlook the omission of the trailing underscore. This is a deliberate design decision, see Martin Odersky's explanation.
To illustrate this point, here is a small example.
object A {
def add(a: Int)(b:Int): Int = a + b
val x: Int => Int = add(5) // compiles fine
val y = add(5) // produces the following compiler error
}
/* missing arguments for method add in object A;
follow this method with `_' if you want to treat it as a partially applied function
val y = add(5)
^
*/
In Scala one can write (curried?) functions like this
def curriedFunc(arg1: Int) (arg2: String) = { ... }
What is the difference between the above curriedFunc function definition with two parameters lists and functions with multiple parameters in a single parameter list:
def curriedFunc(arg1: Int, arg2: String) = { ... }
From a mathematical point of view this is (curriedFunc(x))(y) and curriedFunc(x,y) but I can write def sum(x) (y) = x + y and the same will be def sum2(x, y) = x + y
I know only one difference - this is partially applied functions. But both ways are equivalent for me.
Are there any other differences?
Strictly speaking, this is not a curried function, but a method with multiple argument lists, although admittedly it looks like a function.
As you said, the multiple arguments lists allow the method to be used in the place of a partially applied function. (Sorry for the generally silly examples I use)
object NonCurr {
def tabulate[A](n: Int, fun: Int => A) = IndexedSeq.tabulate(n)(fun)
}
NonCurr.tabulate[Double](10, _) // not possible
val x = IndexedSeq.tabulate[Double](10) _ // possible. x is Function1 now
x(math.exp(_)) // complete the application
Another benefit is that you can use curly braces instead of parenthesis which looks nice if the second argument list consists of a single function, or thunk. E.g.
NonCurr.tabulate(10, { i => val j = util.Random.nextInt(i + 1); i - i % 2 })
versus
IndexedSeq.tabulate(10) { i =>
val j = util.Random.nextInt(i + 1)
i - i % 2
}
Or for the thunk:
IndexedSeq.fill(10) {
println("debug: operating the random number generator")
util.Random.nextInt(99)
}
Another advantage is, you can refer to arguments of a previous argument list for defining default argument values (although you could also say it's a disadvantage that you cannot do that in single list :)
// again I'm not very creative with the example, so forgive me
def doSomething(f: java.io.File)(modDate: Long = f.lastModified) = ???
Finally, there are three other application in an answer to related post Why does Scala provide both multiple parameters lists and multiple parameters per list? . I will just copy them here, but the credit goes to Knut Arne Vedaa, Kevin Wright, and extempore.
First: you can have multiple var args:
def foo(as: Int*)(bs: Int*)(cs: Int*) = as.sum * bs.sum * cs.sum
...which would not be possible in a single argument list.
Second, it aids the type inference:
def foo[T](a: T, b: T)(op: (T,T) => T) = op(a, b)
foo(1, 2){_ + _} // compiler can infer the type of the op function
def foo2[T](a: T, b: T, op: (T,T) => T) = op(a, b)
foo2(1, 2, _ + _) // compiler too stupid, unfortunately
And last, this is the only way you can have implicit and non implicit args, as implicit is a modifier for a whole argument list:
def gaga [A](x: A)(implicit mf: Manifest[A]) = ??? // ok
def gaga2[A](x: A, implicit mf: Manifest[A]) = ??? // not possible
There's another difference that was not covered by 0__'s excellent answer: default parameters. A parameter from one parameter list can be used when computing the default in another parameter list, but not in the same one.
For example:
def f(x: Int, y: Int = x * 2) = x + y // not valid
def g(x: Int)(y: Int = x * 2) = x + y // valid
That's the whole point, is that the curried and uncurried forms are equivalent! As others have pointed out, one or the other form can be syntactically more convenient to work with depending on the situation, and that is the only reason to prefer one over the other.
It's important to understand that even if Scala didn't have special syntax for declaring curried functions, you could still construct them; this is just a mathematical inevitability once you have the ability to create functions which return functions.
To demonstrate this, imagine that the def foo(a)(b)(c) = {...} syntax didn't exist. Then you could still achieve the exact same thing like so: def foo(a) = (b) => (c) => {...}.
Like many features in Scala, this is just a syntactic convenience for doing something that would be possible anyway, but with slightly more verbosity.
The two forms are isomorphic. The main difference is that curried functions are easier to apply partially, while non-curried functions have slightly nicer syntax, at least in Scala.