Error when writing Spark Structured Streaming example in Clojure - scala

I am trying to rewrite Spark Structured Streaming example in Clojure.
The example is written in Scala as follows:
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
(ns flambo-example.streaming-example
(:import [org.apache.spark.sql Encoders SparkSession Dataset Row]
[org.apache.spark.sql.functions]
))
(def spark
(->
(SparkSession/builder)
(.appName "sample")
(.master "local[*]")
.getOrCreate)
)
(def lines
(-> spark
.readStream
(.format "socket")
(.option "host" "localhost")
(.option "port" 9999)
.load
)
)
(def words
(-> lines
(.as (Encoders/STRING))
(.flatMap #(clojure.string/split % #" " ))
))
The above code causes the following exception.
;; Caused by java.lang.IllegalArgumentException
;; No matching method found: flatMap for class
;; org.apache.spark.sql.Dataset
How can I avoid the error ?

You have to follow the signatures. Java Dataset API provides two implementations of Dataset.flatMap, one which takes scala.Function1
def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]
and the second one which takes Spark's own o.a.s.api.java.function.FlatMapFunction
def flatMap[U](f: FlatMapFunction[T, U], encoder: Encoder[U]): Dataset[U]
The former one is rather useless for you, but you should be able to use the latter one. For RDD API flambo uses macros to create Spark friendly adapters which can be accessed with flambo.api/fn - I am not sure if these will work directly with Datasets, but you should be able to adjust them if you need.
Since you cannot depend on implicit Encoders you also have to provide explicit encoder which matches return type.
Overall you'll need something around:
(def words
(-> lines
(.as (Encoders/STRING))
(.flatMap f e)
))
where f implements FlatMapFunction and e is an Encoder. One example implementation:
(def words
(-> lines
(.as (Encoders/STRING))
(.flatMap
(proxy [FlatMapFunction] []
(call [s] (.iterator (clojure.string/split s #" "))))
(Encoders/STRING))))
but I guess it is possible to find a better one.
In practice I'd avoid typed Dataset whatsoever and focus on DataFrame (Dataset[Row]).

Related

How to simulate inheritance in Clojure?

I recently started looking into Clojure, so admittedly I do not have much
experience nor have I read every book about it. Still, I am having a hard
time figuring out how to expand the behavior of systems coded in Clojure.
To be more specific, for educational purposes, some time ago I implemented
parsers in Scala for a family of small languages -- the NAND-CIRC language
family, similar to how it is defined in “Introduction to Theoretical Computer
Science”, by Boaz Barak. There is a
pure version of the language, and then there are successive syntax sugars that
can be added (inline functions, user-defined functions, if/else, and for).
In Scala -- using the parser combinators library -- I could define the basic
grammar as a class, as shown below (the semantic actions are omitted for
brevity).
import scala.util.parsing.combinator._
class NandCircParsers extends RegexParsers {
def program: Parser[Any] = block
def block: Parser[Any] = rep(command)
def command: Parser[Any] = opt(statement) ~ rep(eol)
def statement: Parser[Any] = assign
def assign: Parser[Any] = reference ~ "=" ~ funcall
def reference: Parser[Any] = outputPort | variable
def funcall: Parser[Any] = identifier ~ "(" ~ actualArgs ~ ")"
def actualArgs: Parser[Any] = repsep(expression, ",")
def expression: Parser[Any] = inputPort | reference
def variable: Parser[Any] = identifier
def inputPort: Parser[Any] = "X" ~ "[" ~ index ~ "]"
def outputPort: Parser[Any] = "Y" ~ "[" ~ index ~ "]"
def identifier: Parser[Any] = """[_a-zA-Z$][_a-zA-Z0-9'$]*""".r
def index: Parser[Any] = number
def number: Parser[Any] = """[-+]?[0-9]+""".r
}
Then I could create an instance of the class NandCircParsers and use it to
process the vanilla version of the language.
But one other dialect of the language NAND-CIRD allows the use of inline
functions. For that, the production rule for expression must change. In Scala
that is a matter of creating a subclass and overriding the method in question.
class NandCircInlineParsers extends NandCircParsers {
override def expression: Parser[Any] = inputPort | reference | funcall
}
And then I can instantiate the class NandCircInlineParsers and use the new
dialect without having to rewrite all the grammar, i.e., the parser hierarchy.
For the dialect with if/else syntax sugar (only) the situation is similar.
class NandCircIfParsers extends NandCircParsers {
override def statement: Parser[Any] = assign | ifSttmt
def ifSttmt: Parser[Any] =
"if" ~ expression ~ ":" ~ block ~ opt("else" ~ ":" ~ block) ~ "end"
}
I just have to override the production rule of the grammar (method) that changed
and add the new ones and I have a parser for the new dialect.
But this is in Scala. Now I am trying to achieve an equivalent result with Clojure.
I have implemented a parser combinators library that goes in the line of what
was done for the parsec.el Emacs Lisp parser combinator
library, and for my needs, it is
working.
The parser for the vanilla version of the NAND-CIRC language becomes
something like this.
(def EQU (token (literal "=")))
(def COMMA (token (literal ",")))
(def LPAR (token (literal "(")))
(def RPAR (token (literal ")")))
(def LBKT (token (literal "[")))
(def RBKT (token (literal "]")))
(def IN (token (literal "X")))
(def OUT (token (literal "Y")))
(def ident (token (regex #"[_a-zA-Z$][_a-zA-Z0-9'$]*")))
(def number (token (regex #"[-+]?[0-9]+")))
(def variable ident)
(def index number)
(def input (do* IN LBKT index RBKT))
(def output (do* OUT LBKT index RBKT))
(def expr (choice input output variable))
(def actual-args (sep-by expr COMMA))
(def funcall (do* ident LPAR actual-args RPAR))
(def formal-args (sep-by var COMMA))
(def reference (either output variable))
(def assign (do* reference EQU funcall))
(def command (do* (optional assign) (many eol)))
(def statement command)
(def program (many command))
Readability issues aside, my question is: how can I achieve the same level of
code reuse that I have in Scala with Clojure? How can I turn this design modular
enough that I could only change or add the necessary rules to get a new dialect
of the language? Right now, all I can think about -- that does not involve
implementing an object system with inheritance -- is to duplicate the entire
grammar.
Does Clojure have any resources that can facilitate this?
Thanks.
This sounds very much like something that could be implemented using maps. The basic grammar NandCircParsers could be
(def NandCircParsers
{:program program
:block block
:command command
:statement statement
... ...})
From this grammar, we can create an extended grammar NandCircIfParsers that uses merge to inherit from NandCircParsers. Something like this, maybe:
(def NandCircIfParsers
(let [ifSttmt ...]
(merge NandCircParsers
{:statement (choice (:assign NandCircParsers) ifSttmt)
:ifSttmt ifSttmt})))

Spark Dataset equivalent for scala's "collect" taking a partial function

Regular scala collections have a nifty collect method which lets me do a filter-map operation in one pass using a partial function. Is there an equivalent operation on spark Datasets?
I'd like it for two reasons:
syntactic simplicity
it reduces filter-map style operations to a single pass (although in spark I am guessing there are optimizations which spot these things for you)
Here is an example to show what I mean. Suppose I have a sequence of options and I want to extract and double just the defined integers (those in a Some):
val input = Seq(Some(3), None, Some(-1), None, Some(4), Some(5))
Method 1 - collect
input.collect {
case Some(value) => value * 2
}
// List(6, -2, 8, 10)
The collect makes this quite neat syntactically and does one pass.
Method 2 - filter-map
input.filter(_.isDefined).map(_.get * 2)
I can carry this kind of pattern over to spark because datasets and data frames have analogous methods.
But I don't like this so much because isDefined and get seem like code smells to me. There's an implicit assumption that map is receiving only Somes. The compiler can't verify this. In a bigger example, that assumption would be harder for a developer to spot and the developer might swap the filter and map around for example without getting a syntax error.
Method 3 - fold* operations
input.foldRight[List[Int]](Nil) {
case (nextOpt, acc) => nextOpt match {
case Some(next) => next*2 :: acc
case None => acc
}
}
I haven't used spark enough to know if fold has an equivalent so this might be a bit tangential.
Anyway, the pattern match, the fold boiler plate and the rebuilding of the list all get jumbled together and it's hard to read.
So overall I find the collect syntax the nicest and I'm hoping spark has something like this.
The answers here are incorrect, at least with the current of Spark.
RDDs do in fact have a collect method that takes a partial function and applies a filter & map to the data. This is completely different from the parameterless .collect() method. See the Spark source code RDD.scala # line 955:
/**
* Return an RDD that contains all matching values by applying `f`.
*/
def collect[U: ClassTag](f: PartialFunction[T, U]): RDD[U] = withScope {
val cleanF = sc.clean(f)
filter(cleanF.isDefinedAt).map(cleanF)
}
This does not materialize the data from the RDD, as opposed to the parameterless .collect() method in RDD.scala # line 923:
/**
* Return an array that contains all of the elements in this RDD.
*/
def collect(): Array[T] = withScope {
val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)
Array.concat(results: _*)
}
In the documentation, notice how the
def collect[U](f: PartialFunction[T, U]): RDD[U]
method does not have a warning associated with it about the data being loaded into the driver's memory:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD#collect[U](f:PartialFunction[T,U])(implicitevidence$29:scala.reflect.ClassTag[U]):org.apache.spark.rdd.RDD[U]
It's very confusing for Spark to have these overloaded methods doing completely different things.
edit: My mistake! I misread the question, we're talking about DataSets not RDDs. Still, the accepted answer says that
"the Spark documentation points out, however, "this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory."
Which is incorrect! The data is not loaded into the driver's memory when calling the partial function version of .collect() - only when calling the parameterless version. Calling .collect(partial_function) should have about the same performance as calling .filter() and .map() sequentially, as shown in the source code above.
Just for the sake of completeness:
The RDD API does have such a method, so it's always an option to convert a given Dataset / DataFrame to RDD, perform the collect operation and convert back, e.g.:
val dataset = Seq(Some(1), None, Some(2)).toDS()
val dsResult = dataset.rdd.collect { case Some(i) => i * 2 }.toDS()
However, this will probably perform worse than using a map and filter on the Dataset (for the reason explained in #stefanobaghino's answer).
As for DataFrames, this particular example (using Option) is somewhat misleading, as the conversion into a DataFrame actually does the "flatenning" of Options into their values (or null for None), so the equivalent expression would be:
val dataframe = Seq(Some(1), None, Some(2)).toDF("opt")
dataframe.withColumn("opt", $"opt".multiply(2)).filter(not(isnull($"opt")))
Which, I think, suffers less from your concerns of having the map operation "assume" anything about its input.
The collect method defined over RDDs and Datasets is used to materialize the data in the driver program.
Despite not having something akin to the Collections API collect method, your intuition is right: since both operations are evaluated lazily, the engine has the opportunity to optimize the operations and chain them so that they are performed with maximum locality.
For the use case you mentioned in particular I would suggest you take flatMap in consideration, which works on both RDDs and Datasets:
// Assumes the usual spark-shell environment
// sc: SparkContext, spark: SparkSession
val collection = Seq(Some(1), None, Some(2), None, Some(3))
val rdd = sc.parallelize(collection)
val dataset = spark.createDataset(rdd)
// Both operations will yield `Array(2, 4, 6)`
rdd.flatMap(_.map(_ * 2)).collect
dataset.flatMap(_.map(_ * 2)).collect
// You can also express the operation in terms of a for-comprehension
(for (option <- rdd; n <- option) yield n * 2).collect
(for (option <- dataset; n <- option) yield n * 2).collect
// The same approach is valid for traditional collections as well
collection.flatMap(_.map(_ * 2))
for (option <- collection; n <- option) yield n * 2
EDIT
As correctly pointed out in another question, RDDs actually have the collect method that transforms an RDD by applying a partial function just like it happens in normal collections. As the Spark documentation points out, however, "this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory."
I just wanted to extend stefanobaghino's answer by including an example of a for comprehension with a case class as many use cases for this will probably involve case classes.
Also options are monads which makes the accepted answer very simple in this case as the for neatly drops out the None values, but that approach wouldn't extend to non-monads like case classes:
case class A(b: Boolean, i: Int, d: Double)
val collection = Seq(A(true, 3), A(false, 10), A(true, -1))
val rdd = ...
val dataset = ...
// Select out and double all the 'i' values where 'b' is true:
for {
A(b, i, _) <- dataset
if b
} yield i * 2
You can always create your own extension method:
implicit class DatasetOps[T](ds: Dataset[T]) {
def collectt[U](pf: PartialFunction[T, U])(implicit enc: Encoder[U]): Dataset[U] = {
ds.flatMap(pf.lift(_))
}
}
such that:
// val ds = Dataset(1, 2, 3)
ds.collectt { case x if x % 2 == 1 => x * 3 }
// Dataset(3, 9)
Note that I've unfortunately not been able to name it collect (thus the awful suffix t) as the signature would otherwise (I think) clash with the existing Dataset#collect method that transforms a Dataset into an Array.

Spark closure argument binding

I am working with Apache Spark in Scala.
I have a problem when trying to manipulate one RDD with data from a second RDD. I am trying to pass the 2nd RDD as an argument to a function being 'mapped' against the first RDD, but seemingly the closure created on that function binds an uninitialized version of that value.
Following is a simpler piece of code that shows the type of problem I'm seeing. (My real example where I first had trouble is larger and less understandable).
I don't really understand the argument binding rules for Spark closures.
What I'm really looking for is a basic approach or pattern for how to manipulate one RDD using the content of another (which was previously constructed elsewhere).
In the following code, calling Test1.process(sc) will fail with a null pointer access in findSquare (as the 2nd arg bound in the closure is not initialized)
object Test1 {
def process(sc: SparkContext) {
val squaresMap = (1 to 10).map(n => (n, n * n))
val squaresRDD = sc.parallelize(squaresMap)
val primes = sc.parallelize(List(2, 3, 5, 7))
for (p <- primes) {
println("%d: %d".format(p, findSquare(p, squaresRDD)))
}
}
def findSquare(n: Int, squaresRDD: RDD[(Int, Int)]): Int = {
squaresRDD.filter(kv => kv._1 == n).first._1
}
}
Problem you experience has nothing to do with closures or RDDs which, contrary to popular belief, are serializable.
It is simply breaks a fundamental Spark rule which states that you cannot trigger an action or transformation from another action or transformation* and different variants of this question have been asked on SO multiple times.
To understand why that's the case you have to think about the architecture:
SparkContext is managed on the driver
everything that happens inside transformations is executed on the workers. Each worker have access only to its own part of the data and don't communicate with other workers**.
If you want to use content of multiple RDDs you have to use one of the transformations which combine RDDs, like join, cartesian, zip or union.
Here you most likely (I am not sure why you pass tuple and use only first element of this tuple) want to either use a broadcast variable:
val squaresMapBD = sc.broadcast(squaresMap)
def findSquare(n: Int): Seq[(Int, Int)] = {
squaresMapBD.value
.filter{case (k, v) => k == n}
.map{case (k, v) => (n, k)}
.take(1)
}
primes.flatMap(findSquare)
or Cartesian:
primes
.cartesian(squaresRDD)
.filter{case (n, (k, _)) => n == k}.map{case (n, (k, _)) => (n, k)}
Converting primes to dummy pairs (Int, null) and join would be more efficient:
primes.map((_, null)).join(squaresRDD).map(...)
but based on your comments I assume you're interested in a scenario when there is natural join condition.
Depending on a context you can also consider using database or files to store common data.
On a side note RDDs are not iterable so you cannot simply use for loop. To be able to do something like this you have to collect or convert toLocalIterator first. You can also use foreach method.
* To be precise you cannot access SparkContext.
** Torrent broadcast and tree aggregates involve communication between executors so it is technically possible.
RDD are not serializable, so you can't use an rdd inside an rdd trasformation.
Then I've never seen enumerate an rdd with a for statement, usually I use foreach statement that is part of rdd api.
In order to combine data from two rdd, you can leverage join, union or broadcast ( in case your rdd is small)

Convert datatype to string (SML)

I have a function that returns a (char * int) list list, like [[(#"D", 3)], [(#"F", 7)]], and now I'm wondering if it's posssible to convert this to a string, so that I can use I/O and read it to another file?
First of all, I assume you meant a value like [[(#"D", 3)], [(#"F", 7)]] (note the extra parens) since SML requires parentheses around tuple construction. OCaml uses a slightly different syntax, and allows just commas, like a, b, to construct tuples. I mention this because what follows is totally specific to Standard ML, and doesn't apply to OCaml, because I believe that in OCaml your best bet is an entirely different approach, which I don't know much about (macros, i.e. ocamlp4/5). So I assume that was just a typo and that you're interested in Standard ML.
Now, unfortunately there is no general toString function in Standard ML. Something like that would have to have some kind special support in the language and implementation, since it's not possible to write a function with the type 'a -> string. You basically have to write your own toString : t -> string for each type t.
As you can imagine, this gets tedious fast. I've spent a little time researching the options (for this and other boilerplate functions like compare : 't * 't -> order) and there is one very interesting technique outlined in the paper "Generics for the working ML'er" (http://dl.acm.org/citation.cfm?id=1292547) but it's pretty advanced and I could never actually get the code to compile (that said the paper is very interesting) The full generics library described in that paper is in the MLton lib repo (https://github.com/MLton/mltonlib/tree/master/com/ssh/generic/unstable). Maybe you'll have better luck?
Here's a slightly lighter weight approach that is less powerful but easier to understand, IMHO. I wrote this after reading that paper and struggling to get it to work. The idea is to write building blocks for toString functions (called show in this case) and compose them with other functions for your own types.
structure Show =
struct
(* Show.t is the type of toString functions *)
type 'a t = 'a -> string
val int: int t = Int.toString
val char: char t = Char.toString
val list: 'a t -> 'a list t =
fn show => fn xs => "[" ^ concat (ExtList.interleave (map show xs) ",") ^ "]"
val pair: 'a t * 'b t -> ('a * 'b) t =
fn (showa,showb) => fn (a,b) => "(" ^ showa a ^ "," ^ showb b ^ ")"
(* ... *)
end
Since your type doesn't actually have any user defined datatypes, it's very easy to write the toString function using this structure:
local
open Show
in
val show : (char * int) list list -> string = list (list (pair (char, int)))
end
- show [[(#"D", 3)], [(#"F", 7)]] ;
val it = "[[(D,3)],[(F,7)]]" : string
What I like about this is that the composed functions read like the type turned inside out. It's a quite an elegant style, which I cannot take credit for as I took it from the generics paper linked above.
The rest of the code for Show (and a related module Eq for equality comparison) is here: https://github.com/spacemanaki/lib.sml

Cannot create apply function with static language?

I have read that with a statically typed language like Scala or Haskell there is no way to create or provide a Lisp apply function:
(apply #'+ (list 1 2 3)) => 6
or maybe
(apply #'list '(list :foo 1 2 "bar")) => (:FOO 1 2 "bar")
(apply #'nth (list 1 '(1 2 3))) => 2
Is this a truth?
It is perfectly possible in a statically typed language. The whole java.lang.reflect thingy is about doing that. Of course, using reflection gives you as much type safety as you have with Lisp. On the other hand, while I do not know if there are statically typed languages supporting such feature, it seems to me it could be done.
Let me show how I figure Scala could be extended to support it. First, let's see a simpler example:
def apply[T, R](f: (T*) => R)(args: T*) = f(args: _*)
This is real Scala code, and it works, but it won't work for any function which receives arbitrary types. For one thing, the notation T* will return a Seq[T], which is a homegenously-typed sequence. However, there are heterogeneously-typed sequences, such as the HList.
So, first, let's try to use HList here:
def apply[T <: HList, R](f: (T) => R)(args: T) = f(args)
That's still working Scala, but we put a big restriction on f by saying it must receive an HList, instead of an arbitrary number of parameters. Let's say we use # to make the conversion from heterogeneous parameters to HList, the same way * converts from homogeneous parameters to Seq:
def apply[T, R](f: (T#) => R)(args: T#) = f(args: _#)
We aren't talking about real-life Scala anymore, but an hypothetical improvement to it. This looks reasonably to me, except that T is supposed to be one type by the type parameter notation. We could, perhaps, just extend it the same way:
def apply[T#, R](f: (T#) => R)(args: T#) = f(args: _#)
To me, it looks like that could work, though that may be naivety on my part.
Let's consider an alternate solution, one depending on unification of parameter lists and tuples. Let's say Scala had finally unified parameter list and tuples, and that all tuples were subclass to an abstract class Tuple. Then we could write this:
def apply[T <: Tuple, R](f: (T) => R)(args: T) = f(args)
There. Making an abstract class Tuple would be trivial, and the tuple/parameter list unification is not a far-fetched idea.
A full APPLY is difficult in a static language.
In Lisp APPLY applies a function to a list of arguments. Both the function and the list of arguments are arguments to APPLY.
APPLY can use any function. That means that this could be any result type and any argument types.
APPLY takes arbitrary arguments in arbitrary length (in Common Lisp the length is restricted by an implementation specific constant value) with arbitrary and possibly different types.
APPLY returns any type of value that is returned by the function it got as an argument.
How would one type check that without subverting a static type system?
Examples:
(apply #'+ '(1 1.4)) ; the result is a float.
(apply #'open (list "/tmp/foo" :direction :input))
; the result is an I/O stream
(apply #'open (list name :direction direction))
; the result is also an I/O stream
(apply some-function some-arguments)
; the result is whatever the function bound to some-function returns
(apply (read) (read))
; neither the actual function nor the arguments are known before runtime.
; READ can return anything
Interaction example:
CL-USER 49 > (apply (READ) (READ)) ; call APPLY
open ; enter the symbol OPEN
("/tmp/foo" :direction :input :if-does-not-exist :create) ; enter a list
#<STREAM::LATIN-1-FILE-STREAM /tmp/foo> ; the result
Now an example with the function REMOVE. We are going to remove the character a from a list of different things.
CL-USER 50 > (apply (READ) (READ))
remove
(#\a (1 "a" #\a 12.3 :foo))
(1 "a" 12.3 :FOO)
Note that you also can apply apply itself, since apply is a function.
CL-USER 56 > (apply #'apply '(+ (1 2 3)))
6
There is also a slight complication because the function APPLY takes an arbitrary number of arguments, where only the last argument needs to be a list:
CL-USER 57 > (apply #'open
"/tmp/foo1"
:direction
:input
'(:if-does-not-exist :create))
#<STREAM::LATIN-1-FILE-STREAM /tmp/foo1>
How to deal with that?
relax static type checking rules
restrict APPLY
One or both of above will have to be done in a typical statically type checked programming language. Neither will give you a fully statically checked and fully flexible APPLY.
The reason you can't do that in most statically typed languages is that they almost all choose to have a list type that is restricted to uniform lists. Typed Racket is an example for a language that can talk about lists that are not uniformly typed (eg, it has a Listof for uniform lists, and List for a list with a statically known length that can be non-uniform) -- but still it assigns a limited type (with uniform lists) for Racket's apply, since the real type is extremely difficult to encode.
It's trivial in Scala:
Welcome to Scala version 2.8.0.final ...
scala> val li1 = List(1, 2, 3)
li1: List[Int] = List(1, 2, 3)
scala> li1.reduceLeft(_ + _)
res1: Int = 6
OK, typeless:
scala> def m1(args: Any*): Any = args.length
m1: (args: Any*)Any
scala> val f1 = m1 _
f1: (Any*) => Any = <function1>
scala> def apply(f: (Any*) => Any, args: Any*) = f(args: _*)
apply: (f: (Any*) => Any,args: Any*)Any
scala> apply(f1, "we", "don't", "need", "no", "stinkin'", "types")
res0: Any = 6
Perhaps I mixed up funcall and apply, so:
scala> def funcall(f: (Any*) => Any, args: Any*) = f(args: _*)
funcall: (f: (Any*) => Any,args: Any*)Any
scala> def apply(f: (Any*) => Any, args: List[Any]) = f(args: _*)
apply: (f: (Any*) => Any,args: List[Any])Any
scala> apply(f1, List("we", "don't", "need", "no", "stinkin'", "types"))
res0: Any = 6
scala> funcall(f1, "we", "don't", "need", "no", "stinkin'", "types")
res1: Any = 6
It is possible to write apply in a statically-typed language, as long as functions are typed a particular way. In most languages, functions have individual parameters terminated either by a rejection (i.e. no variadic invocation), or a typed accept (i.e. variadic invocation possible, but only when all further parameters are of type T). Here's how you might model this in Scala:
trait TypeList[T]
case object Reject extends TypeList[Reject]
case class Accept[T](xs: List[T]) extends TypeList[Accept[T]]
case class Cons[T, U](head: T, tail: U) extends TypeList[Cons[T, U]]
Note that this doesn't enforce well-formedness (though type bounds do exist for that, I believe), but you get the idea. Then you have apply defined like this:
apply[T, U]: (TypeList[T], (T => U)) => U
Your functions, then, are defined in terms of type list things:
def f (x: Int, y: Int): Int = x + y
becomes:
def f (t: TypeList[Cons[Int, Cons[Int, Reject]]]): Int = t.head + t.tail.head
And variadic functions like this:
def sum (xs: Int*): Int = xs.foldLeft(0)(_ + _)
become this:
def sum (t: TypeList[Accept[Int]]): Int = t.xs.foldLeft(0)(_ + _)
The only problem with all of this is that in Scala (and in most other static languages), types aren't first-class enough to define the isomorphisms between any cons-style structure and a fixed-length tuple. Because most static languages don't represent functions in terms of recursive types, you don't have the flexibility to do things like this transparently. (Macros would change this, of course, as well as encouraging a reasonable representation of function types in the first place. However, using apply negatively impacts performance for obvious reasons.)
In Haskell, there is no datatype for multi-types lists, although I believe, that you can hack something like this together whith the mysterious Typeable typeclass. As I see, you're looking for a function, which takes a function, a which contains exactly the same amount of values as needed by the function and returns the result.
For me, this looks very familiar to haskells uncurryfunction, just that it takes a tuple instead of a list. The difference is, that a tuple has always the same count of elements (so (1,2) and (1,2,3) are of different types (!)) and there contents can be arbitrary typed.
The uncurry function has this definition:
uncurry :: (a -> b -> c) -> (a,b) -> c
uncurry f (a,b) = f a b
What you need is some kind of uncurry which is overloaded in a way to provide an arbitrary number of params. I think of something like this:
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
class MyApply f t r where
myApply :: f -> t -> r
instance MyApply (a -> b -> c) (a,b) c where
myApply f (a,b) = f a b
instance MyApply (a -> b -> c -> d) (a,b,c) d where
myApply f (a,b,c) = f a b c
-- and so on
But this only works, if ALL types involved are known to the compiler. Sadly, adding a fundep causes the compiler to refuse compilation. As I'm not a haskell guru, maybe domeone else knows, howto fix this. Sadly, I don't know how to archieve this easier.
Résumee: apply is not very easy in Haskell, although possible. I guess, you'll never need it.
Edit I have a better idea now, give me ten minutes and I present you something whithout these problems.
try folds. they're probably similar to what you want. just write a special case of it.
haskell: foldr1 (+) [0..3] => 6
incidentally, foldr1 is functionally equivalent to foldr with the accumulator initialized as the element of the list.
there are all sorts of folds. they all technically do the same thing, though in different ways, and might do their arguments in different orders. foldr is just one of the simpler ones.
On this page, I read that "Apply is just like funcall, except that its final argument should be a list; the elements of that list are treated as if they were additional arguments to a funcall."
In Scala, functions can have varargs (variadic arguments), like the newer versions of Java. You can convert a list (or any Iterable object) into more vararg parameters using the notation :_* Example:
//The asterisk after the type signifies variadic arguments
def someFunctionWithVarargs(varargs: Int*) = //blah blah blah...
val list = List(1, 2, 3, 4)
someFunctionWithVarargs(list:_*)
//equivalent to
someFunctionWithVarargs(1, 2, 3, 4)
In fact, even Java can do this. Java varargs can be passed either as a sequence of arguments or as an array. All you'd have to do is convert your Java List to an array to do the same thing.
The benefit of a static language is that it would prevent you to apply a function to the arguments of incorrect types, so I think it's natural that it would be harder to do.
Given a list of arguments and a function, in Scala, a tuple would best capture the data since it can store values of different types. With that in mind tupled has some resemblance to apply:
scala> val args = (1, "a")
args: (Int, java.lang.String) = (1,a)
scala> val f = (i:Int, s:String) => s + i
f: (Int, String) => java.lang.String = <function2>
scala> f.tupled(args)
res0: java.lang.String = a1
For function of one argument, there is actually apply:
scala> val g = (i:Int) => i + 1
g: (Int) => Int = <function1>
scala> g.apply(2)
res11: Int = 3
I think if you think as apply as the mechanism to apply a first class function to its arguments, then the concept is there in Scala. But I suspect that apply in lisp is more powerful.
For Haskell, to do it dynamically, see Data.Dynamic, and dynApp in particular: http://www.haskell.org/ghc/docs/6.12.1/html/libraries/base/Data-Dynamic.html
See his dynamic thing for haskell, in C, void function pointers can be casted to other types, but you'd have to specify the type to cast it to. (I think, haven't done function pointers in a while)
A list in Haskell can only store values of one type, so you couldn't do funny stuff like (apply substring ["Foo",2,3]). Neither does Haskell have variadic functions, so (+) can only ever take two arguments.
There is a $ function in Haskell:
($) :: (a -> b) -> a -> b
f $ x = f x
But that's only really useful because it has very low precedence, or as passing around HOFs.
I imagine you might be able to do something like this using tuple types and fundeps though?
class Apply f tt vt | f -> tt, f -> vt where
apply :: f -> tt -> vt
instance Apply (a -> r) a r where
apply f t = f t
instance Apply (a1 -> a2 -> r) (a1,a2) r where
apply f (t1,t2) = f t1 t2
instance Apply (a1 -> a2 -> a3 -> r) (a1,a2,a3) r where
apply f (t1,t2,t3) = f t1 t2 t3
I guess that's a sort of 'uncurryN', isn't it?
Edit: this doesn't actually compile; superseded by #FUZxxl's answer.