I have a mapper function defined as such:
def foo(x:Int) = if (x>2) x*2
the type signature of this method being Int => AnyVal. Now if I map this function over a list of integers:
scala> List(-1,3,-4,0,5).map(foo)
res0: List[AnyVal] = List((), 6, (), (), 10)
I need a way of filtering out the Units from the Ints like so:
scala> res0.filter(_.isInstanceOf[Int]).map(_.asInstanceOf[Int])
res1: List[Int] = List(6, 10)
Everything seems concisely expressed until I have to do the filter-map on res0 to extract the values I care about. I could use matchers or an if-else in foo to always ensure I return an Int but I'd still need to filter the unwanted values resulting from the map operation.
Can any of the well-seasoned Scala developers reading this provide some additional insight into what's good or bad about this approach especially as my collection grows large (e.g. maybe this collection is a distributed Spark RDD)? Are there more idiomatic ways of doing this functionally?
In this case I suggest you to use collect with PartialFunction, if you need to drop all ints that are smaller than 2
val foo: PartialFunction[Int, Int] = {
case x if x > 2 => x*2
}
println(List(-1,3,-4,0,5).collect(foo))
Your original foo has type Int => AnyVal, because scalac transforms it in something like
def foo(x: Int) = if (x > 2) x*2 else () // () === Unit
and common super type for Int and Unit is AnyVal
Related
With Scalaz, a function can be mapped over another function. When would I want to use map over andThen? Is there a clear advantage using map? Thanks
For example,
val f: Int => Int = (a) => a + 10
val g: Int => Int = (a) => a * 100
(f map g map {_*3})(10) == (f andThen g andThen {_*3})(10) // true
Setting aside implementation details for a moment, map is andThen for functions (under the functor instance for A => ?), and it doesn't really make a lot of sense to talk about preferring one over the other if we're talking about functions specifically and not some higher level of abstraction.
What methods like map (and type classes like Functor more generally) allow us to do is abstract over specific types or type constructors. Suppose we want to write a incrementResult method that works on both A => Int or Kleisli[Option, A, Int], for example. These types don't have anything in common in terms of inheritance (short of AnyRef, which is useless), but A => ? and Kleisli[Option, A, ?] are both functors, so we could write this:
import scalaz._, Scalaz._
def incrementResult[F[_]: Functor](f: F[Int]): F[Int] = f.map(_ + 1)
And then use it like this (note that I'm using kind-projector to simplify the type syntax a bit):
scala> val plainOldFuncTriple: Int => Int = _ * 3
plainOldFuncTriple: Int => Int = <function1>
scala> val optionKleisliTriple: Kleisli[Option, Int, Int] = Kleisli(i => Some(i * 3))
optionKleisliTriple: scalaz.Kleisli[Option,Int,Int] = Kleisli(<function1>)
scala> val f = incrementResult[Int => ?](plainOldFuncTriple)
f: Int => Int = <function1>
scala> val k = incrementResult[Kleisli[Option, Int, ?]](optionKleisliTriple)
k: scalaz.Kleisli[Option,Int,Int] = Kleisli(<function1>)
scala> f(10)
res0: Int = 31
scala> k(10)
res1: Option[Int] = Some(31)
In this case specifically there are better ways to implement this operation, but it shows the general idea—we couldn't write a single method that works for both ordinary functions and Kleisli arrows using andThen, but we can with the extra level of abstraction that map gives us.
So to answer your question—you'd use map if you want to abstract over all type constructors that have a functor instance, but if you're working specifically with functions, map is andThen, and—as long as we're still setting aside implementation details—it doesn't matter which you choose.
Footnote: the map that Scalaz's syntax package gives you for values of types that have functor instances is implemented as an extension method, so there's a tiny bit of overhead (both at compile time and runtime) involved in using map instead of andThen on a function. If you're only working with functions and don't need the extra abstraction, then, you might as well go with andThen.
I am a newbie to Scala and I am trying to understand collectives. I have a sample Scala code in which a method is defined as follows:
override def write(records: Iterator[Product2[K, V]]): Unit = {...}
From what I understand, this function is passed an argument record which is an Iterator of type Product2[K,V]. Now what I don't understand is this Product2 a user defined class or is it a built in data structure. Moreover how do explore the key-value pair contents of Product2 and how do I iterate over them.
Chances are Product2 is a built-in class and you can easily check it if you're in modern IDE (just hover over it with ctrl pressed), or, by inspecting file header -- if there is no related imports, like some.custom.package.Product2, it's built-in.
What is Product2 and where it's defined? You can easily found out such things by utilizing Scala's ScalaDoc:
In case of build-in class you can treat it like tuple of 2 elements (in fact Tuple2 extends Product2, as you may see below), which has ._1 and ._2 accessor methods.
scala> val x: Product2[String, Int] = ("foo", 1)
// x: Product2[String,Int] = (foo,1)
scala> x._1
// res0: String = foo
scala> x._2
// res1: Int = 1
See How should I think about Scala's Product classes? for more.
Iteration is also hassle free, for example here is the map operation:
scala> val xs: Iterator[Product2[String, Int]] = List("foo" -> 1, "bar" -> 2, "baz" -> 3).iterator
xs: Iterator[Product2[String,Int]] = non-empty iterator
scala> val keys = xs.map(kv => kv._1)
keys: Iterator[String] = non-empty iterator
scala> val keys = xs.map(kv => kv._1).toList
keys: List[String] = List(foo, bar, baz)
scala> xs
res2: Iterator[Product2[String,Int]] = empty iterator
Keep in mind though, that once iterator was consumed, it transitions to empty state and can't be re-used again.
Product2 is just two values of type K and V.
use it like this:
write(List((1, "one"), (2, "two")))
the prototype can also be written like: override def write(records: Iterator[(K, V)]): Unit = {...}
To access values k of type K and v of type V.
override def write(records: Iterator[(K, V)]): Unit = {
records.map{case (k, v) => w(k, v)}
}
I'm currently learning Scala, and I just wondered at fold-left.
Since fold-left is curried, you should be able to get a partially applied function(PAF) with a first parameter as below.
(0 /: List(1, 2, 3)) _
But actually, I've got an error.
<console>:8: error: missing arguments for method /: in trait TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
Then I tried same thing by fold-right such as below
(List(1, 2, 3) :\ 0) _
In this way, it went correctly, and I could get a PAF such as ((Int, Int) => Int) => Int
I know I can get a PAF by using foldLeft method, but I wonder whether it is possible to express it with '/:' or not.
The underscore syntax does not work well with right-associative methods that take multiple parameter lists. Here are the options I see:
Declare a variable type:
val x: ((Int, Int) => Int) => Int = 0 /: List(1, 2, 3)
Similarly, use type ascription:
val x = (0 /: List(1,2,3)) : ((Int, Int) => Int) => Int
Use the postfix notation:
val x = List(1,2,3)./:(0) _
Use the foldLeft synonym:
val x = List(1,2,3).foldLeft(0) _
I played around with it, and couldn't find a configuration that works.
There's always the more explicit:
val f = List(1,2,3,4,5).foldLeft(0)_
Which is arguably neater. I'll keep poking around though.
Edit:
There's this:
val f2 = (0 /: List(1,2,3,4,5))(_: (Int,Int) => Int)
val x = f2(_+_)
But that's getting pretty ugly. Without the type annotation, it complains. That's the best I could do though.
I've got a class with a collections of Foos we'll call Bar. Foo has a number of number-returning methods that we want to aggregate at the Bar level, like so:
def attribute1(inputs: Map[Int, Double]) =
foos.foldLeft(0d)((sum, foo) => sum + foo.attribute1(inputs(foo.id)))
To aggregate these various attributes, I can have n functions of the form
def attributeN(inputs: Map[Int, Double]) =
foos.foldLeft(0d)((sum, foo) => sum + foo.attributeN(inputs(foo.id)))
However, that's ugly - I hate the fact that the iteration and summation are repeated. I want to abstract that, so I can do something like:
def attribute1(inputs: Map[Int, Double]) = aggregate(Foo.attribute1, inputs)
private def aggregate(f: Double => Double) = foos.foldLeft(0d)((sum, foo) => sum + foo.f(inputs(foo.id)
Of course, that does not work as one cannot reference Foo.attribute1 as a function - . is not a function instance.
I've basically stumbled through various solution, but every one results in code for each aggregation method at least as verbose or complex as what we have with no helper, and I'm left with the duplication of the iteration.
I may be just hoping for too much here, but I am virtually certain there is an elegant way to do this is Scala that is escaping me. So, any of the Scala gurus here who answers - thanks in advance!
I'm not sure I get what you're trying to do, but in scala a number-returning method like this:
def attribute1 = 5
IS a function. Well, sort of... It can be seen as a function with type () => Int (takes no argument, returns an Integer). You just need to use the omnipresent _ to tell scala to turn attribute1 into a function.
See if this helps as a starting point:
scala> class Foo {
| def attribute1=5
| def attribute2=2
| }
defined class Foo
scala> val foo=new Foo
foo: Foo = Foo#4cbba0bd
// test takes a function of type () => Int and just applies it (note
// the f() followed by () in the right-hand side to say we want to apply f
scala> def test(f: () => Int) = f()
test: (f: () => Int)Int
// the _ after foo.attribute1 tells scala that we want to use
// res2.attribute as a function, not take its value
scala> test(foo.attribute1 _)
res0: Int = 5
So basically what you're asking for is a way to address a specific method on multiple instances, right? If so, it's easily solvable:
trait Foo {
def id : Int
def attribute1( x : Double ) : Double
}
def aggregate( f : (Foo, Double) => Double, inputs : Map[Int, Double] ) =
foos.foldLeft(0d)( (sum, foo) => sum + f(foo, inputs(foo.id)) )
def aggregateAttribute1(inputs: Map[Int, Double]) =
aggregate(_.attribute1(_), inputs)
The key to this solution is _.attribute1(_) which is a sugarred way of writing
(foo, input) => foo.attribute1(input)
Building on #Nikita's answer, if you want to remove a bit more redundancy from your boring methods, you can curry the aggregate method:
def aggregate(f: (Foo, Double) => Double)(inputs: Map[Int, Double]): Double =
foos.foldLeft(0d)((sum, foo) => sum + f(foo, inputs(foo.id)))
def aggregateAttribute1: Map[Int, Double] => Double =
aggregate(_.attribute1(_))
Looking at
val sb = Seq.newBuilder[Int]
println(sb.getClass.getName)
sb += 1
sb += 2
val s = sb.result()
println(s.getClass.getName)
the output is
scala.collection.mutable.ListBuffer
scala.collection.immutable.$colon$colon
using Scala 2.10.1.
I would expect Seq.newBuilder to return a VectorBuilder for example. This is returned by CanBuildFrom, if the result is explicitly typed to a Seq:
def build[T, C <: Iterable[T]](x: T, y: T)
(implicit cbf: CanBuildFrom[Nothing, T, C]): C = {
val b = cbf()
println(b.getClass.getName)
b += x
b += y
b.result()
}
val s: Seq[Int] = build(1, 2)
println(s.getClass.getName) // scala.collection.immutable.Vector
in this case the builder is a VectorBuilder, and the result's class is a Vector.
So I explicitly wanted to build a Seq, but the result is a List which needs more RAM, according to Scala collection memory footprint characteristics.
So why does Seq.newBuilder return a ListBuffer which gives a List in the end?
The Scala Collections API is very complex and its hierarchy is rich in depth. Each level represents some sort of new abstraction. The Seq trait split up into two different subtraits, which give different guarantees for performance (ref.):
An IndexedSeq provides fast random-access of elements and a fast length operation. One representative of this IndexedSeq is the Vector.
A LinearSeq provides fast access only to the first element via head, but also has a fast tail operation. One representative of this LinearSeq is the List.
As the current default implementation of a Seq is a List, Seq.newBuilder will return a ListBuffer. However, if you want to use a Vector you can either use Vector.newBuilder[T] or IndexedSeq.newBuilder[T]:
scala> scala.collection.immutable.IndexedSeq.newBuilder[Int]
res0: scala.collection.mutable.Builder[Int,scala.collection.immutable.IndexedSeq[Int]] = scala.collection.immutable.VectorBuilder#1fb10a9f
scala> scala.collection.immutable.Vector.newBuilder[Int]
res1: scala.collection.mutable.Builder[Int,scala.collection.immutable.Vector[Int]] = scala.collection.immutable.VectorBuilder#3efe9969
The default Seq implementation is List:
Seq(1, 2, 3) // -> List(1, 2, 3)
...thus ListBuffer is the correct builder. If you want Vector, use Vector.newBuilder or IndexedSeq.newBuilder.
OK, but you're not going to believe it. Turning on -Yinfer-debug for your CanBuildFrom counter-example,
[search] $line14.$read.$iw.$iw.build[scala.this.Int, Seq[scala.this.Int]](1, 2) with pt=generic.this.CanBuildFrom[scala.this.Nothing,scala.this.Int,Seq[scala.this.Int]] in module class $iw, eligible:
fallbackStringCanBuildFrom: [T]=> generic.this.CanBuildFrom[String,T,immutable.this.IndexedSeq[T]]
[solve types] solving for T in ?T
inferExprInstance {
tree scala.this.Predef.fallbackStringCanBuildFrom[T]
tree.tpe generic.this.CanBuildFrom[String,T,immutable.this.IndexedSeq[T]]
tparams type T
pt generic.this.CanBuildFrom[scala.this.Nothing,scala.this.Int,Seq[scala.this.Int]]
targs scala.this.Int
tvars =?scala.this.Int
}
[search] considering no tparams (pt contains no tvars) trying generic.this.CanBuildFrom[String,scala.this.Int,immutable.this.IndexedSeq[scala.this.Int]] against pt=generic.this.CanBuildFrom[scala.this.Nothing,scala.this.Int,Seq[scala.this.Int]]
[success] found SearchResult(scala.this.Predef.fallbackStringCanBuildFrom[scala.this.Int], ) for pt generic.this.CanBuildFrom[scala.this.Nothing,scala.this.Int,Seq[scala.this.Int]]
[infer implicit] inferred SearchResult(scala.this.Predef.fallbackStringCanBuildFrom[scala.this.Int], )
and indeed,
implicit def fallbackStringCanBuildFrom[T]: CanBuildFrom[String, T, immutable.IndexedSeq[T]] =
new CanBuildFrom[String, T, immutable.IndexedSeq[T]] {
def apply(from: String) = immutable.IndexedSeq.newBuilder[T]
def apply() = immutable.IndexedSeq.newBuilder[T]
}
What do you mean, your Iterable is not a String?
trait CanBuildFrom[-From, -Elem, +To]
Such is the evil of inferring either Nothing or Any.
Edit: Sorry, I misspoke, I see that you told it Nothing explicitly.
Update:
Since CBF is contravariant in From, a CBF from String serves as a CBF from Nothing.
scala> typeOf[CanBuildFrom[Nothing,Int,Seq[Int]]] <:< typeOf[CanBuildFrom[String,Int,Seq[Int]]]
res0: Boolean = false
scala> typeOf[CanBuildFrom[String,Int,Seq[Int]]] <:< typeOf[CanBuildFrom[Nothing,Int,Seq[Int]]]
res1: Boolean = true
For instance, if you need to build from an immutable.Map, you'd want a CBF from collection.Map to work.
As someone else commented, it's just weird for Nothing. But you get what you asked for. That is, you underspecified, which means you don't mind much what you get back, Vector or whatever.
I agree that this is weird. Why don't you just use Vector.newBuilder, if that's what you're looking for?
scala> val sb = Vector.newBuilder[Int]
sb: scala.collection.mutable.Builder[Int,scala.collection.immutable.Vector[Int]] = scala.collection.immutable.VectorBuilder#1fb7482a
scala> println(sb.getClass.getName)
scala.collection.immutable.VectorBuilder
scala> sb += 1
res1: sb.type = scala.collection.immutable.VectorBuilder#1fb7482a
scala> sb += 2
res2: sb.type = scala.collection.immutable.VectorBuilder#1fb7482a
scala> val s = sb.result()
s: scala.collection.immutable.Vector[Int] = Vector(1, 2)
scala> println(s.getClass.getName)
scala.collection.immutable.Vector