Need some help with Scala flatten.
I have a list of String and List[String].
Example: List("I", "can't", List("do", "this"))
Expecting result: List("I", "can't", "do", "this")
I've done a lot of experiments, and most compact solution is:
val flattenList = list.flatten {
case list: List[Any] => list
case x => List(x)
}
But it seems very tricky and hard to understand. Any suggestions for more naive code?
Thanks.
What's "tricky and hard to understand" is your mixing elements of different type in the same list. That's the root cause of your problem. Once you have that, there is no way around having to scan the list, and inspect the type of each element to correct it, and your solution to that is as good as any (certainly, better, than the one, suggested in the other answer :)).
I would really rethink the code path that leads to having a heterogeneous list like this in the first place though if I were you. This is not really a good approach, because you subvert the type safety this way, and end up with a List[AnyRef], that can contain ... well, anything.
I don't think you can avoid having to deal with 2 cases: single element vs list. One way or another you would have to tell your program what to do. Here is a more general implementation that deals with a list of any depth:
def flattenList(xs: List[Any]): List[Any] =
xs match {
case Nil => Nil
case (ys:List[_]) :: t => flattenList(ys) ::: flattenList(t)
case h :: t => h :: flattenList(t)
}
Example:
scala> flattenList(List("I", "can't", List("do", "this")))
res1: List[Any] = List(I, can't, do, this)
scala> flattenList(List("I", "can't", List("do", List("this", "and", "this"))))
res2: List[Any] = List(I, can't, do, this, and, this)
This does not look very type safe though. Try to use a Tree or something else.
Related
In my team, I often see teammates writing
list.filter(_.isInstanceOf[T]).map(_.asInstanceOf[T])
but this seems a bit redundant to me.
If we know that everything in the filtered list is an instance of T then why should we have to explicitly cast it as such?
I know of one alternative, which is to use match.
eg:
list.match {
case thing: T => Some(thing)
case _ => None
}
but this has the drawback that we must then explicitly state the generic case.
So, given all the above, I have 2 questions:
1) Is there another (better?) way to do the same thing?
2) If not, which of the two options above should be preferred?
You can use collect:
list collect {
case el: T => el
}
Real types just work (barring type erasure, of course):
scala> List(10, "foo", true) collect { case el: Int => el }
res5: List[Int] = List(10)
But, as #YuvalItzchakov has mentioned, if you want to match for an abstract type T, you must have an implicit ClassTag[T] in scope.
So a function implementing this may look as follows:
import scala.reflect.ClassTag
def filter[T: ClassTag](list: List[Any]): List[T] = list collect {
case el: T => el
}
And using it:
scala> filter[Int](List(1, "foo", true))
res6: List[Int] = List(1)
scala> filter[String](List(1, "foo", true))
res7: List[String] = List(foo)
collect takes a PartialFunction, so you shouldn't provide the generic case.
But if needed, you can convert a function A => Option[B] to a PartialFunction[A, B] with Function.unlift. Here is an example of that, also using shapeless.Typeable to work around type erasure:
import shapeless.Typeable
import shapeless.syntax.typeable._
def filter[T: Typeable](list: List[Any]): List[T] =
list collect Function.unlift(_.cast[T])
Using:
scala> filter[Option[Int]](List(Some(10), Some("foo"), true))
res9: List[Option[Int]] = List(Some(10))
but this seems a bit redundant to me.
Perhaps programmers in your team are trying to shield that piece of code from someone mistakenly inserting a type other then T, assuming this is some sort of collection with type Any. Otherwise, the first mistake you make, you'll blow up at run-time, which is never fun.
I know of one alternative, which is to use match.
Your sample code won't work because of type erasure. If you want to match on underlying types, you need to use ClassTag and TypeTag respectively for each case, and use =:= for type equality and <:< for subtyping relationships.
Is there another (better?) way to do the same thing?
Yes, work with the type system, not against it. Use typed collections when you can. You haven't elaborated on why you need to use run-time checks and casts on types, so I'm assuming there is a reasonable explanation to that.
If not, which of the two options above should be preferred?
That's a matter of taste, but using pattern matching on types can be more error-prone since one has to be aware of the fact that types are erased at run-time, and create a bit more boilerplate code for you to maintain.
Working with collections in Scala, it's common to need to use the empty instance of the collection for a base case. Because the default empty instances extend the collection type class with a type parameter of Nothing, it sometimes defeats type inference to use them directly. For example:
scala> List(1, 2, 3).foldLeft(Nil)((x, y) => x :+ y.toString())
<console>:8: error: type mismatch;
found : List[String]
required: scala.collection.immutable.Nil.type
List(1, 2, 3).foldLeft(Nil)((x, y) => x :+ y.toString())
^
fails, but the following two corrections succeed:
scala> List(1, 2, 3).foldLeft(Nil: List[String])((x, y) => x :+ y.toString())
res9: List[String] = List(1, 2, 3)
scala> List(1, 2, 3).foldLeft(List.empty[String])((x, y) => x :+ y.toString())
res10: List[String] = List(1, 2, 3)
Another place I've run into a similar dilemma is in defining default parameters. These are the only examples I could think of off the top of my head, but I know I've seen others. Is one method of providing the correct type hinting preferable to the other in general? Are there places where each would be advantageous?
I tend to use Nil (or None) in combination with telling the type parameterized method the type (like Kigyo) for the specific use case given. Though I think using explicit type annotation is equally OK for the use case given. But I think there are use cases where you want to stick to using .empty, for example, if you try to call a method on Nil: List[String] you first have to wrap it in braces, so that's 2 extra characters!!.
Now the other argument for using .empty is consistency with the entire collections hierarchy. For example you can't do Nil: Set[String] and you can't do Nil: Option[String] but you can do Set.empty[String] and Option.empty[String]. So unless your really sure your code will never be refactored into some other collection then you should use .empty as it will require less faff to refactor. Furthermore it's just generally nice to be consistent right? :)
To be fair I often use Nil and None as I'm often quite sure I'd never want to use a Set or something else, in fact I'd say it's better to use Nil when your really sure your only going to deal with lists because it tells the reader of the code "I'm really really dealing with lists here".
Finally, you can do some cool stuff with .empty and duck typing check this simple example:
def printEmptyThing[K[_], T <: {def empty[A] : K[A]}](c: T): Unit =
println("thing = " + c.empty[String])
printEmptyThing[List, List.type](List)
printEmptyThing[Option, Option.type](Option)
printEmptyThing[Set, Set.type](Set)
will print:
> thing = List()
> thing = None
> thing = Set()
I have an interesting problem which is proving difficult for someone new to Scala.
I need to combine 2 lists:
listA : List[List[Int]]
listB : List[Int]
In the following way:
val listA = List(List(1,1), List(2,2))
val listB = List(3,4)
val listC = ???
// listC: List[List[Int]] = List(List(1,1,3),List(1,1,4),List(2,2,3),List(2,2,4)
In Java, I would use a couple of nested loops:
for(List<Integer> list : listA) {
for(Integer i: listB) {
subList = new ArrayList<Integer>(list);
subList.add(i);
listC.add(subList);
}
}
I'm guessing this is a one liner in Scala, but so far it's eluding me.
You want to perform a flattened Cartesian product. For-comprehensions are the easiest way to do this and may look similar to your Java solution:
val listC = for (list <- listA; i <- listB) yield list :+ i
scand1sk's answer is almost certainly the approach you should be using here, but as a side note, there's another way of thinking about this problem. What you're doing is in effect lifting the append operation into the applicative functor for lists. This means that using Scalaz you can write the following:
import scalaz._, Scalaz._
val listC = (listA |#| listB)(_ :+ _)
We can think of (_ :+ _) as a function that takes a list of things, and a single thing of the same type, and returns a new list:
(_ :+ _): ((List[Thing], Thing) => List[Thing])
Scalaz provides an applicative functor instance for lists, so we can in effect create a new function that adds an extra list layer to each of the types above. The weird (x |#| y)(_ :+ _) syntax says: create just such a function and apply it to x and y. And the result is what you're looking for.
And as in the for-comprehension case, if you don't care about order, you can make the operation more efficient by using :: and flipping the order of the arguments.
For more information, see my similar answer here about the cartesian product in Haskell, this introduction to applicative functors in Scala, or this blog post about making the syntax for this kind of thing a little less ugly in Scala. And of course feel free to ignore all of the above if you don't care.
I have a model, which has some Option fields, which contain another Option fields. For example:
case class First(second: Option[Second], name: Option[String])
case class Second(third: Option[Third], title: Option[String])
case class Third(numberOfSmth: Option[Int])
I'm receiving this data from external JSON's and sometimes this data may contain null's, that was the reason of such model design.
So the question is: what is the best way to get a deepest field?
First.get.second.get.third.get.numberOfSmth.get
Above method looks really ugly and it may cause exception if one of the objects will be None. I was looking in to Scalaz lib, but didn't figure out a better way to do that.
Any ideas?
The solution is to use Option.map and Option.flatMap:
First.flatMap(_.second.flatMap(_.third.map(_.numberOfSmth)))
Or the equivalent (see the update at the end of this answer):
First flatMap(_.second) flatMap(_.third) map(_.numberOfSmth)
This returns an Option[Int] (provided that numberOfSmth returns an Int). If any of the options in the call chain is None, the result will be None, otherwise it will be Some(count) where count is the value returned by numberOfSmth.
Of course this can get ugly very fast. For this reason scala supports for comprehensions as a syntactic sugar. The above can be rewritten as:
for {
first <- First
second <- first .second
third <- second.third
} third.numberOfSmth
Which is arguably nicer (especially if you are not yet used to seeing map/flatMap everywhere, as will certainly be the case after a while using scala), and generates the exact same code under the hood.
For more background, you may check this other question: What is Scala's yield?
UPDATE:
Thanks to Ben James for pointing out that flatMap is associative. In other words x flatMap(y flatMap z))) is the same as x flatMap y flatMap z. While the latter is usually not shorter, it has the advantage of avoiding any nesting, which is easier to follow.
Here is some illustration in the REPL (the 4 styles are equivalent, with the first two using flatMap nesting, the other two using flat chains of flatMap):
scala> val l = Some(1,Some(2,Some(3,"aze")))
l: Some[(Int, Some[(Int, Some[(Int, String)])])] = Some((1,Some((2,Some((3,aze))))))
scala> l.flatMap(_._2.flatMap(_._2.map(_._2)))
res22: Option[String] = Some(aze)
scala> l flatMap(_._2 flatMap(_._2 map(_._2)))
res23: Option[String] = Some(aze)
scala> l flatMap(_._2) flatMap(_._2) map(_._2)
res24: Option[String] = Some(aze)
scala> l.flatMap(_._2).flatMap(_._2).map(_._2)
res25: Option[String] = Some(aze)
There is no need for scalaz:
for {
first <- yourFirst
second <- f.second
third <- second.third
number <- third.numberOfSmth
} yield number
Alternatively you can use nested flatMaps
This can be done by chaining calls to flatMap:
def getN(first: Option[First]): Option[Int] =
first flatMap (_.second) flatMap (_.third) flatMap (_.numberOfSmth)
You can also do this with a for-comprehension, but it's more verbose as it forces you to name each intermediate value:
def getN(first: Option[First]): Option[Int] =
for {
f <- first
s <- f.second
t <- s.third
n <- t.numberOfSmth
} yield n
I think it is an overkill for your problem but just as a general reference:
This nested access problem is addressed by a concept called Lenses. They provide a nice mechanism to access nested data types by simple composition. As introduction you might want to check for instance this SO answer or this tutorial. The question whether it makes sense to use Lenses in your case is whether you also have to perform a lot of updates in you nested option structure (note: update not in the mutable sense, but returning a new modified but immutable instance). Without Lenses this leads to lengthy nested case class copy code. If you do not have to update at all, I would stick to om-nom-nom's suggestion.
I have an Iterator[Option[T]] and I want to get an Iterator[T] for those Options where T isDefined. There must be a better way than this:
it filter { _ isDefined} map { _ get }
I would have thought that it was possible in one construct... Anybody any ideas?
In the case where it is an Iterable
val it:Iterable[Option[T]] = ...
it.flatMap( x => x ) //returns an Iterable[T]
In the case where it is an Iterator
val it:Iterator[Option[T]] = ...
it.flatMap( x => x elements ) //returns an Iterator[T]
it.flatMap( _ elements) //equivalent
In newer versions this is now possible:
val it: Iterator[Option[T]] = ...
val flatIt = it.flatten
This works for me (Scala 2.8):
it.collect {case Some(s) => s}
To me, this is a classic use case for the monadic UI.
for {
opt <- iterable
t <- opt
} yield t
It's just sugar for the flatMap solution described above, and it produces identical bytecode. However, syntax matters, and I think one of the best times to use Scala's monadic for syntax is when you're working with Option, especially in conjunction with collections.
I think this formulation is considerably more readable, especially for those not very familiar with functional programming. I often try both the monadic and functional expressions of a loop and see which seems more straightforward. I think flatMap is hard name for most people to grok (and actually, calling it >>= makes more intuitive sense to me).