How does Scala's groupBy identity work? - scala

I was browsing around and found a question about grouping a String by it's characters, such as this:
The input:
"aaabbbccccdd"
Would produce the following output:
"aaa"
"bbb"
"cccc"
"ddd"
and I found this suggestion:
val str = "aaabbbccccdd"[
val list = str.groupBy(identity).toList.sortBy(_._1).map(_._2)
And this identity fellow got me curious. I found out it is defined in PreDef like this:
identity[A](x: A): A
So basically it returns whatever it is given, right? but how does that apply in the call to groupBy?
I'm sorry if this is a basic question, is just that functional programming is still tangling my brains a little. Please let me know if there's any information I can give to make this question clearer

This is your expression:
val list = str.groupBy(identity).toList.sortBy(_._1).map(_._2)
Let's go item by function by function. The first one is groupBy, which will partition your String using the list of keys passed by the discriminator function, which in your case is identity. The discriminator function will be applied to each character in the screen and all characters that return the same result will be grouped together. If we want to separate the letter a from the rest we could use x => x == 'a' as our discriminator function. That would group your string chars into the return of this function (true or false) in map:
Map(false -> bbbccccdd, true -> aaa)
By using identity, which is a "nice" way to say x => x, we get a map where each character gets separated in map, in your case:
Map(c -> cccc, a -> aaa, d -> dd, b -> bbb)
Then we convert the map to a list of tuples (char,String) with toList.
Order it by char with sortBy and just keep the String with the map getting your final result.

To understand this just call scala repl with -Xprint:typer option:
val res2: immutable.Map[Char,String] = augmentString(str).groupBy[Char]({
((x: Char) => identity[Char](x))
});
Scalac converts a simple String into StringOps with is a subclass of TraversableLike which has a groupBy method:
def groupBy[K](f: A => K): immutable.Map[K, Repr] = {
val m = mutable.Map.empty[K, Builder[A, Repr]]
for (elem <- this) {
val key = f(elem)
val bldr = m.getOrElseUpdate(key, newBuilder)
bldr += elem
}
val b = immutable.Map.newBuilder[K, Repr]
for ((k, v) <- m)
b += ((k, v.result))
b.result
}
So groupBy contains a map into which inserts chars return by identity function.

First, let's see what happens when you iterate over a String:
scala> "asdf".toList
res1: List[Char] = List(a, s, d, f)
Next, consider that sometimes we want to group elements on the basis of some specific attribute of an object.
For instance, we might group a list of strings by length as in...
List("aa", "bbb", "bb", "bbb").groupBy(_.length)
What if you just wanted to group each item by the item itself. You could pass in the identity function like this:
List("aa", "bbb", "bb", "bbb").groupBy(identity)
You could do something silly like this, but it would be silly:
List("aa", "bbb", "bb", "bbb").groupBy(_.toString)

Take a look at
str.groupBy(identity)
which returns
scala.collection.immutable.Map[Char,String] = Map(b -> bbb, d -> dd, a -> aaa, c -> cccc)
so the key by which the elements are grouped by is the character.

Whenever you try to use methods such as groupBy on the String. It's important to note that it is implicitly converted to StringOps and not List[Char].
StringOps
The signature of groupBy is given by-
def groupBy[K](f: (Char) ⇒ K): Map[K, String]
Hence, the result is in the form -
Map[Char,String]
List[Char]
The signature of groupBy is given by-
def groupBy[K](f: (Char) ⇒ K): Map[K, List[Char]]
If it had been implicitly converted to List[Char] the result would be of the form -
Map[Char,List[Char]]
Now this should implicitly answer your curious question, as how scala figured out to groupBy on Char (see the signature) and yet give you Map[Char, String].

Basically list.groupBy(identity) is just a fancy way of saying list.groupBy(x => x), which in my opinion is clearer. It groups a list containing duplicate items by those items.

Related

Convert List[(Int,String)] into List[Int] in scala

My goal is to to map every word in a text (Index, line) to a list containing the indices of every line the word occurs in. I managed to write a function that returns a list of all words assigned to a index.
The following function should do the rest (map a list of indices to every word):
def mapIndicesToWords(l:List[(Int,String)]):Map[String,List[Int]] = ???
If I do this:
l.groupBy(x => x._2)
it returns a Map[String, List[(Int,String)]. Now I just want to change the value to type List[Int].
I thought of using .mapValues(...) and fold the list somehow, but I'm new to scala and don't know the correct approach for this.
So how do I convert the list?
Also you can use foldLeft, you need just specify accumulator (in your case Map[String, List[Int]]), which will be returned as a result, and write some logic inside. Here is my implementation.
def mapIndicesToWords(l:List[(Int,String)]): Map[String,List[Int]] =
l.foldLeft(Map[String, List[Int]]())((map, entry) =>
map.get(entry._2) match {
case Some(list) => map + (entry._2 -> (entry._1 :: list))
case None => map + (entry._2 -> List(entry._1))
}
)
But with foldLeft, elements of list will be in reversed order, so you can use foldRight. Just change foldLeft to foldRight and swap input parameters, (map, entry) to (entry, map).
And be careful, foldRight works 2 times slower. It is implemented using method reverse list and foldLeft.
scala> val myMap: Map[String,List[(Int, String)]] = Map("a" -> List((1,"line1"), (2, "line")))
myMap: Map[String,List[(Int, String)]] = Map(a -> List((1,line1), (2,line)))
scala> myMap.mapValues(lst => lst.map(pair => pair._1))
res0: scala.collection.immutable.Map[String,List[Int]] = Map(a -> List(1, 2))

FlatMap behavior in scala

I am trying to get the hang of the flatMap implementation in Scala. Based on the definition in Scala programming
Function returning a list of elements as its right argument. It applies the function to each list and returns the concatenation of all function results.
Now to understand this, I have following implementations
val listwords = List(List("abc"),List("def"),List("ghi"))
val res2 = listwords flatMap (_+"1")
println(res2) //output- List(L, i, s, t, (, a, b, c, ), 1, L, i, s, t, (, d, e, f, ), 1, L, i, s, t, (, g, h, i, ), 1)
val res3 = listwords flatMap (_.apply(0).toCharArray())
println(res3) //output- List(a, b, c, d, e, f, g, h, i)
Looking at first output which drives me crazy, why is List[List[String]] treated like List[String]?
After all with answer for above question, someone please help me to perform an operation which needs to pick the first character of the first string of each inner and result in a List[Char]. So given the listwords, I want the output to be List('a', 'd', 'g').
List("abc") + "1" is equivalent to List("abc").toString + "1" so it returns the string "List(a, b, c)1". The type of List.flatMap is
flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): List[B]
and your function has type (List[String] => String). String extends GenTraversableOnce[Char] so your result list has type List[Char].
The code listwords flatMap (_+"1") can be rewritten as listwords flatMap (list => list.toString + "1"). So you basically transformed all lists to strings using toString method.
To obtain first characters you can use the following expression:
listwords.flatMap(_.headOption).flatMap(_.headOption)
First of all, you need to understand the difference between the map and the flatMap methods. Both of them iterate over some container and apply a function literal to every element. The difference is that the flatMap is making one more additional operation: it flattens the structure of the container. There is also a method that allows you just to do the flattening and its called flatten (so the flatMap is the equivalent of the map operation followed by the flatten operation). The second thing you have to remember is that you're modifying (mapping over) nested lists, so you need to nest your map/flatMap calls as well. Those examples should clarify all of those things to you:
scala> val wordLists = List(List("abc"),List("de"),List("f"), List())
wordLists: List[List[String]] = List(List(abc), List(de), List(f), List())
scala> val words = wordsLists.flatten
words: List[String] = List(abc, de, f)
scala> val replacedWordLists = wordsLists.map(_ => List("xyz"))
replacedWordLists: List[List[String]] = List(List(xyz), List(xyz), List(xyz), List(xyz))
scala> val replacedWords = wordsLists.map(_ => List("xyz")).flatten // Equivalent: wordsLists.flatMap(_ => List("xyz"))
replacedWords: List[String] = List(xyz, xyz, xyz, xyz)
scala> val upperCaseWordLists = wordsLists.map(_.map(_.toUpperCase))
upperCaseWordLists: List[List[String]] = List(List(ABC), List(DE), List(F), List())
scala> val upperCaseWords = wordsLists.map(_.map(_.toUpperCase)).flatten // Equivalent: wordsLists.flatMap(_.map(_.toUpperCase))
upperCaseWords: List[String] = List(ABC, DE, F)
scala> val optionalFirstLetterLists = wordLists.map(_.map(_.headOption))
optionalFirstLetterLists: List[List[Option[Char]]] = List(List(Some(a)), List(Some(d)), List(Some(f)), List())
scala> val optionalFirstLetters = wordLists.map(_.map(_.headOption)).flatten // Equivalent: wordLists.flatMap(_.map(_.headOption))
optionalFirstLetters: List[Option[Char]] = List(Some(a), Some(d), Some(f))
scala> val firstLetterLists = wordLists.map(_.map(_.headOption).flatten) // Equivalent: wordLists.map(_.flatMap(_.headOption))
firstLetterLists: List[List[Char]] = List(List(a), List(d), List(f), List())
scala> val firstLetters = wordLists.map(_.flatMap(_.headOption)).flatten // Equivalent: wordLists.flatMap(_.flatMap(_.headOption))
firstLetters: List[Char] = List(a, d, f)
_+"1" isn't doing what you think it's doing.
It's interpreted as list: List[String] => list.+("1")
Since List[String] doesn't contain such a method, the compiler looks for an implicit conversion in scope. It finds any2stringadd. (See http://docs.scala-lang.org/tutorials/tour/implicit-conversions for more on implicit conversions)
implicit final class any2stringadd[A](private val self: A) extends AnyVal {
def +(other: String): String = String.valueOf(self) + other
}
list: List[String] => list.+("1")
now turns into
list: List[String] => new any2stringadd(list).+("1")
which returns String.valueOf(list) + "1"

Iterable with two elements?

We have Option which is an Iterable over 0 or 1 elements.
I would like to have such a thing with two elements. The best I have is
Array(foo, bar).map{...}, while what I would like is:
(foo, bar).map{...}
(such that Scala recognized there are two elements in the Iterable).
Does such a construction exist in the standard library?
EDIT: another solution is to create a map method:
def map(a:Foo) = {...}
val (mappedFoo, mappedBar) = (map(foo), map(bar))
If all you want to do is map on tuples of the same type, a simple version is:
implicit class DupleOps[T](t: (T,T)) {
def map[B](f : T => B) = (f(t._1), f(t._2))
}
Then you can do the following:
val t = (0,1)
val (x,y) = t.map( _ +1) // x = 1, y = 2
There's no specific type in the scala standard library for mapping over exactly 2 elements.
I can suggest you the following thing (I suppose foo and bar has the same type T):
(foo, bar) // -> Tuple2[T,T]
.productIterator // -> Iterator[Any]
.map(_.asInstanceOf[T]) // -> Iterator[T]
.map(x => // some works)
No, it doesn't.
You could
Make one yourself.
Write an implicit conversion from 2-tuples to a Seq of the common supertype. But this won't yield 2-tuples from operations.
object TupleOps {
implicit def tupleToSeq[A <: C, B <: C](tuple: (A, B)): Seq[C] = Seq(tuple._1,tuple._2)
}
import TupleOps._
(0, 1).map(_ + 1)
Use HLists from shapeless. These provide operations on heterogenous lists, whereas you (probably?) have a homogeneous list, but it should work.

Create Map from Option of List

I'm trying to create a map from an option of list. So, I have an option of list declared like this:
val authHeaders: Option[Set[String]] = Some(Set("a", "b", "c"))
and I want to get a map like this: (a -> a, b -> b, c -> c).
So I tried this way:
for {
headers <- authHeaders
header <- headers
} yield (header -> header)
But I get this error:
<console>:11: error: type mismatch;
found : scala.collection.immutable.Set[(String, String)]
required: Option[?]
header <- headers
^
Where did I do wrong?
Additional note: this Option thing has been giving me quite a headache, but I need to understand how to deal with it in any case. Anyway, just for comparison, I tried removing the headache factor, by removing the Option.
scala> val bah = Set("a", "b", "c")
bah: scala.collection.immutable.Set[String] = Set(a, b, c)
scala> (
| for {
| x <- bah
| } yield (x -> x)).toMap
res36: scala.collection.immutable.Map[String,String] = Map(a -> a, b -> b, c -> c)
So, apparently it works. What is the difference here?
Additional note:
Looks like the rule of the game for the "for comprehension" here: if it produces something, that something must be of the same type of the outer collection (in this case that of authHeaders, which is an Option[?]). How to work around it?
Thanks!,
Raka
The Problem
Your for gets desugared into:
authHeaders.flatMap(headers => headers.map(header => header -> header))
The problem in this case is the usage of flatMap, because authHeaders is an Option.
Lets have a look at the signature. (http://www.scala-lang.org/api/2.11.1/index.html#scala.Option)
final def flatMap[B](f: (A) ⇒ Option[B]): Option[B]
So the function f is expected to return an Option. But authHeaders.map(header => header -> header) is not an Option and therefore you get an error.
A solution
Assuming that if authHeaders is None you want an empty Map, we can use fold.
authHeaders.fold(Map.empty[String, String])(_.map(s => s -> s).toMap)
The first parameter is the result if authHeaders is None. The second is expected to be a function Set[String] => Map[String, String] and gets evaluated if there is some Set.
In case you want to keep the result in an Option and just want to have a Map when there actually is some Set, you can simply use map.
authHeaders.map(_.map(s => s -> s).toMap)
Regarding your additional Note
This is the signature of flatMap on TraversableOnce. (http://www.scala-lang.org/api/2.11.1/index.html#scala.collection.TraversableOnce)
def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): TraversableOnce[B]
Here f can return any collection that is an instance of GenTraversableOnce.
So things like this are possible: Set(1,2,3).flatMap(i => List(i)) (not really a creative example, I know..)
I see Option as a special case.

Multi-Assignment based on Collection

Edit
originally the question was "Collection to Tuple" as I assumed I needed a tuple in order to do variable multi-assignment. It turns out that one can do variable multi-assignment directly on collections. Retitled the question accordingly.
Original
Have a simple Seq[String] derived from a regex that I would like to convert to a Tuple.
What's the most direct way to do so?
I currently have:
val(clazz, date) = captures match {
case x: Seq[String] => (x(0), x(1))
}
Which is ok, but my routing layer has a bunch of regex matched routes that I'll be doing val(a,b,c) multi-assignment on (the capture group is always known since the route is not processed if regex does not match). Would be nice to have a leaner solution than match { case.. => ..}
What's the shortest 1-liner to convert collections to tuples in Scala?
This is not an answer to the question but might solve the problem in a different way.
You know you can match a xs: List[String] like so:
val a :: b :: c :: _ = xs
This assigns the first three elements of the list to a,b,c? You can match other things like Seq in the declaration of a val just like inside a case statement. Be sure you take care of matching errors:
Catching MatchError at val initialisation with pattern matching in Scala?
You can make it slightly nicer using |> operator from Scalaz.
scala> val captures = Vector("Hello", "World")
captures: scala.collection.immutable.Vector[java.lang.String] = Vector(Hello, World)
scala> val (a, b) = captures |> { x => (x(0), x(1)) }
a: java.lang.String = Hello
b: java.lang.String = World
If you don't want to use Scalaz, you can define |> yourself as shown below:
scala> class AW[A](a: A) {
| def |>[B](f: A => B): B = f(a)
| }
defined class AW
scala> implicit def aW[A](a: A): AW[A] = new AW(a)
aW: [A](a: A)AW[A]
EDIT:
Or, something like #ziggystar's suggestion:
scala> val Vector(a, b) = captures
a: java.lang.String = Hello
b: java.lang.String = World
You can make it more concise as shown below:
scala> val S = Seq
S: scala.collection.Seq.type = scala.collection.Seq$#157e63a
scala> val S(a, b) = captures
a: java.lang.String = Hello
b: java.lang.String = World
As proposed by #ziggystar in comments you can do something like:
val (clazz, date) = { val a::b::_ = capture; (a, b)}
or
val (clazz, date) = (capture(0), capture(1))
If you verified the type of the list before it is OK, but take care of the length of the Seq because the code will run even if the list is of size 0 or 1.
Your question is originally specifically about assigning the individual capturing groups in a regex, which already allow you to assign from them directly:
scala> val regex = """(\d*)-(\d*)-(\d*)""".r
regex: scala.util.matching.Regex = (\d*)-(\d*)-(\d*)
scala> val regex(a, b, c) = "29-1-2012"
d: String = 29
m: String = 1
y: String = 2012
obviously you can use these in a case as well:
scala> "29-1-2012" match { case regex(d, m, y) => (y, m, d) }
res16: (String, String, String) = (2012,1,29)
and then group these as required.
Seqs to tuple
To perform multi-assignment from a Seq, what about the following?
val Seq(clazz, date) = captures
As you see, no need to restrict to Lists; this code will throw a MatchError if the length does not match (in your case, that's good because it means that you made a mistake). You can then add
(clazz, date)
to recreate the tuple.
Tuples from matches
However, Jed Wesley-Smith posted a solution which avoids this problem and solves the original question better. In particular, in your solution you have a Seq whose length is not specified, so if you make a mistake the compiler won't tell you; with tuples instead the compiler can help you (even if it can't check against the regexp).