Using span function to bisect a Map in Scala - scala

I have a Map of the form [Int, Option[/* of some type that I am using */], thus:
scala> val t1: Map[Int,Option[(String,List[Int])]] = Map(500->Some("A",List(1,2,3)))
t1: Map[Int,Option[(String, List[Int])]] = Map(500 -> Some((A,List(1, 2, 3))))
scala> t1 + (400 -> Some("B",List(9,8,7))) + (300 -> None) + (200 -> None)
res6: scala.collection.immutable.Map[Int,Option[(String, List[Int])]] = Map(500 -> Some((A,List(1, 2, 3))), 400 -> Some((B,List(9, 8, 7))), 300 -> None, 200 -> None)
Now, I am trying to cleave into two maps, one having all the empty Values -from eponymous Key/Value - and the other having none of them, thus:
res6.span(e => e._2.isEmpty)
res7: (scala.collection.immutable.Map[Int,Option[(String, List[Int])]], scala.collection.immutable.Map[Int,Option[(String, List[Int])]]) = (Map(),Map(500 -> Some((A,List(1, 2, 3))), 400 -> Some((B,List(9, 8, 7))), 300 -> None, 200 -> None))
I am failing to understand why I am getting an empty Map on the left, while the < K,None > pairs are sitting blissfully inside the Map on the right. They should have been in the Map on the left, or so I expect.
What is the obvious thing that I am missing?

You should use partition instead of span.
Note: c span p is equivalent to (but possibly more efficient than) (c takeWhile p, c dropWhile p), provided the evaluation of the predicate p does not cause any side-effects.
So, span would stop scanning if the condition is not met.
For example,
scala> val l = List(1, 9, 8, 0)
scala> l.span(e => e < 2)
res7: (List[Int], List[Int]) = (List(1),List(9, 8, 0))
scala> l.partition(e => e < 2)
res8: (List[Int], List[Int]) = (List(1, 0),List(9, 8))
Note that, actually for span, it might return different results for different runs, unless the underlying collection type is ordered.
In your case, the first element in map may not None. (Map is not ordered)

By definition when you use span we get a Tuple2 of sequences that are of the same type as the original collection, one contain true values and other false values.
def span(p: A => Boolean): (Repr, Repr) =
In your case
res6.span(e => e._2.isEmpty)
So in your case you have a empty and non-empty elements of Tuple2.
If you wish to get non-empty values you use simply use _2 as
val nonEmptyValue = res6.span(e => e._2.isEmpty)._2

Related

Scala: How to "map" an Array[Int] to a Map[String, Int] using the "map" method?

I have the following Array[Int]: val array = Array(1, 2, 3), for which I have the following mapping relation between an Int and a String:
val a1 = array.map{
case 1 => "A"
case 2 => "B"
case 3 => "C"
}
To create a Map to contain the above mapping relation, I am aware that I can use a foldLeft method:
val a2 = array.foldLeft(Map[String, Int]()) { (m, e) =>
m + (e match {
case 1 => ("A", 1)
case 2 => "B" -> 2
case 3 => "C" -> 3
})
}
which outputs:
a2: scala.collection.immutable.Map[String,Int] = Map(A -> 1, B -> 2, C
-> 3)
This is the result I want. But can I achieve the same result via the map method?
The following codes do not work:
val a3 = array.map[(String, Int), Map[String, Int]] {
case 1 => ("A", 1)
case 2 => ("B", 2)
case 3 => ("C", 3)
}
The signature of map is
def map[B, That](f: A => B)
(implicit bf: CanBuildFrom[Repr, B, That]): That
What is this CanBuildFrom[Repr, B, That]? I tried to read Tribulations of CanBuildFrom but don't really understand it. That article mentioned Scala 2.12+ has provided two implementations for map. But how come I didn't find it when I use Scala 2.12.4?
I mostly use Scala 2.11.12.
Call toMap in the end of your expression:
val a3 = array.map {
case 1 => ("A", 1)
case 2 => ("B", 2)
case 3 => ("C", 3)
}.toMap
I'll first define your function here for the sake of brevity in later explanation:
// worth noting that this function is effectively partial
// i.e. will throw a `MatchError` if n is not in (1, 2, 3)
def toPairs(n: Int): (String, Int) =
n match {
case 1 => "a" -> 1
case 2 => "b" -> 2
case 3 => "c" -> 3
}
One possible way to go (as already highlighted in another answer) is to use toMap, which only works on collection of pairs:
val ns = Array(1, 2, 3)
ns.toMap // doesn't compile
ns.map(toPairs).toMap // does what you want
It is worth noting however that unless you are working with a lazy representation (like an Iterator or a Stream) this will result in two passes over the collection and the creation of unnecessary intermediate collections: the first time by mapping toPairs over the collection and then by turning the whole collection from a collection of pairs to a Map (with toMap).
You can see it clearly in the implementation of toMap.
As suggested in the read you already linked in the answer (and in particular here) You can avoid this double pass in two ways:
you can leverage scala.collection.breakOut, an implementation of CanBuildFrom that you can give map (among others) to change the target collection, provided that you explicitly provide a type hint for the compiler:
val resultMap: Map[String, Int] = ns.map(toPairs)(collection.breakOut)
val resultSet: Set[(String, Int)] = ns.map(toPairs)(collection.breakOut)
otherwise, you can create a view over your collection, which puts it in the lazy wrapper that you need for the operation to not result in a double pass
ns.view.map(toPairs).toMap
You can read more about implicit builder providers and views in this Q&A.
Basically toMap (credits to Sergey Lagutin) is the right answer.
You could actually make the code a bit more compact though:
val a1 = array.map { i => ((i + 64).toChar, i) }.toMap
If you run this code:
val array = Array(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 0)
val a1 = array.map { i => ((i + 64).toChar, i) }.toMap
println(a1)
You will see this on the console:
Map(E -> 5, J -> 10, F -> 6, A -> 1, # -> 0, G -> 7, L -> 12, B -> 2, C -> 3, H -> 8, K -> 11, D -> 4)

scala map function of map vs. list

Snippet 1:
val l = List(1,2,43,4)
l.map(i => i *2)
Snippet 2:
val s = "dsadadaqer12"
val g = s.groupBy(c=>c)
g.map ( {case (c,s) => (c,s.length)})
In snippet #2, the syntax different than #1 , i.e. curly braces required -- why?
I thought the following would compile, but it does not:
g.map ( (c,s) => (c,s.length))
Can someone explain why?
Thanks
The difference between the two is - the latter uses Pattern Matching and the former doesn't.
The syntax g.map({case (c,s) => (c,s.length)}) is just syntax sugar for:
g.map(v => v match { case (c,s) => (c,s.length) })
Which means: we name the input argument of our anonymous function v, and then in the function body we match it to a tuple (c,s). Since this is so useful, Scala provides the shorthand version you used.
Of course - this doesn't really have anything to do with whether you use a Map or a List - consider all the following possibilities:
scala> val l = List(1,2,43,4)
l: List[Int] = List(1, 2, 43, 4)
scala> l.map({ case i => i*2 })
res0: List[Int] = List(2, 4, 86, 8)
scala> val l2 = List((1,2), (3,4))
l2: List[(Int, Int)] = List((1,2), (3,4))
scala> l2.map({ case (i, j) => i*j })
res1: List[Int] = List(2, 12)
scala> val g = Map(1 -> 2, 3 -> 4)
g: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 3 -> 4)
scala> g.map(t => t._1 * t._2)
res2: scala.collection.immutable.Iterable[Int] = List(2, 12)
Both Map and List can use both syntax options, depending mostly on what you actually want to do.
1- g.map{case (c,s) => (c,s.length)}
2- g.map((c,s) => (c,s.length))
The map method pulls a single argument, a 2-tuple, from the g collection. The 1st example compiles because the case statement uses pattern matching to extract the tuple's elements whereas the 2nd example doesn't and it won't compile. For that you'd have to do something like: g.map(t => (t._1, t._2.length))
As for the parenthesis vs. curly braces: braces have always been required for "partial functions," which is what that case statement is. You can use either braces or parens for anonymous functions (i.e. x => ...) although you are required to use braces if the function is more than a single line (i.e. has a carriage-return).
I read somewhere that this parens/braces distinction might be relaxed but I don't know if that's going to happen any time soon.

fold left operation in Scala?

I am having difficulty understanding how fold left works in Scala.
The following code computes for each unique character in the list chars the number of
times it occurs. For example, the invocation
times(List('a', 'b', 'a'))
should return the following (the order of the resulting list is not important):
List(('a', 2), ('b', 1))
def times(chars: List[Char]): List[(Char, Int)] = {
def incr(acc: Map[Char,Int], c: Char) = {
val count = (acc get c).getOrElse(0) + 1
acc + ((c, count));
}
val map = Map[Char, Int]()
(map /: chars)(incr).iterator.toList
}
I am just confused as to what the last line of this function is actually doing?
Any help wpuld be great.
Thanks.
foldLeft in scala works like this:
suppose you have a list of integers,
val nums = List(2, 3, 4, 5, 6, 7, 8, 9, 10)
val res= nums.foldLeft(0)((m: Int, n: Int) => m + n)
you will get res=55.
lets visualise it.
val res1 = nums.foldLeft(0) { (m: Int, n: Int) => println("m: " + m + " n: " + n);
m + n }
m: 0 n: 1
m: 1 n: 2
m: 3 n: 3
m: 6 n: 4
m: 10 n: 5
m: 15 n: 6
m: 21 n: 7
m: 28 n: 8
m: 36 n: 9
m: 45 n: 10
so, we can see that we need to pass initial accumulator value in foldLeft argument. And accumulated value is stored in 'm' and next value we get in 'n'.
And finally we get the accumulator as result.
Let's start from the "last line" which you are asking about: as the Map trait extends Iterable which in turn extends Traversable where the operator /: is explained, the code (map /: chars)(incr) does fold-left over chars, with the initial value of the accumulator being the empty mapping from characters to integers, applying incr to each intermediate value of acc and each element c of chars.
For example, when chars is List('a', 'b', 'a', 'c'), the fold-left expression (map /: chars)(incr) equals incr(incr(incr(incr(Map[Char, Int](), 'a'), 'b'), 'a'), 'c').
Now, as for what incr does: it takes an intermediate mapping acc from characters to integers, along with a character c, and increments by 1 the integer corresponding to c in the mapping. (Strictly speaking, the mapping is immutable and therefore never mutated: instead, a new, updated mapping is created and returned. Also, getOrElse(0) says that, if c does not exist in acc, the integer to be incremented is considered 0.)
As a whole, given List('a', 'b', 'a', 'c') as chars for example, the final mapping would be List(('a', 2), ('b', 1), ('c', 1)) when converted to a list by toList.
I rewrote your function in a more verbose way:
def times(chars: List[Char]): List[(Char, Int)] = {
chars
.foldLeft(Map[Char, Int]()){ (acc, c) =>
acc + ((c, acc.getOrElse(c, 0) + 1))
}
.toList
}
Let's see the first steps on times("aba".toList)
First invocation:
(Map(), 'a') => Map() ++ Map(`a` -> 1)
Second invocation:
(Map(`a` -> 1), `b`) => Map('a' -> 1) ++ Map('b' ->1)
Third invocation:
(Map('a' -> 1, 'b' ->1), 'a') =>
Map('a' -> 1, 'b' ->1) ++ Map('a' -> 2) =>
Map('a' -> 2, 'b' ->1)
The actual implementation in the scala codebase is very concise:
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
Let me rename stuff for clarity:
def foldLeft[B](initialValue: B)(f: (B, A) => B): B = {
//Notice that both accumulator and collectionCopy are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var collectionCopy = this //the function is inside a collection class, so **this** is the collection
while (!collectionCopy.isEmpty) {
accumulator = f(accumulator , collection.head)
collectionCopy = these.tail
}
accumulator
}
Edit after comment:
Let us revisit now the the OPs function and rewrite it in an imperative manner (i.e. non-functional, which apparently is the source of confusion):
(map /: chars)(incr) is be exactly equivalent to chars.foldLeft(map)(incr), which can be imperatively rewritten as:
def foldLeft(initialValue: Map[Char,Int])(incrFunction: (Map[Char,Int], Char) => Map[Char,Int]): Map[Char,Int] = {
//Notice that both accumulator and charList are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var charList: List[Char] = this //the function is inside a collection class, so **this** is the collection
while (!charList.isEmpty) {
accumulator = incrFunction(accumulator , collection.head)
charList = these.tail
}
accumulator
}
I hope this makes the concept of foldLeft clearer.
So it is essentially an abstraction over an imperative while loop, that accumulates some value by traversing the collection and updating the accumulator. The accumulator is updated using a user-provided function that takes the previous value of the accumulator and the current item of the collection.
Its very description hints that it is a great tool to compute all sorts of aggregates on a collection, like sum, max etc. Yeah, scala collections actually provide all these functions, but they serve as a good example use case.
On the specifics of your question, let me point out that this can be easily done using groupBy:
def times(l: List[Char]) = l.groupBy(c => c).mapValues(_.size).toList
times(List('a','b','a')) // outputs List[(Char, Int)] = List((b,1), (a,2))
.groupBy(c => c) gives you Map[Char,List[Char]] = Map(b -> List(b), a -> List(a, a))
Then we use .mapValues(_.size) to map the values of the map to the size of the grouped sub-collections: Map[Char,Int] = Map(b -> 1, a -> 2).
Finally, you convert the map to a list of key-value tuples with .toList to get the final result.
Lastly, if you don't care about the order of the output list as you said, then leaving the output as a Map[Char,Int] conveys better this decision (instead of converting it to a list).

What is the inverse of intercalate, and how to implement it?

This question discusses how to interleave two lists in an alternating fashion, i.e. intercalate them.
What is the inverse of "intercalate" called?
Is there an idiomatic way to implement this in Scala?
The topic is discussed on this Haskell IRC session.
Possibilities include "deintercalate", "extracalate", "ubercalate", "outercalate", and "chocolate" ;-)
Assuming we go for "extracalate", it can be implemented as a fold:
def extracalate[A](a: List[A]) =
a.foldRight((List[A](), List[A]())){ case (b, (a1,a2)) => (b :: a2, a1) }
For example:
val mary = List("Mary", "had", "a", "little", "lamb")
extracalate(mary)
//> (List(Mary, a, lamb),List(had, little)
Note that the original lists can only be reconstructed if either:
the input lists were the same length, or
the first list was 1 longer than the second list
The second case actually turns out to be useful for the geohashing algorithm, where the latitude bits and longitude bits are intercalated, but there may be an odd number of bits.
Note also that the definition of intercalate in the linked question is different from the definition in the Haskell libraries, which intersperses a list in between a list of lists!
Update: As for any fold, we supply a starting value and a function to apply to each value of the input list. This function modifies the starting value and passes it to the next step of the fold.
Here, we start with a pair of empty output lists: (List[A](), List[A]())
Then for each element in the input list, we add it onto the front of one of the output lists using cons ::. However, we also swap the order of the two output lists , each time the function is invoked; (a1, a2) becomes (b :: a2, a1). This divides the input list between the two output lists in alternating fashion. Because it's a right fold, we start at the end of the input list, which is necessary to get each output list in the correct order. Proceeding from the starting value to the final value, we would get:
([], [])
([lamb], [])
([little],[lamb])
([a, lamb],[little])
([had, little],[a, lamb])
([Mary, a, lamb],[had, little])
Also, using standard methods
val mary = List("Mary", "had", "a", "little", "lamb")
//> mary : List[String] = List(Mary, had, a, little, lamb)
val (f, s) = mary.zipWithIndex.partition(_._2 % 2 == 0)
//> f : List[(String, Int)] = List((Mary,0), (a,2), (lamb,4))
//| s : List[(String, Int)] = List((had,1), (little,3))
(f.unzip._1, s.unzip._1)
//> res0: (List[String], List[String]) = (List(Mary, a, lamb),List(had, little))
Not really recommending it, though, the fold will beat it hands down on performance
Skinning the cat another way
val g = mary.zipWithIndex.groupBy(_._2 % 2)
//> g : scala.collection.immutable.Map[Int,List[(String, Int)]] = Map(1 -> List
//| ((had,1), (little,3)), 0 -> List((Mary,0), (a,2), (lamb,4)))
(g(0).unzip._1, g(1).unzip._1)
//> res1: (List[String], List[String]) = (List(Mary, a, lamb),List(had, little))
Also going to be slow
I think it's inferior to #DNA's answer as it's more code and it requires passing through the list twice.
scala> list
res27: List[Int] = List(1, 2, 3, 4, 5)
scala> val first = list.zipWithIndex.filter( x => x._1 % 2 == 1).map(x => x._2)
first: List[Int] = List(0, 2, 4)
scala> val second = list.zipWithIndex.filter( x => x._1 % 2 == 0).map(x => x._2)
second: List[Int] = List(1, 3)
scala> (first, second)
res28: (List[Int], List[Int]) = (List(0, 2, 4),List(1, 3))

Cannot prove that Unit <:< (T, U)

When trying to remove all Unit - () from a list, I tried to call toMap.
scala> List((), ()).filter(_ != ()).toMap
<console>:8: error: Cannot prove that Unit <:< (T, U).
List((), ()).filter(_ != ()).toMap
^
What does this error mean?
For a List, I'd like to create a map of all tuples (String, String) for non-Unit elements, but some of the values can be null.
scala> val x = List((), (), (3,4)).filter(_ != ()).toMap
<console>:7: error: Cannot prove that Any <:< (T, U).
val x = List((), (), (3,4)).filter(_ != ()).toMap
^
scala> val x = List((), (), (3,4)).filter(_ != ())
x: List[Any] = List((3,4))
scala> x.toMap
<console>:9: error: Cannot prove that Any <:< (T, U).
x.toMap
^
Ah! Now your other question makes a little more sense. Still not sure what you're doing to produce this mixed Unit/Tuple2 list though.
This should work:
List((), (), (3,4)).collect { case t#(_: Int, _: Int) => t }.toMap
Note that I'm using variable binding here (binding the match to t) to return the same Tuple2 instance we matched rather than creating a new one.
By using collect you convert the type of your list from List[Any] to List[(Int, Int)], which is what toMap wants since it's expecting some List[(A,B)].
Note: Although this answer should work for you, I still think your design is flawed. You'd be better off fixing the underlying design flaw rather than treating the symptoms like this.
It looks like this would be a good fit for using Scala's Option type. In this case, your sample list would become List(None, None, Some((3,4))), or you could write it as List(None, None, Some(3->4)) for readability (nested parenthesis like that can get confusing).
If you use Option then the type of your list becomes List[Option[(Int, Int)]], which should be much nicer to deal with than a List[Any]. To get rid of the None entries and get the desired List[(Int,Int)] you can just call flatten:
List(None, None, Some(3->4)).flatten
// res0: List[(Int, Int)] = List((3,4))
List(None, None, Some(3->4)).flatten.toMap
// res1: scala.collection.immutable.Map[Int,Int] = Map(3 -> 4)
However, it would be even better if you can avoid putting the None entries in your list in the first place. If you're producing this list using a Scala for comprehension, you could use a guard in your for expression to remove the invalid elements from the output.
It means that the type of an element in the list can't be viewed as a tuple which is required to build a Map. A Map in a sense is a collection of tuples (and more).
Illustration:
scala> List(1).toMap
<console>:8: error: Cannot prove that Int <:< (T, U).
List(1).toMap
^
scala> List(1 -> 2).toMap
res1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2)
I can build a map from a list of tuples, but not from a list of single cardinality elements.
Maybe you mean to say .map instead of .toMap? ;)
All in one go:
scala> val l2 = List(1 -> 3, (), 4 -> 4, (), 9 -> 4, (), 16 -> 7)
l2: List[Any] = List((1,3), (), (4,4), (), (9,4), (), (16,7))
scala> (l2 collect { case (a, b) => (a, b) }).toMap
res4: scala.collection.immutable.Map[Any,Any] = Map(1 -> 3, 4 -> 4, 9 -> 4, 16 -> 7)
Better typed:
scala> (l2 collect { case (i: Int, j: Int) => (i, j) }).toMap
res5: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3, 4 -> 4, 9 -> 4, 16 -> 7)