I'm just trying to understand the logic behind the implementation of Scala's List[A] map[B]
Scala 2.12 describes it as such:
final override def map[B](f: A => B): List[B] = {
if (this eq Nil) Nil else {
val h = new ::[B](f(head), Nil)
var t: ::[B] = h
var rest = tail
while (rest ne Nil) {
val nx = new ::(f(rest.head), Nil)
t.next = nx
t = nx
rest = rest.tail
}
releaseFence()
h
}
}
I am confused as to how can h be a new list with f applied to each of its elements. h is declared as a val, so how can it be affected by what happens inside the while loop?
does reassigning the value of t retroactively changes the value of h? Wouldn't that make h mutable though?
I'll use as a reference the version of List.scala on latest available commit on the 2.12.x branch of the Scala repository on GitHub, which is slightly different but similar enough.
final override def map[B, That](f: A => B)(implicit bf: CanBuildFrom[List[A], B, That]): That = {
if (isLikeListReusableCBF(bf)) {
if (this eq Nil) Nil.asInstanceOf[That] else {
val h = new ::[B](f(head), Nil)
var t: ::[B] = h
var rest = tail
while (rest ne Nil) {
val nx = new ::(f(rest.head), Nil)
t.tl = nx
t = nx
rest = rest.tail
}
h.asInstanceOf[That]
}
}
else super.map(f)
}
First notice that a val is an immutable reference, not a immutable value. You can mutate an object through an immutable reference if you have access to its mutable state. Notice how t is taken as a mutable reference to h (the value which is ultimately returned) and then a singleton list is associated with t.tl = nx (which corresponds to t.next = nx in your snippet, I believe).
That tl field is the tail of the :: type, kept as mutable state for internal usage only, as you can see here:
/** A non empty list characterized by a head and a tail.
* #param head the first element of the list
* #param tl the list containing the remaining elements of this list after the first one.
* #tparam B the type of the list elements.
* #author Martin Odersky
* #since 2.8
*/
#SerialVersionUID(509929039250432923L) // value computed by serialver for 2.11.2, annotation added in 2.11.4
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
override def tail : List[B] = tl
override def isEmpty: Boolean = false
}
Related
The migration guide to Scala 2.13 explains that Traversable has been removed and that Iterable should be used instead. This change is particularly annoying for one project, which is using a visitor to implement the foreach method in the Node class of a tree:
case class Node(val subnodes: Seq[Node]) extends Traversable[Node] {
override def foreach[A](f: Node => A) = Visitor.visit(this, f)
}
object Visitor {
def visit[A](n: Node, f: Node => A): Unit = {
f(n)
for (sub <- n.subnodes) {
visit(sub, f)
}
}
}
object Main extends App {
val a = Node(Seq())
val b = Node(Seq())
val c = Node(Seq(a, b))
for (Node(subnodes) <- c) {
Console.println("Visiting a node with " + subnodes.length + " subnodes")
}
}
Output:
Visiting a node with 2 subnodes
Visiting a node with 0 subnodes
Visiting a node with 0 subnodes
An easy fix to migrate to Scala 2.13 is to first store the visited elements in a remaining buffer, which is then used to return an iterator:
import scala.collection.mutable
import scala.language.reflectiveCalls
case class Node(val subnodes: Seq[Node]) extends Iterable[Node] {
override def iterator: Iterator[Node] = {
val remaining = mutable.Queue.empty[Node]
Visitor.visit(this, item => iterator.remaining.enqueue(item))
remaining.iterator
}
}
// Same Visitor object
// Same Main object
This solution has the disadvantages that it introduces new allocations that put pressure on the GC, because the number of visited elements is usually quite large.
Do you have suggestions on how to migrate from Traversable into Iterable, using the existing visitor but without introducing new allocations?
As you noticed, you need to extend Iterable instead of Traversable. You can do it like this:
case class Node(name: String, subnodes: Seq[Node]) extends Iterable[Node] {
override def iterator: Iterator[Node] = Iterator(this) ++ subnodes.flatMap(_.iterator)
}
val a = Node("a", Seq())
val b = Node("b", Seq())
val c = Node("c", Seq(a, b))
val d = Node("d", Seq(c))
for (node#Node(name, _) <- d) {
Console.println("Visiting node " + name + " with " + node.subnodes.length + " subnodes")
}
outputs:
Visiting node d with 1 subnodes
Visiting node c with 2 subnodes
Visiting node a with 0 subnodes
Visiting node b with 0 subnodes
Then you can do more operations such as:
d.count(_.subnodes.length > 1)
Code run at Scastie.
This is an example that your code can be implemented with LazyList and that visitor is not needed:
case class Node(val subnodes: Seq[Node]) {
def recursiveMap[A](f: Node => A): LazyList[A] = {
def expand(node: Node): LazyList[Node] = node #:: LazyList.from(node.subnodes).flatMap(expand)
expand(this).map(f)
}
}
val a = Node(Seq())
val b = Node(Seq())
val c = Node(Seq(a, b))
val lazyList = c.recursiveMap { node =>
println("computing value")
"Visiting a node with " + node.subnodes.length + " subnodes"
}
println("started computing values")
lazyList.iterator.foreach(println)
output
started computing values
computing value
Visiting a node with 2 subnodes
computing value
Visiting a node with 0 subnodes
computing value
Visiting a node with 0 subnodes
If you won't store lazyList reference yourself and only iterator, then JVM would be able to GC values as you go.
We ended up writing a minimal Traversable trait, implementing just the methods that are used in our codebase. This way there is no additional overhead and the visitor's logic doesn't need to be changed.
import scala.collection.mutable
/** A trait for traversable collections. */
trait Traversable[+A] {
self =>
/** Applies a function to all element of the collection. */
def foreach[B](f: A => B): Unit
/** Creates a filter of this traversable collection. */
def withFilter(p: A => Boolean): Traversable[A] = new WithFilter(p)
class WithFilter(p: A => Boolean) extends Traversable[A] {
/** Applies a function to all filtered elements of the outer collection. */
def foreach[U](f: A => U): Unit = {
for (x <- self) {
if (p(x)) f(x)
}
}
/** Further refines the filter of this collection. */
override def withFilter(q: A => Boolean): WithFilter = {
new WithFilter(x => p(x) && q(x))
}
}
/** Finds the first element of this collection for which the given partial
* function is defined, and applies the partial function to it.
*/
def collectFirst[B](pf: PartialFunction[A, B]): Option[B] = {
for (x <- self) {
if (pf.isDefinedAt(x)) {
return Some(pf(x))
}
}
None
}
/** Builds a new collection by applying a partial function to all elements
* of this collection on which the function is defined.
*/
def collect[B](pf: PartialFunction[A, B]): Iterable[B] = {
val elements = mutable.Queue.empty[B]
for (x <- self) {
if (pf.isDefinedAt(x)) {
elements.append(pf(x))
}
}
elements
}
}
This is a sequel to Why can't I run such scala code? (reduceRight List class method)
I am following the accepted answer there to rename my abstract class and add list: List[T] as a parameter of the method foldRight.
abstract class ListT {
def foldRight[U](z : U)(list: List[T], op: (T, U) => U): U = list match {
case Nil => z
case x :: xs => op(x, foldRight(z)(xs, op))
}
}
But I still get the error for the 'def' line
multiple markers at this line, not found: type T
Lets start by scratch, that should help you grasp the concepts.
We will create our own simple functional List.
It will be called MyList, will have an empty list called MyNil and the cons class / operator as :!:.
// Here we create the type MyList.
// The sealed is used to signal that the only valid implementations
// will be part of this file. This is because, a List is an ADT.
// The A (which could be a T or whatever) is just a type parameter
// it means that our list can work with any arbitrary type (like Int, String or My Class)
// and we just give it a name, in order to be able to refer to it in the code.
// Finally, the plus (+) sign, tells the compiler that MyList is covariant in A.
// That means: If A <: B Then MyList[A] <: MyList[B]
// (<: means subtype of)
sealed trait MyList[+A] {
def head: A // Here we say that the head of a List of As is an A.
def tail: MyList[A] // As well, a tail of a List of As is another list of As.
// Here we define the cons operator, to prepend elements to the list.
// You can see that it will just create a new cons class with the new element as the head & this as the tail.
// Now, you may be wondering why we added a new type B and why it must be a super type of A
// You can check out this answer of mine:
// https://stackoverflow.com/questions/54163830/implementing-a-method-inside-a-scala-parameterized-class-with-a-covariant-type/54164135#54164135
final def :!:[B >: A](elem: B): MyList[B] =
new :!:(elem, this)
// Finally, foldRigh!
// You can see that we added a new type parameter B.
// In this case, it does not have any restriction because the way fold works.
final def foldRight[B](z: B)(op: (A, B) => B): B = this match {
case MyNil => z
case h :!: t => op(h, t.foldRight(z)(op))
}
}
object MyList {
// Factory.
def apply[A](elems: A*): MyList[A] =
if (elems.nonEmpty) {
elems.head :!: MyList(elems.tail : _*)
} else {
MyNil
}
}
// Implementations of the MyList trait.
final case class :!:[+A](head: A, tail: MyList[A]) extends MyList[A]
final case object MyNil extends MyList[Nothing] {
override def head = throw new NoSuchElementException("head of empty list")
override def tail = throw new NoSuchElementException("tail of empty list")
}
Now you can:
val l1 = MyList(2, 3, 4) // l1: MyList[Int] = 2 :!: 3 :!: 4 :!: MyNil
val l2 = 1 :!: l1 // // l2: MyList[Int] = 1 :!: 2 :!: 3 :!: 4 :!: MyNil
val sum = l2.foldRight(0)(_ + _) // sum: Int = 10
foldLeft needs only one element from the collection before operating. So why does it try to resolve two of them? Couldn't it be just a little bit lazier?
def stream(i: Int): Stream[Int] =
if (i < 100) {
println("taking")
i #:: stream(i + 1)
} else Stream.empty
scala> stream(97).foldLeft(0) { case (acc, i) =>
println("using")
acc + i
}
taking
taking
using
taking
using
using
res0: Int = 294
I ask this because I have a built a stream around a mutable priority queue, wherein the iteration of the fold can inject new members into the stream. It starts off with one value and during the first iteration injects more values. But those other values are never seen because the stream has already been resolved to empty in position 2 before the first iteration.
Can only explain why it's happening. Here is source of stream's #:: (Cons):
final class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A] {
override def isEmpty = false
override def head = hd
#volatile private[this] var tlVal: Stream[A] = _
#volatile private[this] var tlGen = tl _
def tailDefined: Boolean = tlGen eq null
override def tail: Stream[A] = {
if (!tailDefined)
synchronized {
if (!tailDefined) {
tlVal = tlGen()
tlGen = null
}
}
tlVal
}
}
So you can see that head is always calculated (isn't lazy). Here is foldLeft:
override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
if (this.isEmpty) z
else tail.foldLeft(op(z, head))(op)
}
You can see that tail is called here, which means that "head of tail" (second element) becomes calculated automatically (as it requires your stream function to be called again to generate tail). So the better question isn't "why second" - the question is why Stream always calculates its first element. I don't know the answer, but believe that scala-library's implementation could be improved just by making head lazy inside Cons, so you could pass someLazyCalculation #:: stream(i + 1).
Note that eitherway your stream function will be called twice, but second approach gives you a way to avoid automatical second head's calculation by providing some lazy value as a head. Smthng like this could work then (now it doesn't):
def stream(i: Int): Stream[Int] =
if (i < 100) {
lazy val ii = {
println("taking")
i
}
ii #:: stream(i + 1)
} else Stream.empty
P.S. It's probably not so good idea to build (eventually) immutable collection around mutable one.
I'd like to link 2 columns of unique identifiers and be able to get a first column value by a second column value as well as a second column value by a first column value. Something like
Map(1 <-> "one", 2 <-> "two", 3 <-> "three")
Is there such a facility in Scala?
Actually I need even more: 3 columns to select any in a triplet by another in a triplet (individual values will never be met more than once in the entire map). But a 2-column bidirectional map can help too.
Guava has a bimap that you can use along with
import scala.collection.JavaConversions._
My BiMap approach:
object BiMap {
private[BiMap] trait MethodDistinctor
implicit object MethodDistinctor extends MethodDistinctor
}
case class BiMap[X, Y](map: Map[X, Y]) {
def this(tuples: (X,Y)*) = this(tuples.toMap)
private val reverseMap = map map (_.swap)
require(map.size == reverseMap.size, "no 1 to 1 relation")
def apply(x: X): Y = map(x)
def apply(y: Y)(implicit d: BiMap.MethodDistinctor): X = reverseMap(y)
val domain = map.keys
val codomain = reverseMap.keys
}
val biMap = new BiMap(1 -> "A", 2 -> "B")
println(biMap(1)) // A
println(biMap("B")) // 2
Of course one can add syntax for <-> instead of ->.
Here's a quick Scala wrapper for Guava's BiMap.
import com.google.common.{collect => guava}
import scala.collection.JavaConversions._
import scala.collection.mutable
import scala.languageFeature.implicitConversions
class MutableBiMap[A, B] private (
private val g: guava.BiMap[A, B] = new guava.HashBiMap[A, B]()) {
def inverse: MutableBiMap[B, A] = new MutableBiMap[B, A](g.inverse)
}
object MutableBiMap {
def empty[A, B]: MutableBiMap[A, B] = new MutableBiMap()
implicit def toMap[A, B] (x: MutableBiMap[A, B]): mutable.Map[A,B] = x.g
}
I have a really simple BiMap in Scala:
case class BiMap[A, B](elems: (A, B)*) {
def groupBy[X, Y](pairs: Seq[(X, Y)]) = pairs groupBy {_._1} mapValues {_ map {_._2} toSet}
val (left, right) = (groupBy(elems), groupBy(elems map {_.swap}))
def apply(key: A) = left(key)
def apply[C: ClassTag](key: B) = right(key)
}
Usage:
val biMap = BiMap(1 -> "x", 2 -> "y", 3 -> "x", 1 -> "y")
assert(biMap(1) == Set("x", "y"))
assert(biMap("x") == Set(1, 3))
I don't think it exists out of the box, because the generic behavior is not easy to extract
How to handle values matching several keys in a clean api?
However for specific cases here is a good exercise that might help. It must be updated because no hash is used and getting a key or value is O(n).
But the idea is to let you write something similar to what you propose, but using Seq instead of Map...
With the help of implicit and trait, plus find, you could emulate what you need with a kind of clean api (fromKey, fromValue).
The specificities is that a value is not supposed to appear in several places... In this implementation at least.
trait BiMapEntry[K, V] {
def key:K
def value:V
}
trait Sem[K] {
def k:K
def <->[V](v:V):BiMapEntry[K, V] = new BiMapEntry[K, V]() { val key = k; val value = v}
}
trait BiMap[K, V] {
def fromKey(k:K):Option[V]
def fromValue(v:V):Option[K]
}
object BiMap {
implicit def fromInt(i:Int):Sem[Int] = new Sem[Int] {
def k = i
}
implicit def fromSeq[K, V](s:Seq[BiMapEntry[K, V]]) = new BiMap[K, V] {
def fromKey(k:K):Option[V] = s.find(_.key == k).map(_.value)
def fromValue(v:V):Option[K] = s.find(_.value == v).map(_.key)
}
}
object test extends App {
import BiMap._
val a = 1 <-> "a"
val s = Seq(1 <-> "a", 2 <-> "b")
println(s.fromKey(2))
println(s.fromValue("a"))
}
Scala is immutable and values are assigned as reference not copy, so memory footprint will for reference/pointer storage only, which it's better to use to two maps, with type A being key for first and type being B being key for second mapped to B and A respectively, than tun time swapping of maps. And the swapping implementation also has it's own memory footprint and the newly swapped hash-map will also be there in memory till the execution of parent call back and the garbage collector call. And if the the swapping of map is required frequently than virtually your are using equally or more memory than the naive two maps implementation at starting.
One more approach you can try with single map is this(will work only for getting key using mapped value):
def getKeyByValue[A,B](map: Map[A,B], value: B):Option[A] = hashMap.find((a:A,b:B) => b == value)
Code for Scala implementation of find by key:
/** Find entry with given key in table, null if not found.
*/
#deprecatedOverriding("No sensible way to override findEntry as private findEntry0 is used in multiple places internally.", "2.11.0")
protected def findEntry(key: A): Entry =
findEntry0(key, index(elemHashCode(key)))
private[this] def findEntry0(key: A, h: Int): Entry = {
var e = table(h).asInstanceOf[Entry]
while (e != null && !elemEquals(e.key, key)) e = e.next
e
}
In Scala I would like to be able to write
val petMap = ImmutableMultiMap(Alice->Cat, Bob->Dog, Alice->Hamster)
The underlying Map[Owner,Set[Pet]] should have both Map and Set immutable. Here's a first draft for ImmutibleMultiMap with companion object:
import collection.{mutable,immutable}
class ImmutableMultiMap[K,V] extends immutable.HashMap[K,immutable.Set[V]]
object ImmutableMultiMap {
def apply[K,V](pairs: Tuple2[K,V]*): ImmutableMultiMap[K,V] = {
var m = new mutable.HashMap[K,mutable.Set[V]] with mutable.MultiMap[K,V]
for ((k,v) <- pairs) m.addBinding(k,v)
// How do I return the ImmutableMultiMap[K,V] corresponding to m here?
}
}
Can you resolve the comment line elegantly? Both the map and the sets should become immutable.
Thanks!
I've rewritten this same method twice now, at successive jobs. :) Somebody Really Oughta make it more general. It's handy to have a total version around too.
/**
* Like {#link scala.collection.Traversable#groupBy} but lets you return both the key and the value for the resulting
* Map-of-Lists, rather than just the key.
*
* #param in the input list
* #param f the function that maps elements in the input list to a tuple for the output map.
* #tparam A the type of elements in the source list
* #tparam B the type of the first element of the tuple returned by the function; will be used as keys for the result
* #tparam C the type of the second element of the tuple returned by the function; will be used as values for the result
* #return a Map-of-Lists
*/
def groupBy2[A,B,C](in: List[A])(f: PartialFunction[A,(B,C)]): Map[B,List[C]] = {
def _groupBy2[A, B, C](in: List[A],
got: Map[B, List[C]],
f: PartialFunction[A, (B, C)]): Map[B, List[C]] =
in match {
case Nil =>
got.map {case (k, vs) => (k, vs.reverse)}
case x :: xs if f.isDefinedAt(x) =>
val (b, c) = f(x)
val appendTo = got.getOrElse(b, Nil)
_groupBy2(xs, got.updated(b, c :: appendTo), f)
case x :: xs =>
_groupBy2(xs, got, f)
}
_groupBy2(in, Map.empty, f)
}
And you use it like this:
val xs = (1 to 10).toList
groupBy2(xs) {
case i => (i%2 == 0, i.toDouble)
}
res3: Map[Boolean,List[Double]] = Map(false -> List(1.0, 3.0, 5.0, 7.0, 9.0),
true -> List(2.0, 4.0, 6.0, 8.0, 10.0))
You have a bigger problem than that, because there's no method in ImmutableMultiMap that will return an ImmutableMultiMap -- therefore it is impossible to add elements to it, and the constructor does not provide support for creating it with elements. Please see existing implementations and pay attention to the companion object's builder and related methods.