Complexity of linked-lists concatenation - scala

I'm still pretty much new to functional programming and experimenting with algebraic data types. I implemented LinkedList as follows:
sealed abstract class LinkedList[+T]{
def flatMap[T2](f: T => LinkedList[T2]) = LinkedList.flatMap(this)(f)
def foreach(f: T => Unit): Unit = LinkedList.foreach(this)(f)
}
final case class Next[T](t: T, linkedList: LinkedList[T]) extends LinkedList[T]
case object Stop extends LinkedList[Nothing]
object LinkedList{
private def connect[T](left: LinkedList[T], right: LinkedList[T]): LinkedList[T] = left match {
case Stop => right
case Next(t, l) => Next(t, connect(l, right))
}
private def flatMap[T, T2](list: LinkedList[T])(f: T => LinkedList[T2]): LinkedList[T2] = list match {
case Stop => Stop
case Next(t, l) => connect(f(t), flatMap(l)(f))
}
private def foreach[T](ll: LinkedList[T])(f: T => Unit): Unit = ll match {
case Stop => ()
case Next(t, l) =>
f(t)
foreach(l)(f)
}
}
The thing is that I wanted to measure complexity of flatMap operation.
The complexity of flatMap is O(n) where n is the length of flatMaped list (as far as I could prove by induction).
The question is how to design the list that way so the concatenation has constant-time complexity? What kind of class extends LinkedList we should design to achieve that? Or maybe some another way?

One approach would be to keep track of the end of your list, and make your list nodes mutable so you can just modify the end to point to the start of the next list.
I don't see a way to do it with immutable nodes, since modifying what follows a tail node requires modifying everything before it. To get constant-time concatenation with immutable data structures, you need something other than a simple linked list.

Related

Scala: Find and update one element in a list

I am trying to find an elegant way to do:
val l = List(1,2,3)
val (item, idx) = l.zipWithIndex.find(predicate)
val updatedItem = updating(item)
l.update(idx, updatedItem)
Can I do all in one operation ? Find the item, if it exist replace with updated value and keep it in place.
I could do:
l.map{ i =>
if (predicate(i)) {
updating(i)
} else {
i
}
}
but that's pretty ugly.
The other complexity is the fact that I want to update only the first element which match predicate .
Edit: Attempt:
implicit class UpdateList[A](l: List[A]) {
def filterMap(p: A => Boolean)(update: A => A): List[A] = {
l.map(a => if (p(a)) update(a) else a)
}
def updateFirst(p: A => Boolean)(update: A => A): List[A] = {
val found = l.zipWithIndex.find { case (item, _) => p(item) }
found match {
case Some((item, idx)) => l.updated(idx, update(item))
case None => l
}
}
}
I don't know any way to make this in one pass of the collection without using a mutable variable. With two passes you can do it using foldLeft as in:
def updateFirst[A](list:List[A])(predicate:A => Boolean, newValue:A):List[A] = {
list.foldLeft((List.empty[A], predicate))((acc, it) => {acc match {
case (nl,pr) => if (pr(it)) (newValue::nl, _ => false) else (it::nl, pr)
}})._1.reverse
}
The idea is that foldLeft allows passing additional data through iteration. In this particular implementation I change the predicate to the fixed one that always returns false. Unfortunately you can't build a List from the head in an efficient way so this requires another pass for reverse.
I believe it is obvious how to do it using a combination of map and var
Note: performance of the List.map is the same as of a single pass over the list only because internally the standard library is mutable. Particularly the cons class :: is declared as
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
so tl is actually a var and this is exploited by the map implementation to build a list from the head in an efficient way. The field is private[scala] so you can't use the same trick from outside of the standard library. Unfortunately I don't see any other API call that allows to use this feature to reduce the complexity of your problem to a single pass.
You can avoid .zipWithIndex() by using .indexWhere().
To improve complexity, use Vector so that l(idx) becomes effectively constant time.
val l = Vector(1,2,3)
val idx = l.indexWhere(predicate)
val updatedItem = updating(l(idx))
l.updated(idx, updatedItem)
Reason for using scala.collection.immutable.Vector rather than List:
Scala's List is a linked list, which means data are access in O(n) time. Scala's Vector is indexed, meaning data can be read from any point in effectively constant time.
You may also consider mutable collections if you're modifying just one element in a very large collection.
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html

Traverse and Modify an Heterogeneous Directed Acyclic Graph in Scala

Traverse and Modify an Heterogeneous Directed Acyclic Graph in Scala
Good morning everybody.
I have the following directed acyclic graph data structure, implemented in Scala as follows:
abstract class Node // Generic abstract node
/** Various kind of leaf nodes */
case class LeafNodeA(x: String) extends Node
case class LeafNodeB(x: Int) extends Node
/** Various kind of inner nodes */
case class InnerNode1(x: String, depRoleA: Node) extends Node
case class InnerNode2(x: String, y: Double, depRoleX: Node, depRoleY: Node) extends Node
case class InnerNode3(x: List[Int], depRoleA: Node, y: Int,
depRoleB: Node, depRoleG: Node) extends Node
In this structure a node can be a dependency of multiple nodes, therefore it is not a Tree but a Directed Acyclic Graph. In addition the structure is not even balanced (nodes have different numbers of dependencies).
The problem of traversal
Notice that I have called the various dependency fields of the case classes with different names since they represents different roles in the dependencies (for example, depRoleX has a different role than depRoleY in an InnerNode2 type node). For this reason I don't think it is possible to store the dependencies of each node in a List[Node] like in any trivial tree/dag implementation you can find out there, because the meaning of each dependency field is different.
Of course when I traverse this structure I have to do pattern matching in order to understand the type of node I am dealing with at the current recursion step:
// Random function which returns the list of all the String attributes of the nodes
def getAllStrings(dag: Node): List[String] = {
dag match {
case LeafNodeA(x) => List(x)
case LeafNodeB => List()
case InnerNode1(x, dr) => List(x) ::: getAllStrings(dr)
case InnerNode2(x, _, dX, dY) => List(x) ::: getAllStrings(dX) :: getAllStrings(dY)
case InnerNode3(_, dA, _, dB, dG) => getAllStrings(dA) ::: getAllStrings(dB) ::: getAllStrings(dG)
}
}
Now suppose that instead of these 5 relatively simple node types I have around 20 types of node. The previous function would become extremely long and repetitive (a case statement for each node type). Even worse: every time I want to do a traversal I have to do the same thing.
Thinking about this problem I came up with two solutions.
External method for traversal
The first (obvious) way to deal with this is to modularize the previous method defining a generic DAG traversal function
object DAGManipulator {
def getDependencies(dag: Node): List[Node] = {
dag match {
case LeafNodeA => List()
case LeafNodeB => List()
case InnerNode1(_, dr) => List(dr)
case InnerNode2(_, _, dX, dY) => List(dX, dY)
case InnerNode3(_, dA, _, dB, dG) => List(dA, dB, dG)
}
}
}
In this way, every time I need the dependencies of a node I can rely on this static function.
Abstract class method for getting the dependencies
The second solution I came up with is to give to every node an additional method in the following way:
abstract class Node {
def getDependencies : List[Node]
}
case class LeafNodeA(x: String) extends Node = {
override def getDependencies : List[Node] = List()
}
case class LeafNodeB(x: Int) extends Node = {
override def getDependencies : List[Node] = List()
}
/** Various kind of inner nodes */
case class InnerNode1(x: String, depRoleA: Node) extends Node = {
override def getDependencies : List[Node] = List()
}
case class InnerNode2(x: String, y: Double, depRoleX: Node, depRoleY: Node) extends Node = {
override def getDependencies : List[Node] = List(depRoleX, depRoleY)
}
case class InnerNode3(x: List[Int], depRoleA: Node, y: Int,
depRoleB: Node, depRoleG: Node) extends Node = {
override def getDependencies : List[Node] = List(depRoleA, depRoleB, depRoleG)
}
I don't like any of the previous solutions:
The first one must be updated every time a new node type is added to the hierarchy. In addition to this, it delegates a fundamental feature of the DAG structure (traversal) to an external object, which I find very unpleasant from the software engineering point of view.
The second solution in my opinion is even worse because every node type basically has to redundantly state its dependencies (once in its fields and once in the getDependencies method. I find this very ugly and prone to programming errors.
Do you have a better solution to this problem ?
The problem of updating
The second problem I have to deal with is the updating/modification of the data structure.
Suppose that I have a DAG defined in the following way.
val l1 = LeafNodeB(1)
val dag =
InnerNode3(List(1, 2, 3),
InnerNode1("InnerNode1", LeafNodeA("leafA1")),
1, l1, InnerNode2("InnerNode2", 2, l1, LeafNodeA("leafA2")))
corresponding to this structure.
Suppose that I want to change the LeafNodeA("leafA1") (which is a dependency of the InnerNode1) to, for example, l1, which is a LeafNodeB.
This is the kind of operation that I need to do:
def modify(dag: Node): Node = {
dag match {
case x: InnerNode1 => if(x.x == "InnerNode1") x.copy(depRoleA = l1) else x
case x: LeafNodeB => x
case x: LeafNodeA => x
case x: InnerNode2 => x.copy(depRoleX = modify(x.depRoleX), depRoleY = modify(x.depRoleY))
case x: InnerNode3 => x.copy(depRoleA = modify(x.depRoleA), depRoleB = modify(x.depRoleB), depRoleG = modify(x.depRoleG))
}
}
Again, consider the possibility of having more than 20 node types...Again this update method would become not practical, and this counts for every other possible update method that I can think of.
In addition to this...this time I did not come up with a different strategy for factorizing/modularize this "recursive traversal update" of the nested structure. I have to check for every possible node type in order to understand how to use the copy method of the various case classes.
Do you have a better solution / design for this update strategy ?
To address this issue:
The first one must be updated every time a new node type is added to
the hierarchy. In addition to this, it delegates a fundamental feature
of the DAG structure (traversal) to an external object, which I find
very unpleasant from the software engineering point of view.
I think this is not really a drawback, but this is something that's pretty core to the OO vs FP clash inherent in Scala. I would say your node classes being "dumb" data holders, and having a separate code path for traversing on them is a good thing. And sure, you have to add a line there every time you add a node, but the compiler can warn you about that if you don't.
Anyway, this might be overkill and it's a bit of an undertaking, but you may want to look into Matryoshka, which generalizes recursive data structures like this. It requires a bit of plumbing to translate your data types into the scheme that is expected, and define a functor for that:
abstract class NodeF[+A] // Generic abstract node
/** Various kind of leaf nodes */
case class LeafNodeA(x: String) extends NodeF[Nothing]
case class LeafNodeB(x: Int) extends NodeF[Nothing]
/** Various kind of inner nodes */
case class InnerNode1[A](x: String, depRoleA: A) extends NodeF[A]
case class InnerNode2[A](x: String, y: Double, depRoleX: A, depRoleY: A) extends NodeF[A]
case class InnerNode3[A](x: List[Int], depRoleA: A, y: Int,
depRoleB: A, depRoleG: A) extends NodeF[A]
implicit val nodeFunctor: Functor[NodeF] = new Functor[NodeF] {
def map[A, B](fa: NodeF[A])(f: A => B): NodeF[B] = fa match {
case LeafNodeA(x) => LeafNodeA(x)
case LeafNodeB(x) => LeafNodeB(x)
case InnerNode1(x, depA) => InnerNode1(x, f(depA))
case InnerNode2(x, y, depX, depY) => InnerNode2(x, y, f(depX), f(depY))
case InnerNode3(x, depA, y, depB, depG) => InnerNode3(x, f(depA), y, f(depB), f(depG))
}
}
But then it essentially hides the recursion from you and you can more easily define these kinds of things:
type FixNode = Fix[NodeF]
def someExprGeneric[T](implicit T : Corecursive.Aux[T, NodeF]): T =
InnerNode2("hello", 1.0, InnerNode1("world", LeafNodeA("!").embed).embed, LeafNodeB(1).embed).embed
val someExpr = someExprGeneric[FixNode]
def getStrings: Algebra[NodeF, List[String]] = {
case LeafNodeA(x) => List(x)
case LeafNodeB(_) => List()
case InnerNode1(x, depA) => x :: depA
case InnerNode2(x, _, depX, depY) => x :: depX ::: depY
case InnerNode3(_, depA, _, depB, depG) => depA ::: depB ::: depG
}
someExpr.cata(getStrings) // List("hello", "world", "!")
Perhaps that's not that much cleaner than what you have, but it at least separates the recursive traversal logic from the "single step" evaluation logic. But I think where it shines a bit more is when updating:
def expandToUniverse: Algebra[NodeF, Node] = {
case InnerNode1("world", dep) => InnerNode1("universe", dep).embed
case x => x.embed
}
someExpr.cata(expandToUniverse).cata(getStrings) // List("hello", "universe", "!")
Because you've delegated out that recursion, you only have to implement the case(s) you actually care about.

What is meant by not generating the answer lazily in this code?

I came across the following code:
/*
Unlike `take`, `drop` is not incremental. That is, it doesn't generate the
answer lazily. It must traverse the first `n` elements of the stream eagerly.
*/
#annotation.tailrec
final def drop(n: Int): Stream[A] = this match {
case Cons(_, t) if n > 0 => t().drop(n - 1)
case _ => this
}
/*
`take` first checks if n==0. In that case we need not look at the stream at all.
*/
def take(n: Int): Stream[A] = this match {
case Cons(h, t) if n > 1 => cons(h(), t().take(n - 1))
case Cons(h, _) if n == 1 => cons(h(), empty)
case _ => empty
}
Could someone explain what is meant by the comment:
Unlike take, drop is not incremental. That is, it doesn't generate the
answer lazily. It must traverse the first n elements of the stream eagerly.
To me, it looks like both the drop and take functions have to traverse the first n elements of the stream eagerly? What is it about the drop function that causes the first n elements to be eagerly traversed?
(Full code context here: https://github.com/fpinscala/fpinscala/blob/master/answers/src/main/scala/fpinscala/laziness/Stream.scala)
The definition for Cons is:
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
Notice that the second parameter, t, takes a function (from Unit to Stream[A]), not the evaluation of that function. This is not evaluated until required, and hence is lazy, as is the take method that calls it.
Compare this to drop which calls t() itself rather than passing it into the Cons, forcing the immediate evaluation.
The key point is that cons is lazy. That is if the recursion is inside of cons, the recursion won't happen until the tail of the generated list is actually accessed. Whereas if the recursion is outside, it happens right away.
So drop is eager because the recursion is not inside a cons (or any other lazy construct).

Mapping many Eithers to one Either with many

Say I have a monadic function in called processOne defined like this:
def processOne(input: Input): Either[ErrorType, Output] = ...
Given a list of "Inputs", I would like to return a corresponding list of "Outputs" wrapped in an Either:
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = ...
processMany will call processOne for each input it has, however, I would like it to terminate the first time (if any) that processOne returns a Left, and return that Left, otherwise return a Right with a list of the outputs.
My question: what is the best way to implement processMany? Is it possible to accomplish this behavior using a for expression, or is it going to be necessary for me to iterate the list myself recursively?
With Scalaz 7:
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] =
inputs.toStream traverseU processOne
Converting inputs to a Stream[Input] takes advantage of the non-strict traverse implementation for Stream, i.e. gives you the short-circuiting behaviour you want.
By the way, you tagged this "monads", but traversal requires only an applicative functor (which, as it happens, is probably defined in terms of the monad for Either). For further reference, see the paper The Essence of the Iterator Pattern, or, for a Scala-based interpretation, Eric Torreborre's blog post on the subject.
The easiest with standard Scala, which doesn't evaluate more than is necessary, would probably be
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = {
Right(inputs.map{ x =>
processOne(x) match {
case Right(r) => r
case Left(l) => return Left(l)
}
})
}
A fold would be more compact, but wouldn't short-circuit when it hit a left (it'd just keep carrying it along while you iterated through the entire input).
For now, I've decided to just solve this using recursion, as I am reluctant to add a dependency to a library (Scalaz).
(Types and names in my application have been changed here in order to appear more generic)
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = {
import scala.annotation.tailrec
#tailrec
def traverse(acc: Vector[Output], inputs: List[Input]): Either[ErrorType, Seq[Output]] = {
inputs match {
case Nil => Right(acc)
case input :: more =>
processOne(input) match {
case Right(output) => traverse(acc :+ output, more)
case Left(e) => Left(e)
}
}
}
traverse(Vector[Output](), inputs.toList)
}

Implementing ifTrue, ifFalse, ifSome, ifNone, etc. in Scala to avoid if(...) and simple pattern matching

In Scala, I have progressively lost my Java/C habit of thinking in a control-flow oriented way, and got used to go ahead and get the object I'm interested in first, and then usually apply something like a match or a map() or foreach() for collections. I like it a lot, since it now feels like a more natural and more to-the-point way of structuring my code.
Little by little, I've wished I could program the same way for conditions; i.e., obtain a Boolean value first, and then match it to do various things. A full-blown match, however, does seem a bit overkill for this task.
Compare:
obj.isSomethingValid match {
case true => doX
case false => doY
}
vs. what I would write with style closer to Java:
if (obj.isSomethingValid)
doX
else
doY
Then I remembered Smalltalk's ifTrue: and ifFalse: messages (and variants thereof). Would it be possible to write something like this in Scala?
obj.isSomethingValid ifTrue doX else doY
with variants:
val v = obj.isSomethingValid ifTrue someVal else someOtherVal
// with side effects
obj.isSomethingValid ifFalse {
numInvalid += 1
println("not valid")
}
Furthermore, could this style be made available to simple, two-state types like Option? I know the more idiomatic way to use Option is to treat it as a collection and call filter(), map(), exists() on it, but often, at the end, I find that I want to perform some doX if it is defined, and some doY if it isn't. Something like:
val ok = resultOpt ifSome { result =>
println("Obtained: " + result)
updateUIWith(result) // returns Boolean
} else {
numInvalid += 1
println("missing end result")
false
}
To me, this (still?) looks better than a full-blown match.
I am providing a base implementation I came up with; general comments on this style/technique and/or better implementations are welcome!
First: we probably cannot reuse else, as it is a keyword, and using the backticks to force it to be seen as an identifier is rather ugly, so I'll use otherwise instead.
Here's an implementation attempt. First, use the pimp-my-library pattern to add ifTrue and ifFalse to Boolean. They are parametrized on the return type R and accept a single by-name parameter, which should be evaluated if the specified condition is realized. But in doing so, we must allow for an otherwise call. So we return a new object called Otherwise0 (why 0 is explained later), which stores a possible intermediate result as a Option[R]. It is defined if the current condition (ifTrue or ifFalse) is realized, and is empty otherwise.
class BooleanWrapper(b: Boolean) {
def ifTrue[R](f: => R) = new Otherwise0[R](if (b) Some(f) else None)
def ifFalse[R](f: => R) = new Otherwise0[R](if (b) None else Some(f))
}
implicit def extendBoolean(b: Boolean): BooleanWrapper = new BooleanWrapper(b)
For now, this works and lets me write
someTest ifTrue {
println("OK")
}
But, without the following otherwise clause, it cannot return a value of type R, of course. So here's the definition of Otherwise0:
class Otherwise0[R](intermediateResult: Option[R]) {
def otherwise[S >: R](f: => S) = intermediateResult.getOrElse(f)
def apply[S >: R](f: => S) = otherwise(f)
}
It evaluates its passed named argument if and only if the intermediate result it got from the preceding ifTrue or ifFalse is undefined, which is exactly what is wanted. The type parametrization [S >: R] has the effect that S is inferred to be the most specific common supertype of the actual type of the named parameters, such that for instance, r in this snippet has an inferred type Fruit:
class Fruit
class Apple extends Fruit
class Orange extends Fruit
val r = someTest ifTrue {
new Apple
} otherwise {
new Orange
}
The apply() alias even allows you to skip the otherwise method name altogether for short chunks of code:
someTest.ifTrue(10).otherwise(3)
// equivalently:
someTest.ifTrue(10)(3)
Finally, here's the corresponding pimp for Option:
class OptionExt[A](option: Option[A]) {
def ifNone[R](f: => R) = new Otherwise1(option match {
case None => Some(f)
case Some(_) => None
}, option.get)
def ifSome[R](f: A => R) = new Otherwise0(option match {
case Some(value) => Some(f(value))
case None => None
})
}
implicit def extendOption[A](opt: Option[A]): OptionExt[A] = new OptionExt[A](opt)
class Otherwise1[R, A1](intermediateResult: Option[R], arg1: => A1) {
def otherwise[S >: R](f: A1 => S) = intermediateResult.getOrElse(f(arg1))
def apply[S >: R](f: A1 => S) = otherwise(f)
}
Note that we now also need Otherwise1 so that we can conveniently passed the unwrapped value not only to the ifSome function argument, but also to the function argument of an otherwise following an ifNone.
You may be looking at the problem too specifically. You would probably be better off with the pipe operator:
class Piping[A](a: A) { def |>[B](f: A => B) = f(a) }
implicit def pipe_everything[A](a: A) = new Piping(a)
Now you can
("fish".length > 5) |> (if (_) println("Hi") else println("Ho"))
which, admittedly, is not quite as elegant as what you're trying to achieve, but it has the great advantage of being amazingly versatile--any time you want to put an argument first (not just with booleans), you can use it.
Also, you already can use options the way you want:
Option("fish").filter(_.length > 5).
map (_ => println("Hi")).
getOrElse(println("Ho"))
Just because these things could take a return value doesn't mean you have to avoid them. It does take a little getting used to the syntax; this may be a valid reason to create your own implicits. But the core functionality is there. (If you do create your own, consider fold[B](f: A => B)(g: => B) instead; once you're used to it the lack of the intervening keyword is actually rather nice.)
Edit: Although the |> notation for pipe is somewhat standard, I actually prefer use as the method name, because then def reuse[B,C](f: A => B)(g: (A,B) => C) = g(a,f(a)) seems more natural.
Why don't just use it like this:
val idiomaticVariable = if (condition) {
firstExpression
} else {
secondExpression
}
?
IMO, its very idiomatic! :)