Scala: dynamic programming recursion using iterators - scala

Learning how to do dynamic programming in Scala, and I'm often finding myself in a situation where I want to recursively proceed over an array (or some other iterable) of items. When I do this, I tend to write cumbersome functions like this:
def arraySum(array: Array[Int], index: Int, accumulator: Int): Int => {
if (index == array.length) {
accumulator
} else {
arraySum(array, index + 1, accumulator + array(index)
}
}
arraySum(Array(1,2,3), 0, 0)
(Ignore for a moment that I could just call sum on the array or do a .reduce(_ + _), I'm trying to learn programming principles.)
But this seems like I'm passing alot of variables, and what exactly is the point of passing the array to each function call? This seems unclean.
So instead I got the idea to do this with iterators and not worry about passing indexes:
def arraySum(iter: Iterator[Int])(implicit accumulator: Int = 0): Int = {
try {
val nextInt = iter.next()
arraySum(iter)(accumulator + nextInt)
} catch {
case nee: NoSuchElementException => accumulator
}
}
arraySum(Array(1,2,3).toIterator)
This seems like a much cleaner solution. However, this falls apart when you need to use dynamic programming to explore some outcome space and you don't need to call the iterator at every function call. E.g.
def explore(iter: Iterator[Int])(implicit accumulator: Int = 0): Int = {
if (someCase) {
explore(iter)(accumulator)
} else if (someOtherCase){
val nextInt = iter.next()
explore(iter)(accumulator + nextInt)
} else {
// Some kind of aggregation/selection of explore results
}
}
My understanding is that the iter iterator here functions as pass by reference, so when this function calls iter.next() that changes the instance of iter that is passed to all other recursive calls of the function. So to get around that, now I'm cloning the iterator at every call of the explore function. E.g.:
def explore(iter: Iterator[Int])(implicit accumulator: Int = 0): Int = {
if (someCase) {
explore(iter)(accumulator)
} else if (someOtherCase){
val iterClone = iter.toList.toIterator
explore(iterClone)(accumulator + iterClone.next())
} else {
// Some kind of aggregation/selection of explore results
}
}
But this seems pretty stupid, and the stupidity escalates when I have multiple iterators that may or may not need cloning in multiple else if cases. What is the right way to handle situations like this? How can I elegantly solve these kinds of problems?

Suppose that you want to write a back-tracking recursive function that needs some complex data structure as an argument, so that the recursive calls receive a slightly modified version of the data structure. You have several options how you could do it:
Clone the entire data structure, modify it, pass it to recursive call. This is very simple, but usually very expensive.
Modify the mutable structure in-place, pass it to the recursive call, then revert the modification when backtracking. You have to ensure that every possible call of your recursive function always restores the original state of the data structure exactly. This is much more efficient, but is hard to implement, because it can be very error prone.
Subdivide the structure into a large immutable and a small mutable part. For example, you could pass an index (or a pair of indices) that specify some slice of an array explicitly, along with an array that is never mutated. You could then "clone" and save only the mutable part, and restore it when backtracking. If it works, it is both simple and fast, but it doesn't always work, because substructures can be hard to describe by just few integer indices.
Rely on persistent immutable data structures whenever you can.
I'd like to elaborate on the last point, because this is the preferred way to do it in Scala and in functional programming in general.
Here is your original code, that uses the third strategy:
def arraySum(array: Array[Int], index: Int, accumulator: Int): Int = {
if (index == array.length) {
accumulator
} else {
arraySum(array, index + 1, accumulator + array(index))
}
}
If you would use a List instead of an Array, you could rewrite it to this:
#annotation.tailrec
def listSum(list: List[Int], acc: Int): Int = list match {
case Nil => acc
case h :: t => listSum(t, acc + h)
}
Here, h :: t is a pattern that deconstructs the list into the head and the tail.
Note that you don't need an explicit index any more, because accessing the tail t of the list is a constant-time operation, so that only the relevant remaining sublist is passed to the recursive call of listSum.
There is no backtracking here, but if the recursive method would backtrack, using lists would bring another advantage: extracting a sublist is almost free (constant time operation), but it's still guaranteed to be immutable, so you can just pass it into the recursive call, without having to care about whether the recursive call modifies it or not, and so you don't have to do anything to undo any modifications that could have been done by the recursive calls. This is the advantage of persistent immutable data structures: related lists can share most of their structure, while still appearing immutable from the outside, so that it's impossible to break anything in the parent list just because you have the access to the tail of this list. This would not be the case with a view over a mutable array.

Related

Filtering and modifying immutable sequence in loop and have changes effective in subsequent filter calls

I guess I kinda murdered the title but I could not express it another way.
I have a trait like this:
trait Flaggable[T <: Flaggable[T]] { self : T =>
def setFlag(key: String, value: Boolean): T
def getFlag(key:String): Boolean
}
This trait itself is not that important, but main thing here is class implementing it should be immutable as setFlag returns a new instance. Example class extending this trait:
class ExampleData(val flags: Map[String, Boolean] = Map())
extends Flaggable[ExampleData] {
def setFlag(key: String, value: Boolean): ExampleData =
new ExampleData(flags + (key->value))
def getFlag(key:String): Boolean = flags(key)
}
While iterating over collection I set flags on elements and I want those flags to be effective in subsequent iterations. Something like
val seq: Seq[ExampleData] = ???
seq.view.filter(el => !el.getFlag("visited")).foreach { el =>
// do things that may set flag visited to true in elements
// further in the seq, if that happens I do want those
// elements to be filtered
}
Now AFAIK, one option is to make seq mutable and assign new instances returned from setFlag to seq. Another option is to make whole flaggable thing mutable and modify instances in place in collection. Do I have any other option without making either of these (class and collection) mutable? I do not even know how can I modify and filter at the same time in that case.
I guess I should explain my situation more. Specifically, I am trying to implement dbscan clustering algorithm. I have a function that can return distance between two data points. For each data point, I need to get data points that is closer than an epsilon to that data point and mark those visited. And I do not want to process data points that is marked visited again. For example, for data point with index 0, the index list of data points closer than epsilon might be [2, 4, 5]. In this case I want to flag those data points as visited and skip over them without processing.
Just use map instead of foreach and replace the order of the functions:
seq.view.map { el =>
// do things that may set flag visited to true and return the new
// or e1 if no change needed.
}.filter(el => !el.getFlag("visited"))
Update:
Since the filter and the update related to each other, use mutable collection. I prefer that than mutable data objects, since it can be limit only to that scope. (e.g. use seq.to[ListBuffer]). after you done, all the mutations gone.... This allow keep the mutable code locally.
Nevertheless, depends on your algorithm, there may be a better collection for that, like Zipper.
I think you could extract a function to handle what is done inside your foreach with a signature like:
def transform(in: ExampleData): ExampleData
with that you could use a for comprehension:
for {
elem <- seq if !elem.getFlag("visited")
result = transform(elem) if result.getFlag("Foo")
} yield result
If you have multiple operations you can just append:
for {
elem <- seq if !elem.getFlag("visited")
r1 = transform(elem) if r1.getFlag("Foo")
r2 = transform2(r1) if r2.getFlag("Bar")
} yield r2
The result would be a new Seq of new ExampleData according to the transformations and filters applied.
In general, if you want to both filter and process elements you would usually use the collect function and possibly chain them:
seq.collect {
case elem if !elem.getFlag("visited") => transform(elem)
}.collect {
case elem if elem.getFlag("Foo") => transform2(elem)
}.filter(_.getFlag("Bar")

Early return from a for loop in Scala

Now I have some Scala code similar to the following:
def foo(x: Int, fn: (Int, Int) => Boolean): Boolean = {
for {
i <- 0 until x
j <- i + 1 until x
if fn(i, j)
} return true
false
}
But I get the feeling that return true is not so functional (or maybe it is?). Is there a way to rewrite this piece of code in a more elegant way?
In general, what is the more functional (if any) way to write the return-early-from-a-loop kind of code?
There are several methods can help, such as find, exists, etc. For your case, try this:
def foo2(x: Int, fn: (Int, Int) => Boolean): Boolean = {
(0 until x).exists(i =>
(i+1 until x).exists(j=>fn(i, j)))
}
Since all you are checking for is existence, you can just compose 2 uses of exists:
(0 until x).exists(i => (i + 1 until x).exists(fn(i, _)))
More generally, if you are concerned with more than just determining if a certain element exists, you can convert your comprehension to a series of Streams, Iterators, or views, you can use exists and it will evaluate lazily, avoiding unnecessary executions of the loop:
def foo(x: Int, fn: (Int, Int) => Boolean): Boolean = {
(for {
i <- (0 until x).iterator
j <- (i + 1 until x).iterator
} yield(i, j)).exists(fn.tupled)
}
You can also use map and flatMap instead of a for, and toStream or view instead of iterator:
(0 until x).view.flatMap(i => (i + 1 until x).toStream.map(j => i -> j)).exists(fn.tupled)
You can also use view on any collection to get a collection where all the transformers are performed lazily. This is possibly the most idiomatic way to short-circuit a collection traversal. From the docs on views:
Scala collections are by default strict in all their transformers, except for Stream, which implements all its transformer methods lazily. However, there is a systematic way to turn every collection into a lazy one and vice versa, which is based on collection views. A view is a special kind of collection that represents some base collection, but implements all transformers lazily.
As far as overhead is concerned, it really depends on the specifics! Different collections have different implementations of view, toStream, and iterator that may vary in amount of overhead. If fn is very expensive to compute, this overhead is probably worth it, and keeping a consistent, idiomatic, functional style to your code makes it more maintainable, debuggable, and readable. If you are in a situation that calls for extreme optimization, you may want to fall back to the lower-level constructs like return (which isn't without it's own overhead!).

Scala: Thread safe mutable lazy Iterator with append

For an immutable flavour, Iterator does the job.
val x = Iterator.fill(100000)(someFn)
Now I want to implement a mutable version of Iterator, with three guarantees:
thread-safe on all transformations(fold, foldLeft, ..) and append
lazy evaluated
traversable only once! Once used, an object from this Iterator should be destroyed.
Is there an existing implementation to give me these guarantees? Any library or framework example would be great.
Update
To illustrate the desired behaviour.
class SomeThing {}
class Test(val list: Iterator[SomeThing]) {
def add(thing: SomeThing): Test = {
new Test(list ++ Iterator(thing))
}
}
(new Test()).add(new SomeThing).add(new SomeThing);
In this example, SomeThing is an expensive construct, it needs to be lazy.
Re-iterating over list is never required, Iterator is a good fit.
This is supposed to asynchronously and lazily sequence 10 million SomeThing instances without depleting the executor(a cached thread pool executor) or running out of memory.
You don't need a mutable Iterator for this, just daisy-chain the immutable form:
class SomeThing {}
case class Test(val list: Iterator[SomeThing]) {
def add(thing: => SomeThing) = Test(list ++ Iterator(thing))
}
(new Test()).add(new SomeThing).add(new SomeThing)
Although you don't really need the extra boilerplate of Test here:
Iterator(new SomeThing) ++ Iterator(new SomeThing)
Note that Iterator.++ takes a by-name param, so the ++ operation is already lazy.
You might also want to try this, to avoid building intermediate Iterators:
Iterator.continually(new SomeThing) take 2
UPDATE
If you don't know the size in advance, then I'll often use a tactic like this:
def mkSomething = if(cond) Some(new Something) else None
Iterator.continually(mkSomething) takeWhile (_.isDefined) map { _.get }
The trick is to have your generator function wrap its output in an Option, which then gives you a way to flag that the iteration is finished by returning None
Of course... If you're really pushing out the boat, you can even use the dreaded null:
def mkSomething = if(cond) { new Something } else null
Iterator.continually(mkSomething) takeWhile (_ != null)
Seems like you need to hide the fact that the iterator is mutable but at the same time allow it to grow mutably. What I'm going to propose is the same sort of trick I've used to speed up ::: in the past:
abstract class AppendableIterator[A] extends Iterator[A]{
protected var inner: Iterator[A]
def hasNext = inner.hasNext
def next() = inner next ()
def append(that: Iterator[A]) = synchronized{
inner = new JoinedIterator(inner, that)
}
}
//You might need to add some more things, this is a skeleton
class JoinedIterator[A](first: Iterator[A], second: Iterator[A]) extends Iterator[A]{
def hasNext = first.hasNext || second.hasNext
def next() = if(first.hasNext) first next () else if(second.hasNext) second next () else Iterator.next()
}
So what you're really doing is leaving the Iterator at whatever place in its iteration you might have it while still preserving the thread safety of the append by "joining" another Iterator in non-destructively. You avoid the need to recompute the two together because you never actually force them through a CanBuildFrom.
This is also a generalization of just adding one item. You can always wrap some A in an Iterator[A] of one element if you so choose.
Have you looked at the mutable.ParIterable in the collection.parallel package?
To access an iterator over elements you can do something like
val x = ParIterable.fill(100000)(someFn).iterator
From the docs:
Parallel operations are implemented with divide and conquer style algorithms that parallelize well. The basic idea is to split the collection into smaller parts until they are small enough to be operated on sequentially.
...
The higher-order functions passed to certain operations may contain side-effects. Since implementations of bulk operations may not be sequential, this means that side-effects may not be predictable and may produce data-races, deadlocks or invalidation of state if care is not taken. It is up to the programmer to either avoid using side-effects or to use some form of synchronization when accessing mutable data.

Ending a for-comprehension loop when a check on one of the items returns false

I am a bit new to Scala, so apologies if this is something a bit trivial.
I have a list of items which I want to iterate through. I to execute a check on each of the items and if just one of them fails I want the whole function to return false. So you can see this as an AND condition. I want it to be evaluated lazily, i.e. the moment I encounter the first false return false.
I am used to the for - yield syntax which filters items generated through some generator (list of items, sequence etc.). In my case however I just want to break out and return false without executing the rest of the loop. In normal Java one would just do a return false; within the loop.
In an inefficient way (i.e. not stopping when I encounter the first false item), I could do it:
(for {
item <- items
if !satisfiesCondition(item)
} yield item).isEmpty
Which is essentially saying that if no items make it through the filter all of them satisfy the condition. But this seems a bit convoluted and inefficient (consider you have 1 million items and the first one already did not satisfy the condition).
What is the best and most elegant way to do this in Scala?
Stopping early at the first false for a condition is done using forall in Scala. (A related question)
Your solution rewritten:
items.forall(satisfiesCondition)
To demonstrate short-circuiting:
List(1,2,3,4,5,6).forall { x => println(x); x < 3 }
1
2
3
res1: Boolean = false
The opposite of forall is exists which stops as soon as a condition is met:
List(1,2,3,4,5,6).exists{ x => println(x); x > 3 }
1
2
3
4
res2: Boolean = true
Scala's for comprehensions are not general iterations. That means they cannot produce every possible result that one can produce out of an iteration, as, for example, the very thing you want to do.
There are three things that a Scala for comprehension can do, when you are returning a value (that is, using yield). In the most basic case, it can do this:
Given an object of type M[A], and a function A => B (that is, which returns an object of type B when given an object of type A), return an object of type M[B];
For example, given a sequence of characters, Seq[Char], get UTF-16 integer for that character:
val codes = for (char <- "A String") yield char.toInt
The expression char.toInt converts a Char into an Int, so the String -- which is implicitly converted into a Seq[Char] in Scala --, becomes a Seq[Int] (actually, an IndexedSeq[Int], through some Scala collection magic).
The second thing it can do is this:
Given objects of type M[A], M[B], M[C], etc, and a function of A, B, C, etc into D, return an object of type M[D];
You can think of this as a generalization of the previous transformation, though not everything that could support the previous transformation can necessarily support this transformation. For example, we could produce coordinates for all coordinates of a battleship game like this:
val coords = for {
column <- 'A' to 'L'
row <- 1 to 10
} yield s"$column$row"
In this case, we have objects of the types Seq[Char] and Seq[Int], and a function (Char, Int) => String, so we get back a Seq[String].
The third, and final, thing a for comprehension can do is this:
Given an object of type M[A], such that the type M[T] has a zero value for any type T, a function A => B, and a condition A => Boolean, return either the zero or an object of type M[B], depending on the condition;
This one is harder to understand, though it may look simple at first. Let's look at something that looks simple first, say, finding all vowels in a sequence of characters:
def vowels(s: String) = for {
letter <- s
if Set('a', 'e', 'i', 'o', 'u') contains letter.toLower
} yield letter.toLower
val aStringVowels = vowels("A String")
It looks simple: we have a condition, we have a function Char => Char, and we get a result, and there doesn't seem to be any need for a "zero" of any kind. In this case, the zero would be the empty sequence, but it hardly seems worth mentioning it.
To explain it better, I'll switch from Seq to Option. An Option[A] has two sub-types: Some[A] and None. The zero, evidently, is the None. It is used when you need to represent the possible absence of a value, or the value itself.
Now, let's say we have a web server where users who are logged in and are administrators get extra javascript on their web pages for administration tasks (like wordpress does). First, we need to get the user, if there's a user logged in, let's say this is done by this method:
def getUser(req: HttpRequest): Option[User]
If the user is not logged in, we get None, otherwise we get Some(user), where user is the data structure with information about the user that made the request. We can then model that operation like this:
def adminJs(req; HttpRequest): Option[String] = for {
user <- getUser(req)
if user.isAdmin
} yield adminScriptForUser(user)
Here it is easier to see the point of the zero. When the condition is false, adminScriptForUser(user) cannot be executed, so the for comprehension needs something to return instead, and that something is the "zero": None.
In technical terms, Scala's for comprehensions provides syntactic sugars for operations on monads, with an extra operation for monads with zero (see list comprehensions in the same article).
What you actually want to accomplish is called a catamorphism, usually represented as a fold method, which can be thought of as a function of M[A] => B. You can write it with fold, foldLeft or foldRight in a sequence, but none of them would actually short-circuit the iteration.
Short-circuiting arises naturally out of non-strict evaluation, which is the default in Haskell, in which most of these papers are written. Scala, as most other languages, is by default strict.
There are three solutions to your problem:
Use the special methods forall or exists, which target your precise use case, though they don't solve the generic problem;
Use a non-strict collection; there's Scala's Stream, but it has problems that prevents its effective use. The Scalaz library can help you there;
Use an early return, which is how Scala library solves this problem in the general case (in specific cases, it uses better optimizations).
As an example of the third option, you could write this:
def hasEven(xs: List[Int]): Boolean = {
for (x <- xs) if (x % 2 == 0) return true
false
}
Note as well that this is called a "for loop", not a "for comprehension", because it doesn't return a value (well, it returns Unit), since it doesn't have the yield keyword.
You can read more about real generic iteration in the article The Essence of The Iterator Pattern, which is a Scala experiment with the concepts described in the paper by the same name.
forall is definitely the best choice for the specific scenario but for illustration here's good old recursion:
#tailrec def hasEven(xs: List[Int]): Boolean = xs match {
case head :: tail if head % 2 == 0 => true
case Nil => false
case _ => hasEven(xs.tail)
}
I tend to use recursion a lot for loops w/short circuit use cases that don't involve collections.
UPDATE:
DO NOT USE THE CODE IN MY ANSWER BELOW!
Shortly after I posted the answer below (after misinterpreting the original poster's question), I have discovered a way superior generic answer (to the listing of requirements below) here: https://stackoverflow.com/a/60177908/501113
It appears you have several requirements:
Iterate through a (possibly large) list of items doing some (possibly expensive) work
The work done to an item could return an error
At the first item that returns an error, short circuit the iteration, throw away the work already done, and return the item's error
A for comprehension isn't designed for this (as is detailed in the other answers).
And I was unable to find another Scala collections pre-built iterator that provided the requirements above.
While the code below is based on a contrived example (transforming a String of digits into a BigInt), it is the general pattern I prefer to use; i.e. process a collection and transform it into something else.
def getDigits(shouldOnlyBeDigits: String): Either[IllegalArgumentException, BigInt] = {
#scala.annotation.tailrec
def recursive(
charactersRemaining: String = shouldOnlyBeDigits
, accumulator: List[Int] = Nil
): Either[IllegalArgumentException, List[Int]] =
if (charactersRemaining.isEmpty)
Right(accumulator) //All work completed without error
else {
val item = charactersRemaining.head
val isSuccess =
item.isDigit //Work the item
if (isSuccess)
//This item's work completed without error, so keep iterating
recursive(charactersRemaining.tail, (item - 48) :: accumulator)
else {
//This item hit an error, so short circuit
Left(new IllegalArgumentException(s"item [$item] is not a digit"))
}
}
recursive().map(digits => BigInt(digits.reverse.mkString))
}
When it is called as getDigits("1234") in a REPL (or Scala Worksheet), it returns:
val res0: Either[IllegalArgumentException,BigInt] = Right(1234)
And when called as getDigits("12A34") in a REPL (or Scala Worksheet), it returns:
val res1: Either[IllegalArgumentException,BigInt] = Left(java.lang.IllegalArgumentException: item [A] is not digit)
You can play with this in Scastie here:
https://scastie.scala-lang.org/7ddVynRITIOqUflQybfXUA

A grasp of immutable datastructures

I am learning scala and as a good student I try to obey all rules I found.
One rule is: IMMUTABILITY!!!
So I have tried to code everything with immutable data structures and vals, and sometimes this is really hard.
But today I thought to myself: the only important thing is that the object/class should have no mutable state. I am not forced to code all methods in an immutable style, because these methods don't affect each other.
My Question: Am I correct or are there any problems/disadvantages I dont see?
EDIT:
Code example for aishwarya:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
Explanation: It is a method to calculate the log-likelihood of a homogeneous Markov model of variable order. The apply method of state takes all previous symbols and the coming symbol and returns the probability of doing so.
As you may see: the whole method is just multiplying some probabilities which would be much easier using vars.
The rule is not really immutability, but referential transparency. It's perfectly OK to use locally declared mutable variables and arrays, because none of the effects are observable to any other parts of the overall program.
The principle of referential transparency (RT) is this:
An expression e is referentially transparent if for all programs p every occurrence of e in p can be replaced with the result of evaluating e, without affecting the observable result of p.
Note that if e creates and mutates some local state, it doesn't violate RT since nobody can observe this happening.
That said, I very much doubt that your implementation is any more straightforward with vars.
The case for functional programming is one of being concise in your code and bringing in a more mathematical approach. It can reduce the possibility of bugs and make your code smaller and more readable. As for being easier or not, it does require that you think about your problems differently. But once you get use to thinking with functional patterns it's likely that functional will become easier that the more imperative style.
It is really hard to be perfectly functional and have zero mutable state but very beneficial to have minimal mutable state. The thing to remember is that everything needs to done in balance and not to the extreme. By reducing the amount of mutable state you end up making it harder to write code with unintended consequences. A common pattern is to have a mutable variable whose value is immutable. This way identity ( the named variable ) and value ( an immutable object the variable can be assigned ) are seperate.
var acc: List[Int] = Nil
// lots of complex stuff that adds values
acc ::= 1
acc ::= 2
acc ::= 3
// do loop current list
acc foreach { i => /* do stuff that mutates acc */ acc ::= i * 10 }
println( acc ) // List( 1, 2, 3, 10, 20, 30 )
The foreach is looping over the value of acc at the time we started the foreach. Any mutations to acc do not affect the loop. This is much safer than the typical iterators in java where the list can change mid iteration.
There is also a concurrency concern. Immutable objects are useful because of the JSR-133 memory model specification which asserts that the initialization of an objects final members will occur before any thread can have visibility to those members, period! If they are not final then they are "mutable" and there is no guarantee of proper initialization.
Actors are the perfect place to put mutable state. Objects that represent data should be immutable. Take the following example.
object MyActor extends Actor {
var acc: List[Int] = Nil
def act() {
loop {
react {
case i: Int => acc ::= i
case "what is your current value" => reply( acc )
case _ => // ignore all other messages
}
}
}
}
In this case we can send the value of acc ( which is a List ) and not worry about synchronization because List is immutable aka all of the members of the List object are final. Also because of the immutability we know that no other actor can change the underlying data structure that was sent and thus no other actor can change the mutable state of this actor.
Since Apocalisp has already mentioned the stuff I was going to quote him on, I'll discuss the code. You say it is just multiplying stuff, but I don't see that -- it makes reference to at least three important methods defined outside: order, states and M.log. I can infer that order is an Int, and that states return a function that takes a List[T] and a T and returns Double.
There's also some weird stuff going on...
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
sequence is never used except to define seqPos, so why do that?
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
Actually, you could use sequence here instead of seqPos.map( _._1 ), since all that does is undo the zipWithIndex. Also, slice(0, pos) is just take(pos).
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
Now, given the missing methods, it is difficult to assert how this should really be written in functional style. Keeping the mystery methods would yield:
def logLikelihood(seq: Iterator[T]): Double = {
import scala.collection.immutable.Queue
case class State(index: Int, order: Int, slice: Queue[T], result: Double)
seq.foldLeft(State(0, 0, Queue.empty, 0.0)) {
case (State(index, ord, slice, result), symb) =>
val state = states(order)
val partial = M.log(state(slice, symb))
val newSlice = slice enqueue symb
State(index + 1,
if (ord == order) ord else ord + 1,
if (queue.size > order) newSlice.dequeue._2 else newSlice,
result + partial)
}.result
}
Only I suspect the state/M.log stuff could be made part of State as well. I notice other optimizations now that I have written it like this. The sliding window you are using reminds me, of course, of sliding:
seq.sliding(order).zipWithIndex.map {
case (slice, index) => M.log(states(index + order)(slice.init, slice.last))
}.sum
That will only start at the orderth element, so some adaptation would be in order. Not too difficult, though. So let's rewrite it again:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val slices = (1 until order).map(sequence take) ::: sequence.sliding(order)
slices.zipWithIndex.map {
case (slice, index) => M.log(states(index)(slice.init, slice.last))
}.sum
}
I wish I could see M.log and states... I bet I could turn that map into a foldLeft and do away with these two methods. And I suspect the method returned by states could take the whole slice instead of two parameters.
Still... not bad, is it?