Here is a imperative solution:
def longestCommonSubstring(a: String, b: String) : String = {
def loop(m: Map[(Int, Int), Int], bestIndices: List[Int], i: Int, j: Int) : String = {
if (i > a.length) {
b.substring(bestIndices(1) - m((bestIndices(0),bestIndices(1))), bestIndices(1))
} else if (i == 0 || j == 0) {
loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
} else if (a(i-1) == b(j-1) && math.max(m((bestIndices(0),bestIndices(1))), m((i-1,j-1)) + 1) == (m((i-1,j-1)) + 1)) {
loop(
m + ((i,j) -> (m((i-1,j-1)) + 1)),
List(i, j),
if(j == b.length) i + 1 else i,
if(j == b.length) 0 else j + 1
)
} else {
loop(m + ((i,j) -> 0), bestIndices, if(j == b.length) i + 1 else i, if(j == b.length) 0 else j + 1)
}
}
loop(Map[(Int, Int), Int](), List(0, 0), 0, 0)
}
I am looking for a more compact and functional way to find the Longest Common Substring.
def getAllSubstrings(str: String): Set[String] = {
str.inits.flatMap(_.tails).toSet
}
def longestCommonSubstring(str1: String, str2: String): String = {
val str1Substrings = getAllSubstrings(str1)
val str2Substrings = getAllSubstrings(str2)
str1Substrings.intersect(str2Substrings).maxBy(_.length)
}
First get all possible substrings (taken from here) in a set (to remove duplicates) for both strings and then intersect those sets and find the longest of the common substrings.
The code you have is already functional and not that complex. It also has asymptotically better time efficiency than the other currently posted solutions.
I'd just simplify it, clean up a bit and fix the bug:
def longestCommonSubstring(a: String, b: String) = {
def loop(bestLengths: Map[(Int, Int), Int], bestIndices: (Int, Int), i: Int, j: Int): String = {
if (i > a.length) {
val bestJ = bestIndices._2
b.substring(bestJ - bestLengths(bestIndices), bestJ)
} else {
val currentLength = if (a(i-1) == b(j-1)) bestLengths(i-1, j-1) + 1 else 0
loop(
if (currentLength != 0) bestLengths + ((i, j) -> currentLength) else bestLengths,
if (currentLength > bestLengths(bestIndices)) (i, j) else bestIndices,
if (j == b.length) i + 1 else i,
if (j == b.length) 1 else j + 1)
}
}
if (b.isEmpty) ""
else loop(Map.empty[(Int, Int), Int].withDefaultValue(0), (0, 0), 1, 1)
}
Sidenote: This is my third answer to this question because StackOverflow policy does not allow fundamentally replacing the content of a prior answer. And thanks to feedback from #Kolmar, this new answer is much more performant than my prior answer.
The LCS (Longest Common Substring) problem space has had many hours invested in which to find the optimal solution strategy. To observe the more general computer science problem and the optimal strategy, please review this Wikipedia article. Further down in this Wikipedia article is some pseudocode describing an implementation strategy.
Based on the Wikipedia article pseudocode, I will present several different solutions. The purpose is to allow one to copy/paste the specific solution which is needed without having to do much refactoring:
LCSubstr: Translation into Scala of the Wikipedia article pseudocode which uses an imperative mutable style
LCSubstrFp: Refactoring of LCSubstr into the idiomatic Scala functional immutable style
longestCommonSubstrings: Refactoring of LCSubstrFp to use descriptive names (such as left & right instead of s and t) and to skip storing zero lengths in the Map
longestCommonSubstringsFast: Refactoring of longestCommonSubstrings to deeply optimize for both CPU and memory
longestCommonSubstringsWithIndexes: Refactoring of longestCommonSubstringsFast to enhance the return value by expanding each entry into a tuple of (String, (Int, Int)) which includes both the found substring and the index within each input String at which the substring was found (CAUTION: This creates a combinatorial expansion of the index pairs if the same String appears more than once)
firstLongestCommonSubstring: An efficiency-focused version of longestCommonSubstringsFast which provides an opportunity to terminate early when only caring about the first LCS and would like to ignore the others of the same size.
BONUS: longestCommonSubstringsUltimate: Refactoring of longestCommonSubstringsFast to add internal implementation mutability while externally retaining the function's referential transparency.
The more direct answer to the OP's request would fall be somewhere between LCSubstrFp and longestCommonSubstringsFast. LCSubstrFp is the most direct, but fairly inefficient. Using longestCommonSubstringsFast is substantially more efficient as it ends up using far less CPU and GC. And if internal mutuability contained and constrained within a function's implementation is acceptable, then longestCommonSubstringsUltimate is by far the version with the smallest CPU burden and memory footprint.
LCSubstr
Translation into Scala of the Wikipedia article pseudocode which uses an imperative mutable style.
The intention is to come as close as it is possible with Scala to reproduce a one-to-one implementation. For example, Scala assumes zero-based indexing for String where the pseudocode clearly uses one-based indexing which required several tweaks.
def LCSubstr(s: String, t: String): scala.collection.mutable.Set[String] =
if (s.nonEmpty && t.nonEmpty) {
val l: scala.collection.mutable.Map[(Int, Int), Int] = scala.collection.mutable.Map.empty
var z: Int = 0
var ret: scala.collection.mutable.Set[String] = scala.collection.mutable.Set.empty
(0 until s.length).foreach {
i =>
(0 until t.length).foreach {
j =>
if (s(i) == t(j)) {
if ((i == 0) || (j == 0))
l += ((i, j) -> 1)
else
l += ((i, j) -> (l((i - 1, j - 1)) + 1))
if (l((i, j)) > z) {
z = l((i, j))
ret = scala.collection.mutable.Set(s.substring(i - z + 1, i + 1))
}
else
if (l((i, j)) == z)
ret += s.substring(i - z + 1, i + 1)
} else
l += ((i, j) -> 0)
}
}
ret
}
else scala.collection.mutable.Set.empty
LCSubstrFp
Refactoring of LCSubstr into the idiomatic Scala functional immutable style.
All imperative and mutation code have been replaced with function and immutable counterparts. The two for loops have been replaced with recursion.
def LCSubstrFp(s: String, t: String): Set[String] =
if (s.nonEmpty && t.nonEmpty) {
#scala.annotation.tailrec
def recursive(
i: Int = 0,
j: Int = 0,
z: Int = 0,
l: Map[(Int, Int), Int] = Map.empty,
ret: Set[String] = Set.empty
): Set[String] =
if (i < s.length) {
val (newI, newJ) =
if (j < t.length - 1) (i, j + 1)
else (i + 1, 0)
val lij =
if (s(i) != t(j)) 0
else
if ((i == 0) || (j == 0)) 1
else l((i - 1, j - 1)) + 1
recursive(
newI,
newJ,
if (lij > z) lij
else z,
l + ((i, j) -> lij),
if (lij > z) Set(s.substring(i - lij + 1, i + 1))
else
if ((lij == z) && (z > 0)) ret + s.substring(i - lij + 1, i + 1)
else ret
)
}
else ret
recursive()
}
else Set.empty
longestCommonSubstrings
Refactoring of LCSubstrFp to use descriptive names (such as left & right instead of s and t) and to skip storing zero lengths in the Map.
Besides enhancing readability, this refactoring dispenses with storing zero-length values in lengthByIndexLongerAndIndexShorter substantially reducing the amount of "memory churn". Again in a tweak towards the functional style, the return value has also been enhanced to never return an empty Set by wrapping the Set in an Option. If the value returned is a Some, the contained Set will always contain at least one item.
def longestCommonSubstrings(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) {
val (shorter, longer) =
if (left.length < right.length) (left, right)
else (right, left)
#scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthByIndexLongerAndIndexShorter: Map[(Int, Int), Int] = Map.empty,
accumulator: List[Int] = Nil
): (Int, List[Int]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else
if ((indexShorter == 0) || (indexLonger == 0)) 1
else lengthByIndexLongerAndIndexShorter.getOrElse((indexLonger - 1, indexShorter - 1), 0) + 1
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newLengthByIndexLongerAndIndexShorter =
if (length > 0) lengthByIndexLongerAndIndexShorter + ((indexLonger, indexShorter) -> length)
else lengthByIndexLongerAndIndexShorter
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = indexShorter - length + 1
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
newLengthByIndexLongerAndIndexShorter,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
newLengthByIndexLongerAndIndexShorter,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive()
if (indexShorters.nonEmpty)
Some(
indexShorters
.map {
indexShorter =>
shorter.substring(indexShorter, indexShorter + length)
}
.toSet
)
else None
}
else None
longestCommonSubstringsFast
Refactoring of longestCommonSubstrings to deeply optimize for both CPU and memory.
Eliminating every ounce of inefficiency while staying functional and immutable, execution speed is enhanced by several times over longestCommonSubstrings. The bulk of the cost reductions were achieved by replacing the Map of the entire matrix with a pair of Lists tracking only the current and prior rows.
To easily see the differences from longestCommonSubstrings, please view this visual diff.
def longestCommonSubstringsFast(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) {
val (shorter, longer) =
if (left.length < right.length) (left, right)
else (right, left)
#scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthsPrior: List[Int] = List.fill(shorter.length)(0),
lengths: List[Int] = Nil,
accumulator: List[Int] = Nil
): (Int, List[Int]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else lengthsPrior.head + 1
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = indexShorter - length + 1
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
lengthsPrior.tail,
length :: lengths,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
0 :: lengths.reverse,
Nil,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive()
if (indexShorters.nonEmpty)
Some(
indexShorters
.map {
indexShorter =>
shorter.substring(indexShorter, indexShorter + length)
}
.toSet
)
else None
}
else None
longestCommonSubstringsWithIndexes
Refactoring of longestCommonSubstringsFast to enhance the return value by expanding each entry into a tuple of (String, (Int, Int)) which includes both the found substring and the index within each input String at which the substring was found.
CAUTION: This creates a combinatorial expansion of the index pairs if the same String appears more than once.
Again in a tweak towards the functional style, the return value has also been enhanced to never return an empty List by wrapping the List in an Option. If the value returned is a Some, the contained List will always contain at least one item.
def longestCommonSubstringsWithIndexes(left: String, right: String): Option[List[(String, (Int, Int))]] =
if (left.nonEmpty && right.nonEmpty) {
val isLeftShorter = left.length < right.length
val (shorter, longer) =
if (isLeftShorter) (left, right)
else (right, left)
#scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthsPrior: List[Int] = List.fill(shorter.length)(0),
lengths: List[Int] = Nil,
accumulator: List[(Int, Int)] = Nil
): (Int, List[(Int, Int)]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else lengthsPrior.head + 1
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = (indexLonger - length + 1, indexShorter - length + 1)
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
lengthsPrior.tail,
length :: lengths,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
0 :: lengths.reverse,
Nil,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexPairs) = recursive()
if (indexPairs.nonEmpty)
Some(
indexPairs
.reverse
.map {
indexPair =>
(
longer.substring(indexPair._1, indexPair._1 + length),
if (isLeftShorter) indexPair.swap else indexPair
)
}
)
else None
}
else None
firstLongestCommonSubstring
An efficiency-focused version of longestCommonSubstringsFast which provides an opportunity to terminate early when only caring about the first LCS and would like to ignore the others of the same size.
def firstLongestCommonSubstring(left: String, right: String): Option[(String, (Int, Int))] =
if (left.nonEmpty && right.nonEmpty) {
val isLeftShorter = left.length < right.length
val (shorter, longer) =
if (isLeftShorter) (left, right)
else (right, left)
#scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lengthsPrior: List[Int] = List.fill(shorter.length)(0),
lengths: List[Int] = Nil,
accumulator: Option[(Int, Int)] = None
): Option[(Int, (Int, Int))] =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else lengthsPrior.head + 1
val newAccumulator =
if (length > currentLongestLength) Some((indexLonger - length + 1, indexShorter - length + 1))
else accumulator
if (length < shorter.length) {
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
lengthsPrior.tail,
length :: lengths,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
0 :: lengths.reverse,
Nil,
newAccumulator
)
}
else
recursive(longer.length, 0, length, lengthsPrior, lengths, newAccumulator) //early terminate
}
else accumulator.map((currentLongestLength, _))
recursive().map {
case (length, indexPair) =>
(
longer.substring(indexPair._1, indexPair._1 + length),
if (isLeftShorter) indexPair.swap else indexPair
)
}
}
else None
BONUS:
longestCommonSubstringsUltimate
Refactoring of longestCommonSubstringsFast to add internal implementation mutability while externally retaining the function's referential transparency.
Further eliminating every ounce of inefficiency while staying functional and referentially transparent (engaging in utilizing mutability within the implementation itself which some consider is not validly functional), execution speed is enhanced by almost three times over longestCommonSubstringsFast. The bulk of the cost reductions were by replacing the pair of Lists with a single Array.
To easily see the differences from longestCommonSubstringsFast, please view this visual diff.
def longestCommonSubstringsUltimate(left: String, right: String): Option[Set[String]] =
if (left.nonEmpty && right.nonEmpty) {
val (shorter, longer) =
if (left.length < right.length) (left, right)
else (right, left)
val lengths: Array[Int] = new Array(shorter.length) //mutable
#scala.annotation.tailrec
def recursive(
indexLonger: Int = 0,
indexShorter: Int = 0,
currentLongestLength: Int = 0,
lastIterationLength: Int = 0,
accumulator: List[Int] = Nil
): (Int, List[Int]) =
if (indexLonger < longer.length) {
val length =
if (longer(indexLonger) != shorter(indexShorter)) 0
else
if (indexShorter == 0) 1
else lastIterationLength + 1
val newLastIterationLength = lengths(indexShorter)
lengths(indexShorter) = length //mutation
val newCurrentLongestLength =
if (length > currentLongestLength) length
else currentLongestLength
val newAccumulator =
if ((length < currentLongestLength) || (length == 0)) accumulator
else {
val entry = indexShorter - length + 1
if (length > currentLongestLength) List(entry)
else entry :: accumulator
}
if (indexShorter < shorter.length - 1)
recursive(
indexLonger,
indexShorter + 1,
newCurrentLongestLength,
newLastIterationLength,
newAccumulator
)
else
recursive(
indexLonger + 1,
0,
newCurrentLongestLength,
newLastIterationLength,
newAccumulator
)
}
else (currentLongestLength, accumulator)
val (length, indexShorters) = recursive()
if (indexShorters.nonEmpty)
Some(
indexShorters
.map {
indexShorter =>
shorter.substring(indexShorter, indexShorter + length)
}
.toSet
)
else None
}
else None
UPDATE: DO NOT USE THE APPROACH DETAILED BELOW.
I should have paid more attention to the OP's expressly provided implementation. Unfortunately, I got distracted by all the other answers using inefficient String oriented comparisons and went on a tear to provide my own optimized version of those gleeful that I was able to use Stream and LazyList.
I've now added an additional answer (per StackOverflow's policy) which covers substantially faster Scala functional style solutions.
A Stream focused solution might be the following:
def substrings(a:String, len:Int): Stream[String] =
if(len==0)
Stream.empty
else
a.tails.toStream.takeWhile(_.size>=len).map(_.take(len)) #::: substrings(a, len-1)
def longestCommonSubstring(a:String, b:String) =
substrings(a, a.length).dropWhile(sub => !b.contains(sub)).headOption
Here substrings method returns Stream producing decreasing length substrings of the original string, for example "test" produces "test", "tes", "est", "te", "es",...
Method longestCommonSubstring takes first substring generated from a which is contained in string b
UPDATE: After posting this answer, and thanks to #Kolmar's feedback, I discovered that a Char indexing strategy was substantially faster (like at least an order of magnitude). I've now added an additional answer (per StackOverflow's policy) which covers substantially faster Scala functional style solutions.
I should have paid more attention to the OP's specifically provided implementation. Unfortunately, I got distracted by all the other answers using inefficient String oriented comparisons and went on a tear to provide my own optimized version of those gleeful that I was able to use Stream and LazyList.
Beyond the OP's request, I had several additional requirements for composing a solution to finding the longest common substring (LCS) between two String instances.
Solution Requirements:
Eagerly find the first LCS between two String instances
Minimize CPU effort by comparing fewer String instances
Minimize GC effort by producing fewer String instances
Maximize being Scala idiomatic, including use of the Scala Collections API
The first goal is to capture the general search strategy. The process starts with the left String instance, producing an ordered list of substrings from the longest (the original String instance itself) to the shortest (single characters). For example, if the left String instance contains "ABCDEF", the resulting list of String instances should be produced and in exactly this order:
[
ABCDEF,
ABCDE, BCDEF,
ABCD, BCDE, CDEF,
ABC, BCD, CDE, DEF,
AB,BC,CD,DE,EF,
A,B,C,D,E,F
]
Next, an iteration is started through this list of left substring instances, halting as soon as a particular left substring instance is found at any index within the right String instance. When a left substring instance is found, it is returned. Otherwise, an indication there were no matches found is returned.
There are two specific things to note about the "eager" approach to satisfying Solution Requirement #1:
The left substring instance found could appear at more than one index in the right String instance. This means searching for the left substring instance from the start of the right String instance using indexOf could result in a different index value than searching from the end of the right String instance using lastIndexOf.
There could be a second (different) left substring instance of the same length which also appears in the right String instance. This implementation ignores that possibility.
Solution for Scala 2.13/Dotty (a.k.a 3.0) - uses LazyList:
Stream was deprecated as of 2.13.
def longestCommonSubstring(left: String, right: String): Option[String] =
if (left.nonEmpty && right.nonEmpty) {
def substrings(string: String): LazyList[String] = {
def recursive(size: Int = string.length): LazyList[String] = {
if (size > 0) {
def ofSameLength: LazyList[String] =
(0 to (string.length - size))
.iterator.to(LazyList)
.map(offset => string.substring(offset, offset + size))
ofSameLength #::: recursive(size - 1)
}
else
LazyList.empty
}
recursive()
}
val (shorter, longer) =
if (left.length <= right.length)
(left, right)
else
(right, left)
substrings(shorter).find(longer.contains)
}
else
None
Solution Scala 2.12 and prior - uses Stream:
def longestCommonSubstring(left: String, right: String): Option[String] =
if (left.nonEmpty && right.nonEmpty) {
def substrings(string: String): Stream[String] = {
def recursive(size: Int = string.length): Stream[String] = {
if (size > 0) {
def ofSameLength: Stream[String] =
(0 to (string.length - size))
.toStream
.map(offset => string.substring(offset, offset + size))
ofSameLength #::: recursive(size - 1)
}
else
Stream.empty
}
recursive()
}
val (shorter, longer) =
if (left.length <= right.length)
(left, right)
else
(right, left)
substrings(shorter).find(longer.contains)
}
else
None
Notes:
Providing a visual diff between the two versions to quickly see the delta
To cover the various edge cases (ex: providing an empty String as input), an Option[String] is used for the function's return type.
In satisfying Solution Requirements #2 and #3, the shorter of the two String instances is set as shorter to reduce instantiations and comparisons with Strings longer than the longer String instance.
In satisfying Solution Requirements #2 and #3, the production of shorter substring instances are sub-batched by identical size with duplicates removed (via distinct). And then each sub-batch is added to a LazyList (or Stream). This ensures only the instantiation the shorter substring instances actually needed are provided to the longer.contains function.
I think that a code with a for comprehension looks very clear and functional.
def getSubstrings(s:String) =
for {
start <- 0 until s.size
end <- start to s.size
} yield s.substring(start, end)
def getLongest(one: String, two: String): Seq[String] =
getSubstrings(one).intersect(getSubstrings(two))
.groupBy(_.length).maxBy(_._1)._2
The final function returns a Seq[String] as far as the result might contain several substrings with the same maximum length
How about this approach:
Get all substrings:
left.inits.flatMap(_.tails)
Sort it in reverse order based on lenght
.toList.sortBy(_.length).reverse
find the first match
.find(right.contains(_)).get
Full function:
def lcs(left: String, right: String) = {
left.inits.flatMap(_.tails)
.toList.sortBy(_.length).reverse
.find(right.contains(_)).get
}
Note:
get will never be empty, since initial string permutation also contains empty string, which will always match something.
I have the following two code snippets in Scala:
/* Iterative */
for (i <- max to sum by min) {
if (sum % i == 0) validBusSize(i, L, 0)
}
/* Functional */
List.range(max, sum + 1, min)
.filter(sum % _ == 0)
.map(validBusSize(_, L, 0))
Both these code snippets are part of otherwise identical objects. However, when I run my code on Hackerrank, the object with the iterative snippet takes a maximum of 1.45 seconds, while the functional snippet causes the code to take > 7 seconds, which is a timeout.
I'd like to know if it's possible to rewrite the for loop functionally while retaining the speed. I took a look at the Stream container, but again I'll have to call filter before map, instead of computing each validBusSize sequentially.
Thanks!
Edit:
/* Full Code */
import scala.io.StdIn.readLine
object BusStation {
def main(args: Array[String]) {
readLine
val L = readLine.split(" ").map(_.toInt).toList
val min = L.min
val max = L.max
val sum = L.foldRight(0)(_ + _)
/* code under consideration */
for (i <- max to sum by min) {
if (sum % i == 0) validBusSize(i, L, 0)
}
}
def validBusSize(size: Int, L: List[Int], curr: Int) {
L match {
case Nil if (curr == size) => print(size + " ")
case head::tail if (curr < size) =>
validBusSize(size, tail, curr + head)
case head::tail if (curr == size) => validBusSize(size, tail, head)
case head::tail if (curr > size) => return
}
}
}
Right now, your best bet for fast functional code is tail-recursive functions:
#annotation.tailrec
def getBusSizes(i: Int, sum: Int, step: Int) {
if (i <= sum) {
if (sum % i == 0) validBusSize(i, L, 0)
getBusSizes(i + step, sum, step)
}
}
Various other things will be sort of fast-ish, but for something like this where there's mostly simple math, the overhead from the generic interface will be sizable. With a tail-recursive function you'll get a while loop underneath. (You don't need the annotation to make it tail-recursive; that just causes the compilation to fail if it can't. The optimization happens whether the annotation is there or not.)
So apparently the following worked:
Replacing the List.range(max, sum + 1, min) with a Range object, (max to sum by min). Going to ask another questions about why this works though.
Consider converting the range into a parallel version with keyword par, for instance like this
(max to sum by min).par
This may improve performance especially for large sized ranges with large values on calling validBusSize.
Thus in the proposed for comprehension,
for ( i <- (max to sum by min).par ) {
if (sum % i == 0) validBusSize(i, L, 0)
}
I was wondering if there is some general method to convert a "normal" recursion with foo(...) + foo(...) as the last call to a tail-recursion.
For example (scala):
def pascal(c: Int, r: Int): Int = {
if (c == 0 || c == r) 1
else pascal(c - 1, r - 1) + pascal(c, r - 1)
}
A general solution for functional languages to convert recursive function to a tail-call equivalent:
A simple way is to wrap the non tail-recursive function in the Trampoline monad.
def pascalM(c: Int, r: Int): Trampoline[Int] = {
if (c == 0 || c == r) Trampoline.done(1)
else for {
a <- Trampoline.suspend(pascal(c - 1, r - 1))
b <- Trampoline.suspend(pascal(c, r - 1))
} yield a + b
}
val pascal = pascalM(10, 5).run
So the pascal function is not a recursive function anymore. However, the Trampoline monad is a nested structure of the computation that need to be done. Finally, run is a tail-recursive function that walks through the tree-like structure, interpreting it, and finally at the base case returns the value.
A paper from RĂșnar Bjanarson on the subject of Trampolines: Stackless Scala With Free Monads
In cases where there is a simple modification to the value of a recursive call, that operation can be moved to the front of the recursive function. The classic example of this is Tail recursion modulo cons, where a simple recursive function in this form:
def recur[A](...):List[A] = {
...
x :: recur(...)
}
which is not tail recursive, is transformed into
def recur[A]{...): List[A] = {
def consRecur(..., consA: A): List[A] = {
consA :: ...
...
consrecur(..., ...)
}
...
consrecur(...,...)
}
Alexlv's example is a variant of this.
This is such a well known situation that some compilers (I know of Prolog and Scheme examples but Scalac does not do this) can detect simple cases and perform this optimisation automatically.
Problems combining multiple calls to recursive functions have no such simple solution. TMRC optimisatin is useless, as you are simply moving the first recursive call to another non-tail position. The only way to reach a tail-recursive solution is remove all but one of the recursive calls; how to do this is entirely context dependent but requires finding an entirely different approach to solving the problem.
As it happens, in some ways your example is similar to the classic Fibonnaci sequence problem; in that case the naive but elegant doubly-recursive solution can be replaced by one which loops forward from the 0th number.
def fib (n: Long): Long = n match {
case 0 | 1 => n
case _ => fib( n - 2) + fib( n - 1 )
}
def fib (n: Long): Long = {
def loop(current: Long, next: => Long, iteration: Long): Long = {
if (n == iteration)
current
else
loop(next, current + next, iteration + 1)
}
loop(0, 1, 0)
}
For the Fibonnaci sequence, this is the most efficient approach (a streams based solution is just a different expression of this solution that can cache results for subsequent calls). Now,
you can also solve your problem by looping forward from c0/r0 (well, c0/r2) and calculating each row in sequence - the difference being that you need to cache the entire previous row. So while this has a similarity to fib, it differs dramatically in the specifics and is also significantly less efficient than your original, doubly-recursive solution.
Here's an approach for your pascal triangle example which can calculate pascal(30,60) efficiently:
def pascal(column: Long, row: Long):Long = {
type Point = (Long, Long)
type Points = List[Point]
type Triangle = Map[Point,Long]
def above(p: Point) = (p._1, p._2 - 1)
def aboveLeft(p: Point) = (p._1 - 1, p._2 - 1)
def find(ps: Points, t: Triangle): Long = ps match {
// Found the ultimate goal
case (p :: Nil) if t contains p => t(p)
// Found an intermediate point: pop the stack and carry on
case (p :: rest) if t contains p => find(rest, t)
// Hit a triangle edge, add it to the triangle
case ((c, r) :: _) if (c == 0) || (c == r) => find(ps, t + ((c,r) -> 1))
// Triangle contains (c - 1, r - 1)...
case (p :: _) if t contains aboveLeft(p) => if (t contains above(p))
// And it contains (c, r - 1)! Add to the triangle
find(ps, t + (p -> (t(aboveLeft(p)) + t(above(p)))))
else
// Does not contain(c, r -1). So find that
find(above(p) :: ps, t)
// If we get here, we don't have (c - 1, r - 1). Find that.
case (p :: _) => find(aboveLeft(p) :: ps, t)
}
require(column >= 0 && row >= 0 && column <= row)
(column, row) match {
case (c, r) if (c == 0) || (c == r) => 1
case p => find(List(p), Map())
}
}
It's efficient, but I think it shows how ugly complex recursive solutions can become as you deform them to become tail recursive. At this point, it may be worth moving to a different model entirely. Continuations or monadic gymnastics might be better.
You want a generic way to transform your function. There isn't one. There are helpful approaches, that's all.
I don't know how theoretical this question is, but a recursive implementation won't be efficient even with tail-recursion. Try computing pascal(30, 60), for example. I don't think you'll get a stack overflow, but be prepared to take a long coffee break.
Instead, consider using a Stream or memoization:
val pascal: Stream[Stream[Long]] =
(Stream(1L)
#:: (Stream from 1 map { i =>
// compute row i
(1L
#:: (pascal(i-1) // take the previous row
sliding 2 // and add adjacent values pairwise
collect { case Stream(a,b) => a + b }).toStream
++ Stream(1L))
}))
The accumulator approach
def pascal(c: Int, r: Int): Int = {
def pascalAcc(acc:Int, leftover: List[(Int, Int)]):Int = {
if (leftover.isEmpty) acc
else {
val (c1, r1) = leftover.head
// Edge.
if (c1 == 0 || c1 == r1) pascalAcc(acc + 1, leftover.tail)
// Safe checks.
else if (c1 < 0 || r1 < 0 || c1 > r1) pascalAcc(acc, leftover.tail)
// Add 2 other points to accumulator.
else pascalAcc(acc, (c1 , r1 - 1) :: ((c1 - 1, r1 - 1) :: leftover.tail ))
}
}
pascalAcc(0, List ((c,r) ))
}
It does not overflow the stack but as on big row and column but Aaron mentioned it's not fast.
Yes it's possible. Usually it's done with accumulator pattern through some internally defined function, which has one additional argument with so called accumulator logic, example with counting length of a list.
For example normal recursive version would look like this:
def length[A](xs: List[A]): Int = if (xs.isEmpty) 0 else 1 + length(xs.tail)
that's not a tail recursive version, in order to eliminate last addition operation we have to accumulate values while somehow, for example with accumulator pattern:
def length[A](xs: List[A]) = {
def inner(ys: List[A], acc: Int): Int = {
if (ys.isEmpty) acc else inner(ys.tail, acc + 1)
}
inner(xs, 0)
}
a bit longer to code, but i think the idea i clear. Of cause you can do it without inner function, but in such case you should provide acc initial value manually.
I'm pretty sure it's not possible in the simple way you're looking for the general case, but it would depend on how elaborate you permit the changes to be.
A tail-recursive function must be re-writable as a while-loop, but try implementing for example a Fractal Tree using while-loops. It's possble, but you need to use an array or collection to store the state for each point, which susbstitutes for the data otherwise stored in the call-stack.
It's also possible to use trampolining.
It is indeed possible. The way I'd do this is to
begin with List(1) and keep recursing till you get to the
row you want.
Worth noticing that you can optimize it: if c==0 or c==r the value is one, and to calculate let's say column 3 of the 100th row you still only need to calculate the first three elements of the previous rows.
A working tail recursive solution would be this:
def pascal(c: Int, r: Int): Int = {
#tailrec
def pascalAcc(c: Int, r: Int, acc: List[Int]): List[Int] = {
if (r == 0) acc
else pascalAcc(c, r - 1,
// from let's say 1 3 3 1 builds 0 1 3 3 1 0 , takes only the
// subset that matters (if asking for col c, no cols after c are
// used) and uses sliding to build (0 1) (1 3) (3 3) etc.
(0 +: acc :+ 0).take(c + 2)
.sliding(2, 1).map { x => x.reduce(_ + _) }.toList)
}
if (c == 0 || c == r) 1
else pascalAcc(c, r, List(1))(c)
}
The annotation #tailrec actually makes the compiler check the function
is actually tail recursive.
It could be probably be further optimized since given that the rows are symmetric, if c > r/2, pascal(c,r) == pascal ( r-c,r).. but left to the reader ;)