How can I emit periodic results over an iteration? - scala

I might have something like this:
val found = source.toCharArray.foreach{ c =>
// Process char c
// Sometimes (e.g. on newline) I want to emit a result to be
// captured in 'found'. There may be 0 or more captured results.
}
This shows my intent. I want to iterate over some collection of things. Whenever the need arrises I want to "emit" a result to be captured in found. It's not a direct 1-for-1 like map. collect() is a "pull", applying a partial function over the collection. I want a "push" behavior, where I visit everything but push out something when needed.
Is there a pattern or collection method I'm missing that does this?

Apparently, you have a Collection[Thing], and you want to obtain a new Collection[Event] by emitting a Collection[Event] for each Thing. That is, you want a function
(Collection[Thing], Thing => Collection[Event]) => Collection[Event]
That's exactly what flatMap does.
You can write it down with nested fors where the second generator defines what "events" have to be "emitted" for each input from the source. For example:
val input = "a2ba4b"
val result = (for {
c <- input
emitted <- {
if (c == 'a') List('A')
else if (c.isDigit) List.fill(c.toString.toInt)('|')
else Nil
}
} yield emitted).mkString
println(result)
prints
A||A||||
because each 'a' emits an 'A', each digit emits the right amount of tally marks, and all other symbols are ignored.
There are several other ways to express the same thing, for example, the above expression could also be rewritten with an explicit flatMap and with a pattern match instead of if-else:
println(input.flatMap{
case 'a' => "A"
case d if d.isDigit => "|" * (d.toString.toInt)
case _ => ""
})

I think you are looking for a way to build a Stream for your condition. Streams are lazy and are computed only when required.
val sourceString = "sdfdsdsfssd\ndfgdfgd\nsdfsfsggdfg\ndsgsfgdfgdfg\nsdfsffdg\nersdff\n"
val sourceStream = sourceString.toCharArray.toStream
def foundStreamCreator( source: Stream[Char], emmitBoundaryFunction: Char => Boolean): Stream[String] = {
def loop(sourceStream: Stream[Char], collector: List[Char]): Stream[String] =
sourceStream.isEmpty match {
case true => collector.mkString.reverse #:: Stream.empty[String]
case false => {
val char = sourceStream.head
emmitBoundaryFunction(char) match {
case true =>
collector.mkString.reverse #:: loop(sourceStream.tail, List.empty[Char])
case false =>
loop(sourceStream.tail, char :: collector)
}
}
}
loop(source, List.empty[Char])
}
val foundStream = foundStreamCreator(sourceStream, c => c == '\n')
val foundIterator = foundStream.toIterator
foundIterator.next()
// res0: String = sdfdsdsfssd
foundIterator.next()
// res1: String = dfgdfgd
foundIterator.next()
// res2: String = sdfsfsggdfg

It looks like foldLeft to me:
val found = ((List.empty[String], "") /: source.toCharArray) {case ((agg, tmp), char) =>
if (char == '\n') (tmp :: agg, "") // <- emit
else (agg, tmp + char)
}._1
Where you keep collecting items in a temporary location and then emit it when you run into a character signifying something. Since I used List you'll have to reverse at the end if you want it in order.

Related

Recursive sorting in Scala including an exit condition

I am attempting to sort a List of names using Scala, and I am trying to learn how to do this recursively. The List is a List of Lists, with the "element" list containing two items (lastName, firstName). My goal is to understand how to use recursion to sort the names. For the purpose of this post my goal is just to sort by the length of lastName.
If I call my function several times on a small sample list, it will successfully sort lastName by length from shortest to longest, but I have not been able to construct a satisfactory exit condition using recursion. I have tried variations of foreach and other loops, but I have been unsuccessful. Without a satisfactory exit condition, the recursion just continues forever.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
val nameListBuffer = new ListBuffer[List[String]]
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
for (line <- lines) {
nameListBuffer += line.split(" ").reverse.toList
}
#tailrec
def sorting(x: ListBuffer[List[String]]): Unit = {
for (i <- 0 until ((x.length)-1)) {
var temp = x(i)
if (x(i)(0).length > x(i+1)(0).length) {
x(i) = x(i+1)
x(i+1) = temp
}
}
var continue = false
while (continue == false) {
for (i <- 0 until ((x.length)-1)) {
if (x(i)(0).length <= x(i+1)(0).length) {
continue == false//want to check ALL i's before evaluating
}
else continue == true
}
}
sorting(x)
}
sorting(nameListBuffer)
Sorry about the runtime complexity it's basically an inefficient bubble sort at O(n^4) but the exit criteria - focus on that. For tail recursion, the key is that the recursive call is to a smaller element than the preceding recursive call. Also, keep two arguments, one is the original list, and one is the list that you are accumulating (or whatever you want to return, it doesn't have to be a list). The recursive call keeps getting smaller until eventually you can return what you have accumulated. Use pattern matching to catch when the recursion has ended, and then you return what you were accumulating. This is why Lists are so popular in Scala, because of the Nil and Cons subtypes and because of operators like the :: can be handled nicely with pattern matching. One more thing, to be tail recursive, the last case has to make a recursive or it won't run.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
// I did not ingest from file I just created the test list from some literals
val dummyNameList = List(
List("Johanson", "John"), List("Nim", "Bryan"), List("Mack", "Craig")
, List("Youngs", "Daniel"), List("Williamson", "Zion"), List("Rodgersdorf", "Aaron"))
// You can use this code to populate nameList though I didn't run this code
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
val nameList = {
for (line <- lines) yield line.split(" ").reverse.toList
}.toList
println("\nsorted:")
sortedNameList.foreach(println(_))
//This take one element and it will return the lowest element from the list
//of the other argument.
#tailrec
private def swapElem(elem: List[String], listOfLists: List[List[String]]): List[String] = listOfLists match {
case Nil => elem
case h::t if (elem(0).length > h(0).length) => swapElem(h, t)
case h::t => swapElem(elem, t)
}
//If the head is not the smallest element, then swap out the element
//with the smallest element of the list. I probably could have returned
// a tuple it might have looked nicer. It just keeps iterating though until
// there is no elements
#tailrec
private def iterate(listOfLists: List[List[String]], acc: List[List[String]]): List[List[String]] = listOfLists match {
case h::Nil => acc :+ h
case h::t if (swapElem(h, t) != h) => iterate(h :: t.filterNot(_ == swapElem(h, t)), acc :+ swapElem(h, t))
case h::t => iterate(t, acc :+ swapElem(h, t))
}
val sortedNameList = iterate(nameList, List.empty[List[String]])
println("\nsorted:")
sortedNameList.foreach(println(_))
sorted:
List(Nim, Bryan)
List(Mack, Craig)
List(Youngs, Daniel)
List(Johanson, John)
List(Williamson, Zion)
List(Rodgersdorf, Aaron)

Scala - how to loop around list in functional way

I need to iterate over a list and compare the previous element in the list to current. Being from traditional programming background, I used for loop with a variable to hold the previous value but with Scala I believe there should be a better way to do it.
for (item <- dataList)
{
// Code to compare previous list element
prevElement = item
}
The code above works and give correct results but I wanted to explore the functional way of writing the same code.
There are a number of different possible solutions to this. Perhaps the most generic is to use a helper function:
// This example finds the minimum value (the hard way) in a list of integers. It's
// not looking at the previous element, as such, but you can use the same approach.
// I wanted to make the example something meaningful.
def findLowest(xs: List[Int]): Option[Int] = {
#tailrec // Guarantee tail-recursive implementation for safety & performance.
def helper(lowest: Int, rem: List[Int]): Int = {
// If there are no elements left, return lowest found.
if(rem.isEmpty) lowest
// Otherwise, determine new lowest value for next iteration. (If you need the
// previous value, you could put that here.)
else {
val newLow = Math.min(lowest, rem.head)
helper(newLow, rem.tail)
}
}
// Start off looking at the first member, guarding against an empty list.
xs match {
case x :: rem => Some(helper(x, rem))
case _ => None
}
}
In this particular case, this can be replaced by a fold.
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.fold(h)((b, m) => Math.min(b, m)))
case _ => None
}
which can be simplified further to just:
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.foldLeft(h)(Math.min))
case _ => None
}
In this case, we're using the zero argument of the fold operation to store the minimum value.
If this seems too abstract, can you post more details about what you would like to do in your loop, and I can address that in my answer?
Update: OK, so having seen your comment about the dates, here's a more specific version. I've used DateRange (which you will need to replace with your actual type) in this example. You will also have to determine what overlap and merge need to do. But here's your solution:
// Determine whether two date ranges overlap. Return true if so; false otherwise.
def overlap(a: DateRange, b: DateRange): Boolean = { ... }
// Merge overlapping a & b date ranges, return new range.
def merge(a: DateRange, b: DateRange): DateRange = { ... }
// Convert list of date ranges into another list, in which all consecutive,
// overlapping ranges are merged into a single range.
def mergeDateRanges(dr: List[DateRange]): List[DateRange] = {
// Helper function. Builds a list of merged date ranges.
def helper(prev: DateRange, ret: List[DateRange], rem: List[DateRange]):
List[DateRange] = {
// If there are no more date ranges to process, prepend the previous value
// to the list that we'll return.
if(rem.isEmpty) prev :: ret
// Otherwise, determine if the previous value overlaps with the current
// head value. If it does, create a new previous value that is the merger
// of the two and leave the returned list alone; if not, prepend the
// previous value to the returned list and make the previous value the
// head value.
else {
val (newPrev, newRet) = if(overlap(prev, rem.head)) {
(merge(prev, rem.head), ret)
}
else (rem.head, prev :: ret)
// Next iteration of the helper (I missed this off, originally, sorry!)
helper(newPrev, newRet, rem.tail)
}
}
// Start things off, trapping empty list case. Because the list is generated by pre-pending elements, we have to reverse it to preserve the order.
dr match {
case h :: rem => helper(h, Nil, rem).reverse // <- Fixed bug here too...
case _ => Nil
}
}
If you're looking to just compare the two sequential elements and get the boolean result:
val list = Seq("one", "two", "three", "one", "one")
val result = list.sliding(2).collect { case Seq(a,b) => a.equals(b)}.toList
The previous will give you this
List(false, false, false, true)
An alternative to the previous one is that you might want to get the value:
val result = list.sliding(2).collect {
case Seq(a,b) => if (a.equals(b)) Some(a) else None
case _ => None
}.toList.filter(_.isDefined).map(_.get)
This will give the following:
List(one)
But if you're just looking to compare items in list, an option would be to do something like this:
val list = Seq("one", "two", "three", "one")
for {
item1 <- list
item2 <- list
_ <- Option(println(item1 + ","+ item2))
} yield ()
This results in:
one,one
one,two
one,three
one,one
two,one
two,two
two,three
two,one
three,one
three,two
three,three
three,one
one,one
one,two
one,three
one,one
If you are not mutating the list, and are only trying to compare two elements side-by-side.
This can help you. It's functional.
def cmpPrevElement[T](l: List[T]): List[(T,T)] = {
(l.indices).sliding(2).toList.map(e => (l(e.head), l(e.tail.head)))
}

Any way of simplifying this Scala code (pattern matching)?

I have some values provided by the user as Option[String]. I want to validate them only if they are non-empty.
The validation just checks that the string can be converted to an int, and is not less that 0.
Is there any way of simplifying this code, or making it more readable?
val top: Option[String] = ...
val skip: Option[String] = ...
val validationErrors = new ListBuffer[Err]()
top match {
case Some(x) => if (x.toIntOpt.isEmpty || x.toIntOpt.get < 0) validationErrors += PositiveIntegerRequired("$top")
}
skip match {
case Some(x) => if (x.toIntOpt.isEmpty || x.toIntOpt.get < 0) validationErrors += PositiveIntegerRequired("$skip")
}
And here is the toIntOpt helper:
def toIntOpt: Option[Int] = Try(s.toInt).toOption
Yes it can be simplified a lot, by making use of flatMap and for-comprehensions and collect:
def checkInt[A](stringOpt: Option[String], a: A): Option[A] = for {
s <- stringOpt
i <- s.toIntOpt if (i < 0)
} yield a
val validationErrors = List(
checkInt(top, PositiveIntegerRequired("$top")),
checkInt(skip, PositiveIntegerRequired("$skip"))
).collect {
case Some(x) => x
}
The first function checkIntreturns a value a if the original Option was non-empty and contained an invalid negative integer.
Then we put both into a List and collect only the values that are non-empty, resulting in a List of Err with no need for creating an intermediate Buffer.
An even easier way to do something like this can be found with the Validated type found in the cats library: https://typelevel.org/cats/datatypes/validated.html
An alternative to using for-comprehensions is to use a map with a filter.
top.map(toIntOpt)
.filter(i => i.getOrElse(-1) < 0)
.foreach(_ => validationErrors += PositiveIntegerRequired("$top"))
The map call will result in a Option[Option[Int]] which is then filtered based on the value of the nested Option (defaulting to -1 if the option is None).

Simple functionnal way for grouping successive elements? [duplicate]

I'm trying to 'group' a string into segments, I guess this example would explain it more succintly
scala> val str: String = "aaaabbcddeeeeeeffg"
... (do something)
res0: List("aaaa","bb","c","dd","eeeee","ff","g")
I can thnk of a few ways to do this in an imperative style (with vars and stepping through the string to find groups) but I was wondering if any better functional solution could
be attained? I've been looking through the Scala API but there doesn't seem to be something that fits my needs.
Any help would be appreciated
You can split the string recursively with span:
def s(x : String) : List[String] = if(x.size == 0) Nil else {
val (l,r) = x.span(_ == x(0))
l :: s(r)
}
Tail recursive:
#annotation.tailrec def s(x : String, y : List[String] = Nil) : List[String] = {
if(x.size == 0) y.reverse
else {
val (l,r) = x.span(_ == x(0))
s(r, l :: y)
}
}
Seems that all other answers are very concentrated on collection operations. But pure string + regex solution is much simpler:
str split """(?<=(\w))(?!\1)""" toList
In this regex I use positive lookbehind and negative lookahead for the captured char
def group(s: String): List[String] = s match {
case "" => Nil
case s => s.takeWhile(_==s.head) :: group(s.dropWhile(_==s.head))
}
Edit: Tail recursive version:
def group(s: String, result: List[String] = Nil): List[String] = s match {
case "" => result reverse
case s => group(s.dropWhile(_==s.head), s.takeWhile(_==s.head) :: result)
}
can be used just like the other because the second parameter has a default value and thus doesnt have to be supplied.
Make it one-liner:
scala> val str = "aaaabbcddddeeeeefff"
str: java.lang.String = aaaabbcddddeeeeefff
scala> str.groupBy(identity).map(_._2)
res: scala.collection.immutable.Iterable[String] = List(eeeee, fff, aaaa, bb, c, dddd)
UPDATE:
As #Paul mentioned about the order here is updated version:
scala> str.groupBy(identity).toList.sortBy(_._1).map(_._2)
res: List[String] = List(aaaa, bb, c, dddd, eeeee, fff)
You could use some helper functions like this:
val str = "aaaabbcddddeeeeefff"
def zame(chars:List[Char]) = chars.partition(_==chars.head)
def q(chars:List[Char]):List[List[Char]] = chars match {
case Nil => Nil
case rest =>
val (thesame,others) = zame(rest)
thesame :: q(others)
}
q(str.toList) map (_.mkString)
This should do the trick, right? No doubt it can be cleaned up into one-liners even further
A functional* solution using fold:
def group(s : String) : Seq[String] = {
s.tail.foldLeft(Seq(s.head.toString)) { case (carry, elem) =>
if ( carry.last(0) == elem ) {
carry.init :+ (carry.last + elem)
}
else {
carry :+ elem.toString
}
}
}
There is a lot of cost hidden in all those sequence operations performed on strings (via implicit conversion). I guess the real complexity heavily depends on the kind of Seq strings are converted to.
(*) Afaik all/most operations in the collection library depend in iterators, an imho inherently unfunctional concept. But the code looks functional, at least.
Starting Scala 2.13, List is now provided with the unfold builder which can be combined with String::span:
List.unfold("aaaabbaaacdeeffg") {
case "" => None
case rest => Some(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
or alternatively, coupled with Scala 2.13's Option#unless builder:
List.unfold("aaaabbaaacdeeffg") {
rest => Option.unless(rest.isEmpty)(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
Details:
Unfold (def unfold[A, S](init: S)(f: (S) => Option[(A, S)]): List[A]) is based on an internal state (init) which is initialized in our case with "aaaabbaaacdeeffg".
For each iteration, we span (def span(p: (Char) => Boolean): (String, String)) this internal state in order to find the prefix containing the same symbol and produce a (String, String) tuple which contains the prefix and the rest of the string. span is very fortunate in this context as it produces exactly what unfold expects: a tuple containing the next element of the list and the new internal state.
The unfolding stops when the internal state is "" in which case we produce None as expected by unfold to exit.
Edit: Have to read more carefully. Below is no functional code.
Sometimes, a little mutable state helps:
def group(s : String) = {
var tmp = ""
val b = Seq.newBuilder[String]
s.foreach { c =>
if ( tmp != "" && tmp.head != c ) {
b += tmp
tmp = ""
}
tmp += c
}
b += tmp
b.result
}
Runtime O(n) (if segments have at most constant length) and tmp.+= probably creates the most overhead. Use a string builder instead for strict runtime in O(n).
group("aaaabbcddeeeeeeffg")
> Seq[String] = List(aaaa, bb, c, dd, eeeeee, ff, g)
If you want to use scala API you can use the built in function for that:
str.groupBy(c => c).values
Or if you mind it being sorted and in a list:
str.groupBy(c => c).values.toList.sorted

Tune Nested Loop in Scala

I was wondering if I can tune the following Scala code :
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
var listNoDuplicates: List[(Class1, Class2)] = Nil
for (outerIndex <- 0 until listOfTuple.size) {
if (outerIndex != listOfTuple.size - 1)
for (innerIndex <- outerIndex + 1 until listOfTuple.size) {
if (listOfTuple(i)._1.flag.equals(listOfTuple(j)._1.flag))
listNoDuplicates = listOfTuple(i) :: listNoDuplicates
}
}
listNoDuplicates
}
Usually if you have someting looking like:
var accumulator: A = new A
for( b <- collection ) {
accumulator = update(accumulator, b)
}
val result = accumulator
can be converted in something like:
val result = collection.foldLeft( new A ){ (acc,b) => update( acc, b ) }
So here we can first use a map to force the unicity of flags. Supposing the flag has a type F:
val result = listOfTuples.foldLeft( Map[F,(ClassA,ClassB)] ){
( map, tuple ) => map + ( tuple._1.flag -> tuple )
}
Then the remaining tuples can be extracted from the map and converted to a list:
val uniqList = map.values.toList
It will keep the last tuple encoutered, if you want to keep the first one, replace foldLeft by foldRight, and invert the argument of the lambda.
Example:
case class ClassA( flag: Int )
case class ClassB( value: Int )
val listOfTuples =
List( (ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)), (ClassA(1),ClassB(-1)) )
val result = listOfTuples.foldRight( Map[Int,(ClassA,ClassB)]() ) {
( tuple, map ) => map + ( tuple._1.flag -> tuple )
}
val uniqList = result.values.toList
//uniqList: List((ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)))
Edit: If you need to retain the order of the initial list, use instead:
val uniqList = listOfTuples.filter( result.values.toSet )
This compiles, but as I can't test it it's hard to say if it does "The Right Thing" (tm):
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
(for {outerIndex <- 0 until listOfTuple.size
if outerIndex != listOfTuple.size - 1
innerIndex <- outerIndex + 1 until listOfTuple.size
if listOfTuple(i)._1.flag == listOfTuple(j)._1.flag
} yield listOfTuple(i)).reverse.toList
Note that you can use == instead of equals (use eq if you need reference equality).
BTW: https://codereview.stackexchange.com/ is better suited for this type of question.
Do not use index with lists (like listOfTuple(i)). Index on lists have very lousy performance. So, some ways...
The easiest:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
SortedSet(listOfTuple: _*)(Ordering by (_._1.flag)).toList
This will preserve the last element of the list. If you want it to preserve the first element, pass listOfTuple.reverse instead. Because of the sorting, performance is, at best, O(nlogn). So, here's a faster way, using a mutable HashSet:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
// Produce a hash map to find the duplicates
import scala.collection.mutable.HashSet
val seen = HashSet[Flag]()
// now fold
listOfTuple.foldLeft(Nil: List[(Class1,Class2)]) {
case (acc, el) =>
val result = if (seen(el._1.flag)) acc else el :: acc
seen += el._1.flag
result
}.reverse
}
One can avoid using a mutable HashSet in two ways:
Make seen a var, so that it can be updated.
Pass the set along with the list being created in the fold. The case then becomes:
case ((seen, acc), el) =>