I need to iterate over a list and compare the previous element in the list to current. Being from traditional programming background, I used for loop with a variable to hold the previous value but with Scala I believe there should be a better way to do it.
for (item <- dataList)
{
// Code to compare previous list element
prevElement = item
}
The code above works and give correct results but I wanted to explore the functional way of writing the same code.
There are a number of different possible solutions to this. Perhaps the most generic is to use a helper function:
// This example finds the minimum value (the hard way) in a list of integers. It's
// not looking at the previous element, as such, but you can use the same approach.
// I wanted to make the example something meaningful.
def findLowest(xs: List[Int]): Option[Int] = {
#tailrec // Guarantee tail-recursive implementation for safety & performance.
def helper(lowest: Int, rem: List[Int]): Int = {
// If there are no elements left, return lowest found.
if(rem.isEmpty) lowest
// Otherwise, determine new lowest value for next iteration. (If you need the
// previous value, you could put that here.)
else {
val newLow = Math.min(lowest, rem.head)
helper(newLow, rem.tail)
}
}
// Start off looking at the first member, guarding against an empty list.
xs match {
case x :: rem => Some(helper(x, rem))
case _ => None
}
}
In this particular case, this can be replaced by a fold.
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.fold(h)((b, m) => Math.min(b, m)))
case _ => None
}
which can be simplified further to just:
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.foldLeft(h)(Math.min))
case _ => None
}
In this case, we're using the zero argument of the fold operation to store the minimum value.
If this seems too abstract, can you post more details about what you would like to do in your loop, and I can address that in my answer?
Update: OK, so having seen your comment about the dates, here's a more specific version. I've used DateRange (which you will need to replace with your actual type) in this example. You will also have to determine what overlap and merge need to do. But here's your solution:
// Determine whether two date ranges overlap. Return true if so; false otherwise.
def overlap(a: DateRange, b: DateRange): Boolean = { ... }
// Merge overlapping a & b date ranges, return new range.
def merge(a: DateRange, b: DateRange): DateRange = { ... }
// Convert list of date ranges into another list, in which all consecutive,
// overlapping ranges are merged into a single range.
def mergeDateRanges(dr: List[DateRange]): List[DateRange] = {
// Helper function. Builds a list of merged date ranges.
def helper(prev: DateRange, ret: List[DateRange], rem: List[DateRange]):
List[DateRange] = {
// If there are no more date ranges to process, prepend the previous value
// to the list that we'll return.
if(rem.isEmpty) prev :: ret
// Otherwise, determine if the previous value overlaps with the current
// head value. If it does, create a new previous value that is the merger
// of the two and leave the returned list alone; if not, prepend the
// previous value to the returned list and make the previous value the
// head value.
else {
val (newPrev, newRet) = if(overlap(prev, rem.head)) {
(merge(prev, rem.head), ret)
}
else (rem.head, prev :: ret)
// Next iteration of the helper (I missed this off, originally, sorry!)
helper(newPrev, newRet, rem.tail)
}
}
// Start things off, trapping empty list case. Because the list is generated by pre-pending elements, we have to reverse it to preserve the order.
dr match {
case h :: rem => helper(h, Nil, rem).reverse // <- Fixed bug here too...
case _ => Nil
}
}
If you're looking to just compare the two sequential elements and get the boolean result:
val list = Seq("one", "two", "three", "one", "one")
val result = list.sliding(2).collect { case Seq(a,b) => a.equals(b)}.toList
The previous will give you this
List(false, false, false, true)
An alternative to the previous one is that you might want to get the value:
val result = list.sliding(2).collect {
case Seq(a,b) => if (a.equals(b)) Some(a) else None
case _ => None
}.toList.filter(_.isDefined).map(_.get)
This will give the following:
List(one)
But if you're just looking to compare items in list, an option would be to do something like this:
val list = Seq("one", "two", "three", "one")
for {
item1 <- list
item2 <- list
_ <- Option(println(item1 + ","+ item2))
} yield ()
This results in:
one,one
one,two
one,three
one,one
two,one
two,two
two,three
two,one
three,one
three,two
three,three
three,one
one,one
one,two
one,three
one,one
If you are not mutating the list, and are only trying to compare two elements side-by-side.
This can help you. It's functional.
def cmpPrevElement[T](l: List[T]): List[(T,T)] = {
(l.indices).sliding(2).toList.map(e => (l(e.head), l(e.tail.head)))
}
Related
I am attempting to sort a List of names using Scala, and I am trying to learn how to do this recursively. The List is a List of Lists, with the "element" list containing two items (lastName, firstName). My goal is to understand how to use recursion to sort the names. For the purpose of this post my goal is just to sort by the length of lastName.
If I call my function several times on a small sample list, it will successfully sort lastName by length from shortest to longest, but I have not been able to construct a satisfactory exit condition using recursion. I have tried variations of foreach and other loops, but I have been unsuccessful. Without a satisfactory exit condition, the recursion just continues forever.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
val nameListBuffer = new ListBuffer[List[String]]
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
for (line <- lines) {
nameListBuffer += line.split(" ").reverse.toList
}
#tailrec
def sorting(x: ListBuffer[List[String]]): Unit = {
for (i <- 0 until ((x.length)-1)) {
var temp = x(i)
if (x(i)(0).length > x(i+1)(0).length) {
x(i) = x(i+1)
x(i+1) = temp
}
}
var continue = false
while (continue == false) {
for (i <- 0 until ((x.length)-1)) {
if (x(i)(0).length <= x(i+1)(0).length) {
continue == false//want to check ALL i's before evaluating
}
else continue == true
}
}
sorting(x)
}
sorting(nameListBuffer)
Sorry about the runtime complexity it's basically an inefficient bubble sort at O(n^4) but the exit criteria - focus on that. For tail recursion, the key is that the recursive call is to a smaller element than the preceding recursive call. Also, keep two arguments, one is the original list, and one is the list that you are accumulating (or whatever you want to return, it doesn't have to be a list). The recursive call keeps getting smaller until eventually you can return what you have accumulated. Use pattern matching to catch when the recursion has ended, and then you return what you were accumulating. This is why Lists are so popular in Scala, because of the Nil and Cons subtypes and because of operators like the :: can be handled nicely with pattern matching. One more thing, to be tail recursive, the last case has to make a recursive or it won't run.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
// I did not ingest from file I just created the test list from some literals
val dummyNameList = List(
List("Johanson", "John"), List("Nim", "Bryan"), List("Mack", "Craig")
, List("Youngs", "Daniel"), List("Williamson", "Zion"), List("Rodgersdorf", "Aaron"))
// You can use this code to populate nameList though I didn't run this code
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
val nameList = {
for (line <- lines) yield line.split(" ").reverse.toList
}.toList
println("\nsorted:")
sortedNameList.foreach(println(_))
//This take one element and it will return the lowest element from the list
//of the other argument.
#tailrec
private def swapElem(elem: List[String], listOfLists: List[List[String]]): List[String] = listOfLists match {
case Nil => elem
case h::t if (elem(0).length > h(0).length) => swapElem(h, t)
case h::t => swapElem(elem, t)
}
//If the head is not the smallest element, then swap out the element
//with the smallest element of the list. I probably could have returned
// a tuple it might have looked nicer. It just keeps iterating though until
// there is no elements
#tailrec
private def iterate(listOfLists: List[List[String]], acc: List[List[String]]): List[List[String]] = listOfLists match {
case h::Nil => acc :+ h
case h::t if (swapElem(h, t) != h) => iterate(h :: t.filterNot(_ == swapElem(h, t)), acc :+ swapElem(h, t))
case h::t => iterate(t, acc :+ swapElem(h, t))
}
val sortedNameList = iterate(nameList, List.empty[List[String]])
println("\nsorted:")
sortedNameList.foreach(println(_))
sorted:
List(Nim, Bryan)
List(Mack, Craig)
List(Youngs, Daniel)
List(Johanson, John)
List(Williamson, Zion)
List(Rodgersdorf, Aaron)
I might have something like this:
val found = source.toCharArray.foreach{ c =>
// Process char c
// Sometimes (e.g. on newline) I want to emit a result to be
// captured in 'found'. There may be 0 or more captured results.
}
This shows my intent. I want to iterate over some collection of things. Whenever the need arrises I want to "emit" a result to be captured in found. It's not a direct 1-for-1 like map. collect() is a "pull", applying a partial function over the collection. I want a "push" behavior, where I visit everything but push out something when needed.
Is there a pattern or collection method I'm missing that does this?
Apparently, you have a Collection[Thing], and you want to obtain a new Collection[Event] by emitting a Collection[Event] for each Thing. That is, you want a function
(Collection[Thing], Thing => Collection[Event]) => Collection[Event]
That's exactly what flatMap does.
You can write it down with nested fors where the second generator defines what "events" have to be "emitted" for each input from the source. For example:
val input = "a2ba4b"
val result = (for {
c <- input
emitted <- {
if (c == 'a') List('A')
else if (c.isDigit) List.fill(c.toString.toInt)('|')
else Nil
}
} yield emitted).mkString
println(result)
prints
A||A||||
because each 'a' emits an 'A', each digit emits the right amount of tally marks, and all other symbols are ignored.
There are several other ways to express the same thing, for example, the above expression could also be rewritten with an explicit flatMap and with a pattern match instead of if-else:
println(input.flatMap{
case 'a' => "A"
case d if d.isDigit => "|" * (d.toString.toInt)
case _ => ""
})
I think you are looking for a way to build a Stream for your condition. Streams are lazy and are computed only when required.
val sourceString = "sdfdsdsfssd\ndfgdfgd\nsdfsfsggdfg\ndsgsfgdfgdfg\nsdfsffdg\nersdff\n"
val sourceStream = sourceString.toCharArray.toStream
def foundStreamCreator( source: Stream[Char], emmitBoundaryFunction: Char => Boolean): Stream[String] = {
def loop(sourceStream: Stream[Char], collector: List[Char]): Stream[String] =
sourceStream.isEmpty match {
case true => collector.mkString.reverse #:: Stream.empty[String]
case false => {
val char = sourceStream.head
emmitBoundaryFunction(char) match {
case true =>
collector.mkString.reverse #:: loop(sourceStream.tail, List.empty[Char])
case false =>
loop(sourceStream.tail, char :: collector)
}
}
}
loop(source, List.empty[Char])
}
val foundStream = foundStreamCreator(sourceStream, c => c == '\n')
val foundIterator = foundStream.toIterator
foundIterator.next()
// res0: String = sdfdsdsfssd
foundIterator.next()
// res1: String = dfgdfgd
foundIterator.next()
// res2: String = sdfsfsggdfg
It looks like foldLeft to me:
val found = ((List.empty[String], "") /: source.toCharArray) {case ((agg, tmp), char) =>
if (char == '\n') (tmp :: agg, "") // <- emit
else (agg, tmp + char)
}._1
Where you keep collecting items in a temporary location and then emit it when you run into a character signifying something. Since I used List you'll have to reverse at the end if you want it in order.
I just started working with scala and am trying to get used to the language. I was wondering if the following is possible:
I have a list of Instruction objects that I am looping over with the foreach method. Am I able to add elements to my Instruction list while I am looping over it? Here is a code example to explain what I want:
instructions.zipWithIndex.foreach { case (value, index) =>
value match {
case WhileStmt() => {
---> Here I want to add elements to the instructions list.
}
case IfStmt() => {
...
}
_ => {
...
}
Idiomatic way would be something like this for rather complex iteration and replacement logic:
#tailrec
def idiomaticWay(list: List[Instruction],
acc: List[Instruction] = List.empty): List[Instruction] =
list match {
case WhileStmt() :: tail =>
// add element to head of acc
idiomaticWay(tail, CherryOnTop :: acc)
case IfStmt() :: tail =>
// add nothing here
idiomaticWay(tail, list.head :: acc)
case Nil => acc
}
val updatedList = idiomaticWay(List(WhileStmt(), IfStmt()))
println(updatedList) // List(IfStmt(), CherryOnTop)
This solution works with immutable list, returns immutable list which has different values in it according to your logic.
If you want to ultimately hack around (add, remove, etc) you could use Java ListIterator class that would allow you to do all operations mentioned above:
def hackWay(list: util.List[Instruction]): Unit = {
val iterator = list.listIterator()
while(iterator.hasNext) {
iterator.next() match {
case WhileStmt() =>
iterator.set(CherryOnTop)
case IfStmt() => // do nothing here
}
}
}
import collection.JavaConverters._
val instructions = new util.ArrayList[Instruction](List(WhileStmt(), IfStmt()).asJava)
hackWay(instructions)
println(instructions.asScala) // Buffer(CherryOnTop, IfStmt())
However in the second case you do not need scala :( So my advise would be to stick to immutable data structures in scala.
Trying to get a handle on pattern matching here-- coming from a C++/Java background it's very foreign to me.
The point of this branch is to check each member of a List d of tuples [format of (string,object). I want to define three cases.
1) If the counter in this function is larger than the size of the list (defined in another called acc), I want to return nothing (because there is no match)
2) If the key given in the input matches a tuple in the list, I want to return its value (or, whatever is stored in the tuple._2).
3) If there is no match, and there is still more list to iterate, increment and continue.
My code is below:
def get(key:String):Option[Any] = {
var counter: Int = 0
val flag: Boolean = false
x match {
case (counter > acc) => None
case ((d(counter)._1) == key) => d(counter)._2
case _ => counter += 1
}
My issue here is while the first case seems to compile correctly, the second throws an error:
:36: error: ')' expected but '.' found.
case ((d(counter)._1) == key) => d(counter)._2
The third as well:
scala> case _ => counter += 1
:1: error: illegal start of definition
But I assume it's because the second isn't correct. My first thought is that I'm not comparing tuples correctly, but I seem to be following the syntax for indexing into a tuple, so I'm stumped. Can anyone steer me in the right direction?
Hopefully a few things to clear up your confusion:
Matching in scala follows this general template:
x match {
case SomethingThatXIs if(SomeCondition) => SomeExpression
// rinse and repeat
// note that `if(SomeCondition)` is optional
}
It looks like you may have attempted to use the match/case expression as more of an if/else if/else kind of block, and as far as I can tell, the x doesn't really matter within said block. If that's the case, you might be fine with something like
case _ if (d(counter)._1 == key) => d(counter)._2
BUT
Some info on Lists in scala. You should always think of it like a LinkedList, where indexed lookup is an O(n) operation. Lists can be matched with a head :: tail format, and Nil is an empty list. For example:
val myList = List(1,2,3,4)
myList match {
case first :: theRest =>
// first is 1, theRest is List(2,3,4), which you can also express as
// 2 :: 3 :: 4 :: Nil
case Nil =>
// an empty list case
}
It looks like you're constructing a kind of ListMap, so I'll write up a more "functional"/"recursive" way of implementing your get method.
I'll assume that d is the backing list, of type List[(String, Any)]
def get(key: String): Option[Any] = {
def recurse(key: String, list: List[(String, Any)]): Option[Any] = list match {
case (k, value) :: _ if (key == k) => Some(value)
case _ :: theRest => recurse(key, theRest)
case Nil => None
}
recurse(key, d)
}
The three case statements can be explained as follows:
1) The first element in list is a tuple of (k, value). The rest of the list is matched to the _ because we don't care about it in this case. The condition asks if k is equal to the key we are looking for. In this case, we want to return the value from the tuple.
2) Since the first element didn't have the right key, we want to recurse. We don't care about the first element, but we want the rest of the list so that we can recurse with it.
3) case Nil means there's nothing in the list, which should mark "failure" and the end of the recursion. In this case we return None. Consider this the same as your counter > acc condition from your question.
Please don't hesitate to ask for further explanation; and if I've accidentally made a mistake (won't compile, etc), point it out and I will fix it.
I'm assuming that conditionally extracting part of a tuple from a list of tuples is the important part of your question, excuse me if I'm wrong.
First an initial point, in Scala we normally would use AnyRef instead of Object or, if worthwhile, we would use a type parameter which can increase reuse of the function or method and increase type safety.
The three cases you describe can be collapsed into two cases, the first case uses a guard (the if statement after the pattern match), the second case matches the entire non-empty list and searches for a match between each first tuple argument and the key, returning a Some[T] containing the second tuple argument of the matching tuple or None if no match occurred. The third case is not required as the find operation traverses (iterates over) the list.
The map operation after the find is used to extract the second tuple argument (map on an Option returns an Option), remove this operation and change the method's return type to Option[(String, T)] if you want the whole tuple returned.
def f[T](key: String, xs: List[(String, T)], initialCount: Int = 2): Option[T] = {
var counter = initialCount
xs match {
case l: List[(String, T)] if l.size < counter => None
case l: List[(String, T)] => l find {_._1 == key} map {_._2}
}
}
f("A", List(("A", 1), ("B", 2))) // Returns Some(1)
f("B", List(("A", 1), ("B", 2))) // Returns Some(2)
f("A", List(("A", 1))) // Returns None
f("C", List(("A", 1), ("B", 2))) // Returns None
f("C", Nil) // Returns None
First, why are you using a List for that reason? What you need is definitely a Map. Its get() returns None if key is not found and Some(value) if it is found in it.
Second, what is x in your example? Is it the list?
Third, you cannot write case (log) => .. where log is a logical condition, it is in the form of case _ if (log) => ... (as Rex Kerr already pinted out in his comment).
Fouth, you need a recursive function for this (simply increasing the counter will call this only on the second element).
So you'll need something like this (if still prefer sticking to List):
def get(l: List[Tuple2[String, String]], key: String): Option[String] = {
if (l.isEmpty) {
None
} else {
val act = l.head
act match {
case x if (act._1 == key) => Some(act._2)
case _ => get(l.tail, key)
}
}
}
One way is this
list.distinct.size != list.size
Is there any better way? It would have been nice to have a containsDuplicates method
Assuming "better" means "faster", see the alternative approaches benchmarked in this question, which seems to show some quicker methods (although note that distinct uses a HashSet and is already O(n)). YMMV of course, depending on specific test case, scala version etc. Probably any significant improvement over the "distinct.size" approach would come from an early-out as soon as a duplicate is found, but how much of a speed-up is actually obtained would depend strongly on how common duplicates actually are in your use-case.
If you mean "better" in that you want to write list.containsDuplicates instead of containsDuplicates(list), use an implicit:
implicit def enhanceWithContainsDuplicates[T](s:List[T]) = new {
def containsDuplicates = (s.distinct.size != s.size)
}
assert(List(1,2,2,3).containsDuplicates)
assert(!List("a","b","c").containsDuplicates)
You can also write:
list.toSet.size != list.size
But the result will be the same because distinct is already implemented with a Set. In both case the time complexity should be O(n): you must traverse the list and Set insertion is O(1).
I think this would stop as soon as a duplicate was found and is probably more efficient than doing distinct.size - since I assume distinct keeps a set as well:
#annotation.tailrec
def containsDups[A](list: List[A], seen: Set[A] = Set[A]()): Boolean =
list match {
case x :: xs => if (seen.contains(x)) true else containsDups(xs, seen + x)
case _ => false
}
containsDups(List(1,1,2,3))
// Boolean = true
containsDups(List(1,2,3))
// Boolean = false
I realize you asked for easy and I don't now that this version is, but finding a duplicate is also finding if there is an element that has been seen before:
def containsDups[A](list: List[A]): Boolean = {
list.iterator.scanLeft(Set[A]())((set, a) => set + a) // incremental sets
.zip(list.iterator)
.exists{ case (set, a) => set contains a }
}
#annotation.tailrec
def containsDuplicates [T] (s: Seq[T]) : Boolean =
if (s.size < 2) false else
s.tail.contains (s.head) || containsDuplicates (s.tail)
I didn't measure this, and think it is similar to huynhjl's solution, but a bit more simple to understand.
It returns early, if a duplicate is found, so I looked into the source of Seq.contains, whether this returns early - it does.
In SeqLike, 'contains (e)' is defined as 'exists (_ == e)', and exists is defined in TraversableLike:
def exists (p: A => Boolean): Boolean = {
var result = false
breakable {
for (x <- this)
if (p (x)) { result = true; break }
}
result
}
I'm curious how to speed things up with parallel collections on multi cores, but I guess it is a general problem with early-returning, while another thread will keep running, because it doesn't know, that the solution is already found.
Summary:
I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.
Note: This answer is a straight copy of the answer on a related question.
Details:
If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.
Here's the function:
def filterDupes[A](items: List[A]): (List[A], List[(A, Int)]) = {
def recursive(remaining: List[A], index: Int, accumulator: (List[A], List[(A, Int)])): (List[A], List[(A, Int)]) =
if (remaining.isEmpty)
accumulator
else
recursive(
remaining.tail
, index + 1
, if (accumulator._1.contains(remaining.head))
(accumulator._1, (remaining.head, index) :: accumulator._2)
else
(remaining.head :: accumulator._1, accumulator._2)
)
val (distinct, dupes) = recursive(items, 0, (Nil, Nil))
(distinct.reverse, dupes.reverse)
}
An below is an example which might make it a bit more intuitive. Given this List of String values:
val withDupes =
List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
...and then performing the following:
val (deduped, dupeAndIndexs) =
filterDupes(withDupes)
...the results are:
deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:
val dupesOnly =
dupeAndIndexs.map(_._1).distinct
...or all in a single call:
val dupesOnly =
filterDupes(withDupes)._2.map(_._1).distinct
...or if a Set is preferred, skip distinct and invoke toSet...
val dupesOnly2 =
dupeAndIndexs.map(_._1).toSet
...or all in a single call:
val dupesOnly2 =
filterDupes(withDupes)._2.map(_._1).toSet
This is a straight copy of the filterDupes function out of my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.
If you're trying to check for duplicates in a test then ScalaTest can be helpful.
import org.scalatest.Inspectors._
import org.scalatest.Matchers._
forEvery(list.distinct) { item =>
withClue(s"value $item, the number of occurences") {
list.count(_ == item) shouldBe 1
}
}
// example:
scala> val list = List(1,2,3,4,3,2)
list: List[Int] = List(1, 2, 3, 4, 3, 2)
scala> forEvery(list) { item => withClue(s"value $item, the number of occurences") { list.count(_ == item) shouldBe 1 } }
org.scalatest.exceptions.TestFailedException: forEvery failed, because:
at index 1, value 2, the number of occurences 2 was not equal to 1 (<console>:19),
at index 2, value 3, the number of occurences 2 was not equal to 1 (<console>:19)
in List(1, 2, 3, 4)