Related
I need to iterate over a list and compare the previous element in the list to current. Being from traditional programming background, I used for loop with a variable to hold the previous value but with Scala I believe there should be a better way to do it.
for (item <- dataList)
{
// Code to compare previous list element
prevElement = item
}
The code above works and give correct results but I wanted to explore the functional way of writing the same code.
There are a number of different possible solutions to this. Perhaps the most generic is to use a helper function:
// This example finds the minimum value (the hard way) in a list of integers. It's
// not looking at the previous element, as such, but you can use the same approach.
// I wanted to make the example something meaningful.
def findLowest(xs: List[Int]): Option[Int] = {
#tailrec // Guarantee tail-recursive implementation for safety & performance.
def helper(lowest: Int, rem: List[Int]): Int = {
// If there are no elements left, return lowest found.
if(rem.isEmpty) lowest
// Otherwise, determine new lowest value for next iteration. (If you need the
// previous value, you could put that here.)
else {
val newLow = Math.min(lowest, rem.head)
helper(newLow, rem.tail)
}
}
// Start off looking at the first member, guarding against an empty list.
xs match {
case x :: rem => Some(helper(x, rem))
case _ => None
}
}
In this particular case, this can be replaced by a fold.
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.fold(h)((b, m) => Math.min(b, m)))
case _ => None
}
which can be simplified further to just:
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.foldLeft(h)(Math.min))
case _ => None
}
In this case, we're using the zero argument of the fold operation to store the minimum value.
If this seems too abstract, can you post more details about what you would like to do in your loop, and I can address that in my answer?
Update: OK, so having seen your comment about the dates, here's a more specific version. I've used DateRange (which you will need to replace with your actual type) in this example. You will also have to determine what overlap and merge need to do. But here's your solution:
// Determine whether two date ranges overlap. Return true if so; false otherwise.
def overlap(a: DateRange, b: DateRange): Boolean = { ... }
// Merge overlapping a & b date ranges, return new range.
def merge(a: DateRange, b: DateRange): DateRange = { ... }
// Convert list of date ranges into another list, in which all consecutive,
// overlapping ranges are merged into a single range.
def mergeDateRanges(dr: List[DateRange]): List[DateRange] = {
// Helper function. Builds a list of merged date ranges.
def helper(prev: DateRange, ret: List[DateRange], rem: List[DateRange]):
List[DateRange] = {
// If there are no more date ranges to process, prepend the previous value
// to the list that we'll return.
if(rem.isEmpty) prev :: ret
// Otherwise, determine if the previous value overlaps with the current
// head value. If it does, create a new previous value that is the merger
// of the two and leave the returned list alone; if not, prepend the
// previous value to the returned list and make the previous value the
// head value.
else {
val (newPrev, newRet) = if(overlap(prev, rem.head)) {
(merge(prev, rem.head), ret)
}
else (rem.head, prev :: ret)
// Next iteration of the helper (I missed this off, originally, sorry!)
helper(newPrev, newRet, rem.tail)
}
}
// Start things off, trapping empty list case. Because the list is generated by pre-pending elements, we have to reverse it to preserve the order.
dr match {
case h :: rem => helper(h, Nil, rem).reverse // <- Fixed bug here too...
case _ => Nil
}
}
If you're looking to just compare the two sequential elements and get the boolean result:
val list = Seq("one", "two", "three", "one", "one")
val result = list.sliding(2).collect { case Seq(a,b) => a.equals(b)}.toList
The previous will give you this
List(false, false, false, true)
An alternative to the previous one is that you might want to get the value:
val result = list.sliding(2).collect {
case Seq(a,b) => if (a.equals(b)) Some(a) else None
case _ => None
}.toList.filter(_.isDefined).map(_.get)
This will give the following:
List(one)
But if you're just looking to compare items in list, an option would be to do something like this:
val list = Seq("one", "two", "three", "one")
for {
item1 <- list
item2 <- list
_ <- Option(println(item1 + ","+ item2))
} yield ()
This results in:
one,one
one,two
one,three
one,one
two,one
two,two
two,three
two,one
three,one
three,two
three,three
three,one
one,one
one,two
one,three
one,one
If you are not mutating the list, and are only trying to compare two elements side-by-side.
This can help you. It's functional.
def cmpPrevElement[T](l: List[T]): List[(T,T)] = {
(l.indices).sliding(2).toList.map(e => (l(e.head), l(e.tail.head)))
}
Item is a custom type.
I have a Iterable of pairs (Item, Item). The first element in every pair is the same, so I want to reduce the list to a single pair of type (Item, Array[Item])
// list: Iterable[(Item, Item)]
// First attempt
val res = list.foldLeft((null, Array[Item]()))((p1,p2) => {
(p2._1, p1._2 :+ p2._2)
}
// Second attempt
val r = list.unzip
val res = (r._1.head, r._2.toArray))
1. I don't know how to correctly setup the zero value in the first ("foldLeft") solution. Is there any way to do something like this?
2. Other than the second solution, is there a better way to reduce a list of custom object tuples to single tuple ?
If you are sure the first element in every pair is the same, why don't you use that information to simplify?
(list.head._1, list.map(_._2))
should do the work
if there are other cases where the first element is different, you may want to try:
list.groupBy(_._1).map { case (common, lst) => (common, lst.map(_._2)) }
While trying to understand the differences between streams, iterators, and views of collections, I stumbled upon the following strange behavior.
Here the code (map and filter simply print their input and forward it unchanged):
object ArrayViewTest {
def main(args: Array[String]) {
val array = Array.range(1,10)
print("stream-map-head: ")
array.toStream.map(x => {print(x); x}).head
print("\nstream-filter-head: ")
array.toStream.filter(x => {print(x); true}).head
print("\niterator-map-head: ")
array.iterator.map(x => {print(x); x}).take(1).toArray
print("\niterator-filter-head: ")
array.iterator.filter(x => {print(x); true}).take(1).toArray
print("\nview-map-head: ")
array.view.map(x => {print(x); x}).head
print("\nview-filter-head: ")
array.view.filter(x => {print(x); true}).head
}
}
And its output:
stream-map-head: 1
stream-filter-head: 1
iterator-map-head: 1
iterator-filter-head: 1
view-map-head: 1
view-filter-head: 123456789 // <------ WHY ?
Why does filter called on a view process the whole array?
I expected that the evaluation of filter is driven only once by calling head, just as in all other cases, in particular just as in using map on view.
Which insight am I missing ?
(Mini-side-question for a comment, why is there no head on an iterator?)
edit:
The same strange behavior (as here for scala.Array.range(1,10)) is achieved by scala.collection.mutable.ArraySeq.range(1,10), scala.collection.mutable.ArrayBuffer.range(1,10), and scala.collection.mutable.StringBuilder.newBuilder.append("123456789").
However, for all other mutable collections, and all immutable collections, the filter on the view works as expected and outputs 1.
It seems the head uses isEmpty
trait IndexedSeqOptimized[+A, +Repr] extends Any with IndexedSeqLike[A, Repr] { self =>
...
override /*IterableLike*/
def head: A = if (isEmpty) super.head else this(0)
And isEmpty uses length
trait IndexedSeqOptimized[+A, +Repr] extends Any with IndexedSeqLike[A, Repr] { self =>
...
override /*IterableLike*/
def isEmpty: Boolean = { length == 0 }
The implementation of length is used from Filtered which loops through the whole array
trait Filtered extends super.Filtered with Transformed[A] {
protected[this] lazy val index = {
var len = 0
val arr = new Array[Int](self.length)
for (i <- 0 until self.length)
if (pred(self(i))) {
arr(len) = i
len += 1
}
arr take len
}
def length = index.length
def apply(idx: Int) = self(index(idx))
}
The Filtered trait is only used when calling filter
protected override def newFiltered(p: A => Boolean): Transformed[A] =
new { val pred = p } with AbstractTransformed[A] with Filtered
This is why is happens when using filter and not when using map
I think it has to do that Array is a mutable indexed sequence. And it's view is also a mutable collection :) So when it creates a view it creates an index that maps between original collection and filtered collection. And it doesn't really make sense to create this index lazily, because when someone will request the ith element than the whole source array may be traversed anyway. It is still lazy in a sense that this index is not created until you call head. Still this is not explicitly stated in scala documentation, and it looks like a bug at first sight.
For the mini side question, I think the problem with head on iterator is that people expect head to be pure function, namely you should be able to call it n times and it should return the same result each time. And iterator is inherently mutable data structure, which by contract is only traversable once. This may be overcomed by caching the first element of the iterator, but I find this to be very confusing.
I have the following recursive function in Scala that should return the maximum size integer in the List. Is anyone able to tell me why the largest value is not returned?
def max(xs: List[Int]): Int = {
var largest = xs.head
println("largest: " + largest)
if (!xs.tail.isEmpty) {
var next = xs.tail.head
println("next: " + next)
largest = if (largest > next) largest else next
var remaining = List[Int]()
remaining = largest :: xs.tail.tail
println("remaining: " + remaining)
max(remaining)
}
return largest
}
Print out statements show me that I've successfully managed to bring back the largest value in the List as the head (which was what I wanted) but the function still returns back the original head in the list. I'm guessing this is because the reference for xs is still referring to the original xs list, problem is I can't override that because it's a val.
Any ideas what I'm doing wrong?
You should use the return value of the inner call to max and compare that to the local largest value.
Something like the following (removed println just for readability):
def max(xs: List[Int]): Int = {
var largest = xs.head
if (!xs.tail.isEmpty) {
var remaining = List[Int]()
remaining = largest :: xs.tail
var next = max(remaining)
largest = if (largest > next) largest else next
}
return largest
}
Bye.
I have an answer to your question but first...
This is the most minimal recursive implementation of max I've ever been able to think up:
def max(xs: List[Int]): Option[Int] = xs match {
case Nil => None
case List(x: Int) => Some(x)
case x :: y :: rest => max( (if (x > y) x else y) :: rest )
}
(OK, my original version was ever so slightly more minimal but I wrote that in Scheme which doesn't have Option or type safety etc.) It doesn't need an accumulator or a local helper function because it compares the first two items on the list and discards the smaller, a process which - performed recursively - inevitably leaves you with a list of just one element which must be bigger than all the rest.
OK, why your original solution doesn't work... It's quite simple: you do nothing with the return value from the recursive call to max. All you had to do was change the line
max(remaining)
to
largest = max(remaining)
and your function would work. It wouldn't be the prettiest solution, but it would work. As it is, your code looks as if it assumes that changing the value of largest inside the recursive call will change it in the outside context from which it was called. But each new call to max creates a completely new version of largest which only exists inside that new iteration of the function. Your code then throws away the return value from max(remaining) and returns the original value of largest, which hasn't changed.
Another way to solve this would have been to use a local (inner) function after declaring var largest. That would have looked like this:
def max(xs: List[Int]): Int = {
var largest = xs.head
def loop(ys: List[Int]) {
if (!ys.isEmpty) {
var next = ys.head
largest = if (largest > next) largest else next
loop(ys.tail)
}
}
loop(xs.tail)
return largest
}
Generally, though, it is better to have recursive functions be entirely self-contained (that is, not to look at or change external variables but only at their input) and to return a meaningful value.
When writing a recursive solution of this kind, it often helps to think in reverse. Think first about what things are going to look like when you get to the end of the list. What is the exit condition? What will things look like and where will I find the value to return?
If you do this, then the case which you use to exit the recursive function (by returning a simple value rather than making another recursive call) is usually very simple. The other case matches just need to deal with a) invalid input and b) what to do if you are not yet at the end. a) is usually simple and b) can usually be broken down into just a few different situations, each with a simple thing to do before making another recursive call.
If you look at my solution, you'll see that the first case deals with invalid input, the second is my exit condition and the third is "what to do if we're not at the end".
In many other recursive solutions, Nil is the natural end of the recursion.
This is the point at which I (as always) recommend reading The Little Schemer. It teaches you recursion (and basic Scheme) at the same time (both of which are very good things to learn).
It has been pointed out that Scala has some powerful functions which can help you avoid recursion (or hide the messy details of it), but to use them well you really do need to understand how recursion works.
The following is a typical way to solve this sort of problem. It uses an inner tail-recursive function that includes an extra "accumulator" value, which in this case will hold the largest value found so far:
def max(xs: List[Int]): Int = {
def go(xs: List[Int], acc: Int): Int = xs match {
case Nil => acc // We've emptied the list, so just return the final result
case x :: rest => if (acc > x) go(rest, acc) else go(rest, x) // Keep going, with remaining list and updated largest-value-so-far
}
go(xs, Int.MinValue)
}
Nevermind I've resolved the issue...
I finally came up with:
def max(xs: List[Int]): Int = {
var largest = 0
var remaining = List[Int]()
if (!xs.isEmpty) {
largest = xs.head
if (!xs.tail.isEmpty) {
var next = xs.tail.head
largest = if (largest > next) largest else next
remaining = largest :: xs.tail.tail
}
}
if (!remaining.tail.isEmpty) max(remaining) else xs.head
}
Kinda glad we have loops - this is an excessively complicated solution and hard to get your head around in my opinion. I resolved the problem by making sure the recursive call was the last statement in the function either that or xs.head is returned as the result if there isn't a second member in the array.
The most concise but also clear version I have ever seen is this:
def max(xs: List[Int]): Int = {
def maxIter(a: Int, xs: List[Int]): Int = {
if (xs.isEmpty) a
else a max maxIter(xs.head, xs.tail)
}
maxIter(xs.head, xs.tail)
}
This has been adapted from the solutions to a homework on the Scala official Corusera course: https://github.com/purlin/Coursera-Scala/blob/master/src/example/Lists.scala
but here I use the rich operator max to return the largest of its two operands. This saves having to redefine this function within the def max block.
What about this?
def max(xs: List[Int]): Int = {
maxRecursive(xs, 0)
}
def maxRecursive(xs: List[Int], max: Int): Int = {
if(xs.head > max && ! xs.isEmpty)
maxRecursive(xs.tail, xs.head)
else
max
}
What about this one ?
def max(xs: List[Int]): Int = {
var largest = xs.head
if( !xs.tail.isEmpty ) {
if(xs.head < max(xs.tail)) largest = max(xs.tail)
}
largest
}
My answer is using recursion is,
def max(xs: List[Int]): Int =
xs match {
case Nil => throw new NoSuchElementException("empty list is not allowed")
case head :: Nil => head
case head :: tail =>
if (head >= tail.head)
if (tail.length > 1)
max(head :: tail.tail)
else
head
else
max(tail)
}
}
One way is this
list.distinct.size != list.size
Is there any better way? It would have been nice to have a containsDuplicates method
Assuming "better" means "faster", see the alternative approaches benchmarked in this question, which seems to show some quicker methods (although note that distinct uses a HashSet and is already O(n)). YMMV of course, depending on specific test case, scala version etc. Probably any significant improvement over the "distinct.size" approach would come from an early-out as soon as a duplicate is found, but how much of a speed-up is actually obtained would depend strongly on how common duplicates actually are in your use-case.
If you mean "better" in that you want to write list.containsDuplicates instead of containsDuplicates(list), use an implicit:
implicit def enhanceWithContainsDuplicates[T](s:List[T]) = new {
def containsDuplicates = (s.distinct.size != s.size)
}
assert(List(1,2,2,3).containsDuplicates)
assert(!List("a","b","c").containsDuplicates)
You can also write:
list.toSet.size != list.size
But the result will be the same because distinct is already implemented with a Set. In both case the time complexity should be O(n): you must traverse the list and Set insertion is O(1).
I think this would stop as soon as a duplicate was found and is probably more efficient than doing distinct.size - since I assume distinct keeps a set as well:
#annotation.tailrec
def containsDups[A](list: List[A], seen: Set[A] = Set[A]()): Boolean =
list match {
case x :: xs => if (seen.contains(x)) true else containsDups(xs, seen + x)
case _ => false
}
containsDups(List(1,1,2,3))
// Boolean = true
containsDups(List(1,2,3))
// Boolean = false
I realize you asked for easy and I don't now that this version is, but finding a duplicate is also finding if there is an element that has been seen before:
def containsDups[A](list: List[A]): Boolean = {
list.iterator.scanLeft(Set[A]())((set, a) => set + a) // incremental sets
.zip(list.iterator)
.exists{ case (set, a) => set contains a }
}
#annotation.tailrec
def containsDuplicates [T] (s: Seq[T]) : Boolean =
if (s.size < 2) false else
s.tail.contains (s.head) || containsDuplicates (s.tail)
I didn't measure this, and think it is similar to huynhjl's solution, but a bit more simple to understand.
It returns early, if a duplicate is found, so I looked into the source of Seq.contains, whether this returns early - it does.
In SeqLike, 'contains (e)' is defined as 'exists (_ == e)', and exists is defined in TraversableLike:
def exists (p: A => Boolean): Boolean = {
var result = false
breakable {
for (x <- this)
if (p (x)) { result = true; break }
}
result
}
I'm curious how to speed things up with parallel collections on multi cores, but I guess it is a general problem with early-returning, while another thread will keep running, because it doesn't know, that the solution is already found.
Summary:
I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.
Note: This answer is a straight copy of the answer on a related question.
Details:
If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.
Here's the function:
def filterDupes[A](items: List[A]): (List[A], List[(A, Int)]) = {
def recursive(remaining: List[A], index: Int, accumulator: (List[A], List[(A, Int)])): (List[A], List[(A, Int)]) =
if (remaining.isEmpty)
accumulator
else
recursive(
remaining.tail
, index + 1
, if (accumulator._1.contains(remaining.head))
(accumulator._1, (remaining.head, index) :: accumulator._2)
else
(remaining.head :: accumulator._1, accumulator._2)
)
val (distinct, dupes) = recursive(items, 0, (Nil, Nil))
(distinct.reverse, dupes.reverse)
}
An below is an example which might make it a bit more intuitive. Given this List of String values:
val withDupes =
List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
...and then performing the following:
val (deduped, dupeAndIndexs) =
filterDupes(withDupes)
...the results are:
deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:
val dupesOnly =
dupeAndIndexs.map(_._1).distinct
...or all in a single call:
val dupesOnly =
filterDupes(withDupes)._2.map(_._1).distinct
...or if a Set is preferred, skip distinct and invoke toSet...
val dupesOnly2 =
dupeAndIndexs.map(_._1).toSet
...or all in a single call:
val dupesOnly2 =
filterDupes(withDupes)._2.map(_._1).toSet
This is a straight copy of the filterDupes function out of my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.
If you're trying to check for duplicates in a test then ScalaTest can be helpful.
import org.scalatest.Inspectors._
import org.scalatest.Matchers._
forEvery(list.distinct) { item =>
withClue(s"value $item, the number of occurences") {
list.count(_ == item) shouldBe 1
}
}
// example:
scala> val list = List(1,2,3,4,3,2)
list: List[Int] = List(1, 2, 3, 4, 3, 2)
scala> forEvery(list) { item => withClue(s"value $item, the number of occurences") { list.count(_ == item) shouldBe 1 } }
org.scalatest.exceptions.TestFailedException: forEvery failed, because:
at index 1, value 2, the number of occurences 2 was not equal to 1 (<console>:19),
at index 2, value 3, the number of occurences 2 was not equal to 1 (<console>:19)
in List(1, 2, 3, 4)