Related
I have a type of set and union function as follow
type Set = Int => Boolean
def union(s: Set, t: Set): Set = (e: Int) => s(e) || t(e)
val xs = Set(12001,12002, 12003, 12004)
val ys = Set(13001,13002, 13003, 13004)
When i use the union operation,
union(xs,ys)
It should return me another set which contains all the elements of both sets xs and ys
Edited Section:
I am sorry i was not clear on my question, i have my own implementation of the iterator for both Set xs and ys
var i = xs.iterator;
while(i.hasNext)
println(i.next())
But i was not satisfied with this implementation and found that you can implement the condition with the function (after some googling) but i was unable to get it to work in my eclipse worksheet.
val rs = union(xs,ys) //> rs : Learn2.Set = <function1>
I am guessing it returns a function.
so my questions,
1. is it possible to implement as described above in the edited section? if so, then what am i missing to get it working?
2. I don't understand how the element e in (e: Int) => s(e) || t(e) is iterating over the elements in both the sets
Look at your Set type: Int => Boolean. So it takes an Int and returns a Boolean. What that means is that it is not a collection that you can iterate over to retrieve all its values, because it actually contains no values.
If you want to know what Int values return true then you have to iterate over the entire range of possible inputs (or some subset thereof) and filter for the condition you're looking for.
scala> val res = union(xs,ys)
res: Set = $$Lambda$1091/332405156#2c30c81d
scala> (0 to 20000).filter(res).foreach(println)
12001
12002
12003
12004
13001
13002
13003
13004
scala>
update
Your confusion stems from the fact that you've named your function after an existing collection in the standard library. xs.itorator works because xs is not an example of your Set, it is a Set from the standard library with all the associated methods. Rename your type alias to something like Xet and you'll see what I mean.
type Xet = Int => Boolean
def union(s: Xet, t: Xet): Xet = (e: Int) => s(e) || t(e)
val xx: Xet = _ == 12001
val yx: Xet = _ == 13002
val zx: Xet = union(xx, yx)
xx.itrerator // Error, won't compile
(1 to 20000).filter(zx).foreach(println) // output: 12001 & 13002
I'm working on Advent of Code's coding challenges and I'm on day one. I've read from a file that contains nothing but ((()(())(( so I'm looking to turn each '(' to 1 and each ')' to a -1 so I can compute them. But I'm having issues when I map findFloor over source. I'm getting a type mismatch. Everything looks right to me and that's the weird part because it's not working.
import scala.io._
object Advent1 extends App {
// Read from file
val source = Source.fromFile("floor1-Input.txt").toList
// Replace each '(' with 1 and each ')' with -1, return List[Int]
def findFloor(input: List[Char]):Int = input match {
case _ if input.contains('(') => 1
case _ if input.contains(')') => -1
}
val floor = source.map(findFloor)
}
Error output
error: type mismatch;
found : List[Char] => Int
required: Char => ?
val floor = source.map(findFloor)
^ one error found
What I'm I doing wrong here ? / what I'm I missing ?
Scala map works over an elements rather than whole collection. Try this:
val floor = source.map {
case '(' => 1
case ')' => -1
}.sum
If you want to compute them in an sequential way, you could even do the computations directly by using foldLeft.
val computation = source.foldLeft(0)( (a, b) => {
b match {
case '(' => a + 1
case ')' => a - 1
}
})
It's simply adding up all values and returning the acummulated value. For ( it's adding and for ')' it's substracting.
The first argument is the starting value, a is the value from the previous step and b is the actual element therefore the char.
You're error is probably caused because the pattern match is not defined for all chars. You missing a match for all others, case _ => 0 for example.
An other option would be to use 'collect' since this accepts a PartialFunction and ignores all not matching elements.
The suggested 'fold' solution is the better approach here I think.
What's the best way to terminate a fold early? As a simplified example, imagine I want to sum up the numbers in an Iterable, but if I encounter something I'm not expecting (say an odd number) I might want to terminate. This is a first approximation
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
nums.foldLeft (Some(0): Option[Int]) {
case (Some(s), n) if n % 2 == 0 => Some(s + n)
case _ => None
}
}
However, this solution is pretty ugly (as in, if I did a .foreach and a return -- it'd be much cleaner and clearer) and worst of all, it traverses the entire iterable even if it encounters a non-even number.
So what would be the best way to write a fold like this, that terminates early? Should I just go and write this recursively, or is there a more accepted way?
My first choice would usually be to use recursion. It is only moderately less compact, is potentially faster (certainly no slower), and in early termination can make the logic more clear. In this case you need nested defs which is a little awkward:
def sumEvenNumbers(nums: Iterable[Int]) = {
def sumEven(it: Iterator[Int], n: Int): Option[Int] = {
if (it.hasNext) {
val x = it.next
if ((x % 2) == 0) sumEven(it, n+x) else None
}
else Some(n)
}
sumEven(nums.iterator, 0)
}
My second choice would be to use return, as it keeps everything else intact and you only need to wrap the fold in a def so you have something to return from--in this case, you already have a method, so:
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
Some(nums.foldLeft(0){ (n,x) =>
if ((n % 2) != 0) return None
n+x
})
}
which in this particular case is a lot more compact than recursion (though we got especially unlucky with recursion since we had to do an iterable/iterator transformation). The jumpy control flow is something to avoid when all else is equal, but here it's not. No harm in using it in cases where it's valuable.
If I was doing this often and wanted it within the middle of a method somewhere (so I couldn't just use return), I would probably use exception-handling to generate non-local control flow. That is, after all, what it is good at, and error handling is not the only time it's useful. The only trick is to avoid generating a stack trace (which is really slow), and that's easy because the trait NoStackTrace and its child trait ControlThrowable already do that for you. Scala already uses this internally (in fact, that's how it implements the return from inside the fold!). Let's make our own (can't be nested, though one could fix that):
import scala.util.control.ControlThrowable
case class Returned[A](value: A) extends ControlThrowable {}
def shortcut[A](a: => A) = try { a } catch { case Returned(v) => v }
def sumEvenNumbers(nums: Iterable[Int]) = shortcut{
Option(nums.foldLeft(0){ (n,x) =>
if ((x % 2) != 0) throw Returned(None)
n+x
})
}
Here of course using return is better, but note that you could put shortcut anywhere, not just wrapping an entire method.
Next in line for me would be to re-implement fold (either myself or to find a library that does it) so that it could signal early termination. The two natural ways of doing this are to not propagate the value but an Option containing the value, where None signifies termination; or to use a second indicator function that signals completion. The Scalaz lazy fold shown by Kim Stebel already covers the first case, so I'll show the second (with a mutable implementation):
def foldOrFail[A,B](it: Iterable[A])(zero: B)(fail: A => Boolean)(f: (B,A) => B): Option[B] = {
val ii = it.iterator
var b = zero
while (ii.hasNext) {
val x = ii.next
if (fail(x)) return None
b = f(b,x)
}
Some(b)
}
def sumEvenNumbers(nums: Iterable[Int]) = foldOrFail(nums)(0)(_ % 2 != 0)(_ + _)
(Whether you implement the termination by recursion, return, laziness, etc. is up to you.)
I think that covers the main reasonable variants; there are some other options also, but I'm not sure why one would use them in this case. (Iterator itself would work well if it had a findOrPrevious, but it doesn't, and the extra work it takes to do that by hand makes it a silly option to use here.)
The scenario you describe (exit upon some unwanted condition) seems like a good use case for the takeWhile method. It is essentially filter, but should end upon encountering an element that doesn't meet the condition.
For example:
val list = List(2,4,6,8,6,4,2,5,3,2)
list.takeWhile(_ % 2 == 0) //result is List(2,4,6,8,6,4,2)
This will work just fine for Iterators/Iterables too. The solution I suggest for your "sum of even numbers, but break on odd" is:
list.iterator.takeWhile(_ % 2 == 0).foldLeft(...)
And just to prove that it's not wasting your time once it hits an odd number...
scala> val list = List(2,4,5,6,8)
list: List[Int] = List(2, 4, 5, 6, 8)
scala> def condition(i: Int) = {
| println("processing " + i)
| i % 2 == 0
| }
condition: (i: Int)Boolean
scala> list.iterator.takeWhile(condition _).sum
processing 2
processing 4
processing 5
res4: Int = 6
You can do what you want in a functional style using the lazy version of foldRight in scalaz. For a more in depth explanation, see this blog post. While this solution uses a Stream, you can convert an Iterable into a Stream efficiently with iterable.toStream.
import scalaz._
import Scalaz._
val str = Stream(2,1,2,2,2,2,2,2,2)
var i = 0 //only here for testing
val r = str.foldr(Some(0):Option[Int])((n,s) => {
println(i)
i+=1
if (n % 2 == 0) s.map(n+) else None
})
This only prints
0
1
which clearly shows that the anonymous function is only called twice (i.e. until it encounters the odd number). That is due to the definition of foldr, whose signature (in case of Stream) is def foldr[B](b: B)(f: (Int, => B) => B)(implicit r: scalaz.Foldable[Stream]): B. Note that the anonymous function takes a by name parameter as its second argument, so it need no be evaluated.
Btw, you can still write this with the OP's pattern matching solution, but I find if/else and map more elegant.
Well, Scala does allow non local returns. There are differing opinions on whether or not this is a good style.
scala> def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
| nums.foldLeft (Some(0): Option[Int]) {
| case (None, _) => return None
| case (Some(s), n) if n % 2 == 0 => Some(s + n)
| case (Some(_), _) => None
| }
| }
sumEvenNumbers: (nums: Iterable[Int])Option[Int]
scala> sumEvenNumbers(2 to 10)
res8: Option[Int] = None
scala> sumEvenNumbers(2 to 10 by 2)
res9: Option[Int] = Some(30)
EDIT:
In this particular case, as #Arjan suggested, you can also do:
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
nums.foldLeft (Some(0): Option[Int]) {
case (Some(s), n) if n % 2 == 0 => Some(s + n)
case _ => return None
}
}
You can use foldM from cats lib (as suggested by #Didac) but I suggest to use Either instead of Option if you want to get actual sum out.
bifoldMap is used to extract the result from Either.
import cats.implicits._
def sumEven(nums: Stream[Int]): Either[Int, Int] = {
nums.foldM(0) {
case (acc, n) if n % 2 == 0 => Either.right(acc + n)
case (acc, n) => {
println(s"Stopping on number: $n")
Either.left(acc)
}
}
}
examples:
println("Result: " + sumEven(Stream(2, 2, 3, 11)).bifoldMap(identity, identity))
> Stopping on number: 3
> Result: 4
println("Result: " + sumEven(Stream(2, 7, 2, 3)).bifoldMap(identity, identity))
> Stopping on number: 7
> Result: 2
Cats has a method called foldM which does short-circuiting (for Vector, List, Stream, ...).
It works as follows:
def sumEvenNumbers(nums: Stream[Int]): Option[Long] = {
import cats.implicits._
nums.foldM(0L) {
case (acc, c) if c % 2 == 0 => Some(acc + c)
case _ => None
}
}
If it finds a not even element it returns None without computing the rest, otherwise it returns the sum of the even entries.
If you want to keep count until an even entry is found, you should use an Either[Long, Long]
#Rex Kerr your answer helped me, but I needed to tweak it to use Either
def foldOrFail[A,B,C,D](map: B => Either[D, C])(merge: (A, C) => A)(initial: A)(it: Iterable[B]): Either[D, A] = {
val ii= it.iterator
var b= initial
while (ii.hasNext) {
val x= ii.next
map(x) match {
case Left(error) => return Left(error)
case Right(d) => b= merge(b, d)
}
}
Right(b)
}
You could try using a temporary var and using takeWhile. Here is a version.
var continue = true
// sample stream of 2's and then a stream of 3's.
val evenSum = (Stream.fill(10)(2) ++ Stream.fill(10)(3)).takeWhile(_ => continue)
.foldLeft(Option[Int](0)){
case (result,i) if i%2 != 0 =>
continue = false;
// return whatever is appropriate either the accumulated sum or None.
result
case (optionSum,i) => optionSum.map( _ + i)
}
The evenSum should be Some(20) in this case.
You can throw a well-chosen exception upon encountering your termination criterion, handling it in the calling code.
A more beutiful solution would be using span:
val (l, r) = numbers.span(_ % 2 == 0)
if(r.isEmpty) Some(l.sum)
else None
... but it traverses the list two times if all the numbers are even
Just for an "academic" reasons (:
var headers = Source.fromFile(file).getLines().next().split(",")
var closeHeaderIdx = headers.takeWhile { s => !"Close".equals(s) }.foldLeft(0)((i, S) => i+1)
Takes twice then it should but it is a nice one liner.
If "Close" not found it will return
headers.size
Another (better) is this one:
var headers = Source.fromFile(file).getLines().next().split(",").toList
var closeHeaderIdx = headers.indexOf("Close")
Trying to get a handle on pattern matching here-- coming from a C++/Java background it's very foreign to me.
The point of this branch is to check each member of a List d of tuples [format of (string,object). I want to define three cases.
1) If the counter in this function is larger than the size of the list (defined in another called acc), I want to return nothing (because there is no match)
2) If the key given in the input matches a tuple in the list, I want to return its value (or, whatever is stored in the tuple._2).
3) If there is no match, and there is still more list to iterate, increment and continue.
My code is below:
def get(key:String):Option[Any] = {
var counter: Int = 0
val flag: Boolean = false
x match {
case (counter > acc) => None
case ((d(counter)._1) == key) => d(counter)._2
case _ => counter += 1
}
My issue here is while the first case seems to compile correctly, the second throws an error:
:36: error: ')' expected but '.' found.
case ((d(counter)._1) == key) => d(counter)._2
The third as well:
scala> case _ => counter += 1
:1: error: illegal start of definition
But I assume it's because the second isn't correct. My first thought is that I'm not comparing tuples correctly, but I seem to be following the syntax for indexing into a tuple, so I'm stumped. Can anyone steer me in the right direction?
Hopefully a few things to clear up your confusion:
Matching in scala follows this general template:
x match {
case SomethingThatXIs if(SomeCondition) => SomeExpression
// rinse and repeat
// note that `if(SomeCondition)` is optional
}
It looks like you may have attempted to use the match/case expression as more of an if/else if/else kind of block, and as far as I can tell, the x doesn't really matter within said block. If that's the case, you might be fine with something like
case _ if (d(counter)._1 == key) => d(counter)._2
BUT
Some info on Lists in scala. You should always think of it like a LinkedList, where indexed lookup is an O(n) operation. Lists can be matched with a head :: tail format, and Nil is an empty list. For example:
val myList = List(1,2,3,4)
myList match {
case first :: theRest =>
// first is 1, theRest is List(2,3,4), which you can also express as
// 2 :: 3 :: 4 :: Nil
case Nil =>
// an empty list case
}
It looks like you're constructing a kind of ListMap, so I'll write up a more "functional"/"recursive" way of implementing your get method.
I'll assume that d is the backing list, of type List[(String, Any)]
def get(key: String): Option[Any] = {
def recurse(key: String, list: List[(String, Any)]): Option[Any] = list match {
case (k, value) :: _ if (key == k) => Some(value)
case _ :: theRest => recurse(key, theRest)
case Nil => None
}
recurse(key, d)
}
The three case statements can be explained as follows:
1) The first element in list is a tuple of (k, value). The rest of the list is matched to the _ because we don't care about it in this case. The condition asks if k is equal to the key we are looking for. In this case, we want to return the value from the tuple.
2) Since the first element didn't have the right key, we want to recurse. We don't care about the first element, but we want the rest of the list so that we can recurse with it.
3) case Nil means there's nothing in the list, which should mark "failure" and the end of the recursion. In this case we return None. Consider this the same as your counter > acc condition from your question.
Please don't hesitate to ask for further explanation; and if I've accidentally made a mistake (won't compile, etc), point it out and I will fix it.
I'm assuming that conditionally extracting part of a tuple from a list of tuples is the important part of your question, excuse me if I'm wrong.
First an initial point, in Scala we normally would use AnyRef instead of Object or, if worthwhile, we would use a type parameter which can increase reuse of the function or method and increase type safety.
The three cases you describe can be collapsed into two cases, the first case uses a guard (the if statement after the pattern match), the second case matches the entire non-empty list and searches for a match between each first tuple argument and the key, returning a Some[T] containing the second tuple argument of the matching tuple or None if no match occurred. The third case is not required as the find operation traverses (iterates over) the list.
The map operation after the find is used to extract the second tuple argument (map on an Option returns an Option), remove this operation and change the method's return type to Option[(String, T)] if you want the whole tuple returned.
def f[T](key: String, xs: List[(String, T)], initialCount: Int = 2): Option[T] = {
var counter = initialCount
xs match {
case l: List[(String, T)] if l.size < counter => None
case l: List[(String, T)] => l find {_._1 == key} map {_._2}
}
}
f("A", List(("A", 1), ("B", 2))) // Returns Some(1)
f("B", List(("A", 1), ("B", 2))) // Returns Some(2)
f("A", List(("A", 1))) // Returns None
f("C", List(("A", 1), ("B", 2))) // Returns None
f("C", Nil) // Returns None
First, why are you using a List for that reason? What you need is definitely a Map. Its get() returns None if key is not found and Some(value) if it is found in it.
Second, what is x in your example? Is it the list?
Third, you cannot write case (log) => .. where log is a logical condition, it is in the form of case _ if (log) => ... (as Rex Kerr already pinted out in his comment).
Fouth, you need a recursive function for this (simply increasing the counter will call this only on the second element).
So you'll need something like this (if still prefer sticking to List):
def get(l: List[Tuple2[String, String]], key: String): Option[String] = {
if (l.isEmpty) {
None
} else {
val act = l.head
act match {
case x if (act._1 == key) => Some(act._2)
case _ => get(l.tail, key)
}
}
}
One way is this
list.distinct.size != list.size
Is there any better way? It would have been nice to have a containsDuplicates method
Assuming "better" means "faster", see the alternative approaches benchmarked in this question, which seems to show some quicker methods (although note that distinct uses a HashSet and is already O(n)). YMMV of course, depending on specific test case, scala version etc. Probably any significant improvement over the "distinct.size" approach would come from an early-out as soon as a duplicate is found, but how much of a speed-up is actually obtained would depend strongly on how common duplicates actually are in your use-case.
If you mean "better" in that you want to write list.containsDuplicates instead of containsDuplicates(list), use an implicit:
implicit def enhanceWithContainsDuplicates[T](s:List[T]) = new {
def containsDuplicates = (s.distinct.size != s.size)
}
assert(List(1,2,2,3).containsDuplicates)
assert(!List("a","b","c").containsDuplicates)
You can also write:
list.toSet.size != list.size
But the result will be the same because distinct is already implemented with a Set. In both case the time complexity should be O(n): you must traverse the list and Set insertion is O(1).
I think this would stop as soon as a duplicate was found and is probably more efficient than doing distinct.size - since I assume distinct keeps a set as well:
#annotation.tailrec
def containsDups[A](list: List[A], seen: Set[A] = Set[A]()): Boolean =
list match {
case x :: xs => if (seen.contains(x)) true else containsDups(xs, seen + x)
case _ => false
}
containsDups(List(1,1,2,3))
// Boolean = true
containsDups(List(1,2,3))
// Boolean = false
I realize you asked for easy and I don't now that this version is, but finding a duplicate is also finding if there is an element that has been seen before:
def containsDups[A](list: List[A]): Boolean = {
list.iterator.scanLeft(Set[A]())((set, a) => set + a) // incremental sets
.zip(list.iterator)
.exists{ case (set, a) => set contains a }
}
#annotation.tailrec
def containsDuplicates [T] (s: Seq[T]) : Boolean =
if (s.size < 2) false else
s.tail.contains (s.head) || containsDuplicates (s.tail)
I didn't measure this, and think it is similar to huynhjl's solution, but a bit more simple to understand.
It returns early, if a duplicate is found, so I looked into the source of Seq.contains, whether this returns early - it does.
In SeqLike, 'contains (e)' is defined as 'exists (_ == e)', and exists is defined in TraversableLike:
def exists (p: A => Boolean): Boolean = {
var result = false
breakable {
for (x <- this)
if (p (x)) { result = true; break }
}
result
}
I'm curious how to speed things up with parallel collections on multi cores, but I guess it is a general problem with early-returning, while another thread will keep running, because it doesn't know, that the solution is already found.
Summary:
I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.
Note: This answer is a straight copy of the answer on a related question.
Details:
If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.
Here's the function:
def filterDupes[A](items: List[A]): (List[A], List[(A, Int)]) = {
def recursive(remaining: List[A], index: Int, accumulator: (List[A], List[(A, Int)])): (List[A], List[(A, Int)]) =
if (remaining.isEmpty)
accumulator
else
recursive(
remaining.tail
, index + 1
, if (accumulator._1.contains(remaining.head))
(accumulator._1, (remaining.head, index) :: accumulator._2)
else
(remaining.head :: accumulator._1, accumulator._2)
)
val (distinct, dupes) = recursive(items, 0, (Nil, Nil))
(distinct.reverse, dupes.reverse)
}
An below is an example which might make it a bit more intuitive. Given this List of String values:
val withDupes =
List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
...and then performing the following:
val (deduped, dupeAndIndexs) =
filterDupes(withDupes)
...the results are:
deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:
val dupesOnly =
dupeAndIndexs.map(_._1).distinct
...or all in a single call:
val dupesOnly =
filterDupes(withDupes)._2.map(_._1).distinct
...or if a Set is preferred, skip distinct and invoke toSet...
val dupesOnly2 =
dupeAndIndexs.map(_._1).toSet
...or all in a single call:
val dupesOnly2 =
filterDupes(withDupes)._2.map(_._1).toSet
This is a straight copy of the filterDupes function out of my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.
If you're trying to check for duplicates in a test then ScalaTest can be helpful.
import org.scalatest.Inspectors._
import org.scalatest.Matchers._
forEvery(list.distinct) { item =>
withClue(s"value $item, the number of occurences") {
list.count(_ == item) shouldBe 1
}
}
// example:
scala> val list = List(1,2,3,4,3,2)
list: List[Int] = List(1, 2, 3, 4, 3, 2)
scala> forEvery(list) { item => withClue(s"value $item, the number of occurences") { list.count(_ == item) shouldBe 1 } }
org.scalatest.exceptions.TestFailedException: forEvery failed, because:
at index 1, value 2, the number of occurences 2 was not equal to 1 (<console>:19),
at index 2, value 3, the number of occurences 2 was not equal to 1 (<console>:19)
in List(1, 2, 3, 4)