How to get distinct items from a Scala Iterable, maintaining laziness - scala

I have a java.lang.Iterable which computes its values lazily. I am accessing it from Scala. Is there a core API way of returning only distinct values? For instance, imaging there was a filter method that also provided all results returned thus far:
val myLazyDistinctIterable = iterable.filter((previousReturnedItems, newItem) => previousReturnedItems.contains(newItem))
I guess this is not a very general case because it involves storing previously returned items, and that might be why it isn't in the core API.
I know about List.distinct and Sets but I want something that will not compute its elements until asked.

You can use the distinct method on Stream. For example, if you have this Iterable:
val it = new java.lang.Iterable[Int] {
def iterator = new java.util.Iterator[Int] {
var i = 0
var first = true
def hasNext = true
def next =
if (first) { first = false; i } else { first = true; i += 1; i - 1 }
def remove() { throw new UnsupportedOperationException("Can't remove.") }
}
}
You can write:
scala> import scala.collection.JavaConverters._
import scala.collection.JavaConverters._
scala> val s = it.asScala.toStream
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> s.take(10).toList
res0: List[Int] = List(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)
scala> val s = it.asScala.toStream.distinct
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> s.take(10).toList
res1: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
We can tell that everything is appropriately lazy since the stream is infinite.

UPDATE Reading questions carefully is good. No laziness in this solution. Sorry.
toSet will do exactly what you want:
Store iterated elements in a collection (not what you want but required)
Drop / Replace duplicates
Example
val it = Seq(1,2,3,4,2,4): Iterable[Int]
it.toSet
// Set(1,2,3,4)
If you feel fancy, you can convert that back to an iterable:
it.toSet.toIterable
Or, pimp the Iterable:
implicit class UniquableIterable[T](t: Iterable[T]) {
def unique = t.toSet.toIterable
}
And then call
it.unique

Expanding on my comment above, but I can't test it right now:
def unique[A](it: Iterator[A]): Iterator[A] = {
val seen = mutable.Set[A]()
it.filter { a =>
if (seen(a))
false
else {
seen += a
true
}
}
}
You get the idea, at least. You would then apply this to the iterator you get from your iterable, and not get the unnecessary storage behavior of Stream.

Here is the code that adds .disctinct method to Iterator.
implicit class IteratorWrapper[T](it: Iterator[T]) {
def distinct = new Iterator[T] {
var seen = Set.empty[T]
var ahead = Option.empty[T]
def searchAhead {
while (ahead.isEmpty && it.hasNext) {
val v = it.next
if (!seen(v)) {
seen += v
ahead = Some(v)
}
}
}
def hasNext = {
searchAhead
ahead.nonEmpty
}
def next = {
searchAhead
val result = ahead.get
ahead = None
result
}
}
}
Be aware that, as it is usually so with Iterators, the original iterator is not valid after calling .distinct on it.

This should do the job (but I hate):
class UniqueIterable[T](i: Iterable[T]) extends Iterable[T] {
import scala.collection.mutable.Set
def iterator = new Iterator[T] {
val it = i.iterator
var nextE: Option[T] = None
val seen: Set[T] = Set.empty
def hasNext = {
popNext()
nextE.isDefined
}
def next = {
popNext()
val res = nextE.get
nextE = None
res
}
#tailrec
private def popNext() {
if (nextE.isEmpty && it.hasNext) {
val n = it.next
if (seen contains n) popNext()
else {
seen += n
nextE = Some(n)
}
}
}
}
}

Related

Sum of int elements in list and vector using single function in Scala

How to make this code work?
sealed abstract class Addable[A] {
def sum(el: Seq[A]): A
}
class MyAddable[A]() extends Addable[A] {
override def sum(el: Seq[A]): A = {
el.sum
}
}
val myvec = Vector(1, 2, 3)
val mylist = List(1, 2, 3)
val inst = new MyAddable
val res0 = inst.sum(mylist) // should return 6
val res1 = inst.sum(myvec) // should return 6
println(s"res0 = $res0")
println(s"res1 = $res1")
I want to pass a generic data type (Vector/List[Int]) and get a sum of it's elements using the described signatures and code structure.
At the moment I am getting:
found : immutable.this.List[scala.this.Int]
required: Seq[scala.this.Nothing]
Scalafiddle
The specific error is here:
val inst = new MyAddable
which should be
val inst = new MyAddable[Int]()
MyAddable is generic but you are not specifying a type, so it is assuming Nothing, hence the error message.
sealed abstract class Addable[A] {
def sum(el: Seq[A]): A
}
class MyAddable[A: Numeric]() extends Addable[A] {
override def sum(el: Seq[A]): A = {
el.sum
}
}
val myvec = Vector(1, 2, 3)
val mylist = List(1, 2, 3)
val inst = new MyAddable[Int]()
val res0 = inst.sum(mylist)
val res1 = inst.sum(myvec)
println(s"res0 = $res0")
println(s"res1 = $res1")
import cats.{Semigroup}
import cats.implicits._
// Specify a generic Reduce Function. Use Contravariant parameter to support reduce on derived types
trait Reduce[-F[_]] {
def reduce[A](fa:F[A])(f:(A,A) => A):A
}
object Reduce {
implicit val SeqReduce = new Reduce[Seq] {
def reduce[A] (data:Seq[A])(f:(A,A) => A ):A = data reduce f
}
implicit val OptReduce = new Reduce[Option] {
def reduce[A] (data:Option[A])(f:(A,A) => A ):A = data reduce f
}
}
// Generic sum function
def sum[A:Semigroup, F[_]](container: F[A])(implicit red:Reduce[F]):A = {
red.reduce(container)(Semigroup.combine(_,_))
}
val myvec = Vector(1, 2, 3)
val mylist = List (1, 2, 3)
val mymap = Map ( 1 -> "one",
2 -> "two",
3 -> "three"
)
val myopt = Some(1)
val res0 = sum(myvec)
val res1 = sum(mylist)
val res2 = sum(myopt)
println(s"res0 = $res0")
println(s"res1 = $res1")
println(s"res2 = $res2")
This gets a little more complicated for Maps

How to implement class that supports circular iteration and deletion

I have a class, that should support circular iteration and deletion:
class CircularTest {
private val set = mutable.LinkedHashSet[String]("1", "2", "3", "4", "5")
private val circularIter: Iterator[String] = Iterator.continually(set).flatten
def selectNext: String = {
circularIter.next()
}
def remove(v: String): Unit = {
set.remove(v)
}
}
And this not working.
Simple test, that should work:
val circularTest = new CircularTest
circularTest.selectNext shouldEqual "1"
circularTest.selectNext shouldEqual "2"
circularTest.remove("3")
circularTest.remove("5")
circularTest.selectNext shouldEqual "4" // actual "3"
circularTest.selectNext shouldEqual "1"
How to implement this functionality? Or maybe other solution with no iterator?
Well... The thing is that the Iterator.continually will give you a kind of immutable thing in this case. Which means that even if you change the content of your set it will have no effect on iterator.
You can actually have a work around that with updating the iterator itself in the remove method.
class CircularTest {
private var set = Set[String]("1", "2", "3", "4", "5")
private var circularIter: Iterator[String] = Iterator.continually(set).flatten
def selectNext: String = this.synchronized {
circularIter.next()
}
def remove(v: String): Unit = this.synchronized {
set = set.remove(v)
circularIter = Iterator.continually(set).flatten
}
}
But a better approach is to actually implement your own iterator in proper way.
import scala.collection.immutable.HashSet
import scala.collection.mutable.ArrayBuffer
class MyCircularIterator[T] extends Iterator[T] {
private var index: Int = 0
private var set: mutable.LinkedHashSet[T] = mutable.LinkedHashSet()
private var vector: Vector[T] = Vector()
private var vectorSize: Int = 0
override def hasNext: Boolean = this.synchronized {
set.size match {
case 0 => false
case _ => true
}
}
// Iterator does not define `next()` behavior whe hasNext == false;
// here it will just throw IndexOutOfBoundsException
override def next(): T = this.synchronized {
index = index % vectorSize
val next = vector(index)
index = index + 1
next
}
def add(t: T*): Unit = this.synchronized {
set = set ++ t
vector = Vector(set.toList: _*)
vectorSize = vector.length
}
def remove(t: T*): Unit = this.synchronized {
set = set -- t
vector = Vector(set.toList: _*)
vectorSize = vector.length
}
}
object MyCircularIterator {
def apply[T](hashSet: HashSet[T]): MyCircularIterator[T] = {
val iter = new MyCircularIterator[T]()
iter.add(hashSet.toList: _*)
iter
}
}
Now you can use it like this,
val myCircularIterator = MyCircularIterator(HashSet[Int](1, 2, 3, 4, 5))
myCircularIterator.next()
// 1
myCircularIterator.next()
// 2
myCircularIterator.remove(3, 5)
myCircularIterator.next()
// 4
myCircularIterator.next()
// 1

Pass implicit Ordering[Int] argument to Ordering[T] parameter

I want to write some mergesort function.
How to supply Ordering[T] to merge subfunction?
The overall structure of application is the following:
object Main extends App {
...
val array: Array[Int] = string.split(' ').map(_.toInt)
def mergesort[T](seq: IndexedSeq[T]): IndexedSeq[T] = {
def mergesortWithIndexes(seq: IndexedSeq[T],
startIdx: Int, endIdx: Int): IndexedSeq[T] = {
import Helpers.append
val seqLength = endIdx - startIdx
val splitLength = seq.length / 2
val (xs, ys) = seq.splitAt(splitLength)
val sortXs = mergesortWithIndexes(xs, startIdx, startIdx + seqLength)
val sortYs = mergesortWithIndexes(ys, startIdx + seqLength, endIdx)
def merge(sortXs: IndexedSeq[T], sortYs: IndexedSeq[T],
writeFun: Iterable[CharSequence] => Path)(ord: math.Ordering[T]): IndexedSeq[T] = {
...
while (firstIndex < firstLength || secondIndex < secondLength) {
if (firstIndex == firstLength)
buffer ++ sortYs
else if (secondIndex == secondLength)
buffer ++ sortXs
else {
if (ord.lteq(minFirst, minSecond)) {
...
} else {
...
}
}
}
buffer.toIndexedSeq
}
merge(sortXs, sortYs, append(output))
}
mergesortWithIndexes(seq, 0, seq.length)
}
val outSeq = mergesort(array)
Helpers.write(output)(Vector(outSeq.mkString(" ")))
}
I want to have general merge() function definition, but in application I use IndexedSeq[Int] and thus expecting pass predefined Ordering[Int].
Adding implicit Ordering[T] parameter to the outermost function should fix the problem, and passing non Ordering[T] arguments will result in compile error.
Scala's sort functions do the same thing: https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/SeqLike.scala#L635
def mergesort[T](seq: IndexedSeq[T])(implicit ord: math.Ordering[T]): IndexedSeq[T] = {

Filtering a collection based on an arbitrary number of options

How can the following Scala function be refactored to use idiomatic best practices?
def getFilteredList(ids: Seq[Int],
idsMustBeInThisListIfItExists: Option[Seq[Int]],
idsMustAlsoBeInThisListIfItExists: Option[Seq[Int]]): Seq[Int] = {
var output = ids
if (idsMustBeInThisListIfItExists.isDefined) {
output = output.intersect(idsMustBeInThisListIfItExists.get)
}
if (idsMustAlsoBeInThisListIfItExists.isDefined) {
output = output.intersect(idsMustAlsoBeInThisListIfItExists.get)
}
output
}
Expected IO:
val ids = Seq(1,2,3,4,5)
val output1 = getFilteredList(ids, None, Some(Seq(3,5))) // 3, 5
val output2 = getFilteredList(ids, None, None) // 1,2,3,4,5
val output3 = getFilteredList(ids, Some(Seq(1,2)), None) // 1,2
val output4 = getFilteredList(ids, Some(Seq(1)), Some(Seq(5))) // 1,5
Thank you for your time.
Here's a simple way to do this:
implicit class SeqAugmenter[T](val seq: Seq[T]) extends AnyVal {
def intersect(opt: Option[Seq[T]]): Seq[T] = {
opt.fold(seq)(seq intersect _)
}
}
def getFilteredList(ids: Seq[Int],
idsMustBeInThisListIfItExists: Option[Seq[Int]],
idsMustAlsoBeInThisListIfItExists: Option[Seq[Int]]
): Seq[Int] = {
ids intersect
idsMustBeInThisListIfItExists intersect
idsMustAlsoBeInThisListIfItExists
}
Yet another way without for comprehensions and implicits:
def getFilteredList(ids: Seq[Int],
idsMustBeInThisListIfItExists: Option[Seq[Int]],
idsMustAlsoBeInThisListIfItExists: Option[Seq[Int]]): Seq[Int] = {
val output1 = ids.intersect(idsMustBeInThisListIfItExists.getOrElse(ids))
val output2 = output1.intersect(idsMustAlsoBeInThisListIfItExists.getOrElse(output1))
output2
}
Another similar way, without implicits.
def getFilteredList[A](ids: Seq[A],
idsMustBeInThisListIfItExists: Option[Seq[A]],
idsMustAlsoBeInThisListIfItExists: Option[Seq[A]]): Seq[A] = {
val a = intersect(Some(ids), idsMustBeInThisListIfItExists)(ids)
val b = intersect(Some(a), idsMustAlsoBeInThisListIfItExists)(a)
b
}
def intersect[A](ma: Option[Seq[A]], mb: Option[Seq[A]])(default: Seq[A]) = {
(for {
a <- ma
b <- mb
} yield {
a.intersect(b)
}).getOrElse(default)
}

In Scala, is there a way to get the currently evaluated items in a Stream?

In Scala, is there a way to get the currently evaluated items in a Stream? For example in the Stream
val s: Stream[Int] = Stream.cons(1, Stream.cons(2, Stream.cons(3, s.map(_+1))))
the method should return only List(1,2,3).
In 2.8, there is a protected method called tailDefined that will return false when you get to the point in the stream that has not yet been evaluated.
This isn't too useful (unless you want to write your own Stream class) except that Cons itself makes the method public. I'm not sure why it's protected in Stream and not in Cons--I would think one or the other might be a bug. But for now, at least, you can write a method like so (writing a functional equivalent is left as an exercise to the reader):
def streamEvalLen[T](s: Stream[T]) = {
if (s.isEmpty) 0
else {
var i = 1
var t = s
while (t match {
case c: Stream.Cons[_] => c.tailDefined
case _ => false
}) {
i += 1
t = t.tail
}
i
}
}
Here you can see it in action:
scala> val s = Stream.iterate(0)(_+1)
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> streamEvalLen(s)
res0: Int = 1
scala> s.take(3).toList
res1: List[Int] = List(0, 1, 2)
scala> s
res2: scala.collection.immutable.Stream[Int] = Stream(0, 1, 2, ?)
scala> streamEvalLen(s)
res3: Int = 3
The solution based on Rex's answer:
def evaluatedItems[T](stream: => Stream[T]): List[T] = {
#tailrec
def inner(s: => Stream[T], acc: List[T]): List[T] = s match {
case Empty => acc
case c: Cons[T] => if (c.tailDefined) {
inner(c.tail, acc ++ List(c.head))
} else { acc ++ List(c.head) }
}
inner(stream, List())
}
Type that statement into the interactive shell and you will see that it evaluates to s: Stream[Int] = Stream(1, ?). So, in fact, the other two elements of 2 and 3 are not yet known.
As you access further elements, more of the stream is calculated. So, now put s(3) into the shell, which will return res0: Int = 2. Now put s into the shell and you will see the new value res1: Stream[Int] = Stream(1, 2, 3, 2, ?).
The only method I could find that contained the information that you wanted was, unfortunately, s.toString. With some parsing you will be able to get the elements back out of the string. This is a barely acceptable solution with just ints and I couldn't imagine any generic solution using the string parsing idea.
Using scanLeft
lazy val s: Stream[Int] = 1 #:: s.scanLeft(2) { case (a, _) => 1 + a }