Scala generic "string split" method - scala

If I were splitting a string, I would be able to do
"123,456,789".split(",")
to get
Seq("123","456","789")
Thinking of a string as a sequence of characters, how could this be generalized to other sequences of objects?
val x = Seq(One(),Two(),Three(),Comma(),Five(),Six(),Comma(),Seven(),Eight(),Nine())
x.split(
number=>{
case _:Comma => true
case _ => false
}
)
split in this case doesn't exist, but it reminds me of span, partition, groupby, but only span seems close, but it doesn't handle leading/ending comma's gracefully.

implicit class SplitSeq[T](seq: Seq[T]){
import scala.collection.mutable.ListBuffer
def split(sep: T): Seq[Seq[T]] = {
val buffer = ListBuffer(ListBuffer.empty[T])
seq.foreach {
case `sep` => buffer += ListBuffer.empty
case elem => buffer.last += elem
}; buffer.filter(_.nonEmpty)
}
}
It can be then used like x.split(Comma()).

The following is 'a' solution, not the most elegant -
def split[A](x: Seq[A], edge: A => Boolean): Seq[Seq[A]] = {
val init = (Seq[Seq[A]](), Seq[A]())
val (result, last) = x.foldLeft(init) { (cum, n) =>
val (total, prev) = cum
if (edge(n)) {
(total :+ prev, Seq.empty)
} else {
(total, prev :+ n)
}
}
result :+ last
}
Example result -
scala> split(Seq(1,2,3,0,4,5,0,6,7), (_:Int) == 0)
res53: Seq[Seq[Int]] = List(List(1, 2, 3), List(4, 5), List(6, 7))

This is how I've solved it in the past, but I suspect there is a better / more elegant way.
def break[A](xs:Seq[A], p:A => Boolean): (Seq[A], Seq[A]) = {
if (p(xs.head)) {
xs.span(p)
}
else {
xs.span(a => !p(a))
}
}

Related

how to get the index of the duplicate pair in the a list using scala

I have a Scala list below :
val numList = List(1,2,3,4,5,1,2)
I want to get index of the same element pair in the list. The output should look like (0,5),(1,6)
How can I achieve using map?
def catchDuplicates(num : List[Int]) : (Int , Int) = {
val count = 0;
val emptyMap: HashMap[Int, Int] = HashMap.empty[Int, Int]
for (i <- num)
if (emptyMap.contains(i)) {
emptyMap.put(i, (emptyMap.get(i)) + 1) }
else {
emptyMap.put(i, 1)
}
}
Let's make the challenge a little more interesting.
val numList = List(1,2,3,4,5,1,2,1)
Now the result should be something like (0, 5, 7),(1, 6), which makes it pretty clear that returning one or more tuples is not going to be feasible. Returning a List of List[Int] would make much more sense.
def catchDuplicates(nums: List[Int]): List[List[Int]] =
nums.zipWithIndex //List[(Int,Int)]
.groupMap(_._1)(_._2) //Map[Int,List[Int]]
.values //Iterable[List[Int]]
.filter(_.lengthIs > 1)
.toList //List[List[Int]]
You might also add a .view in order to minimize the number of traversals and intermediate collections created.
def catchDuplicates(nums: List[Int]): List[List[Int]] =
nums.view
.zipWithIndex
.groupMap(_._1)(_._2)
.collect{case (_,vs) if vs.sizeIs > 1 => vs.toList}
.toList
How can I achieve using map?
You can't.
Because you only want to return the indexes of the elements that appear twice; which is a very different kind of transformation than the one that map expects.
You can use foldLeft thought.
object catchDuplicates {
final case class Result[A](elem: A, firstIdx: Int, secondIdx: Int)
private final case class State[A](seenElements: Map[A, Int], duplicates: List[Result[A]]) {
def next(elem: A, idx: Int): State[A] =
seenElements.get(key = elem).fold(
ifEmpty = this.copy(seenElements = this.seenElements + (elem -> idx))
) { firstIdx =>
State(
seenElements = this.seenElements.removed(key = elem),
duplicates = Result(elem, firstIdx, secondIdx = idx) :: this.duplicates
)
}
}
private object State {
def initial[A]: State[A] =
State(
seenElements = Map.empty,
duplicates = List.empty
)
}
def apply[A](data: List[A]): List[Result[A]] =
data.iterator.zipWithIndex.foldLeft(State.initial[A]) {
case (acc, (elem, idx)) =>
acc.next(elem, idx)
}.duplicates // You may add a reverse here if order is important.
}
Which can be used like this:
val numList = List(1,2,3,4,5,1,2)
val result = catchDuplicates(numList)
// result: List[Result] = List(Result(2,1,6), Result(1,0,5))
You can see the code running here.
I think returning tuple is not a good option instead you should try Map like -
object FindIndexOfDupElement extends App {
val numList = List(1, 2, 3, 4, 5, 1, 2)
#tailrec
def findIndex(elems: List[Int], res: Map[Int, List[Int]] = Map.empty, index: Int = 0): Map[Int, List[Int]] = {
elems match {
case head :: rest =>
if (res.get(head).isEmpty) {
findIndex(rest, res ++ Map(head -> (index :: Nil)), index + 1)
} else {
val updatedMap: Map[Int, List[Int]] = res.map {
case (key, indexes) if key == head => (key, (indexes :+ index))
case (key, indexes) => (key, indexes)
}
findIndex(rest, updatedMap, index + 1)
}
case _ => res
}
}
println(findIndex(numList).filter(x => x._2.size > 1))
}
you can clearly see the number(key) and respective index in the map -
HashMap(1 -> List(0, 5), 2 -> List(1, 6))

How to create an Akka Stream Source that generates items recursively

I'm trying to figure out how to create an Akka Streams source that generates many Seq[Int].
Basically, given an int n I want to generate all of the Seq[Int] of 1 to n
Here's some code that does this:
def combinations(n: Int): Seq[Seq[Int]] = {
def loop(acc: (Seq[Int], Seq[Seq[Int]]),
remaining: Seq[Int]): Seq[Seq[Int]] = {
remaining match {
case s if s.size == 1 => {
val total: Seq[Seq[Int]] = acc._2
val current: Seq[Int] = acc._1
total :+ (current :+ s.head)
}
case _ => {
for {
x <- remaining
comb <- loop((acc._1 :+ x, acc._2), remaining.filter(_ != x))
} yield comb
}
}
}
loop((Seq(), Seq()), (1 to n))
}
This works fine up to 10... then it blows up because it runs out of memory. Since I just want to process each of them and don't need to keep them all in memory, I thought... Akka Streams. But I'm at a loss for how to turn this into a Source that produces each combination so I can process them. Basically there where it's appending to total I would produce another item onto the stream.
Here is a solution that uses the Johnson-Trotter algorithm for permutations. tcopermutations creates a LazyList that can be evaluated as needed. For more permutations, just pass a different value to printNIterations.
The reason for using the Johnson-Trotter algorithm is that it breaks the recursive structure of the permutation finding algorithm. That's important for being able to evaluate successive instances of the permutation and storing them in some kind of lazy list or stream.
object PermutationsTest {
def main(args: Array[String]) = {
printNIterations(50, tcopermutations(5).iterator)
}
def printNIterations(n: Int, it: Iterator[Seq[Int]]): Unit = {
if (n<=0) ()
else {
if (it.hasNext) {
println(it.next())
printNIterations(n - 1, it)
} else ()
}
}
def naivepermutations(n: Int): Seq[Seq[Int]] = {
def loop(acc: Seq[Int], remaining: Seq[Int]): Seq[Seq[Int]] = {
remaining match {
case s if s.size == 1 => {
val current: Seq[Int] = acc
Seq((current :+ s.head))
}
case _ => {
for {
x <- remaining
comb <- loop(acc :+ x, remaining.filter(_ != x))
} yield comb
}
}
}
loop(Seq(), (1 to n))
}
def tcopermutations(n: Int): LazyList[Seq[Int]] = {
val start = (1 to n).map(Element(_, Left))
def loop(v: Seq[Element]): LazyList[Seq[Element]] = {
johnsonTrotter(v) match {
case Some(s) => v #:: loop(s)
case None => LazyList(v)
}
}
loop(start).map(_.map(_.i))
}
def checkIfMobile(seq: Seq[Element], i: Int): Boolean = {
val e = seq(i)
def getAdjacent(s: Seq[Element], d: Direction, j: Int): Int = {
val adjacentIndex = d match {
case Left => j - 1
case Right => j + 1
}
s(adjacentIndex).i
}
if (e.direction == Left && i == 0) false
else if (e.direction == Right && i == seq.size - 1) false
else if (getAdjacent(seq, e.direction, i) < e.i) true
else false
}
def findLargestMobile(seq: Seq[Element]): Option[Int] = {
val mobiles = (0 until seq.size).filter{j => checkIfMobile(seq, j)}
if (mobiles.isEmpty) None
else {
val folded = mobiles.map(x=>(x,seq(x).i)).foldLeft(None: Option[(Int, Int)]){ case (acc, elem) =>
acc match {
case None => Some(elem)
case Some((i, value)) => if (value > elem._2) Some((i, value)) else Some(elem)
}
}
folded.map(_._1)
}
}
def swapLargestMobile(seq: Seq[Element], index: Int): (Seq[Element], Int) = {
val dir = seq(index).direction
val value = seq(index).i
dir match {
case Right =>
val folded = seq.foldLeft((None, Seq()): (Option[Element], Seq[Element])){(acc, elem) =>
val matched = elem.i == value
val newAccOpt = if (matched) Some(elem) else None
val newAccSeq = acc._1 match {
case Some(swapMe) => acc._2 :+ elem :+ swapMe
case None => if (matched) acc._2 else acc._2 :+ elem
}
(newAccOpt, newAccSeq)
}
(folded._2, index + 1)
case Left =>
val folded = seq.foldRight((None, Seq()): (Option[Element], Seq[Element])){(elem, acc) =>
val matched = elem.i == value
val newAccOpt = if (matched) Some(elem) else None
val newAccSeq = acc._1 match {
case Some(swapMe) => swapMe +: elem +: acc._2
case None => if (matched) acc._2 else elem +: acc._2
}
(newAccOpt, newAccSeq)
}
(folded._2, index - 1)
}
}
def revDirLargerThanMobile(seq: Seq[Element], mobile: Int) = {
def reverse(e: Element) = {
e.direction match {
case Left => Element(e.i, Right)
case Right => Element(e.i, Left)
}
}
seq.map{ elem =>
if (elem.i > seq(mobile).i) reverse(elem)
else elem
}
}
def johnsonTrotter(curr: Seq[Element]): Option[Seq[Element]] = {
findLargestMobile(curr).map { m =>
val (swapped, newMobile) = swapLargestMobile(curr, m)
revDirLargerThanMobile(swapped, newMobile)
}
}
trait Direction
case object Left extends Direction
case object Right extends Direction
case class Element(i: Int, direction: Direction)
}

Break loop if Either function returns Left

In the following code, what I need is to stop processing the loop if either either1 or either2 return Left, and if that happens then mainFunction has to return Left as well. Also, the string returned by either1.Left or either2.Left needs to be returned by mainFunction.Left. How to make this work?
def either1 (i:Int): Future[Either[String,Int]] = Future {
if (i<3)
Right(i*2)
else
Left("error 1")
}
def either2 (i:Int): Future[Either[String,Int]] = Future {
if (i>3)
Right(i*2)
else
Left("error 2")
}
val seq = Seq ( 1,1,2,2,3,4,5 )
def mainFunction: Future[Either[String,Int]] = Future {
val seq2 = seq.map { number =>
if (number % 2 == 0)
either1(number) // <-- this needs to break the loop if it returns Left
else
either2(number) // <-- this needs to break the loop if it returns Left
}
Right(seq2.length) // <-- seq2 is a sequence of Futures
}
The code below keeps iterating over the sequence until it encounters the first error, and returns the error message, or the fixed number 42 (that's the "doesn't matter what it returns"-requirement).
import scala.concurrent._
import scala.util._
import scala.concurrent.ExecutionContext.Implicits.global
def either1(i: Int): Future[Either[String,Int]] = Future {
if (i < 3) Right(i * 2)
else Left("error 1")
}
def either2 (i:Int): Future[Either[String,Int]] = Future {
if (i > 3) Right(i * 2)
else Left("error 2")
}
val seq = Seq(1, 1, 2, 2, 3, 4, 5)
val doesntMatter = 42
/** Returns either first error message returned by `either1` or
* `either2`, or the fixed number `doesntMatter`.
*/
def mainFunction: Future[Either[String, Int]] = {
def recHelper(remaining: List[Int]): Future[Either[String, Int]] = {
remaining match {
case Nil => Future { Right(doesntMatter) }
case h :: t => (if (h % 2 == 0) either1(h) else either2(h)).flatMap {
headEither =>
headEither match {
case Left(s) => Future { Left(s) }
case Right(n) => recHelper(t)
}
}
}
}
recHelper(seq.toList)
}
val res = mainFunction
Thread.sleep(2000)
println(res) // Future(Success(Left(error 2)))
If you do this significantly more often than once, consider taking a look at Scala Cats' EitherT, and also at the method tailRecM defined specifically for such use cases on all monadic typeclasses.
In Scala the standard collections don’t provide a method for that.
You can user either scala.util.control.Breaks or you have to write the
recursion, something like this
val seq = Seq(1, 1, 2, 2, 3, 4, 5)
def either1(i: Int): Either[String, Int] = {
if (i < 3) Right(i * 2)
else Left("error 1")
}
def either2(i: Int): Either[String, Int] = {
if (i > 3) Right(i * 2)
else Left("error 2")
}
def rec(seq: Seq[Int], acc: Seq[Either[String, Int]]): Seq[Either[String, Int]] = seq match {
case Nil => acc
case x :: xs =>
val xx = if (x % 2 == 0) either1(x) else either2(x)
xx match {
case Left(_) => acc
case Right(value) => rec(xs, acc :+ Right(value))
}
}
rec(seq, Seq())
I generally avoid recursive functions if a library function will do what I want.
In this case we can use takeWhile to take all the leading elements that are Right. However the map call will still process every element of the Seq so you need to use view to evaluate this lazily:
val seq2 = seq.view.map { number =>
if (number % 2 == 0)
either1(number)
else
either2(number)
}.takeWhile(_.isRight)
You still have the the problem that your either functions actually return a Future and therefore can't be tested for Left or Right until they complete.

Cartesian product stream scala

I had a simple task to find combination which occurs most often when we drop 4 cubic dices an remove one with least points.
So, the question is: are there any Scala core classes to generate streams of cartesian products in Scala? When not - how to implement it in the most simple and effective way?
Here is the code and comparison with naive implementation in Scala:
object D extends App {
def dropLowest(a: List[Int]) = {
a diff List(a.min)
}
def cartesian(to: Int, times: Int): Stream[List[Int]] = {
def stream(x: List[Int]): Stream[List[Int]] = {
if (hasNext(x)) x #:: stream(next(x)) else Stream(x)
}
def hasNext(x: List[Int]) = x.exists(n => n < to)
def next(x: List[Int]) = {
def add(current: List[Int]): List[Int] = {
if (current.head == to) 1 :: add(current.tail) else current.head + 1 :: current.tail // here is a possible bug when we get maximal value, don't reuse this method
}
add(x.reverse).reverse
}
stream(Range(0, times).map(t => 1).toList)
}
def getResult(list: Stream[List[Int]]) = {
list.map(t => dropLowest(t).sum).groupBy(t => t).map(t => (t._1, t._2.size)).toMap
}
val list1 = cartesian(6, 4)
val list = for (i <- Range(1, 7); j <- Range(1,7); k <- Range(1, 7); l <- Range(1, 7)) yield List(i, j, k, l)
println(getResult(list1))
println(getResult(list.toStream) equals getResult(list1))
}
Thanks in advance
I think you can simplify your code by using flatMap :
val stream = (1 to 6).toStream
def cartesian(times: Int): Stream[Seq[Int]] = {
if (times == 0) {
Stream(Seq())
} else {
stream.flatMap { i => cartesian(times - 1).map(i +: _) }
}
}
Maybe a little bit more efficient (memory-wise) would be using Iterators instead:
val pool = (1 to 6)
def cartesian(times: Int): Iterator[Seq[Int]] = {
if (times == 0) {
Iterator(Seq())
} else {
pool.iterator.flatMap { i => cartesian(times - 1).map(i +: _) }
}
}
or even more concise by replacing the recursive calls by a fold :
def cartesian[A](list: Seq[Seq[A]]): Iterator[Seq[A]] =
list.foldLeft(Iterator(Seq[A]())) {
case (acc, l) => acc.flatMap(i => l.map(_ +: i))
}
and then:
cartesian(Seq.fill(4)(1 to 6)).map(dropLowest).toSeq.groupBy(i => i.sorted).mapValues(_.size).toSeq.sortBy(_._2).foreach(println)
(Note that you cannot use groupBy on Iterators, so Streams or even Lists are the way to go whatever to be; above code still valid since toSeq on an Iterator actually returns a lazy Stream).
If you are considering stats on the sums of dice instead of combinations, you can update the dropLowest fonction :
def dropLowest(l: Seq[Int]) = l.sum - l.min

MatchError when match receives an IndexedSeq but not a LinearSeq

Is there a reason that match written against Seq would work differently on IndexedSeq types than the way it does on LinearSeq types? To me it seems like the code below should do the exact same thing regardless of the input types. Of course it doesn't or I wouldn't be asking.
import collection.immutable.LinearSeq
object vectorMatch {
def main(args: Array[String]) {
doIt(Seq(1,2,3,4,7), Seq(1,4,6,9))
doIt(List(1,2,3,4,7), List(1,4,6,9))
doIt(LinearSeq(1,2,3,4,7), LinearSeq(1,4,6,9))
doIt(IndexedSeq(1,2,3,4,7), IndexedSeq(1,4,6,9))
doIt(Vector(1,2,3,4,7), Vector(1,4,6,9))
}
def doIt(a: Seq[Long], b: Seq[Long]) {
try {
println("OK! " + m(a, b))
}
catch {
case ex: Exception => println("m(%s, %s) failed with %s".format(a, b, ex))
}
}
#annotation.tailrec
def m(a: Seq[Long], b: Seq[Long]): Seq[Long] = {
a match {
case Nil => b
case firstA :: moreA => b match {
case Nil => a
case firstB :: moreB if (firstB < firstA) => m(moreA, b)
case firstB :: moreB if (firstB > firstA) => m(a, moreB)
case firstB :: moreB if (firstB == firstA) => m(moreA, moreB)
case _ => throw new Exception("Got here: a: " + a + " b: " + b)
}
}
}
}
Running this on 2.9.1 final, I get the following output:
OK! List(2, 3, 4, 7)
OK! List(2, 3, 4, 7)
OK! List(2, 3, 4, 7)
m(Vector(1, 2, 3, 4, 7), Vector(1, 4, 6, 9)) failed with scala.MatchError: Vector(1, 2, 3, 4, 7) (of class scala.collection.immutable.Vector)
m(Vector(1, 2, 3, 4, 7), Vector(1, 4, 6, 9)) failed with scala.MatchError: Vector(1, 2, 3, 4, 7) (of class scala.collection.immutable.Vector)
It runs fine for List-y things, but fails for Vector-y things. Am I missing something? Is this a compiler bug?
The scalac -print output for m looks like:
#scala.annotation.tailrec def m(a: Seq, b: Seq): Seq = {
<synthetic> val _$this: object vectorMatch = vectorMatch.this;
_m(_$this,a,b){
<synthetic> val temp6: Seq = a;
if (immutable.this.Nil.==(temp6))
{
b
}
else
if (temp6.$isInstanceOf[scala.collection.immutable.::]())
{
<synthetic> val temp8: scala.collection.immutable.:: = temp6.$asInstanceOf[scala.collection.immutable.::]();
<synthetic> val temp9: Long = scala.Long.unbox(temp8.hd$1());
<synthetic> val temp10: List = temp8.tl$1();
val firstA$1: Long = temp9;
val moreA: List = temp10;
{
<synthetic> val temp1: Seq = b;
if (immutable.this.Nil.==(temp1))
{
a
}
else
if (temp1.$isInstanceOf[scala.collection.immutable.::]())
{
<synthetic> val temp3: scala.collection.immutable.:: = temp1.$asInstanceOf[scala.collection.immutable.::]();
<synthetic> val temp4: Long = scala.Long.unbox(temp3.hd$1());
<synthetic> val temp5: List = temp3.tl$1();
val firstB: Long = temp4;
if (vectorMatch.this.gd1$1(firstB, firstA$1))
body%11(firstB){
_m(vectorMatch.this, moreA, b)
}
else
{
val firstB: Long = temp4;
val moreB: List = temp5;
if (vectorMatch.this.gd2$1(firstB, moreB, firstA$1))
body%21(firstB,moreB){
_m(vectorMatch.this, a, moreB)
}
else
{
val firstB: Long = temp4;
val moreB: List = temp5;
if (vectorMatch.this.gd3$1(firstB, moreB, firstA$1))
body%31(firstB,moreB){
_m(vectorMatch.this, moreA, moreB)
}
else
{
body%41(){
throw new java.lang.Exception("Got here: a: ".+(a).+(" b: ").+(b))
}
}
}
}
}
else
{
body%41()
}
}
}
else
throw new MatchError(temp6)
}
};
You can't use :: for anything other than List. The Vector is failing to match because :: is a case class that extends List, so its unapply method does not work for Vector.
val a :: b = List(1,2,3) // fine
val a :: b = Vector(1,2,3) // error
But you can define your own extractor that works for all sequences:
object +: {
def unapply[T](s: Seq[T]) =
s.headOption.map(head => (head, s.tail))
}
So you can do:
val a +: b = List(1,2,3) // fine
val a +: b = Vector(1,2,3) // fine
Followed pattern match works for List, Seq, LinearSeq, IndexedSeq, Vector.
Vector(1,2) match {
case a +: as => s"$a + $as"
case _ => "empty"
}
In Scala 2.10 object +: was introduced at this commit. Since then, for every SeqLike, you can do:
#annotation.tailrec
def m(a: Seq[Long], b: Seq[Long]): Seq[Long] = {
a match {
case Nil => b
case firstA +: moreA => b match {
case Nil => a
case firstB +: moreB if (firstB < firstA) => m(moreA, b)
case firstB +: moreB if (firstB > firstA) => m(a, moreB)
case firstB +: moreB if (firstB == firstA) => m(moreA, moreB)
case _ => throw new Exception("Got here: a: " + a + " b: " + b)
}
}
}
Code run at Scastie.