accumulate list based on a condition using dropWhile - scala

val list = List("1","10","12","30","40","50")
based on parameter n in eg "12" here , the elements ahead of "12" should form a
list List("30,40,50") and final list should be created as below
Expected Output
List("1","10","12",List("30,40,50") )
list.dropWhile(_!="12").tail gives `List("30,40,50")` but i am not above the achieve the desired output

partition will give the closest output to what you are looking for.
scala> list.partition(_ <= "12")
res21: (List[String], List[String]) = (List(1, 10, 12),List(30, 40, 50))
All elements of List must have the same type. splitAt or partition can accomplish this albeit with a different return type than you want. I suspect that the desired return type List[String, ..., List[String]] is a "type smell" that may indicate a another issue.

Maybe span could help:
val (a, b) = list.span(_ != "12")
val h :: t = b
val res = a :+ h :+ List(t.mkString(","))
produces for input List("123", "10", "12", "30", "40", "50"):
List("123", "10", "12", List("30,40,50"))

If you handle the output type(which is List[java.io.Serializable]), here is the acc method, which takes the desired parameter, a String s in this case :
def acc(list:List[String],s:String) = {
val i = list.indexOf(s)
list.take(i+1):+List(list.drop(i+1).mkString(","))
}
In Scala REPL:
scala> val list = List("1","10","12","30","40","50")
list: List[String] = List(1, 10, 12, 30, 40, 50)
scala> acc(list,"12")
res29: List[java.io.Serializable] = List(1, 10, 12, List(30,40,50))
scala> acc(list,"10")
res30: List[java.io.Serializable] = List(1, 10, List(12,30,40,50))
scala> acc(list,"40")
res31: List[java.io.Serializable] = List(1, 10, 12, 30, 40, List(50))
scala> acc(list,"30")
res32: List[java.io.Serializable] = List(1, 10, 12, 30, List(40,50))

Related

Grouping fs2 streams into sub-streams based on predicate

I need a combinator that solves the following problem:
test("groupUntil") {
val s = Stream(1, 2, 3, 4, 1, 2, 3, 3, 1, 2, 2).covary[IO]
val grouped: Stream[IO, Stream[IO, Int]] = s.groupUntil(_ == 1)
val result =
for {
group <- grouped
element <- group.fold(0)(_ + _)
} yield element
assertEquals(result.compile.toList.unsafeRunSync(), List(10, 9, 5))
}
The inner streams must also be lazy. (note, groupUntil is the imaginary combinator I'm asking for).
NOTE: I must deal with every element of the internal stream as soon as they arrive at the original stream, i.e. I cannot wait to chunk an entire group.
One way you can achieve laziness here is using Stream as container in fold function:
import cats.effect.IO
import fs2.Stream
val s = Stream(1, 2, 3, 4, 1, 2, 3, 3, 1, 2, 2).covary[IO]
val acc: Stream[IO, Stream[IO, Int]] = Stream.empty
val grouped: Stream[IO, Stream[IO, Int]] = s.fold(acc) {
case (streamOfStreams, nextInt) if nextInt == 1 =>
Stream(Stream(nextInt).covary[IO]).append(streamOfStreams)
case (streamOfStreams, nextInt) =>
streamOfStreams.head.map(_.append(Stream(nextInt).covary[IO])) ++
streamOfStreams.tail
}.flatten
val result: Stream[IO, IO[Int]] = for {
group <- grouped
element = group.compile.foldMonoid
} yield element
assertEquals(result.map(_.unsafeRunSync()).compile.toList.unsafeRunSync().reverse, List(10, 9, 5))
be careful, in result you will get reversed stream, because it's not good idea to work with the last element of the stream, better way is taking head but it requires us to reverse list in the end of our processing.
Another way is use groupAdjacentBy and group elements by some predicate:
val groupedOnceAndOthers: fs2.Stream[IO, (Boolean, Chunk[Int])] =
s.groupAdjacentBy(x => x == 1)
here you will get groups with pairs:
(true,Chunk(1)), (false,Chunk(2, 3, 4)),
(true,Chunk(1)), (false,Chunk(2, 3, 3)),
(true,Chunk(1)), (false,Chunk(2, 2))
to concat groups with 1 and without we can use chunkN (like grouped in scala List) and map result to get rid of boolean pairs and flatMap to flatten Chunks:
val grouped = groupedOnceAndOthers
.chunkN(2, allowFewer = true)
.map(ch => ch.flatMap(_._2).toList)
result grouped is:
List(1, 2, 3, 4) List(1, 2, 3, 3) List(1, 2, 2)
full working sample:
import cats.effect.IO
import fs2.Stream
val s = Stream(1, 2, 3, 4, 1, 2, 3, 3, 1, 2, 2).covary[IO]
val grouped: Stream[IO, Stream[IO, Int]] = s.groupAdjacentBy(x => x == 1)
.chunkN(2, allowFewer = true)
.map(ch => Stream.fromIterator[IO](ch.flatMap(_._2).iterator))
val result: Stream[IO, IO[Int]] = for {
group <- grouped
element = group.compile.foldMonoid
} yield element
assertEquals(result.map(_.unsafeRunSync()).compile.toList.unsafeRunSync(), List(10, 9, 5))

Behaviour of Options inside for comprehension is Scala

Two newbie questions.
It seems that for comprehension knows about Options and can skip automatically None and unwrap Some, e.g.
val x = Map("a" -> List(1,2,3), "b" -> List(4,5,6), "c" -> List(7,8,9))
val r = for {map_key <- List("WRONG_KEY", "a", "b", "c")
map_value <- x get map_key } yield map_value
outputs:
r: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
Where do the Options go? Can someone please shed some light on how does this work? Can we always rely on this behaviour?
The second things is why this does not compile?
val x = Map("a" -> List(1,2,3), "b" -> List(4,5,6), "c" -> List(7,8,9))
val r = for {map_key <- List("WRONG_KEY", "a", "b", "c")
map_value <- x get map_key
list_value <- map_value
} yield list_value
It gives
Error:(57, 26) type mismatch;
found : List[Int]
required: Option[?]
list_value <- map_value
^
Looking at the type of the first example, I am not sure why we need to have an Option here?
For comprehensions are converted into calls to sequence of map or flatMap calls. See here
Your for loop is equivalent to
List("WRONG_KEY", "a", "b", "c").flatMap(
map_key => x.get(map_key).flatMap(map_value => map_value)
)
flatMap in Option is defined as
#inline final def flatMap[B](f: A => Option[B]): Option[B]
So it is not allowed to pass List as argument as you are notified by compiler.
I think the difference is due to the way for comprehensions are expanded into map() and flatMap method calls within the Seq trait.
For conciseness, lets define some variables:
var keys = List("WRONG_KEY", a, b, c)
Your first case is equivalent to:
val r = keys.flatMap(x.get(_))
whereas your second case is equivalent to:
val r= keys.flatMap(x.get(_).flatMap{ case y => y })
I think the issue is that Option.flatMap() should return an Option[], which is fine in the first case, but is not consistent in the second case with what the x.get().flatMap is passed, which is a List[Int].
These for-comprehension translation rules are explained in further detail in chapter 7 of "Programming Scala" by Wampler & Payne.
Maybe this small difference, setting parenthesis and calling flatten, makes it clear:
val r = for {map_key <- List("WRONG_KEY", "a", "b", "c")
| } yield x get map_key
r: List[Option[List[Int]]] = List(None, Some(List(1, 2, 3)), Some(List(4, 5, 6)), Some(List(7, 8, 9)))
val r = (for {map_key <- List("WRONG_KEY", "a", "b", "c")
| } yield x get map_key).flatten
r: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
That's equivalent to:
scala> List("WRONG_KEY", "a", "b", "c").map (x get _)
res81: List[Option[List[Int]]] = List(None, Some(List(1, 2, 3)), Some(List(4, 5, 6)), Some(List(7, 8, 9)))
scala> List("WRONG_KEY", "a", "b", "c").map (x get _).flatten
res82: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
The intermediate value (map_key) vanished as _ in the second block.
You are mixing up two different monads (List and Option) inside the for statement. This sometimes works as expected, but not always. In any case, you can trasform options into lists yourself:
for {
map_key <- List("WRONG_KEY", "a", "b", "c")
list_value <- x get map_key getOrElse Nil
} yield list_value

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

Can we call method having more than one arguments with list.map?

I was trying to perform multiply operation on list in Scala like:
val list = List(1,2,3,4,5)
list.map(_*2)
res0: List[Int] = List(2, 4, 6, 8, 10) // Output
Now, I have created a separate method for the multiply operation like:
val list = List(1,2,3,4,5)
def multiplyListContents(x: Int) = {
x * 2
}
list.map(multiplyListContents)
res1: List[Int] = List(2, 4, 6, 8, 10) // Output
Now I want to pass custom multiplier instead of using default multiplier 2 like:
val list = List(1,2,3,4,5)
val multiplier = 3
def multiplyListContents(x: Int, multiplier: Int) = {
x * multiplier
}
list.map(multiplyListContents(multiplier))
res1: List[Int] = List(3, 6, 9, 12, 15) // Output should be this
Any idea how to do this?
scala> list.map(multiplyListContents(_, multiplier))
res0: List[Int] = List(3, 6, 9, 12, 15)
This translates to list.map(x => multiplyListContents(x, multiplier)).
(see scala placeholder syntax for more information).

Replace element in List with scala

How do you replace an element by index with an immutable List.
E.g.
val list = 1 :: 2 ::3 :: 4 :: List()
list.replace(2, 5)
If you want to replace index 2, then
list.updated(2,5) // Gives 1 :: 2 :: 5 :: 4 :: Nil
If you want to find every place where there's a 2 and put a 5 in instead,
list.map { case 2 => 5; case x => x } // 1 :: 5 :: 3 :: 4 :: Nil
In both cases, you're not really "replacing", you're returning a new list that has a different element(s) at that (those) position(s).
In addition to what has been said before, you can use patch function that replaces sub-sequences of a sequence:
scala> val list = List(1, 2, 3, 4)
list: List[Int] = List(1, 2, 3, 4)
scala> list.patch(2, Seq(5), 1) // replaces one element of the initial sequence
res0: List[Int] = List(1, 2, 5, 4)
scala> list.patch(2, Seq(5), 2) // replaces two elements of the initial sequence
res1: List[Int] = List(1, 2, 5)
scala> list.patch(2, Seq(5), 0) // adds a new element
res2: List[Int] = List(1, 2, 5, 3, 4)
You can use list.updated(2,5) (which is a method on Seq).
It's probably better to use a scala.collection.immutable.Vector for this purpose, becuase updates on Vector take (I think) constant time.
You can use map to generate a new list , like this :
# list
res20: List[Int] = List(1, 2, 3, 4, 4, 5, 4)
# list.map(e => if(e==4) 0 else e)
res21: List[Int] = List(1, 2, 3, 0, 0, 5, 0)
It can also be achieved using patch function as
scala> var l = List(11,20,24,31,35)
l: List[Int] = List(11, 20, 24, 31, 35)
scala> l.patch(2,List(27),1)
res35: List[Int] = List(11, 20, 27, 31, 35)
where 2 is the position where we are looking to add the value, List(27) is the value we are adding to the list and 1 is the number of elements to be replaced from the original list.
If you do a lot of such replacements, it is better to use a muttable class or Array.
following is a simple example of String replacement in scala List, you can do similar for other types of data
scala> val original: List[String] = List("a","b")
original: List[String] = List(a, b)
scala> val replace = original.map(x => if(x.equals("a")) "c" else x)
replace: List[String] = List(c, b)