Reducing iterator based on value - scala

I have an iterator as such:
Iterator(List(1, 2012), List(2, 2015), List(5, 2017), List(7, 2020))
I'm trying to return an iterator, but with the values slightly changed. The values for all multiples of 5 must be added to the previous row. So the result would be:
Iterator(List(1, 2012), List(2, 4032), List(7, 2020))
I've tried using the following method:
val a = Iterator(List(1, 2012), List(2, 2015), List(5, 2017), List(7, 2020))
val aTransformed = a.reduce((x,y) => if (y(0)%5 == 0) List(x(0),x(1)+y(1)) else x)
but it gives me the final value val aTransformed: List[Int] = List(1, 4029)
What can I do to get an iterator in my desired format? Is there a method to just check the previous/next row without folding it all into one final value?
I know this is possible by converting the iterator to a List, traversing, mutating and converting back to an iterator, but is there a more elegant solution?
Edit for clarification:
Consecutive multiples of 5 will get collated into one sum
Ex:
Iterator(List(1, 2012), List(2, 2015), List(5, 2017), List(10, 2025))
should become
Iterator(List(1, 2012), List(2, 6057))

Since we cant directly get last element from Iterator, we need a buffer to store the last element, and after calcuate, we check the buffer state and append it the final result.
Here I append a empty Iterator[List[Int]] element to simplify the check step.
def convert(xs: Iterator[List[Int]]): Iterator[List[Int]] = {
val res = (xs ++ Iterator(List[Int]())).foldLeft(Iterator[List[Int]](), List[Int]())((x, y)=> {
if (y.nonEmpty && y(0) % 5 == 0) {
if (x._2.nonEmpty) {
(x._1, List(x._2(0), x._2(1) + y(1)))
} else {
(x._1, y)
}
} else {
if (x._2.nonEmpty) {
(x._1 ++ Iterator(x._2), y)
} else {
(x._1, y)
}
}
})
res._1
}
test
scala> val xs1 = Iterator(List(1, 2012), List(2, 2015), List(5, 2017), List(7, 2020))
val xs1: Iterator[List[Int]] = <iterator>
scala> val xs2 = Iterator(List(1, 2012), List(2, 2015), List(5, 2017), List(10, 2025))
val xs2: Iterator[List[Int]] = <iterator>
scala> convert(xs1)
val res44: Iterator[List[Int]] = <iterator>
scala> res44.toList
val res45: List[List[Int]] = List(List(1, 2012), List(2, 4032), List(7, 2020))
scala> convert(xs2)
val res47: Iterator[List[Int]] = <iterator>
scala> res47.toList
val res48: List[List[Int]] = List(List(1, 2012), List(2, 6057))

Following is a possible way to get the expected result. I haven't checked all the possibilities..
val interResult = itr.foldLeft((List.empty[List[Int]], List.empty[Int])) { (acc, curr) =>
if(curr.size != 2)
acc
else if(acc._2.isEmpty)
(acc._1, curr)
else
if(curr.headOption.exists(_ % 5 == 0))
(acc._1, List(acc._2.head, acc._2.last + curr.last))
else
(acc._1 :+ acc._2, curr)
}
interResult._1 :+ interResult._2

Related

scala - if/else in a for loop

If i use the two for loops like this i get a List[List[Int]], but how can i get a List[Int]?
I dont know how i could write a if/else statement in only one for loop, can someone help me ?
def example: (List[(Int, Int)], Int,Int) => List[Int] ={
(list, p, counter) =>
if (counter >=0)
for(x<-list(i._1); if ( x._1 ==p))yield x._2
for(x<-list(i._1); if ( x._1 !=p))yield example((x._1,x._2+i._2):: Nil,p,counter-1)
else { ....}
}
First off, as written, the code you posted is not even a valid definition. If you have something that works but returns a different type than what is desired, post that working code.
That being said, if you have List[List[Int]] and want a List[Int], the method for that is flatten
Usage:
scala> val nestedList = List(List(1, 2), List(3, 4), List(5, 6))
nestedList: List[List[Int]] = List(List(1, 2), List(3, 4), List(5, 6))
scala> val flattenedList = nestedList.flatten
flattenedList: List[Int] = List(1, 2, 3, 4, 5, 6)

Remove one element from Scala List

For example, if I have a list of List(1,2,1,3,2), and I want to remove only one 1, so the I get List(2,1,3,2). If the other 1 was removed it would be fine.
My solution is:
scala> val myList = List(1,2,1,3,2)
myList: List[Int] = List(1, 2, 1, 3, 2)
scala> myList.patch(myList.indexOf(1), List(), 1)
res7: List[Int] = List(2, 1, 3, 2)
But I feel like I am missing a simpler solution, if so what am I missing?
surely not simpler:
def rm(xs: List[Int], value: Int): List[Int] = xs match {
case `value` :: tail => tail
case x :: tail => x :: rm(tail, value)
case _ => Nil
}
use:
scala> val xs = List(1, 2, 1, 3)
xs: List[Int] = List(1, 2, 1, 3)
scala> rm(xs, 1)
res21: List[Int] = List(2, 1, 3)
scala> rm(rm(xs, 1), 1)
res22: List[Int] = List(2, 3)
scala> rm(xs, 2)
res23: List[Int] = List(1, 1, 3)
scala> rm(xs, 3)
res24: List[Int] = List(1, 2, 1)
you can zipWithIndex and filter out the index you want to drop.
scala> val myList = List(1,2,1,3,2)
myList: List[Int] = List(1, 2, 1, 3, 2)
scala> myList.zipWithIndex.filter(_._2 != 0).map(_._1)
res1: List[Int] = List(2, 1, 3, 2)
The filter + map is collect,
scala> myList.zipWithIndex.collect { case (elem, index) if index != 0 => elem }
res2: List[Int] = List(2, 1, 3, 2)
To remove first occurrence of elem, you can split at first occurance, drop the element and merge back.
list.span(_ != 1) match { case (before, atAndAfter) => before ::: atAndAfter.drop(1) }
Following is expanded answer,
val list = List(1, 2, 1, 3, 2)
//split AT first occurance
val elementToRemove = 1
val (beforeFirstOccurance, atAndAfterFirstOccurance) = list.span(_ != elementToRemove)
beforeFirstOccurance ::: atAndAfterFirstOccurance.drop(1) // shouldBe List(2, 1, 3, 2)
Resource
How to remove an item from a list in Scala having only its index?
How should I remove the first occurrence of an object from a list in Scala?
List is immutable, so you can’t delete elements from it, but you can filter out the elements you don’t want while you assign the result to a new variable:
scala> val originalList = List(5, 1, 4, 3, 2)
originalList: List[Int] = List(5, 1, 4, 3, 2)
scala> val newList = originalList.filter(_ > 2)
newList: List[Int] = List(5, 4, 3)
Rather than continually assigning the result of operations like this to a new variable, you can declare your variable as a var and reassign the result of the operation back to itself:
scala> var x = List(5, 1, 4, 3, 2)
x: List[Int] = List(5, 1, 4, 3, 2)
scala> x = x.filter(_ > 2)
x: List[Int] = List(5, 4, 3)

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

Grouping a list

I want to group elements of a list such as :
val lst = List(1,2,3,4,5)
On transformation it should return a new list as:
val newlst = List(List(1), List(1,2), List(1,2,3), List(1,2,3,4), Lis(1,2,3,4,5))
You can do it this way:
lst.inits.toList.reverse.tail
(1 to lst.size map lst.take).toList should do it.
Not as pretty or short as others, but gotta have some tail recursion for the soul:
def createFromElements(list: List[Int]): List[List[Int]] = {
#tailrec
def createFromElements(l: List[Int], p: List[List[Int]]): List[List[Int]] =
l match {
case x :: xs =>
createFromElements(xs, (p.headOption.getOrElse(List()) ++ List(x)) :: p)
case Nil => p.reverse
}
createFromElements(list, Nil)
}
And now:
scala> createFromElements(List(1,2,3,4,5))
res10: List[List[Int]] = List(List(1), List(1, 2), List(1, 2, 3), List(1, 2, 3, 4), List(1, 2, 3, 4, 5))
Doing a foldLeft seems to be more efficient, though ugly:
(lst.foldLeft((List[List[Int]](), List[Int]()))((x,y) => {
val z = x._2 :+ y;
(x._1 :+ z, z)
}))._1

Scala: grouped from right?

In Scala, grouped works from left to right.
val list = List(1,2,3,4,5)
list.grouped(2).toList
=> List[List[Int]] = List(List(1, 2), List(3, 4), List(5))
But what if I want:
=> List[List[Int]] = List(List(1), List(2, 3), List(4, 5))
?
Well I know this works:
list.reverse.grouped(2).map(_.reverse).toList.reverse
It seems not efficient, however.
Then you could implement it by yourself:
def rgrouped[T](xs: List[T], n: Int) = {
val diff = xs.length % n
if (diff == 0) xs.grouped(n).toList else {
val (head, toGroup) = xs.splitAt(diff)
List(head, toGroup.grouped(n).toList.head)
}
}
Quite ugly, but should work.
Here is my attempt:
def rightGrouped[T](ls:List[T], s:Int) = {
val a = ls.length%s match {
case 0 => ls.grouped(s)
case x => List(ls.take(x)) ++ ls.takeRight(ls.length-x).grouped(s)
}
a.toList
}
Usage:
scala> rightGrouped(List(1,2,3,4,5),3)
res6: List[List[Int]] = List(List(1, 2), List(3, 4, 5))
I initially tried without pattern matching, but it was wrong when the list was "even"
val ls = List(1,2,3,4,5,6)
val s = 3
val x = ls.length % s
List(ls.take(x)) ++ ls.takeRight(ls.length-x).grouped(s)
produced:
List(List(), List(1, 2, 3), List(4, 5, 6))
val l =List(list.head)::(list.tail grouped(2) toList)
EDIT:
After #gzm0 pointed out my mistake I have fixed the solution, though it works only for n=2
def group2[T](list: List[T]) ={
(list.size % 2 == 0) match {
case true => list.grouped(2).toList
case false => List(list.head) :: (list.tail grouped(2) toList)
}
}
println(group2(List()))
println(group2(List(1,2,3,4,5)))
println(group2(List(1,2,3,4,5,6)))
List()
List(List(1), List(2, 3), List(4, 5))
List(List(1, 2), List(3, 4), List(5, 6))
Staying consistent with idiomatic use of the Scala Collections Library such that it also works on things like String, here's an implementation.
def groupedRight[T](seqT: Seq[T], width: Int): Iterator[Seq[T]] =
if (width > 0) {
val remainder = seqT.length % width
if (remainder == 0)
seqT.grouped(width)
else
(seqT.take(remainder) :: seqT.drop(remainder).grouped(width).toList).iterator
}
else
throw new IllegalArgumentException(s"width [$width] must be greater than 0")
val x = groupedRight(List(1,2,3,4,5,6,7), 3).toList
// => val x: List[Seq[Int]] = List(List(1), List(2, 3, 4), List(5, 6, 7))
val sx = groupedRight("12345", 3).toList
// => val sx: List[Seq[Char]] = List(12, 345)
val sx = groupedRight("12345", 3).toList.map(_.mkString)
// => val sx: List[String] = List(12, 345)