Using Stream as a substitute for loop - scala

scala> val s = for(i <- (1 to 10).toStream) yield { println(i); i }
1
s: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> s.take(8).toList
2
3
4
5
6
7
8
res15: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
scala> s.take(8).toList
res16: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
Looks like Scala is storing stream values in memory to optimize access. Which means that Stream can't be used as a substitute for imperative loop as it will allocate memory. Things like (for(i <- (1 to 1000000).toStream, j <- (1 to 1000000).toStream) yield ...).reduceLeft(...) will fail because of memory. Was it wrong idea to use streams this way?

You should use Iterator instead of Stream if you want a loop-like collection construct. Stream is specifically for when you want to store past results. If you don't, then, for instance:
scala> Iterator.from(1).map(x => x*x).take(10).sum
res23: Int = 385
Methods continually, iterate, and tabulate are among the more useful ones in this regard (in the Iterator companion object).

Related

How to remove duplicates from list without using in inbuilt libraries such as distinct, groupBy(identity), toSet.. Etc

I wanted to write a Scala program that takes command-line args as list input and provide the output list without duplicates.
I want to know the custom implementation of this without using any libraries.
Input : 4 3 7 2 8 4 2 7 3
Output :4 3 7 2 8
val x= List(4, 3, 7, 2, 8, 4, 2, 7, 3)
x.foldLeft(List[Int]())((l,v)=> if (l.contains(v)) l else v :: l)
if you can't use contains you can do another fold
x.foldLeft(List[Int]())((l,v)=> if (l.foldLeft(false)((contains,c)=>if (c==v ) contains | true else contains | false)) l else v :: l)
Here's a way you could do this using recursion. I've tried to lay it out in a way that's easiest to explain:
import scala.annotation.tailrec
#tailrec
def getIndividuals(in: List[Int], out: List[Int] = List.empty): List[Int] = {
if(in.isEmpty) out
else if(!out.contains(in.head)) getIndividuals(in.tail, out :+ in.head)
else getIndividuals(in.tail, out)
}
val list = List(1, 2, 3, 4, 5, 4, 3, 5, 6, 0, 7)
val list2 = List(1)
val list3 = List()
val list4 = List(3, 3, 3, 3)
getIndividuals(list) // List(1, 2, 3, 4, 5, 6, 0, 7)
getIndividuals(list2) // List(1)
getIndividuals(list3) // List()
getIndividuals(list4) // List(3)
This function takes two parameters, in and out, and iterates through every element in the in List until it's empty (by calling itself with the tail of in). Once in is empty, the function outputs the out List.
If the out List doesn't contain the value of in you are currently looking at, the function calls itself with the tail of in and with that value of in added on to the end of the out List.
If out does contain the value of in you are currently looking at, it just calls itself with the tail of in and the current out List.
Note: This is an alternative to the fold method that Arnon proposed. I personally would write a function like mine and then maybe refactor it into a fold function if necessary. I don't naturally think in a functional, fold-y way so laying it out like this helps me picture what's going on as I'm trying to work out the logic.

Why copying a Scala array using keyword yield gives me a vector

I tried to copy a Scala array using the yield keyword, but I got an vector in the end. Why and how can I get an copied array using yield?
scala> val s=Array(1,2,3,4,5); val copied_s=for (i<-0 until s.size) yield s(i)
The console returns
s: Array[Int] = Array(1, 2, 3, 4, 5)
copied_s: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, 4, 5)
Use clone instead:
val c = s.clone
0 until ... creates a Range, and this is the source, from which the Vector is considered a good fit, not the Array.
scala> 0 until 4
res4: scala.collection.immutable.Range = Range(0, 1, 2, 3)
A big (...) helps too:
(for (i<-0 until s.size) yield s(i)).toArray
but clone is much smaller.

RDD intersections

I have a query about intersection between two RDDs.
My first RDD has a list of elements like this:
A = List(1,2,3,4), List(4,5,6), List(8,3,1),List(1,6,8,9,2)
And the second RDD is like this:
B = (1,2,3,4,5,6,8,9)
(I could store B in memory as a Set but not the first one.)
I would like to do an intersection of each element in A with B
List(1,2,3,4).intersect((1,2,3,4,5,6,8,9))
List(4,5,6).intersect((1,2,3,4,5,6,8,9))
List(8,3,1).intersect((1,2,3,4,5,6,8,9))
List(1,6,8,9,2).intersect((1,2,3,4,5,6,8,9))
How can I do this in Scala?
val result = rdd.map( x => x.intersect(B))
Both B and rdd have to have the same type (in this case List[Int]). Also, note that if B is big but fits in memory, you would want to probably broadcast it as documented here.
scala> val B = List(1,2,3,4,5,6,8,9)
B: List[Int] = List(1, 2, 3, 4, 5, 6, 8, 9)
scala> val rdd = sc.parallelize(Seq(List(1,2,3,4), List(4,5,6), List(8,3,1),List(1,6,8,9,2)))
rdd: org.apache.spark.rdd.RDD[List[Int]] = ParallelCollectionRDD[0] at parallelize at <console>:21
scala> rdd.map( x => x.intersect(B)).collect.mkString("\n")
res3: String =
List(1, 2, 3, 4)
List(4, 5, 6)
List(8, 3, 1)
List(1, 6, 8, 9, 2)

Complement method for .last when working with List objects?

Working with Lists in Scala I would like a simple way to get all elements but the last element. Is there a complementary method for .last similar to .head/.tail complement?
I'd rather not dirty up code with something like:
val x: List[String] = List("abc", "def", "ghi")
val allButLast: List[String] = x.reverse.tail.reverse
// List(abc, def)
Thanks.
init selects all elements but the last one.
List API for init.
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.init
res1: List[Int] = List(1, 2, 3, 4)
The 4 related methods here are head, tail, init, and last.
head and last get the first and final member, whereas
tail and init exclude the first and final members.
scala> val list = (0 to 10).toList
list: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> list.head
res0: Int = 0
scala> list.tail
res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> list.init
res2: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
scala> list.last
res3: Int = 10
You should also take care, because all 4 of them are unsafe on the empty list and will throw exceptions.
These methods are defined on GenTraversableLike, which List implements.
That's init.
link to Scaladoc: http://www.scala-lang.org/api/2.11.5/index.html#scala.collection.immutable.List#init:Repr
def init: List[A]
Selects all elements except the last.
Also, note that it's defined in GenTraversableLike, so pretty much any Scala collection has this method.
For dropping off any number of items from the end of a list consider dropRight,
val xs = (1 to 5).toList
xs.dropRight(1)
List(1, 2, 3, 4)
xs.dropRight(2)
List(1, 2, 3)
xs.dropRight(10)
List()

Scala Stream confusion

Running:
lazy val s: Stream[Int] = 1 #:: 2 #:: {val x = s.tail.map(_+1); println("> " + x.head); x}
s.take(5).toList
I'd expect:
> List(2, 3)
> List(2, 3, 4)
List(1, 2, 3, 4, 5)
And I get:
> 3
List(1, 2, 3, 4, 5)
Could you explain it to me?
The reason that you're getting an Int instead of a List is that s is a stream of integers, so it contains integers, not lists.
The reason why you get 3 is that the tail of (1,2,3,4,5,...) (i.e. s) is (2,3,4,5,...) and if you map +1 over that, you will get (3,4,5,6,7,...) and the head of that is 3.
The reason why only one integer is printed is that the expression is only evaluated once to get the stream for the tail. After that only the stream returned by s.tail.map(_+1) is evaluated (which doesn't contain any print statements).