What does to function do in:
rdd.flatMap(x => x.to(3))
?
rdd is composed of {1, 2, 3, 3} and the above function returns {1, 2, 3, 2, 3, 3, 3}
I have been googling "scala number to function" and its variants, but can't seem to find what it does.
To function
To create a Range in Scala, use the predefined methods to and by.
Example:
1 to 3 will give Range(1, 2, 3)
What does to function do in RDD
a) Looking into map function:
sc.range(1L, 6L).map(x => x to 3).collect.foreach(println)
This prints
NumericRange(1, 2, 3)
NumericRange(2, 3)
NumericRange(3)
NumericRange() // 4 to 3 returns an empty Range
NumericRange() // 5 to 3 returns an empty Range
b) Looking into flatMap function:
sc.range(1L, 6L).flatMap(x => x to 3).collect.foreach(println)
This prints
1
2
3
2
3
3
It combines mapping and flattening. flatMap takes a function that works on the nested lists and then concatenates the results back together.
An important thing to understand about flatMap is that anything that looks like an empty array will disappear. so NumericRange() doesn't appears in flatMap result .
rdd.flatMap(x => x.to(3)) works as follows:
1. rdd.map(x => x.to(3))
a) fetch the first element of {1, 2, 3, 3}, that is 1
b) apply to x => x.to(3), that is 1.to(3),
that is also explained as 1 to 3, it will generate the range {1, 2, 3}
c) fetch the second element of {1, 2, 3, 3}, that is 2
d) apply to x => x.to(3), that is 2.to(3), that is also explained as 2 to 3,
it will generate the range {2, 3}
e) repeat above, 3 to 3 will get {3}, the final 3 will get {3}
f) so finally, you get {{1, 2, 3}, {2, 3}, {3}, {3}}
2. flatMap is combination of map and flattern
so flatmap will make {{1, 2, 3}, {2, 3}, {3}, {3}} become {1, 2, 3, 2, 3, 3, 3}
So, x.to(y) just to generate a range, [x, y], you can use repl to verify it.
C:\Windows\System32>scala
Welcome to Scala version 2.10.6 (Java HotSpot(TM) Client VM, Java 1.7.0_55).
Type in expressions to have them evaluated.
Type :help for more information.
scala> 2 to 5
res0: scala.collection.immutable.Range.Inclusive = Range(2, 3, 4, 5)
Related
I wanted to write a Scala program that takes command-line args as list input and provide the output list without duplicates.
I want to know the custom implementation of this without using any libraries.
Input : 4 3 7 2 8 4 2 7 3
Output :4 3 7 2 8
val x= List(4, 3, 7, 2, 8, 4, 2, 7, 3)
x.foldLeft(List[Int]())((l,v)=> if (l.contains(v)) l else v :: l)
if you can't use contains you can do another fold
x.foldLeft(List[Int]())((l,v)=> if (l.foldLeft(false)((contains,c)=>if (c==v ) contains | true else contains | false)) l else v :: l)
Here's a way you could do this using recursion. I've tried to lay it out in a way that's easiest to explain:
import scala.annotation.tailrec
#tailrec
def getIndividuals(in: List[Int], out: List[Int] = List.empty): List[Int] = {
if(in.isEmpty) out
else if(!out.contains(in.head)) getIndividuals(in.tail, out :+ in.head)
else getIndividuals(in.tail, out)
}
val list = List(1, 2, 3, 4, 5, 4, 3, 5, 6, 0, 7)
val list2 = List(1)
val list3 = List()
val list4 = List(3, 3, 3, 3)
getIndividuals(list) // List(1, 2, 3, 4, 5, 6, 0, 7)
getIndividuals(list2) // List(1)
getIndividuals(list3) // List()
getIndividuals(list4) // List(3)
This function takes two parameters, in and out, and iterates through every element in the in List until it's empty (by calling itself with the tail of in). Once in is empty, the function outputs the out List.
If the out List doesn't contain the value of in you are currently looking at, the function calls itself with the tail of in and with that value of in added on to the end of the out List.
If out does contain the value of in you are currently looking at, it just calls itself with the tail of in and the current out List.
Note: This is an alternative to the fold method that Arnon proposed. I personally would write a function like mine and then maybe refactor it into a fold function if necessary. I don't naturally think in a functional, fold-y way so laying it out like this helps me picture what's going on as I'm trying to work out the logic.
For example I have following Scala list, I want get a sublist until there is a requirement can be met.
val list = Seq(1,2,3,4,5,5,4,1,2,5)
The requirement is the number is 5, so I want the result as:
Seq(1,2,3,4)
Currently I use Scala collection's indexWhere and splitAt to return:
list.splitAt(list.indexWhere(x => x == 5))
(Seq(1,2,3,4), Seq(5,5,4,1,2,5))
I am not sure there are more better ways to achieve the same with better Scala collection's method I didn't realise?
You can use takeWhile:
scala> val list = Seq(1,2,3,4,5,5,4,1,2,5)
list: Seq[Int] = List(1, 2, 3, 4, 5, 5, 4, 1, 2, 5)
scala> list.takeWhile(_ != 5)
res30: Seq[Int] = List(1, 2, 3, 4)
Use span like this,
val (l,r) = list.span(_ != 5)
which delivers
l: List(1, 2, 3, 4)
r: List(5, 5, 4, 1, 2, 5)
Alternatively, you can write
val l = list.span(_ != 5)._1
to access only the first element of the resulting tuple.
This bisects the list at the first element that does not hold the condition.
I have some experience with R language and now I wanted to try Scala language. In R language I can assign one value to many elements of a vector, e.g.
(xs <- 1:10)
#[1] 1 2 3 4 5 6 7 8 9 10
k <- 3
xs[1:k] <- xs[k+1]
xs
# 4 4 4 4 5 6 7 8 9 10
It assigns value of k+1 element to all elements of indices from 1 to k. Is it also possible to do it without a loop in Scala (I mean Array in Scale)? I know there is slice method, but it only returns values of an Array, one cannot modify elements of the Array using this method.
What is even more, should I use Array or ArrayBuffer if I only want to change values of elements and I do not want to add/remove elements from a collection?
Check out the java.util.Arrays.fill methods.
scala> val xs = (1 to 9).toArray
xs: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
scala> val k = 6
k: Int = 6
scala> java.util.Arrays.fill(xs, 0, k, xs(k))
scala> xs
res10: Array[Int] = Array(7, 7, 7, 7, 7, 7, 7, 8, 9)
For your second question, if not resizing the collection but editing the elements, stick with array. ArrayBuffer is much like the Java ArrayList, it resizes it self when it needs to, so insertion is amortized constant, not just constant.
For your first question, I'm not aware of any method in the collections library that would allow you to do this. It's obviously syntactic sugar for looping, so if you really care (do you really find yourself needing to do this often?), you can define an implicit class and yourself define a method which loops, and then use that. Write a comment if you would like to see example of such code, otherwise try doing it yourself, it's gonna be good training.
Scala has the Range class. You can convert the Range to an Array if you wish.
scala> val n = 10
n: Int = 10
scala> Range(1,n)
res22: scala.collection.immutable.Range = Range(1, 2, 3, 4, 5, 6, 7, 8, 9)
scala> res22.toArray
res23: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
│
ArrrayBuffer has constant time update and would be good for updating values.
This question already has answers here:
How to get a set of all elements that occur multiple times in a list in Scala?
(2 answers)
Closed 8 years ago.
I find a lot about how to remove duplicates, but what is the most elegant way to remove unique items first and then the remaining duplicates.
E.g. a sequence (1, 2, 5, 2, 3, 4, 4, 0, 2) should be converted into (2, 4).
I can think of using a for-loop to add a count to each distinct item, but I could imagine that Scala has a more elegant way to achieve this.
distinct and diff will works for you:
val a = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
> a: List[Int] = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
val b = a diff a.distinct
> b: List[Int] = List(2, 4, 2)
val c = (a diff a.distinct).distinct
> c: List[Int] = List(2, 4)
In place distinct you can use toSet as well.
Also keep in mind that i => i can be replaced by identity and map(_._1) by keys, like this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).filter(_._2.size > 1).keys.toSeq
This is where a countByKey method, such as the one that can be found in Spark's API, would be useful.
Pretty straight forward:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(i => i).filter(_._2.size > 1).map(_._1).toSeq
Using the link from Ende Neu I think your code would become this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).collect { case (v, l) if l.length > 1 => v } toSeq
UserGuide of scalacheck project mentioned sized generators. The explanation code
def matrix[T](g:Gen[T]):Gen[Seq[Seq[T]]] = Gen.sized {size =>
val side = scala.Math.sqrt(size).asInstanceOf[Int] //little change to prevent compile-time exception
Gen.vectorOf(side, Gen.vectorOf(side, g))
}
explained nothing for me. After some exploration I understood that length of generated sequence does not depend on actual size of generator (there is resize method in Gen object that "Creates a resized version of a generator" according to javadoc (maybe that means something different?)).
val g = Gen.choose(1,5)
val g2 = Gen.resize(15, g)
println(matrix(g).sample) // (1)
println(matrix(g2).sample) // (2)
//1,2 produce Seq with same length
Could you explain me what had I missed and give me some examples how you use them in testing code?
The vectorOf (which now is replaced with listOf) generates lists with a size that depends (linearly) on the size parameter that ScalaCheck sets when it evaluates a generator. When ScalaCheck tests a property it will increase this size parameter for each test, resulting in properties that are tested with larger and larger lists (if listOf is used).
If you create a matrix generator by just using the listOf generator in a nested fashion, you will get matrices with a size that depends on the square of the size parameter. Hence when using such a generator in a property you might end up with very large matrices, since ScalaCheck increases the size parameter for each test run. However, if you use the resize generator combinator in the way it is done in the ScalaCheck User Guide, your final matrix size depend linearly on the size parameter, resulting in nicer performance when testing your properties.
You should really not have to use the resize generator combinator very often. If you need to generate lists that are bounded by some specific size, it's much better to do something like the example below instead, since there is no guarantee that the listOf/ containerOf generators really use the size parameter the way you expect.
def genBoundedList(maxSize: Int, g: Gen[T]): Gen[List[T]] = {
Gen.choose(0, maxSize) flatMap { sz => Gen.listOfN(sz, g) }
}
The vectorOf method that you use is deprecated , and you should use the listOf method. This generates a list of random length where the maximum length is limited by the size of the generator. You should therefore resize the generator that
actually generates the actual list if you want control over the maximum elements that are generated:
scala> val g1 = Gen.choose(1,5)
g1: org.scalacheck.Gen[Int] = Gen()
scala> val g2 = Gen.listOf(g1)
g2: org.scalacheck.Gen[List[Int]] = Gen()
scala> g2.sample
res19: Option[List[Int]] = Some(List(4, 4, 4, 4, 2, 4, 2, 3, 5, 1, 1, 1, 4, 4, 1, 1, 4, 5, 5, 4, 3, 3, 4, 1, 3, 2, 2, 4, 3, 4, 3, 3, 4, 3, 2, 3, 1, 1, 3, 2, 5, 1, 5, 5, 1, 5, 5, 5, 5, 3, 2, 3, 1, 4, 3, 1, 4, 2, 1, 3, 4, 4, 1, 4, 1, 1, 4, 2, 1, 2, 4, 4, 2, 1, 5, 3, 5, 3, 4, 2, 1, 4, 3, 2, 1, 1, 1, 4, 3, 2, 2))
scala> val g3 = Gen.resize(10, g2)
g3: java.lang.Object with org.scalacheck.Gen[List[Int]] = Gen()
scala> g3.sample
res0: Option[List[Int]] = Some(List(1))
scala> g3.sample
res1: Option[List[Int]] = Some(List(4, 2))
scala> g3.sample
res2: Option[List[Int]] = Some(List(2, 1, 2, 4, 5, 4, 2, 5, 3))