I am new to Spark and am trying to code in scala. I have an RDD which consists of data in the form :
1: 2 3 5
2: 5 6 7
3: 1 8 9
4: 1 2 4
and another list in the form [1,4,8,9]
I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list.
I have written the following code:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
for(x<-l(0).split(" ")){
root.contains(x.toInt)
}
})
linksFile is the RDD and root is the list.
But this doesn't work. any suggestions??
You're close: the for-loop just doesn't actually use the value computed inside it. You should use the exists method instead. Also I think you want l(1), not l(0) for the second check:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
l(1).split(" ").exists { x =>
root.contains(x.toInt)
}
})
For-comprehension without a yield doesn't ... well ... yield :)
But you don't really need for-comprehension (or any "loop" for that matter) here.
Something like this:
linksFile.map(
_.split(": ").map(_.toInt)
).filter(_.exits(list.toSet))
.map(_.mkString)
should do it.
Related
I'm new in spark and scala and I would like to select several columns from a dataset.
I transformed my data in RDD a file using :
val dataset = sc.textFile(args(0))
Then I split my line
val resu = dataset.map(line => line.split("\001"))
But I in my dataset I have a lot of features and I just want to keep some of then (colums 2 and 3)
I tried this (which works with Pyspark) but It does'nt work.
val resu = dataset.map(line => line.split("\001")[2,3])
I know this is a newbie question but is there someone who can help me ? thanks.
I just want to keep some of then (colums 2 and 3)
If you want columns 2 and 3 in tuple form you can do
val resu = dataset.map(line => {
val array = line.split("\001")
(array(2), array(3))
})
But if you want column 2 and 3 in array form then you can do
val resu = dataset.map(line => {
val array = line.split("\001")
Array(array(2), array(3))
})
In Scala, in order to access to specific list elements you have to use parentheses.
In your case, you want a sublist, so you can try the slice(i, j) function. It extracts the elements from the index i to the j-1. So in your case, you may use:
val resu = dataset.map(line => line.split("\001").slice(2,4))
Hope it helps.
I evaluated through the spark-shell the following lines of scala codes:
val a = sc.parallelize(Array(1,2,3,4,5,6,7,8,9,10))
val b = a.coalesce(1)
b.foreachPartition { p =>
p.map(_ + 1).foreach(println)
p.map(_ * 2).foreach(println)
}
The output is the following:
2
3
4
5
6
7
8
9
10
11
Why the partition p becomes empty after the first map?
It does not look strange to me since p is Iterator, when you walk through it with map, it has no more values, and taking into account that length is shortcut for size which is implemented like this:
def size: Int = {
var result = 0
for (x <- self) result += 1
result
}
you get 0.
The answer is in the scala doc http://www.scala-lang.org/api/2.11.8/#scala.collection.Iterator. It explicitely states that an iterator (p is an iterator) must be discarded after calling on it the map method.
I am new to Scala and I am having troubles constructing a Map from inputs.
Here is my problem :
I am getting an input for elevators information. It consists of n lines, each one has the elevatorFloor number and the elevatorPosition on the floor.
Example:
0 5
1 3
4 5
So here I have 3 elevators, first one is on floor 0 at position 5, second one at floor 1 position 3 etc..
Is there a way in Scala to put it in a Map without using var ?
What I get so far is a Vector of all the elevators' information :
val elevators = {
for{i <- 0 until n
j <- readLine split " "
} yield j.toInt
}
I would like to be able split the lines in two variables "elevatorFloor" and "elevatorPos" and group them in a data structure (my guess is Map would be the appropriate choice) I would like to get something looking like:
elevators: SomeDataStructure[Int,Int] = ( 0->5, 1 -> 3, 4 -> 5)
I would like to clarify that I know I could write Javaish code, initialise a Map and then add the values to it, but I am trying to keep as close to functionnal programming as possible.
Thanks for the help or comments
You can do:
val res: Map[Int, Int] =
Source.fromFile("myfile.txt")
.getLines
.map { line =>
Array(floor, position) = line.split(' ')
(floor.toInt -> position.toInt)
}.toMap
I have a input file something looks like:
1: 3 5 7
3: 6 9
2: 5
......
I hope to get two list
the first list is made up of numbers before ":", the second list is made up of numbers after ":".
the two lists in the above example are:
1 3 2
3 5 7 6 9 5
I just write code as following:
val rdd = sc.textFile("input.txt");
val s = rdd.map(_.split(":"));
But do not know how to implement following things. Thanks.
I would use flatmaps!
So,
val rdd = sc.textFile("input.txt")
val s = rdd.map(_.split(": ")) # I recommend adding a space after the colon
val before_colon = s.map(x => x(0))
val after_colon = s.flatMap(x => x(1).split(" "))
Now you have two RDDs, one with the items from before the colon, and one with the items after the colon!
If it is possible for your the part of the text before the colon to have multiple numbers, such as an example like 1 2 3: 4 5 6, I would write val before_colon = s.flatMap(x => x(0).split(" "))
I'm a scala beginner and trying to understand how val works in Scala. I read that vals cannot be modified. When I do the following:
for( line <- Source.fromFile(args(0)).getLines() ) {
val currentLine = line
println(currentLine)
}
currentLine is updated in each iteration, while I expect it to be initialized with the first line and hold it till the end, or at least give a re-initialization error of some sort. Why is this so? Is the val created and destroyed in each iteration? My second question: I would like to use x outside if in the following code.
if( some condition is satisfied) val x = 2 else val x = 3
As of now, I'm getting an "Illegal start of simple expression" error. Is there a way to use x outside if?
Yes, the val is created and destroyed on each iteration.
val x = if(condition) 2 else 3 would do what you want.
Edit: You could rewrite 2. to if(conditon) {val x = 2} else {val x = 3} (to make it compile) but that would do nothing, since the if does not return anything and the variable can not be used outside the if
For Loop
You can break it down into a map operation.
for( line <- Source.fromFile(args(0)).getLines() ) {
val currentLine = line
println(currentLine)
}
So this code transforms to
Source.fromFile(args(0)).getLines().map( line => block )
block could be any expression. Where in your case block is:
{
val currentLine = line
println(currentLine)
}
Here currentLine is local to block and is created for each of the values of line given to map operation.
If-Else
Again following is also wrong:
if( some condition is satisfied) val x = 2 else val x = 3
Essentially if-else in Scala returns a value. So it should be:
if( condition ) expression1 else expression1
In your case you it can be:
if( condition ) { val x = 2 } else { val x = 3 }
However an assignment returns Unit ( or void if you want an analogy with Java / C++ ). So You can simply take the value of if-else like so:
val x = if( condition ) { 2 } else { 3 }
// OR
val x = if( condition ) 2 else 3
No answer mentioned it so in addition to what was said :
The val is made available for garbage collection on each iteration (and thus is not accessible from the next loop iteration). This is due to what is called scope of variables which is limited to the block in scala (same as Java).
As stated by #Kigyo val x = if(condition) 2 else 3 would do what you want, because you do only one assignation. If you put the assignation to val into the blocks, then the scope of this val is the block and thus not usable like you want to.
1st question: yes, in every iteration a new val is created
2nd question: you could rewrite it is
val x = if (some condition is satisfied)
2
else
3