how to use split in spark scala?

how to use split in spark scala? - scala

I have a input file something looks like:
1: 3 5 7
3: 6 9
2: 5
......
I hope to get two list
the first list is made up of numbers before ":", the second list is made up of numbers after ":".
the two lists in the above example are:
1 3 2
3 5 7 6 9 5
I just write code as following:
val rdd = sc.textFile("input.txt");
val s = rdd.map(_.split(":"));
But do not know how to implement following things. Thanks.

I would use flatmaps!
So,
val rdd = sc.textFile("input.txt")
val s = rdd.map(_.split(": ")) # I recommend adding a space after the colon
val before_colon = s.map(x => x(0))
val after_colon = s.flatMap(x => x(1).split(" "))
Now you have two RDDs, one with the items from before the colon, and one with the items after the colon!
If it is possible for your the part of the text before the colon to have multiple numbers, such as an example like 1 2 3: 4 5 6, I would write val before_colon = s.flatMap(x => x(0).split(" "))

Related

Read file and Iterate over a list of strings using Scala for loop

How can you build a Seq of Strings read from a file using a for loop in scala?
So I have a file bla.txt:
1 2 3
4 5 6
7 8 9
val lines = Source.fromResource("bla.txt").getLines
val result = (for {
line <- lines
l = line.split(" ")
} yield {
new String(l(0) + l(1) + l(2))
}).toSeq
Printing result gives me: events: Stream(123, ?) - Why does it not process the second and third lines of the file?

loop inside spark RDD filter

I am new to Spark and am trying to code in scala. I have an RDD which consists of data in the form :
1: 2 3 5
2: 5 6 7
3: 1 8 9
4: 1 2 4
and another list in the form [1,4,8,9]
I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list.
I have written the following code:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
for(x<-l(0).split(" ")){
root.contains(x.toInt)
}
})
linksFile is the RDD and root is the list.
But this doesn't work. any suggestions??

You're close: the for-loop just doesn't actually use the value computed inside it. You should use the exists method instead. Also I think you want l(1), not l(0) for the second check:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
l(1).split(" ").exists { x =>
root.contains(x.toInt)
}
})

For-comprehension without a yield doesn't ... well ... yield :)
But you don't really need for-comprehension (or any "loop" for that matter) here.
Something like this:
linksFile.map(
_.split(": ").map(_.toInt)
).filter(_.exits(list.toSet))
.map(_.mkString)
should do it.

Element wise addition in spark rdd

I have an RDD of RDD[Array[Array[Double]]]. So essentially each element is a matrix. I need to do element wise addition.
so if the first element of the rdd is
1 2
3 4
and second element is
5 6
7 8
at the end i need to have
6 8
10 12
I have looked into zip but not sure if I can use it for this case.

Yes, you can use zip, but you'll have to use it twice, once for the rows and once for the columns:
val rdd = sc.parallelize(List(Array(Array(1.0,2.0),Array(3.0,4.0)),
Array(Array(5.0,6.0),Array(7.0,8.0))))
rdd.reduce((a,b) => a.zip(b).map {case (c,d) => c.zip(d).map{ case (x,y) => x+y}})

Scala comprehension from input

I am new to Scala and I am having troubles constructing a Map from inputs.
Here is my problem :
I am getting an input for elevators information. It consists of n lines, each one has the elevatorFloor number and the elevatorPosition on the floor.
Example:
0 5
1 3
4 5
So here I have 3 elevators, first one is on floor 0 at position 5, second one at floor 1 position 3 etc..
Is there a way in Scala to put it in a Map without using var ?
What I get so far is a Vector of all the elevators' information :
val elevators = {
for{i <- 0 until n
j <- readLine split " "
} yield j.toInt
}
I would like to be able split the lines in two variables "elevatorFloor" and "elevatorPos" and group them in a data structure (my guess is Map would be the appropriate choice) I would like to get something looking like:
elevators: SomeDataStructure[Int,Int] = ( 0->5, 1 -> 3, 4 -> 5)
I would like to clarify that I know I could write Javaish code, initialise a Map and then add the values to it, but I am trying to keep as close to functionnal programming as possible.
Thanks for the help or comments

You can do:
val res: Map[Int, Int] =
Source.fromFile("myfile.txt")
.getLines
.map { line =>
Array(floor, position) = line.split(' ')
(floor.toInt -> position.toInt)
}.toMap

How to create key-value pairs from textFile

I want to use a member in a RDD element as a key, how can i do this
this is my data:
2 1
4 1
1 2
6 3
7 3
7 6
6 7
3 7
I want to create key/value pairs such that key is a element, also value is next element;
I wrote this code:
def main(args: Array[String])
{
System.setProperty("hadoop.home.dir","C:\\spark-1.5.1-bin-hadoop2.6\\winutil")
val conf = new SparkConf().setAppName("test").setMaster("local[4]")
val sc = new SparkContext(conf)
val lines = sc.textFile("followers.txt")
.flatMap{x => (x.indexOfSlice(x1),x.indexOfSlice(x2))}
}
but it is not true and it wont determine the index of elements;
every two number is a line

Maybe I'm misunderstanding your question, but if you are simply looking to split your data into key-value pairs, you just need to do this:
val lines = sc.textFile("followers.txt").map(s => {
val substrings = s.split(" ")
(substrings(0), substrings(1))
})
Does this solve your problem?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to use split in spark scala? - scala

Related

Read file and Iterate over a list of strings using Scala for loop

loop inside spark RDD filter

Element wise addition in spark rdd

Scala comprehension from input

How to create key-value pairs from textFile

Categories

Resources