Scala comprehension from input - scala

I am new to Scala and I am having troubles constructing a Map from inputs.
Here is my problem :
I am getting an input for elevators information. It consists of n lines, each one has the elevatorFloor number and the elevatorPosition on the floor.
Example:
0 5
1 3
4 5
So here I have 3 elevators, first one is on floor 0 at position 5, second one at floor 1 position 3 etc..
Is there a way in Scala to put it in a Map without using var ?
What I get so far is a Vector of all the elevators' information :
val elevators = {
for{i <- 0 until n
j <- readLine split " "
} yield j.toInt
}
I would like to be able split the lines in two variables "elevatorFloor" and "elevatorPos" and group them in a data structure (my guess is Map would be the appropriate choice) I would like to get something looking like:
elevators: SomeDataStructure[Int,Int] = ( 0->5, 1 -> 3, 4 -> 5)
I would like to clarify that I know I could write Javaish code, initialise a Map and then add the values to it, but I am trying to keep as close to functionnal programming as possible.
Thanks for the help or comments

You can do:
val res: Map[Int, Int] =
Source.fromFile("myfile.txt")
.getLines
.map { line =>
Array(floor, position) = line.split(' ')
(floor.toInt -> position.toInt)
}.toMap

Related

loop inside spark RDD filter

I am new to Spark and am trying to code in scala. I have an RDD which consists of data in the form :
1: 2 3 5
2: 5 6 7
3: 1 8 9
4: 1 2 4
and another list in the form [1,4,8,9]
I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list.
I have written the following code:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
for(x<-l(0).split(" ")){
root.contains(x.toInt)
}
})
linksFile is the RDD and root is the list.
But this doesn't work. any suggestions??
You're close: the for-loop just doesn't actually use the value computed inside it. You should use the exists method instead. Also I think you want l(1), not l(0) for the second check:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
l(1).split(" ").exists { x =>
root.contains(x.toInt)
}
})
For-comprehension without a yield doesn't ... well ... yield :)
But you don't really need for-comprehension (or any "loop" for that matter) here.
Something like this:
linksFile.map(
_.split(": ").map(_.toInt)
).filter(_.exits(list.toSet))
.map(_.mkString)
should do it.

Scala for loop multiple counters

I'm new to Scala, and I'm trying to convert this for loop from Java:
for(int x=1, y=2; x<=5; x++, y+=2)
System.out.println(x+y);
I'm trying to zip the values in Scala since I can't find a way to have multiple counters which are non-nested:
val a = Seq(1 to 5)
val b = Seq(2 to 10 by 2)
for((x,y) <- a.zip(b))
println(x+y)
But the above code is giving this error:
type mismatch; found: scala.collection.immutable.Range required: String
Does anyone know how to fix this? I would prefer to do with for loop only, not while loop.
Try this, no need to wrap the Range in a Seq:
val a = 1 to 5
val b = 2 to 10 by 2
for(
(x,y) <- a.zip(b)
)
println(x+y)
You might try . . .
((1 to 5) zip (2 to 10 by 2)).foreach(x => println(x._1+x._2))
Because Scala for comprehensions are sufficiently different from for() loops in other languages, it's often a good idea for beginners to avoid them until they've gained a sufficient knowledge of map, flatMap, and foreach.
In your example you want x to range from 1 to 5 and y is always 2*x. Using for loops is easy for those coming from Java:
for(x <- 1 to 5; y = x*2) {
println(s"x = $x, y = $y, x+y = ${x+y}")
}
Here is solution to a more generic problem - iterating over elements in a collection using multiple counters (=indices or pointers), like if you want to compare each 2 pairs:
val c = List("a", "b", "c", "d") //or any collection
val end = c.length - 1
for(i <- 0 to end-1; j <- i+1 to end)
//compare or operate with each pair
println(c(i)+c(j))
... prints:
ab
ac
ad
bc
bd
cd

Unexpected behavior inside the foreachPartition method of a RDD

I evaluated through the spark-shell the following lines of scala codes:
val a = sc.parallelize(Array(1,2,3,4,5,6,7,8,9,10))
val b = a.coalesce(1)
b.foreachPartition { p =>
p.map(_ + 1).foreach(println)
p.map(_ * 2).foreach(println)
}
The output is the following:
2
3
4
5
6
7
8
9
10
11
Why the partition p becomes empty after the first map?
It does not look strange to me since p is Iterator, when you walk through it with map, it has no more values, and taking into account that length is shortcut for size which is implemented like this:
def size: Int = {
var result = 0
for (x <- self) result += 1
result
}
you get 0.
The answer is in the scala doc http://www.scala-lang.org/api/2.11.8/#scala.collection.Iterator. It explicitely states that an iterator (p is an iterator) must be discarded after calling on it the map method.

transforming from native matrix format, scalding

So this question is related to question Transforming matrix format, scalding
But now, I want to make the back operation. So i can make it in a such way:
Tsv(in, ('row, 'col, 'v))
.read
.groupBy('row) { _.sortBy('col).mkString('v, "\t") }
.mapTo(('row, 'v) -> ('c)) { res : (Long, String) =>
val (row, v) = res
v }
.write(Tsv(out))
But, there, we got problem with zeros. As we know, scalding skips zero values fields. So for example we got matrix:
1 0 8
4 5 6
0 8 9
In scalding format is is:
1 1 1
1 3 8
2 1 4
2 2 5
2 3 6
3 2 8
3 3 9
Using my function I wrote above we can only get:
1 8
4 5 6
8 9
And that's incorrect. So, how can i deal with it? I see two possible variants:
To find way, to add zeros (actually, dunno how to insert data)
To write own operations on own matrix format (it is unpreferable, cause I'm interested in Scalding matrix operations, and dont want to write all of them my own)
Mb there r some methods, and I can avoid skipping zeros in matrix?
Scalding stores a sparse representation of the data. If you want to output a dense matrix (first of all, that won't scale, because the rows will be bigger than can fit in memory at some point), you will need to enumerate all the rows and columns:
// First, I highly suggest you use the TypedPipe api, as it is easier to get
// big jobs right generally
val mat = // has your matrix in 'row1, 'col1, 'val1
def zero: V = // the zero of your value type
val rows = IterableSource(0 to 1000, 'row)
val cols = IterableSource(0 to 2000, 'col)
rows.crossWithTiny(cols)
.leftJoinWithSmaller(('row, 'col) -> ('row1, 'col1), mat)
.map('val1 -> 'val1) { v: V =>
if(v == null) // this value should be 0 in your type:
zero
else
v
}
.groupBy('row) {
_.toList[(Int, V)](('col, 'val1) -> 'cols)
}
.map('cols -> 'cols) { cols: List[(Int, V)] =>
cols.sortBy(_._1).map(_._2).mkString("\t")
}
.write(TypedTsv[(Int, String)]("output"))

Scala split argument over several lines and parse to Int

This is probably going to end up being very simple, but I ask more to help me learn better Scala idioms (Python guy by trade looking to learn some scala tricks.)
I'm doing some hacker rank problems and the method of input requires is a read over lines from stdin. The spec is quoted below:
The first line contains the number of test cases T. T test cases
follow. Each case contains two integers N and M.
So in the input passed to the script looks something like this:
4
2 2
3 2
2 3
4 4
I'm wondering what would be the proper, idiomatic way to do this. I've thought of a few:
Use io.Source.stdin.readLines.zipWithIndex, then from within a foreach, if the index is greater than 0, split on whitespace and map to (_.toInt)
Use the same readLines function to get the input and then pattern match against the index.
Split on whitespace and newlines to make a single list of digits, map toInt, pop the first element (problem size) and then modulo 2 to make tuples of arguments for my problem function.
I'm wondering what more experienced scala programmers would consider the best way to parse these args, where the 2 element lines would be args to a function and the first, single digit line is just the number of problems to solve.
Maybe you're looking for something like this?
def f(x: Int, y: Int) = { f"do something with $x and $y" }
io.Source.stdin.readLines
.map(_.trim.split("\\s+").map(_.toInt)) // split and convert to ints
.collect { case Array(a, b) => f(a, b) } // pass to f if there are two arguments
.foreach(println) // print the result of each function call
Another way to read the input for Hacker Rank problems is with scala.io.Stdin
import scala.io.StdIn
import scala.collection.mutable.ArrayBuffer
object Solution {
def main(args: Array[String]) = {
val q = StdIn.readInt
var lines = ArrayBuffer[Array[Int]]()
(1 to q).foreach(_ => lines += StdIn.readLine.split(" ").map(_.toInt))
for (a <- lines){
val n = a(0)
val m = a(1)
val ans = n * m
println(ans)
}
}
}
I have tested it on Hacker Rank platform today and the output is:
4
6
6
16