I have two arrays in Scala both with the same numbers
val v = myGraph.vertices.collect.map(_._1)
which gives:
Array[org.apache.spark.graphx.VertexId] = Array(-7023794695707475841, -44591218498176864, 757355101589630892, 21619280952332745)
and another
val w = myGraph.vertices.collect.map(_._2._2)
which gives:
Array[String] = Array(2, 3, 1, 2)
and i want to create a string using
val z = v.map("{id:" + _ + "," + "group:" + "1" + "}").mkString(",")
which gives:
String = {id:-7023794695707475841,group:1},{id:-44591218498176864,group:1},{id:757355101589630892,group:1},{id:21619280952332745,group:1}
But now instead of the hardcoded group of "1", i want to instead map in the numbers from the w array to give:
String = {id:-7023794695707475841,group:2},{id:-44591218498176864,group:3},{id:757355101589630892,group:1},{id:21619280952332745,group:2}
How do i do this?
There's a method in Scala collections called zip which pairs up two collections just the way you need.
val v = Array(-37581, -44864, 757102, 21625)
val w = Array(2, 3, 1, 2)
val z = v.zip(w).map {
case (v, w) => "{id:" + v + "," + "group:" + w + "}"
}.mkString(",")
Value z becomes:
{id:-37581,group:2},{id:-44864,group:3},{id:757102,group:1},{id:21625,group:2}
I can't understand reduceByKey(_ + _) in the first example of spark with scala
object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _) **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}
Reduce takes two elements and produce a third after applying a function to the two parameters.
The code you shown is equivalent to the the following
reduceByKey((x,y)=> x + y)
Instead of defining dummy variables and write a lambda, Scala is smart enough to figure out that what you trying achieve is applying a func (sum in this case) on any two parameters it receives and hence the syntax
reduceByKey(_ + _)
reduceByKey takes two parameters, apply a function and returns
reduceByKey(_ + _) is equivalent to reduceByKey((x,y)=> x + y)
Example :
val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)
println("The sum of the numbers one through five is " + sum)
Results :
The sum of the numbers one through five is 15
numbers: Array[Int] = Array(1, 2, 3, 4, 5)
sum: Int = 15
Same reduceByKey(_ ++ _) is equivalent to reduceByKey((x,y)=> x ++ y)
I am trying to parallelize recursive calls of sudoku solver from 25 lines Sudoku solver in Scala. I've changed their Fold into reduce
def reduce(f: (Int, Int) => Int, accu: Int, l: Int, u: Int): Int = {
accu + (l until u).toArray.reduce(f(accu, _) + f(accu, _))
}
which if run sequentially works fine, but when I change it into
accu + (l until u).toArray.par.reduce(f(accu, _) + f(accu, _))
the recursion reaches the bottom much more often and generates false solutions. I thought, that it will execute the bottom level recursion and work it's way up, but doesn't seem to do so.
I've also tried futures
def parForFut2(f: (Int, Int) => Int, accu: Int, l: Int, u: Int): Int = {
var sum: Int = accu
val vals = l until u
vals.foreach(t => scala.actors.Futures.future(sum + f(accu, t)))
sum
}
which appears to have the same problem as the par.reduce. I would appreciate any comment. The whole code is here:
object SudokuSolver extends App {
// The board is represented by an array of string
val source = scala.io.Source.fromFile("./puzzle")
val lines = (source.getLines).toArray
var m: Array[Array[Char]] = for (
str <- lines;
line: Array[Char] = str.toArray
) yield line
source.close()
// For printing m
def print = {
Console.println("");
refArrayOps(m) map (carr => Console.println(new String(carr)))
}
// The test for validity of n on position x,y
def invalid(i: Int, x: Int, y: Int, n: Char): Boolean =
i < 9 && (m(y)(i) == n || m(i)(x) == n ||
m(y / 3 * 3 + i / 3)(x / 3 * 3 + i % 3) == n || invalid(i + 1, x, y, n))
// Looping over a half-closed range of consecutive Integers [l..u)
// is factored out Into a higher-order function
def parReduce(f: (Int, Int) => Int, accu: Int, l: Int, u: Int): Int = {
accu + (l until u).toArray.par.reduce(f(accu, _) + f(accu, _))
}
// The search function examines each position on the board in turn,
// trying the numbers 1..9 in each unfilled position
// The function is itself a higher-order fold, accumulating the value
// accu by applying the given function f to it whenever a solution m
// is found
def search(x: Int, y: Int, f: (Int) => Int, accu: Int): Int = Pair(x, y) match {
case Pair(9, y) => search(0, y + 1, f, accu) // next row
case Pair(0, 9) => f(accu) // found a solution - print it and continue
case Pair(x, y) => if (m(y)(x) != '0') search(x + 1, y, f, accu) else
parForFut1((accu: Int, n: Int) =>
if (invalid(0, x, y, (n + 48).asInstanceOf[Char])) accu else {
m(y)(x) = (n + 48).asInstanceOf[Char];
val newaccu = search(x + 1, y, f, accu);
m(y)(x) = '0';
newaccu
}, accu, 1, 10)
}
// The main part of the program uses the search function to accumulate
// the total number of solutions
Console.println("\n" + search(0, 0, i => { print; i + 1 }, 0) + " solution(s)")
}
As far as I can tell, recursion is inherently sequential (i.e. not parallelizable).
I'd reason it like this: recursion means (put simply) 'call myself'. A function call always happens within one and exactly one thread.
If you are telling the function to call itself, then you are staying in that thread - you're not dividing the work (i.e. making it parallel).
Recursion and parallelism are related though, but not at the function-call level. They are related in the sense that tasks can be recursively decomposed into smaller parts, that can be performed in parallel.
After Andreas comment I changed the m: Array[Array[Char]] into m: List[List[Char]] which prevents any unnecessary and unwanted changes to it. The final looping method is
def reduc(f: (Int, Int) => Int,
accu: Int, l: Int, u: Int, m1: List[List[Char]]):Int =
accu + (l until u).toArray.par.reduce(f(accu, _) + f(accu, _))
and I had to pass m as an argument to each used function, so every one of them had its own instance of it. The whole code:
object SudokuSolver extends App{
// The board is represented by an Array of strings (Arrays of Chars),
val source = scala.io.Source.fromFile("./puzzle")
val lines = source.getLines.toList
val m: List[List[Char]] = for (
str <- lines;
line: List[Char] = str.toList
) yield line
source.close()
// For prInting m
def printSud(m: List[List[Char]]) = {
Console.println("")
m map (println)
}
Console.println("\nINPUT:")
printSud(m)
def invalid(i:Int, x:Int, y:Int, n:Char,m1: List[List[Char]]): Boolean =
i < 9 && (m1(y)(i) == n || m1(i)(x) == n ||
m1(y / 3 * 3 + i / 3)(x / 3 * 3 + i % 3) == n ||
invalid(i + 1, x, y, n, m1))
def reduc(f: (Int, Int) => Int, accu: Int, l: Int, u: Int,
m1: List[List[Char]]): Int =
accu + (l until u).toArray.par.reduce(f(accu, _) + f(accu, _))
def search(x: Int, y: Int, accu: Int, m1: List[List[Char]]): Int =
Pair(x, y) match {
case Pair(9, y) => search(0, y + 1, accu, m1) // next row
case Pair(0, 9) => { printSud(m1); accu + 1 } // found a solution
case Pair(x, y) =>
if (m1(y)(x) != '0')
search(x + 1, y, accu, m1) // place is filled, we skip it.
else // box is not filled, we try all n in {1,...,9}
reduc((accu: Int, n: Int) => {
if (invalid(0, x, y, (n + 48).asInstanceOf[Char], m1))
accu
else { // n fits here
val line = List(m1(y).patch(x, Seq((n + 48).asInstanceOf[Char]), 1))
val m2 = m1.patch(y, line, 1)
val newaccu = search(x + 1, y, accu, m2);
val m3 = m1.patch(y, m1(y).patch(x, Seq(0), 1), 1)
newaccu
}
}, accu, 1, 10, m1)
}
Console.println("\n" + search(0, 0, 0, m) + " solution(s)")
}
I have an infinite Stream and need to "pull" from it until some element, naturally.
This is the first step.
But only part of the "pulled" elements will be used in the second step, e.g. only the even
elements.
Is it possible to avoid processing the odd elements by means of lazyness?
The better way to explain what I am asking is showing the code:
Welcome to Scala version 2.9.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_26).
Type in expressions to have them evaluated.
Type :help for more information.
scala> var n=0; def numbers:Stream[Int] = {n += 1; println("n= " + n); n #:: numbers}
n: Int = 0
numbers: Stream[Int]
scala> numbers.map{z => println("z^2= " + z*z) ; z*z}.take(10)(2)
n= 1
z^2= 1
n= 2
z^2= 4
n= 3
z^2= 9
res0: Int = 9
scala> var n=0; def numbers:Stream[Int] = {n += 1; println("n= " + n); n #:: numbers}
n: Int = 0
numbers: Stream[Int]
scala> numbers.map{lazy z => println("z^2= " + z*z) ; z}.take(10)(2)
<console>:1: error: lazy not allowed here. Only vals can be lazy
numbers.map{lazy z => println("z^2= " + z*z) ; z*z}.take(10)(2)
^
scala>
Since the result of take(10)(2) is res0: Int = 9, only the z^2= 9 calculation is really needed.
If you want to defer the calculation of z^2 (the map operation), a view should suffice:
object Worksheet {
var n = 0 //> n : Int = 0
def numbers: Stream[Int] = { n += 1; println("n= " + n); n #:: numbers }
//> numbers: => Stream[Int]
numbers.view.map { z => println("z^2= " + z * z); z }.take(10)(2)
//> n= 1
//| n= 2
//| n= 3
//| z^2= 9
//| res0: Int = 3
}
If you also want to defer the generation of the numbers in the stream until they are needed, you can do so manually:
object sheet {
var n = 0 //> n : Int = 0
def numbers: Stream[() => Int] = (() => { n += 1; println("n= " + n); n }) #:: numbers
//> numbers: => Stream[() => Int]
numbers.view.map { z => val x = z(); println("z^2= " + x * x); x }.take(10)(2)
//> n= 1
//| z^2= 1
//| res0: Int = 1
}
The issue here seems to be that the streams in the Scala library are not lazy in their head. I wouldn't be surprised if the Scalaz library had something like that already.
I'm trying to generate a list in scala according to the formula:
for n > 1 f(n) = 4*n^2 - 6*n + 6 and for n == 1 f(n) = 1
currently I have:
def lGen(end: Int): List[Int] = {
for { n <- List.range(3 , end + 1 , 2) } yields { 4*n*n - 6*n - 6 }
}
For end = 5 this would give the list:
List(24 , 76)
Right now I'm stuck on trying to find a gracefull way to make this function give
List(1 , 24 , 74)
Any suggestions would be greatly appreciated.
-Lee
I'd separate out the "formula" from the list generation:
val f : Int => Int = {
case 1 => 1
case x if x > 1 => 4*x*x - 6*x + 6
}
def lGen(end: Int) = (1 to end by 2 map f).toList
or
def lGen(end: Int) = List.range(1, end + 1, 2) map f
How about this:
scala> def lGen(end: Int): List[Int] =
1 :: List.range(3, end+1, 2).map(n => 4*n*n - 6*n + 6)
scala> lGen(5)
res0: List[Int] = List(1, 24, 76)