Why is there a syntax error in the following scala code? - scala

def sortAndCountInv[T](vec: Vector[T]): (Int, Vector[T]) = {
val n = vec.length
if (n == 1) {
(0, vec)
}
else {
val (left, right) = vec.splitAt(n / 2)
val (leftInversions, sortedLeft) = sortAndCountInv(left)
val (rightInversions, sortedRight) = sortAndCountInv(right)
val (splitInversions, sortedArray) = countSplitInvAndMerge(left, right)
(leftInversions + rightInversions + splitInversions, sortedArray)
}
}
This code is for counting the number of inversions in a vector,
when I try to compile it Scala IDE for Eclipse gives me the following error: illegal start of simple expression, for the line val (left, right) ...
Why does this happen?

It's missing a final closing brace. In general "stale" errors will show up in an IDE when code is incorrect; when in doubt it's best to look at just the first error from a command-line compile (maven or similar).

If it works on REPL, this will be an IDE bug. Try IDEA community with Scala plugin. I found it quite nice also still have some problem understanding some complicated structures.

Related

Generator Ordering Causing Infinite Recursion in For Comprehension in Scala [duplicate]

I'm seeing what seems to be a very obvious bug with scalacheck, such that if it's really there I can't see how people use it for recursive data structures.
This program fails with a StackOverflowError before scalacheck takes over, while constructing the Arbitrary value. Note that the Tree type and the generator for Trees is taken verbatim from this scalacheck tutorial.
package treegen
import org.scalacheck._
import Prop._
class TreeProperties extends Properties("Tree") {
trait Tree
case class Node(left: Tree, right: Tree) extends Tree
case class Leaf(x: Int) extends Tree
val ints = Gen.choose(-100, 100)
def leafs: Gen[Leaf] = for {
x <- ints
} yield Leaf(x)
def nodes: Gen[Node] = for {
left <- trees
right <- trees
} yield Node(left, right)
def trees: Gen[Tree] = Gen.oneOf(leafs, nodes)
implicit lazy val arbTree: Arbitrary[Tree] = Arbitrary(trees)
property("vacuous") = forAll { t: Tree => true }
}
object Main extends App {
(new TreeProperties).check
}
What's stranger is that changes that shouldn't affect anything seem to alter the program so that it works. For example, if you change the definition of trees to this, it passes without any problem:
def trees: Gen[Tree] = for {
x <- Gen.oneOf(0, 1)
t <- if (x == 0) {leafs} else {nodes}
} yield t
Even stranger, if you alter the binary tree structure so that the value is stored on Nodes and not on Leafs, and alter the leafs and nodes definition to be:
def leafs: Gen[Leaf] = Gen.value(Leaf())
def nodes: Gen[Node] = for {
x <- ints // Note: be sure to ask for x first, or it'll StackOverflow later, inside scalacheck code!
left <- trees
right <- trees
} yield Node(left, right, x)
It also then works fine.
What's going on here? Why is constructing the Arbitrary value initially causing a stack overflow? Why does it seem that scalacheck generators are so sensitive to minor changes that shouldn't affect the control flow of the generators?
Why isn't my expression above with the oneOf(0, 1) exactly equivalent to the original oneOf(leafs, nodes) ?
The problem is that when Scala evaluates trees, it ends up in an endless recursion since trees is defined in terms of itself (via nodes). However, when you put some other expression than trees as the first part of your for-expression in nodes, Scala will delay the evaluation of the rest of the for-expression (wrapped up in chains of map and flatMap calls), and the infinite recursion will not happen.
Just as pedrofurla says, if oneOf was non-strict this would probably not happen (since Scala wouldn't evaluate the arguments immediately). However you can use Gen.lzy to be explicit about the lazyness. lzy takes any generator and delays the evaluation of that generator until it is really used. So the following change solves your problem:
def trees: Gen[Tree] = Gen.lzy(Gen.oneOf(leafs, nodes))
Even though following Rickard Nilsson's answer above got rid of the constant StackOverflowError on program startup, I'd still hit a StackOverflowError about one time out of three once I actually asked scalacheck to check the properties. (I changed Main above to run .check 40 times, and would see it succeed twice, then fail with a stack overflow, then succeed twice, etc.)
Eventually I had to put in a hard block to the depth of the recursion and this is what I guess I'll be doing when using scalacheck on recursive data structures in the future:
def leafs: Gen[Leaf] = for {
x <- ints
} yield Leaf(x)
def genNode(level: Int): Gen[Node] = for {
left <- genTree(level)
right <- genTree(level)
} yield Node(left, right)
def genTree(level: Int): Gen[Tree] = if (level >= 100) {leafs}
else {leafs | genNode(level + 1)}
lazy val trees: Gen[Tree] = genTree(0)
With this change, scalacheck never runs into a StackOverflowError.
A slight generalization of approach in Daniel Martin's own answer is using sized. Something like (untested):
def genTree() = Gen.sized { size => genTree0(size) }
def genTree0(maxDepth: Int) =
if (maxDepth == 0) leafs else Gen.oneOf(leafs, genNode(maxDepth))
def genNode(maxDepth: Int) = for {
depthL <- Gen.choose(0, maxDepth - 1)
depthR <- Gen.choose(0, maxDepth - 1)
left <- genTree0(depthL)
right <- genTree0(depthR)
} yield Node(left, right)
def leafs = for {
x <- ints
} yield Leaf(x)

Error while finding lines starting with H or I using Scala

I am trying to learn Spark and Scala. I am working on a scenario to identify the lines that start with H or I. Below is my code
def startWithHorI(s:String):String=
{
if(s.startsWith("I")
return s
if(s.startsWith("H")
return s
}
val fileRDD=sc.textFile("wordcountsample.txt")
val checkRDD=fileRDD.map(startWithHorI)
checkRDD.collect
It is throwing an error while creating the function Found:Unit Required:Boolean.
From research I understood that it is not able to recognize the return as Unit means void. Could someone help me.
There are a few things wrong with your def, we will start there:
It is throwing the error because according to the code posted, your syntax is incomplete and the def is defined improperly:
def startWithHorI(s:String): String=
{
if(s.startsWith("I")) // missing extra paren char in original post
s // do not need return statement
if(s.startsWith("H")) // missing extra paren char in original post
s // do not need return statement
}
This will still return an error because we are expecting a String when the compiler sees that it's returning an Any. We cannot do this if we do not have an else case (what will be returned when s does not start with H or I?) - the compiler will see this as an Any return type. The correction for this would be to have an else condition that ultimately returns a String.
def startWithHorI(s: String): String = {
if(s.startsWith("I")) s else "no I"
if(s.startsWith("H")) s else "no H"
}
If you don't want to return anything, then an Option is worth looking at for a return type.
Finally we can achieve what you are doing via filter - no need to map with a def:
val fileRDD = sc.textFile("wordcountsample.txt")
val checkRDD = fileRDD.filter(s => s.startsWith("H") || s.startsWith("I"))
checkRDD.collect
While passing any function to rdd.map(fn) make sure that fn covers all possible scenarios.
If you want to completely avoid strings which does not start with either H or I then use flatMap and return Option[String] from your function.
Example:
def startWithHorI(s:String): Option[String]=
{
if(s.startsWith("I") || s.startsWith("H")) Some(s)
else None
}
Then,
sc.textFile("wordcountsample.txt").flatMap(startWithHorI)
This will remove all rows not starting with H or I.
In general, to minimize run-time errors try to create total functions which handles all possible values of the arguments.
Something like below would work for you?
val fileRDD=sc.textFile("wordcountsample.txt")
fileRDD.collect
Array[String] = Array("Hello ", Hello World, Instragram, Good Morning)
val filterRDD=fileRDD.filter( x=> (x(0) == 'H'||x(0) == 'I'))
filterRDD.collect()
Array[String] = Array("Hello ", Hello World, Instragram)

Explanation of what is happening with REPL

I've been learning/experimenting with Scala in the REPL.
By the way, and as a side note, I'm quite impressed so far. I'm finding the language beautiful.
The following happened and need an explanation of what is going on.
Thanks in advance for any help given.
At the REPL entered:
def withMarks(mark: String)(body: => Unit){
println(mark + " Init")
body
println(mark + " End")
}
val a = "Testing clojure with paremeter by name as control structure"
withMarks("***"){
println(a)
println("more expressions")
}
Everything worked as expected.
Than happened what I consider weird, out of ignorance I suspect. I entered some more stuff:
class FileAsIterable{
def iterator = scala.io.Source.fromFile("/Users/MacBookProRetina/Google Drive/NewControl.scala").getLines()
}
val newIterator = new FileAsIterable with Iterable[String]
When evaluating the last line the REPL prints:
newIterator: FileAsIterable with Iterable[String] = (def withMarks(mark: String)(body: => Unit){, println(mark + " Init"), body, println(mark + " End"), }, val a = "Hola Mundo", withMarks("***"){, println(a), })
I keep getting the same result even after restarting the terminal in the Mac, and running the scala REPL at different directory locations.
Don't know how the newIterator val got connected to the the withMarks def.
Nevermind. I was just confused by the contents of the file

scalacheck Arbitrary implicits and recursive generators

I'm seeing what seems to be a very obvious bug with scalacheck, such that if it's really there I can't see how people use it for recursive data structures.
This program fails with a StackOverflowError before scalacheck takes over, while constructing the Arbitrary value. Note that the Tree type and the generator for Trees is taken verbatim from this scalacheck tutorial.
package treegen
import org.scalacheck._
import Prop._
class TreeProperties extends Properties("Tree") {
trait Tree
case class Node(left: Tree, right: Tree) extends Tree
case class Leaf(x: Int) extends Tree
val ints = Gen.choose(-100, 100)
def leafs: Gen[Leaf] = for {
x <- ints
} yield Leaf(x)
def nodes: Gen[Node] = for {
left <- trees
right <- trees
} yield Node(left, right)
def trees: Gen[Tree] = Gen.oneOf(leafs, nodes)
implicit lazy val arbTree: Arbitrary[Tree] = Arbitrary(trees)
property("vacuous") = forAll { t: Tree => true }
}
object Main extends App {
(new TreeProperties).check
}
What's stranger is that changes that shouldn't affect anything seem to alter the program so that it works. For example, if you change the definition of trees to this, it passes without any problem:
def trees: Gen[Tree] = for {
x <- Gen.oneOf(0, 1)
t <- if (x == 0) {leafs} else {nodes}
} yield t
Even stranger, if you alter the binary tree structure so that the value is stored on Nodes and not on Leafs, and alter the leafs and nodes definition to be:
def leafs: Gen[Leaf] = Gen.value(Leaf())
def nodes: Gen[Node] = for {
x <- ints // Note: be sure to ask for x first, or it'll StackOverflow later, inside scalacheck code!
left <- trees
right <- trees
} yield Node(left, right, x)
It also then works fine.
What's going on here? Why is constructing the Arbitrary value initially causing a stack overflow? Why does it seem that scalacheck generators are so sensitive to minor changes that shouldn't affect the control flow of the generators?
Why isn't my expression above with the oneOf(0, 1) exactly equivalent to the original oneOf(leafs, nodes) ?
The problem is that when Scala evaluates trees, it ends up in an endless recursion since trees is defined in terms of itself (via nodes). However, when you put some other expression than trees as the first part of your for-expression in nodes, Scala will delay the evaluation of the rest of the for-expression (wrapped up in chains of map and flatMap calls), and the infinite recursion will not happen.
Just as pedrofurla says, if oneOf was non-strict this would probably not happen (since Scala wouldn't evaluate the arguments immediately). However you can use Gen.lzy to be explicit about the lazyness. lzy takes any generator and delays the evaluation of that generator until it is really used. So the following change solves your problem:
def trees: Gen[Tree] = Gen.lzy(Gen.oneOf(leafs, nodes))
Even though following Rickard Nilsson's answer above got rid of the constant StackOverflowError on program startup, I'd still hit a StackOverflowError about one time out of three once I actually asked scalacheck to check the properties. (I changed Main above to run .check 40 times, and would see it succeed twice, then fail with a stack overflow, then succeed twice, etc.)
Eventually I had to put in a hard block to the depth of the recursion and this is what I guess I'll be doing when using scalacheck on recursive data structures in the future:
def leafs: Gen[Leaf] = for {
x <- ints
} yield Leaf(x)
def genNode(level: Int): Gen[Node] = for {
left <- genTree(level)
right <- genTree(level)
} yield Node(left, right)
def genTree(level: Int): Gen[Tree] = if (level >= 100) {leafs}
else {leafs | genNode(level + 1)}
lazy val trees: Gen[Tree] = genTree(0)
With this change, scalacheck never runs into a StackOverflowError.
A slight generalization of approach in Daniel Martin's own answer is using sized. Something like (untested):
def genTree() = Gen.sized { size => genTree0(size) }
def genTree0(maxDepth: Int) =
if (maxDepth == 0) leafs else Gen.oneOf(leafs, genNode(maxDepth))
def genNode(maxDepth: Int) = for {
depthL <- Gen.choose(0, maxDepth - 1)
depthR <- Gen.choose(0, maxDepth - 1)
left <- genTree0(depthL)
right <- genTree0(depthR)
} yield Node(left, right)
def leafs = for {
x <- ints
} yield Leaf(x)

Scala Parallel Collections- How to return early?

I have a list of possible input Values
val inputValues = List(1,2,3,4,5)
I have a really long to compute function that gives me a result
def reallyLongFunction( input: Int ) : Option[String] = { ..... }
Using scala parallel collections, I can easily do
inputValues.par.map( reallyLongFunction( _ ) )
To get what all the results are, in parallel. The problem is, I don't really want all the results, I only want the FIRST result. As soon as one of my input is a success, I want my output, and want to move on with my life. This did a lot of extra work.
So how do I get the best of both worlds? I want to
Get the first result that returns something from my long function
Stop all my other threads from useless work.
Edit -
I solved it like a dumb java programmer by having
#volatile var done = false;
Which is set and checked inside my reallyLongFunction. This works, but does not feel very scala. Would like a better way to do this....
(Updated: no, it doesn't work, doesn't do the map)
Would it work to do something like:
inputValues.par.find({ v => reallyLongFunction(v); true })
The implementation uses this:
protected[this] class Find[U >: T](pred: T => Boolean, protected[this] val pit: IterableSplitter[T]) extends Accessor[Option[U], Find[U]] {
#volatile var result: Option[U] = None
def leaf(prev: Option[Option[U]]) = { if (!pit.isAborted) result = pit.find(pred); if (result != None) pit.abort }
protected[this] def newSubtask(p: IterableSplitter[T]) = new Find(pred, p)
override def merge(that: Find[U]) = if (this.result == None) result = that.result
}
which looks pretty similar in spirit to your #volatile except you don't have to look at it ;-)
I took interpreted your question in the same way as huynhjl, but if you just want to search and discardNones, you could do something like this to avoid the need to repeat the computation when a suitable outcome is found:
class Computation[A,B](value: A, function: A => B) {
lazy val result = function(value)
}
def f(x: Int) = { // your function here
Thread.sleep(100 - x)
if (x > 5) Some(x * 10)
else None
}
val list = List.range(1, 20) map (i => new Computation(i, f))
val found = list.par find (_.result.isDefined)
//found is Option[Computation[Int,Option[Int]]]
val result = found map (_.result.get)
//result is Option[Int]
However find for parallel collections seems to do a lot of unnecessary work (see this question), so this might not work well, with current versions of Scala at least.
Volatile flags are used in the parallel collections (take a look at the source for find, exists, and forall), so I think your idea is a good one. It's actually better if you can include the flag in the function itself. It kills referential transparency on your function (i.e. for certain inputs your function now sometimes returns None rather than Some), but since you're discarding the stopped computations, this shouldn't matter.
If you're willing to use a non-core library, I think Futures would be a good match for this task. For instance:
Akka's Futures include Futures.firstCompletedOf
Twitter's Futures include Future.select
...both of which appear to enable the functionality you're looking for.