scala: what is the inner class method for dynamic scoping? - scala

i'm trying to evaluate all 3 methods of dynamic scoping described here (https://wiki.scala-lang.org/display/SYGN/Dynamic-scope) and i understand all but the "inner class method". it is described as follows:
It is possible to achieve a similar effect to dynamic scoping using nested class definitions. By defining the entire state-consuming code as inner classes of a state object, and instantiating that object each time a new global state is required, all the contained code gets direct access to the state variables via the parent reference.
To avoid defining the entire program in a single file, this approach for most purposes mandates the use of component mixins in order to compose the program into a single class.
I don't quite understand this - is it possible for someone to give some example code showing this? The second approach of implicit parameters makes sense to me, but the article also suggests that it can be combined with the inner class method and I don't quite see that either. Thanks!

Like this:
case class Board(rows: Int, columns: Int) {
case class Pos(row: Int, column: Int) {
require(0 <= row && row < rows && 0 <= column && column < columns)
def neighbors = for {
nRow <- Set(row - 1, row, row + 1)
if 0 <= nRow && nRow < rows
nColumn <- Set(column - 1, column, column + 1)
if 0 <= nColumn && nColumn < columns
if (nRow, nColumn) != (row, column)
} yield Pos(nRow, nColumn)
}
}
Here, Pos refers to a "context" that is on Board: rows and columns. For example:
scala> val board = Board(5, 5)
board: Board = Board(5,5)
scala> val pos = board.Pos(0, 0)
pos: board.Pos = Pos(0,0)
scala> println(pos.neighbors)
Set(Pos(0,1), Pos(1,0), Pos(1,1))
Changes in one Board as seen by the Pos associated with that instance, but not with others:
scala> val board2 = Board(2, 2)
board2: Board = Board(2,2)
scala> println(board.Pos(1,1).neighbors+"\n"+board2.Pos(1, 1).neighbors)
Set(Pos(1,0), Pos(1,2), Pos(2,0), Pos(2,1), Pos(0,0), Pos(2,2), Pos(0,1), Pos(0,2))
Set(Pos(0,0), Pos(0,1), Pos(1,0))

Related

Declaration never used - but it is?

I am writing in Scala and within this if statement I have a for loop and I initialized i=0 and used i in the for loop. It is telling me that declaration is not used, but I am using it in the for loop. top is always equal to 5 also.
else {
var i = 0
for (i <- 0 until (top))
For loops work a little different in Scala than other languages. In Scala, a for comprehension is syntax sugar over foreach, filter, map/flatMap higher order functions.
When you write
for (i <- 0 until top)
The compiler re-writes this to
(0 until top).foreach {
i => ...
}
Here, the foreach body is an anonymous function where i is the function parameter that has the same type as the iterable. So you can just remove the declaration for i in your code snippet.
Like posted before, the for in Scala is a very different animal than in e.g. Java. Using it as an iterator is only one of its many potential usages.
For your code: The first and the second i are not the same. In fact, the variable in the for expression is ephemeral and doesn't leave the for's scope, but actually shadows the outer one.
var i = 0 // you don't need this
for (i <- 0 until (top)) // loops from 0 to whatever top is regardless what's in the outer i
This will print 0 1 2 3 4:
val top = 5
for (i <- 0 until (top)) {
println(i)
}
One really cool aspect of for comes from the yield keyword (example from the first link):
val names = List("adam", "david", "frank")
val ucNames = for (name <- names) yield name.capitalize
Also it is used to kind of map over multiple collections/monads etc. at once like in Haskell, a task rather tedious otherwise:
val names = List("adam", "david")
val numbers = List(1, 2)
val lst = for {
name <- names
number <- numbers
} yield s"$name: $number"
lst will now hold the cartesian product of the two lists as a List[String]: List("adam: 1", "adam: 2", "david: 1", "david: 2")

Scala/functional way of doing things

I am using scala to write up a spark application that reads data from csv files using dataframes (none of these details matter really, my question can be answered by anyone who is good at functional programming)
I'm used to sequential programming and its taking a while to think of things in the functional way.
I basically want to read to columns (a,b) from a csv file and keep track of those rows where b < 0.
I implemented this but its pretty much how I would do it Java and I would like to utilize Scala's features instead:
val ValueDF = fileDataFrame.select("colA", "colB")
val ValueArr = ValueDF.collect()
for ( index <- 0 until (ValueArr.length)){
var row = ValueArr(index)
var A = row(0).toString()
var B = row(1).toString().toDouble
if (B < 0){
//write A and B somewhere
}
}
Converting the dataframe to an array defeats the purpose of distributed computation.
So how could I possibly get the same results but instead of forming an array and traversing through it, I would rather want to perform some transformations of the data frame itself (such as map/filter/flatmap etc).
I should get going soon hopefully, just need some examples to wrap my head around it.
You are doing basically a filtering operation (ignore if not (B < 0)) and mapping (from each row, get A and B / do something with A and B).
You could write it like this:
val valueDF = fileDataFrame.select("colA", "colB")
val valueArr = valueDF.collect()
val result = valueArr.filter(_(1).toString().toDouble < 0).map{row => (row(0).toString(), row(1).toString().toDouble)}
// do something with result
You also can do first the mapping and then the filtering:
val result = valueArr.map{row => (row(0).toString(), row(1).toString().toDouble)}.filter(_._2 < 0)
Scala also offers more convenient versions for this kind of operations (thanks Sascha Kolberg), called withFilter and collect. withFilter has the advantage over filter that it doesn't create a new collection, saving you one pass, see this answer for more details. With collect you also map and filter in one pass, passing a partial function which allows to do pattern matching, see e.g. this answer.
In your case collect would look like this:
val valueDF = fileDataFrame.select("colA", "colB")
val valueArr = valueDF.collect()
val result = valueArr.collect{
case row if row(1).toString().toDouble < 0) => (row(0).toString(), row(1).toString().toDouble)
}
// do something with result
(I think there's a more elegant way to express this but that's left as an exercise ;))
Also, there's a lightweight notation called "sequence comprehensions". With this you could write:
val result = for (row <- valueArr if row(1).toString().toDouble < 0) yield (row(0).toString(), row(1).toString().toDouble)
Or a more flexible variant:
val result = for (row <- valueArr) yield {
val double = row(1).toString().toDouble
if (double < 0) {
(row(0).toString(), double)
}
}
Alternatively, you can use foldLeft:
val valueDF = fileDataFrame.select("colA", "colB")
val valueArr = valueDF.collect()
val result = valueArr.foldLeft(Seq[(String, Double)]()) {(s, row) =>
val a = row(0).toString()
val b = row(1).toString().toDouble
if (b < 0){
s :+ (a, b) // append tuple with A and B to results sequence
} else {
s // let results sequence unmodified
}
}
// do something with result
All of these are considered functional... which one you prefer is for the most part a matter of taste. The first 2 examples (filter/map, map/filter) do have a performance disadvantage compared to the rest because they iterate through the sequence twice.
Note that in FP it's very important to minimize side effects / isolate them from the main logic. I/O ("write A and B somewhere") is a side effect. So you normally will write your functions such that they don't have side effects - just input -> output logic without affecting or retrieving data from the surroundings. Once you have a final result, you can do side effects. In this concrete case, once you have result (which is a sequence of A and B tuples), you can loop through it and print it. This way you can for example change easily the way to print (you may want to print to the console, send to a remote place, etc.) without touching the main logic.
Also you should prefer immutable values (val) wherever possible, which is safer. Even in your loop, row, A and B are not modified so there's no reason to use var.
(Btw, I corrected the values names to start with lower case, see conventions).

Scala Seq's find - wrong number of parameters; expected = 1

I have a Seq val that is populated with case class instances. I am then trying to use the find method in order to find the first option matching my criteria. Here is the code:
val week = weeks.find(now >= _.start && now <= _.end).headOption.map( _.week).getOrElse{0}
This is giving me an error:
wrong number of parameters; expected = 1
am I using the find method incorrectly above? The case class in the event it helps that weeks is populated with has the following definition:
case class Period(week: Int, start: DateTime, end: DateTime)
You can only use _ once per parameter, so scala thinks you're giving find a method that takes two parameters and it's telling you that it only takes a method with one parameter. This should work instead:
val week = weeks.find(p => now >= p.start && now <= p.end).headOption
.map( _.week).getOrElse{0}
As a side note, you don't need to use headOption because find is already returning an option of the first instance that matches your predicate. Additionally, instead of map and getOrElse you should use a fold as it has much stronger type safety:
val week2 = weeks.find(p => now >= p.start && now <= p.end).fold(0)( _.week)

What is the fastest way to subtract two arrays in scala

I have two arrays (that i have pulled out of a matrix (Array[Array[Int]]) and I need to subtract one from the other.
At the moment I am using this method however, when I profile it, it is the bottleneck.
def subRows(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
I need to do this billions of times so any improvement in speed is a plus.
I have tried using a List instead of an Array to collect the differences and it is MUCH faster but I lose all benefit when I convert it back to an Array.
I did modify the downstream code to take a List to see if that would help but I need to access the contents of the list out of order so again there is loss of any gains there.
It seems like any conversion of one type to another is expensive and I am wondering if there is some way to use a map etc. that might be faster.
Is there a better way?
EDIT
Not sure what I did the first time!?
So the code I used to test it was this:
def subRowsArray(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
def subRowsList(a: Array[Int], b: Array[Int], sizeHint: Int): List[Int] = {
var l: List[Int] = Nil
var i = 0
while (i < sizeHint) {
l = a(i) - b(i) :: l
i += 1
}
l
}
val a = Array.fill(100, 100)(scala.util.Random.nextInt(2))
val loops = 30000 * 10000
def runArray = for (i <- 1 to loops) subRowsArray(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def runList = for (i <- 1 to loops) subRowsList(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def optTimer(f: => Unit) = {
val s = System.currentTimeMillis
f
System.currentTimeMillis - s
}
The results I thought I got the first time I did this were the exact opposite... I must have misread or mixed up the methods.
My apologies for asking a bad question.
That code is the fastest you can manage single-threaded using a standard JVM. If you think List is faster, you're either fooling yourself or not actually telling us what you're doing. Putting an Int into List requires two object creations: one to create the list element, and one to box the integer. Object creations take about 10x longer than an array access. So it's really not a winning proposition to do it any other way.
If you really, really need to go faster, and must stay with a single thread, you should probably switch to C++ or the like and explicitly use SSE instructions. See this question, for example.
If you really, really need to go faster and can use multiple threads, then the easiest is to package up a chunk of work like this (i.e. a sensible number of pairs of vectors that need to be subtracted--probably at least a few million elements per chunk) into a list as long as the number of processors on your machine, and then call list.par.map(yourSubtractionRoutineThatActsOnTheChunkOfWork).
Finally, if you can be destructive,
a(i) -= b(i)
in the inner loop is, of course, faster. Likewise, if you can reuse space (e.g. with System.arraycopy), you're better off than if you have to keep allocating it. But that changes the interface from what you've shown.
You can use Scalameter to try a benchmark the two implementations which requires at least JRE 7 update 4 and Scala 2.10 to be run. I used scala 2.10 RC2.
Compile with scalac -cp scalameter_2.10-0.2.jar RangeBenchmark.scala.
Run with scala -cp scalameter_2.10-0.2.jar:. RangeBenchmark.
Here's the code I used:
import org.scalameter.api._
object RangeBenchmark extends PerformanceTest.Microbenchmark {
val limit = 100
val a = new Array[Int](limit)
val b = new Array[Int](limit)
val array: Array[Int] = new Array(limit)
var list: List[Int] = Nil
val ranges = for {
size <- Gen.single("size")(limit)
} yield 0 until size
measure method "subRowsArray" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
array(i) = a(i) - b(i)
i += 1
}
r => array
}
}
measure method "subRowsList" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
list = a(i) - b(i) :: list
i += 1
}
r => list
}
}
}
Here's the results:
::Benchmark subRowsArray::
Parameters(size -> 100): 8.26E-4
::Benchmark subRowsList::
Parameters(size -> 100): 7.94E-4
You can draw your own conclusions. :)
The stack blew up on larger values of limit. I'll guess it's because it's measuring the performance many times.

Creating immutable instances and modifying copies in an idiomatic way

I would like to conditionally create copies of an object instance depending on information external to that instance. Most of the information in the copies will be the same as the original, but some of the information will need to change. This information is being passed around between actors, so I need the objects to be immutable in order to avoid strange concurrency-related behavior. The following toy code is a simple example of what I would like some help with.
If I have the following code:
case class Container(condition:String,amount:Int,ID:Long)
I can do the following:
val a = new Container("Hello",10,1234567890)
println("a = " + a)
val b = a.copy(amount = -5)
println("b = " + b)
println("amount in b is " + b.amount)
and the output is
a = Container(Hello,10,1234567890)
b = Container(Hello,-5,1234567890)
amount in b is -5
I can also conditionally create copies of the object doing the following:
import scala.Math._
val max = 3
val c = if(abs(b.amount) >= max) b.copy(amount = max,condition="Goodbye") else if(abs(b.amount) < max) b.copy(amount = abs(b.amount))
println("c = " + c)
If I set the amount in the b object to -5, then the output is
c = Container(Goodbye,3,1234567890)
and if I set the amount in the b object to -2, then the output is
c = Container(Hello,2,1234567890)
However, when I try to print out c.amount, it gets flagged by the compiler with the following message
println("amount in c is " + c.amount)
value amount is not a member of Any
If I change the c object creation line to
val c:Container = if(abs(b.amount) >= max) b.copy(amount = max,condition="Goodbye") else if(abs(b.amount) < max) b.copy(amount = abs(b.amount))
I get the compiler error
type mismatch; found: Unit required:
Container
What is the best, idiomatic way of conditionally creating immutable instances of case classes by copying existing instances and modifying a value or two?
Thanks,
Bruce
You are not including a final else clause. Thus the type of c is Any -- the only type that is supertype both of Container and Unit, where Unit is the result of not including a catch-all else clause. If you try to force the result type to be Container, by writing c: Container =, the compiler now tells you the missing else clause resulting in Unit is not assignable to Container.
Thus
val c = if (abs(b.amount) >= max) {
b.copy(amount = max, condition = "Goodbye")
} else if (abs(b.amount) < max) {
b.copy(amount = abs(b.amount))
} else b // leave untouched !
works. The compiler isn't smart enough to figure out that the last else clause cannot be logically reached (it would need to know what abs and >= and < means, that they are mutual exclusive and exhaustive, and that abs is purely functional, as is b.amount).
In other words, since you know that the two clauses are mutually exclusive and exhaustive, you can simplify
val c = if (abs(b.amount) >= max) {
b.copy(amount = max, condition = "Goodbye")
} else { // i.e. abs(b.amount) < max
b.copy(amount = abs(b.amount))
}