Will scala compiler hoist regular expressions - scala

I wonder if this:
object Foo {
val regex = "some complex regex".r
def foo() {
// use regex
}
}
and this:
object Foo {
def foo() {
val regex = "some complex regex".r
// use regex
}
}
will have any performance difference. i.e., will scala compiler recognize that "some complex regex".r is a constant and cache it, so that it will not recompile every time?

It will have a difference in runtime. Expression from first example will be calculated only once. Expression from second - every time you call Foo.foo(). Calculation here means applying implicitly added function "r" (from scala-library) to the string:
scala> ".*".r
res40: scala.util.matching.Regex = .*
This function actually compiles the regular expression every time you call it (no caching).
Btw, any naive caching of regexps in runtime is vulnerable to OutOfMemory - however, I believe it's possible to implement it safely with WeakHashMap, but current Java's Pattern implementation (which is underlying to scala's Regex) doesn't implement it actually, probably because such implementation may not have predictable effect on performance (GC may have to remove most of cached values every time it's running). Cache with eviction is more predictable, but still not so easy way (who's gonna choose timeout/size for it?). Talking about scala-way, some smart macro could do optimization in compile-time (do caching only for 'string constant'-based regexps), but by default:
Scala compiler also doesn't have any optimizations about regexps because regexp is not a part of the scala language.
So it's better to move static "".r constructions out of the function.

Related

Writing macros which generate statements

With intention of reducing the boilerplate for the end-user when deriving instances of a certain typeclass (let's take Showable for example), I aim to write a macro, which will autogenerate the instance names. E.g.:
// calling this:
Showable.derive[Int]
Showable.derive[String]
// should expand to this:
// implicit val derivedShowableInstance0 = new Showable[Int] { ... }
// implicit val derivedShowableInstance1 = new Showable[String] { ... }
I tried to approach the problem the following way, but the compiler complained that the expression should return a < no type > instead of Unit:
object Showable {
def derive[a] = macro Macros.derive[a]
object Macros {
private var instanceNameIndex = 0
def derive[ a : c.WeakTypeTag ]( c : Context ) = {
import c.universe._
import Flag._
val newInstanceDeclaration = ...
c.Expr[Unit](
ValDef(
Modifiers(IMPLICIT),
newTermName("derivedShowableInstance" + nameIndex),
TypeTree(),
newInstanceDeclaration
)
)
}
}
}
I get that a val declaration is not exactly an expression and hence Unit might not be appropriate, but then how to make it right?
Is this at all possible? If not then why, will it ever be, and are there any workarounds?
Yes, that's right. Declarations/definitions aren't expressions, so they need to be wrapped into Unit-returning blocks to become ones. Typically Scala does this automatically, but in this particular case you need to do it yourself.
However if you wrap a definition in a block, then it becomes invisible from the outside. That's the limitation of the current macro system that strictly follows the metaphor of "macro application is very much similar to a typed function call". Function calls don't bring new members into scope, so neither do def macros - both for technical and philosophical reasons. As shown in the example #3 of my recent "What Are Macros Good For?" talk, by using structural types def macros can work around this limitation, but this doesn't look particularly related to your question, so I'll omit the details.
On the other hand, there are some ideas how to overcome this limitation with new macro flavors. Macro annotations show that it's technically feasible for the macro engine to hook into new member creation, but we'd like to get more experience with macro annotations before bringing them into trunk. Some details about this can be found in my "Philosophy of Scala Macros" presentation. This can be useful in your situation, but I still won't go into details, because I think there's a much better solution for this particular case (feel free to ask for elaboration in comments, though!).
What I'd like to recommend you is to use materialization as described in http://docs.scala-lang.org/overviews/macros/implicits.html. With materializing macros you can automatically generate type class instances for the user without having the user write any code at all. Would that fit your use case?

Is everything a function or expression or object in scala?

I am confused.
I thought everything is expression because the statement returns a value. But I also heard that everything is an object in scala.
What is it in reality? Why did scala choose to do it one way or the other? What does that mean to a scala developer?
I thought everything is expression because the statement returns a value.
There are things that don't have values, but for the most part, this is correct. That means that we can basically drop the distinction between "statement" and "expression" in Scala.
The term "returns a value" is not quite fitting, however. We say everything "evaluates" to a value.
But I also heard that everything is an object in scala.
That doesn't contradict the previous statement at all :) It just means that every possible value is an object (so every expression evaluates to an object). By the way, functions, as first-class citizens in Scala, are objects, too.
Why did scala choose to do it one way or the other?
It has to be noted that this is in fact a generalization of the Java way, where statements and expressions are distinct things and not everything is an object. You can translate every piece of Java code to Scala without a lot of adaptions, but not the other way round. So this design decision makes Scala is in fact more powerful in terms of conciseness and expressiveness.
What does that mean to a scala developer?
It means, for example, that:
You often don't need return, because you can just put the return value as the last expression in a method
You can exploit the fact that if and case are expressions to make your code shorter
An example would be:
def mymethod(x: Int) = if (x > 2) "yay!" else "too low!"
// ...
println(mymethod(10)) // => prints "yay!"
println(mymethod(0)) // => prints "too low!"
We can also assign the value of such a compound expression to a variable:
val str = value match {
case Some(x) => "Result: " + x
case None => "Error!"
}
The distinction here is that the assertion "everything is a expression" is being made about blocks of code, whereas "everything is an object" is being made about the values in your program.
Blocks of Code
In the Java language, there are both expressions and statements. That is, any "block" of code is either an expression or a statement;
//the if-else-block is a statement whilst (x == 4) is an expression
if (x == 4) {
foo();
}
else {
bar();
}
Expressions have a type; statements do not; statements are invoked purely for their side-effects.
In scala, every block of code is an expression; that is, it has a type:
if (x == 4) foo else bar //has the type lub of foo and bar
This is extremely important! I cannot stress this enough; it's one of the things which makes scala a pleasure to work with because I can assign a value to an expression.
val x = if (x == 4) foo else bar
Values
By value, I mean something that we might reference in the program:
int i = foo(); //i is a reference to a value
java.util.TimeUnit.SECONDS;
In Java, the i above is not an object - it is a primitive. Furthermore I can access the field SECONDS of TimeUnit, but TimeUnit is not an object either; it is a static (for want of a better phrase). In scala:
val j = 4
Map.empty
As far as the language is concerned, j is an object (and I may dispatch methods to it), as is Map - the module (or companion object) for the type scala.collection.immutable.Map.
Is everything a function or expression or object in scala?
None of it.
There are things that are not objects. e.g. classes, methods.
There are things that are not expressions. e.g. class Foo { } is a statement, and does not evaluate to any value. (This is basically the same point as above.)
There are things that are not functions. I don't need to mention any examples for this one as there would be plenty in sight in any Scala code.
In other words, "Everything is a X" is nothing more than a sales pitch (in case of Scala).

Scala doesn't do type inference on return statements? Why not? [duplicate]

Why does Scala fail to infer the return type of the method when there's an explicit return statement used in the method?
For instance, why does the following code compile?
object Main {
def who = 5
def main(args: Array[String]) = println(who)
}
But the following doesn't.
object Main {
def who = return 5
def main(args: Array[String]) = println(who)
}
The return type of a method is either the type of the last statement in the block that defines it, or the type of the expression that defines it, in the absence of a block.
When you use return inside a method, you introduce another statement from which the method may return. That means Scala can't determine the type of that return at the point it is found. Instead, it must proceed until the end of the method, then combine all exit points to infer their types, and then go back to each of these exit points and assign their types.
To do so would increase the complexity of the compiler and slow it down, for the sole gain of not having to specify return type when using return. In the present system, on the other hand, inferring return type comes for free from the limited type inference Scala already uses.
So, in the end, in the balance between compiler complexity and the gains to be had, the latter was deemed to be not worth the former.
It would increase the complexity of the compiler (and language). It's just really funky to be doing type inference on something like that. As with anything type inference related, it all works better when you have a single expression. Scattered return statements effectively create a lot of implicit branching that gets to be very sticky to unify. It's not that it's particularly hard, just sticky. For example:
def foo(xs: List[Int]) = xs map { i => return i; i }
What, I ask you, does the compiler infer here? If the compiler were doing inference with explicit return statements, it would need to be Any. In fact, a lot of methods with explicit return statements would end up returning Any, even if you don't get sneaky with non-local returns. Like I said, sticky.
And on top of that, this isn't a language feature that should be encouraged. Explicit returns do not improve code clarity unless there is just one explicit return and that at the end of the function. The reason is pretty easy to see if you view code paths as a directed graph. As I said earlier, scattered returns produce a lot of implicit branching that produces weird leaves on your graph, as well as a lot of extra paths in the main body. It's just funky. Control flow is much easier to see if your branches are all explicit (pattern matching or if expressions) and your code will be much more functional if you don't rely on side-effecting return statements to produce values.
So, like several other "discouraged" features in Scala (e.g. asInstanceOf rather than as), the designers of the language made a deliberate choice to make things less pleasant. This combined with the complexity that it introduces into type inference and the practical uselessness of the results in all but the most contrived of scenarios. It just doesn't make any sense for scalac to attempt this sort of inference.
Moral of the story: learn not to scatter your returns! That's good advice in any language, not just Scala.
Given this (2.8.Beta1):
object Main {
def who = return 5
def main(args: Array[String]) = println(who)
}
<console>:5: error: method who has return statement; needs result type
def who = return 5
...it seems not inadvertent.
I'm not sure why. Perhaps just to discourage the use of the return statement. :)

why use foldLeft instead of procedural version?

So in reading this question it was pointed out that instead of the procedural code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
var result = exp
for ((oldS, newS) <- replacements)
result = result.replace(oldS, newS)
result
}
You could write the following functional code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
replacements.foldLeft(exp){
case (result, (oldS, newS)) => result.replace(oldS, newS)
}
}
I would almost certainly write the first version because coders familiar with either procedural or functional styles can easily read and understand it, while only coders familiar with functional style can easily read and understand the second version.
But setting readability aside for the moment, is there something that makes foldLeft a better choice than the procedural version? I might have thought it would be more efficient, but it turns out that the implementation of foldLeft is actually just the procedural code above. So is it just a style choice, or is there a good reason to use one version or the other?
Edit: Just to be clear, I'm not asking about other functions, just foldLeft. I'm perfectly happy with the use of foreach, map, filter, etc. which all map nicely onto for-comprehensions.
Answer: There are really two good answers here (provided by delnan and Dave Griffith) even though I could only accept one:
Use foldLeft because there are additional optimizations, e.g. using a while loop which will be faster than a for loop.
Use fold if it ever gets added to regular collections, because that will make the transition to parallel collections trivial.
It's shorter and clearer - yes, you need to know what a fold is to understand it, but when you're programming in a language that's 50% functional, you should know these basic building blocks anyway. A fold is exactly what the procedural code does (repeatedly applying an operation), but it's given a name and generalized. And while it's only a small wheel you're reinventing, but it's still a wheel reinvention.
And in case the implementation of foldLeft should ever get some special perk - say, extra optimizations - you get that for free, without updating countless methods.
Other than a distaste for mutable variable (even mutable locals), the basic reason to use fold in this case is clarity, with occasional brevity. Most of the wordiness of the fold version is because you have to use an explicit function definition with a destructuring bind. If each element in the list is used precisely once in the fold operation (a common case), this can be simplified to use the short form. Thus the classic definition of the sum of a collection of numbers
collection.foldLeft(0)(_+_)
is much simpler and shorter than any equivalent imperative construct.
One additional meta-reason to use functional collection operations, although not directly applicable in this case, is to enable a move to using parallel collection operations if needed for performance. Fold can't be parallelized, but often fold operations can be turned into commutative-associative reduce operations, and those can be parallelized. With Scala 2.9, changing something from non-parallel functional to parallel functional utilizing multiple processing cores can sometimes be as easy as dropping a .par onto the collection you want to execute parallel operations on.
One word I haven't seen mentioned here yet is declarative:
Declarative programming is often defined as any style of programming that is not imperative. A number of other common definitions exist that attempt to give the term a definition other than simply contrasting it with imperative programming. For example:
A program that describes what computation should be performed and not how to compute it
Any programming language that lacks side effects (or more specifically, is referentially transparent)
A language with a clear correspondence to mathematical logic.
These definitions overlap substantially.
Higher-order functions (HOFs) are a key enabler of declarativity, since we only specify the what (e.g. "using this collection of values, multiply each value by 2, sum the result") and not the how (e.g. initialize an accumulator, iterate with a for loop, extract values from the collection, add to the accumulator...).
Compare the following:
// Sugar-free Scala (Still better than Java<5)
def sumDoubled1(xs: List[Int]) = {
var sum = 0 // Initialized correctly?
for (i <- 0 until xs.size) { // Fenceposts?
sum = sum + (xs(i) * 2) // Correct value being extracted?
// Value extraction and +/* smashed together
}
sum // Correct value returned?
}
// Iteration sugar (similar to Java 5)
def sumDoubled2(xs: List[Int]) = {
var sum = 0
for (x <- xs) // We don't need to worry about fenceposts or
sum = sum + (x * 2) // value extraction anymore; that's progress
sum
}
// Verbose Scala
def sumDoubled3(xs: List[Int]) = xs.map((x: Int) => x*2). // the doubling
reduceLeft((x: Int, y: Int) => x+y) // the addition
// Idiomatic Scala
def sumDoubled4(xs: List[Int]) = xs.map(_*2).reduceLeft(_+_)
// ^ the doubling ^
// \ the addition
Note that our first example, sumDoubled1, is already more declarative than (most would say superior to) C/C++/Java<5 for loops, because we haven't had to micromanage the iteration state and termination logic, but we're still vulnerable to off-by-one errors.
Next, in sumDoubled2, we're basically at the level of Java>=5. There are still a couple things that can go wrong, but we're getting pretty good at reading this code-shape, so errors are quite unlikely. However, don't forget that a pattern that's trivial in a toy example isn't always so readable when scaled up to production code!
With sumDoubled3, desugared for didactic purposes, and sumDoubled4, the idiomatic Scala version, the iteration, initialization, value extraction and choice of return value are all gone.
Sure, it takes time to learn to read the functional versions, but we've drastically foreclosed our options for making mistakes. The "business logic" is clearly marked, and the plumbing is chosen from the same menu that everyone else is reading from.
It is worth pointing out that there is another way of calling foldLeft which takes advantages of:
The ability to use (almost) any Unicode symbol in an identifier
The feature that if a method name ends with a colon :, and is called infix, then the target and parameter are switched
For me this version is much clearer, because I can see that I am folding the expr value into the replacements collection
def expand(expr: String, replacements: Traversable[(String, String)]): String = {
(expr /: replacements) { case (r, (o, n)) => r.replace(o, n) }
}

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.