Specs2 - Unit specification style should not be used in concurrent environments - scala

Specs2 promotes functional style when dealing with Acceptance specification (even Unit specification if we want).
Risks of using old style (mutable style) are mentioned in the spec Specs2 philosophy and concernes potential unwanted side-effects:
The important things to know are:
side-effects are only used to build the specification fragments, by
mutating a variable they are also used to short-circuit the execution
of an example as soon as there is a failure (by throwing an exception).
If you build fragments in the body of examples or execute the same
specification concurrently, the sky should fall down. "context"
management is to be done with case classes or traits (see
org.specs2.examples.MutableSpec)
I don't figure out how a same specification could be run concurrently since each specification is distinct from the other (separated class's instances), even if we run the same twice or more simultaneously.
Indeed, specFragments (mutable variable):
protected[mutable] var specFragments: Fragments = new Fragments()
is declared in a trait called FragmentBuilder, not in an object(in scala sense => singleton) or other shared thing..., so specFragments is a local variable to each Specification's instance.
So what might a scenario be risking concurrency mechanism?
I don't really figure out a true scenario (non-stupid) proving the benefit of Specs2 functional style.

The issues with mutable specification can only be seen when the specification is being built, not when it is executed. When building a mutable specification it's easy to have unexpected side-effect
import org.specs2._
val spec = new mutable.Specification {
"example" >> ok
}
import spec._
def addTitle {
// WHOOPS, forgot to remove this line!
// test, add also an example
"this is only for testing" >> ok
"new title".title
}
addTitle
And the output is:
new title
+ example
+ this is only for testing
Total for specification new title
Finished in 0 ms
2 examples, 0 failure, 0 error
So, you're right, the highlighted sentence in the guide ("execute the same specification concurrently") is ambiguous. The construction of the specification itself might be unsafe if several threads were building the same specification object but not if they were running it (the whole process being called "execute" in that sentence).
Your other question is: what are the benefits of the "functional style"? The main benefit, from a user point of view, is that it is another style of writing specifications where all the text comes first and all the code is put elsewhere.
In conclusion, do not fear the mutable style of Specification if you like it!

Related

ScalaTest: where Checkers are used and assertions are used

I am going through the coursera functional programming and have an assignment where the scalatest is written using FunSuite and Checkers.
This test framework is new to me but I have some basic idea of using assertion, as I have developed pigunit for an user defined function using assert.
As google didn't give me clear usage of Checkers and how it is different from assert, could anyone clarify where Checkers can be used and why not assert be used.
Thanks
As you know, an assertion is a way of testing that a certain condition holds. These are pretty simple in ScalaTest, as you only need to use assert. For example:
assert(List(1, 2, 3).length == 3)
"Checkers," or, as they are more often called, properties, are a bit different. They are a way to assert that a condition holds for all possible inputs instead of for a single case. For example, here is a property that tests that a list always has a nonnegative length:
check((ls: List[Int]) => ls.length >= 0)
At this point, ScalaTest defers to ScalaCheck to do the heavy lifting. ScalaCheck generates random values for ls in an effort to find one that fails the test. This concept is called property-based testing. You can read more about how to use it in ScalaTest here.

Why future has side effects?

I am reading the book FPiS and on the page 107 the author says:
We should note that Future doesn’t have a purely functional interface.
This is part of the reason why we don’t want users of our library to
deal with Future directly. But importantly, even though methods on
Future rely on side effects, our entire Par API remains pure. It’s
only after the user calls run and the implementation receives an
ExecutorService that we expose the Future machinery. Our users
therefore program to a pure interface whose implementation
nevertheless relies on effects at the end of the day. But since our
API remains pure, these effects aren’t side effects.
Why Future has not purely functional interface?
The problem is that creating a Future that induces a side-effect is in itself also a side-effect, due to Future's eager nature.
This breaks referential transparency. I.e. if you create a Future that only prints to the console, the future will be run immediately and run the side-effect without you asking it to.
An example:
for {
x <- Future { println("Foo") }
y <- Future { println("Foo") }
} yield ()
This results in "Foo" being printed twice. Now if Future was referentially transparent we should be able to get the same result in the non-inlined version below:
val printFuture = Future { println("Foo") }
for {
x <- printFuture
y <- printFuture
} yield ()
However, this instead prints "Foo" only once and even more problematic, it prints it no matter if you include the for-expression or not.
With referentially transparent expression we should be able to inline any expression without changing the semantics of the program, Future can not guarantee this, therefore it breaks referential transparency and is inherently effectful.
A basic premise of FP is referential transparency. In other words, avoiding side effects.
What's a side effect? From Wikipedia:
In computer science, a function or expression is said to have a side effect if it modifies some state outside its scope or has an observable interaction with its calling functions or the outside world. (Except, by convention, returning a value: returning a value has an effect on the calling function, but this is usually not considered as a side effect.)
And what is a Scala future? From the documentation page:
A Future is a placeholder object for a value that may not yet exist.
So a future can transition from a not-yet-existing-value to an existing-value without any interaction from or with the rest of the program, and, as you quoted: "methods on Future rely on side effects."
It would appear that Scala futures do not maintain referential transparency.
As far as I know, Future runs its computation automatically when it's created. Even if it lacks side-effects in its nested computation, it still breaks flatMap composition rule, because it changes state over time:
someFuture.flatMap(Future(_)) == someFuture // can be false
Equality implementation questions aside, we can have a race condition here: new Future immediately runs for a tiny fraction of time, and its isCompleted can differ from someFuture if it is already done.
In order to be pure w.r.t. effect it represents, Future should defer its computation and run it only when explicitly asked for it, like in the case of Par (or scalaz's Task).
To complement the other points and explain relationship between referential transparency (a requirement) and side-effects (mutation that might break this requirement), here is kinda simplistic but pragmatic view on what's happening:
newly created Future immediately submits a Callable task into your pool's queue. Given that queue is a mutable collection - this is basically a side-effect
any subscription (from onComplete to map) does the same + uses an additional mutable collection of subscribers per Callable.
Btw, subscriptions are not only in violation of Monad laws as noted by #P.Frolov (for flatMap) - Functor laws f.map(identity) == f are broken too. Especially, in the light of fact that newly created Future (by map) isn't equivalent to original - it has its separate subscriptions and Callable
This "fire and subscribe" allows you to do stuff like:
val f = Future{...}
val f2 = f.map(...)
val f3 = f.map(...)//twice or more
Every line of this code produces a side-effect that might potentially break referential transparency and actually does as many mentioned.
The reason why many authors prefer "referential transparency" term is probably because from low-level perspective we always do some side-effects, however only subset (usually a more high-level one) of those actually makes your code "non-functional".
As per the futures, breaking referential transparency is most disruptive as it also leads to non-determinism (in Futures case):
val f1 = Future {
println("1")
}
val f2 = Future {
println("2")
}
It gets worse when this is combined with Monads, including for-comprehension cases mentioned by #Luka Jacobowitz. In practice, monads are used not only to flatten-merge compatible containers, but also in order to guarantee [con]sequential relation. This is probably because even in abstract algebra Monads are generalizing over consequence operators meant as a general characterization of the notion of deduction.
This simply means that it's hard to reason about non-deterministic logic, even harder than just non-referential-transparent stuff:
analyzing logs produced by Futures, or even worse actors, is a hell. Even no matter how many labels and thread-local propagation you have - everything breaks eventually.
non-deterministic (aka "sometimes appearing") bugs are most annoying and stay in production for years(!) - even extensive high-load testing (including performance tests) doesn't always catch those.
So, even in absence of other criteria, code that is easier to reason about, is essentially more functional and Futures often lead to code that isn't.
P.S. As a conclusion, if your project is tolerant to scalaz/cats/monix/fs2 so on, it's better to use Tasks/Streams/Iteratees. Those libraries introduce some risks of overdesgn of course; however, IMO it's better to spent time simplifying incomprehensible scalaz-code than debugging an incomprehensible bug.

How do I set the default number of threads for Scala 2.10 parallel collections?

In Scala before 2.10, I can set the parallelism in the defaultForkJoinPool (as in this answer scala parallel collections degree of parallelism). In Scala 2.10, that API no longer exists. It is well documented that we can set the parallelism on a single collection (http://docs.scala-lang.org/overviews/parallel-collections/configuration.html) by assigning to its taskSupport property.
However, I use parallel collections all over my codebase and would not like to add an extra two lines to every single collection instantiation. Is there some way to configure the global default thread pool size so that someCollection.par.map(f(_)) automatically uses the default number of threads?
I know that the question is over a month old, but I've just had exactly the same question. Googling wasn't helpful and I couldn't find anything that looked halfway sane in the new API.
Setting -Dscala.concurrent.context.maxThreads=n as suggested here: Set the parallelism level for all collections in Scala 2.10? seemingly had no effect at all, but I'm not sure if I used it correctly (I run my application with 'java' in an environment without 'scala' installed explicitly, it might be the cause).
I don't know why scala-people removed this essential setter from the appropriate package object.
However, it's often possible to use reflection to work around an incomplete/weird interface:
def setParallelismGlobally(numThreads: Int): Unit = {
val parPkgObj = scala.collection.parallel.`package`
val defaultTaskSupportField = parPkgObj.getClass.getDeclaredFields.find{
_.getName == "defaultTaskSupport"
}.get
defaultTaskSupportField.setAccessible(true)
defaultTaskSupportField.set(
parPkgObj,
new scala.collection.parallel.ForkJoinTaskSupport(
new scala.concurrent.forkjoin.ForkJoinPool(numThreads)
)
)
}
For those not familiar with the more obscure features of Scala, here is a short explanation:
scala.collection.parallel.`package`
accesses the package object with the defaultTaskSupport variable (it looks somewhat like Java's static variable, but it's actually a member variable of the package object). The backticks are required for the identifier, because package is a reserved keyword. Then we get the private final field that we want (getField("defaultTaskSupport") didn't work for some reason?...), tell it to be accessible in order to be able to modify it, and then replace it's value by our own ForkJoinTaskSupport.
I don't yet understand the exact mechanism of the creation of parallel collections, but the source code of the Combiner trait suggests that the value of defaultTaskSupport should percolate to the parallel collections somehow.
Notice that the question is qualitatively of the same sort as a much older question: "I have Math.random() all over my codebase, how can I set the seed to a fixed number for debugging purposes?" (See e.g. : Set seed on Math.random() ). In both cases, we have some sort of global "static" variable that we implicitly use in a million different places, we want to change it, but there are no setters for this variable => we use reflection.
Ugly as hell, but seems to work just fine. If you need to limit the total number of threads, don't forget that the garbage collector runs on separate thread.

How do you do dependency injection with the Cake pattern without hardcoding?

I just read and enjoyed the Cake pattern article. However, to my mind, one of the key reasons to use dependency injection is that you can vary the components being used by either an XML file or command-line arguments.
How is that aspect of DI handled with the Cake pattern? The examples I've seen all involve mixing traits in statically.
Since mixing in traits is done statically in Scala, if you want to vary the traits mixed in to an object, create different objects based on some condition.
Let's take a canonical cake pattern example. Your modules are defined as traits, and your application is constructed as a simple Object with a bunch of functionality mixed in
val application =
new Object
extends Communications
with Parsing
with Persistence
with Logging
with ProductionDataSource
application.startup
Now all of those modules have nice self-type declarations which define their inter-module dependencies, so that line only compiles if your all inter-module dependencies exist, are unique, and well-typed. In particular, the Persistence module has a self-type which says that anything implementing Persistence must also implement DataSource, an abstract module trait. Since ProductionDataSource inherits from DataSource, everything's great, and that application construction line compiles.
But what if you want to use a different DataSource, pointing at some local database for testing purposes? Assume further that you can't just reuse ProductionDataSource with different configuration parameters, loaded from some properties file. What you would do in that case is define a new trait TestDataSource which extends DataSource, and mix it in instead. You could even do so dynamically based on a command line flag.
val application = if (test)
new Object
extends Communications
with Parsing
with Persistence
with Logging
with TestDataSource
else
new Object
extends Communications
with Parsing
with Persistence
with Logging
with ProductionDataSource
application.startup
Now that looks a bit more verbose than we would like, particularly if your application needs to vary its construction on multiple axes. On the plus side, you usually you only have one chunk of conditional construction logic like that in an application (or at worst once per identifiable component lifecycle), so at least the pain is minimized and fenced off from the rest of your logic.
Scala is also a script language. So your configuration XML can be a Scala script. It is type-safe and not-a-different-language.
Simply look at startup:
scala -cp first.jar:second.jar startupScript.scala
is not so different than:
java -cp first.jar:second.jar com.example.MyMainClass context.xml
You can always use DI, but you have one more tool.
The short answer is that Scala doesn't currently have any built-in support for dynamic mixins.
I am working on the autoproxy-plugin to support this, although it's currently on hold until the 2.9 release, when the compiler will have new features making it a much easier task.
In the meantime, the best way to achieve almost exactly the same functionality is by implementing your dynamically added behavior as a wrapper class, then adding an implicit conversion back to the wrapped member.
Until the AutoProxy plugin becomes available, one way to achieve the effect is to use delegation:
trait Module {
def foo: Int
}
trait DelegatedModule extends Module {
var delegate: Module = _
def foo = delegate.foo
}
class Impl extends Module {
def foo = 1
}
// later
val composed: Module with ... with ... = new DelegatedModule with ... with ...
composed.delegate = choose() // choose is linear in the number of `Module` implementations
But beware, the downside of this is that it's more verbose, and you have to be careful about the initialization order if you use vars inside a trait. Another downside is that if there are path dependent types within Module above, you won't be able to use delegation that easily.
But if there is a large number of different implementations that can be varied, it will probably cost you less code than listing cases with all possible combinations.
Lift has something along those lines built in. It's mostly in scala code, but you have some runtime control. http://www.assembla.com/wiki/show/liftweb/Dependency_Injection

In Scala, what is an "early initializer"?

In Martin Odersky's recent post about levels of programmer ability in Scala, in the Expert library designer section, he includes the term "early initializers".
These are not mentioned in Programming in Scala. What are they?
Early initializers are part of the constructor of a subclass that is intended to run before its superclass. For example:
abstract class X {
val name: String
val size = name.size
}
class Y extends {
val name = "class Y"
} with X
If the code was written instead as
class Z extends X {
val name = "class Z"
}
then a null pointer exception would occur when Z got initialized, because size is initialized before name in the normal ordering of initialization (superclass before class).
As far as I can tell, the motivation (as given in the link above) is:
"Naturally when a val is overridden, it is not initialized more than once. So though x2 in the above example is seemingly defined at every point, this is not the case: an overridden val will appear to be null during the construction of superclasses, as will an abstract val."
I don't see why this is natural at all. It is completely possible that the r.h.s. of an assignment might have a side effect. Note that such code structure is completely impossible in either C++ or Java (and I will guess Smalltalk, although I can't speak for that language). In fact you have to make such dual assignments implicit...ticilpmi...EXplicit in those languages via constructors. In the light of the r.h.s. side effect uncertainty, it really doesn't seem like much of a motivation at all: the ability to sidestep superclass side effects (thereby voiding superclass invariants) via ASSIGNMENT? Ick!
Are there other "killer" motivations for allowing such unsafe code structure? Object-oriented languages have done without such a mechanism for about 40 years (30-odd years, if you count from the creation of the language), why include it now?
It...just...seems...dangerous.
On second thought, a year layer...
This is just cake. Literally.
Not an early ANYTHING. Just cake (mixins).
Cake is a term/pattern coined by The Grand Pooh-bah himself, one that employs Scala's trait system, which is halfway between a class and an interface. It is far better than Java's decoration pattern.
The so-called "interface" is merely an unnamed base class, and what used to be the base class is acting as a trait (which I frankly did not know could be done). It is unclear to me if a "with'd" class can take arguments (traits can't), will try it and report back.
This question and its answer has stepped into one of Scala's coolest features. Read up on it and be in awe.