How do I set the default number of threads for Scala 2.10 parallel collections? - scala

In Scala before 2.10, I can set the parallelism in the defaultForkJoinPool (as in this answer scala parallel collections degree of parallelism). In Scala 2.10, that API no longer exists. It is well documented that we can set the parallelism on a single collection (http://docs.scala-lang.org/overviews/parallel-collections/configuration.html) by assigning to its taskSupport property.
However, I use parallel collections all over my codebase and would not like to add an extra two lines to every single collection instantiation. Is there some way to configure the global default thread pool size so that someCollection.par.map(f(_)) automatically uses the default number of threads?

I know that the question is over a month old, but I've just had exactly the same question. Googling wasn't helpful and I couldn't find anything that looked halfway sane in the new API.
Setting -Dscala.concurrent.context.maxThreads=n as suggested here: Set the parallelism level for all collections in Scala 2.10? seemingly had no effect at all, but I'm not sure if I used it correctly (I run my application with 'java' in an environment without 'scala' installed explicitly, it might be the cause).
I don't know why scala-people removed this essential setter from the appropriate package object.
However, it's often possible to use reflection to work around an incomplete/weird interface:
def setParallelismGlobally(numThreads: Int): Unit = {
val parPkgObj = scala.collection.parallel.`package`
val defaultTaskSupportField = parPkgObj.getClass.getDeclaredFields.find{
_.getName == "defaultTaskSupport"
}.get
defaultTaskSupportField.setAccessible(true)
defaultTaskSupportField.set(
parPkgObj,
new scala.collection.parallel.ForkJoinTaskSupport(
new scala.concurrent.forkjoin.ForkJoinPool(numThreads)
)
)
}
For those not familiar with the more obscure features of Scala, here is a short explanation:
scala.collection.parallel.`package`
accesses the package object with the defaultTaskSupport variable (it looks somewhat like Java's static variable, but it's actually a member variable of the package object). The backticks are required for the identifier, because package is a reserved keyword. Then we get the private final field that we want (getField("defaultTaskSupport") didn't work for some reason?...), tell it to be accessible in order to be able to modify it, and then replace it's value by our own ForkJoinTaskSupport.
I don't yet understand the exact mechanism of the creation of parallel collections, but the source code of the Combiner trait suggests that the value of defaultTaskSupport should percolate to the parallel collections somehow.
Notice that the question is qualitatively of the same sort as a much older question: "I have Math.random() all over my codebase, how can I set the seed to a fixed number for debugging purposes?" (See e.g. : Set seed on Math.random() ). In both cases, we have some sort of global "static" variable that we implicitly use in a million different places, we want to change it, but there are no setters for this variable => we use reflection.
Ugly as hell, but seems to work just fine. If you need to limit the total number of threads, don't forget that the garbage collector runs on separate thread.

Related

How to find max date from stream in scala using CompareTo?

I am new to Scala and trying to explore how I can use Java functionalities with Scala.
I am having stream of LocalDate which is a Java class and I am trying to find maximum date out of my list.
var processedResult : Stream[LocalDate] =List(javaList)
.toStream
.map { s => {
//some processing
LocalDate.parse(str, formatter)
}
}
I know we can do easily by using .compare() and .compareTo() in Java but I am not sure how do I use the same thing over here.
Also, I have no idea how Ordering works in Scala when it comes to sorting.
Can anyone suggest how can get this done?
First of all, a lot of minor details that I will point out since it seems you are pretty new to the language and I expect those to help you with your learning path.
First, avoid var at all costs, especially when learning.
While mutability has its place and is not always wrong, forcing you to avoid it while learning will help you. Particularly, avoid it when it doesn't provide any value; like in this case.
Second, this List(javaList) doesn't do what you think it does. It creates a single element Scala List whose unique element is a Java List. What you probably want is to transform that Java List into a Scala one, for that you can use the CollectionConverters.
import scala.jdk.CollectionConverters._ // This works if you are in 2.13
// if you are in 2.12 or lower use: import scala.collection.JavaConverters._
val scalaList = javaList.asScala.toList
Third, not sure why you want to use a Scala Stream, a Stream is for infinite or very large collections where you want all the transformations to be made lazily and only produce elements as they are consumed (also, btw, it was deprecated in 2.13 in favour of LazyList).
Maybe, you are confused because in Java you need a "Stream" to apply functional operations like map? If so, note that in Scala all collections provide the same rich API.
Fourth, Ordering is a Typeclass which is a functional pattern for Polymorphism. On its own, this is a very broad question so I won't answer it here, but I hope the two links provide insight.
The TL;DR; is simple, it is just that an Ordering for a type T knows how to order (sort) elements of type T. Thus operations like max will work for any collection of any type if, and only if, the compiler can prove the existence of an Ordering for that type if it can then it will pass such value implicitly to the method call for you; again the implicits topic is very broad and deserves its own question.
Now for your particular question, you can just call max or maxOption in the List or Stream and that is all.
Note that max will throw if the List is empty, whereas maxOption returns an Option which will be empty (None) for an empty input; idiomatic Scala favour the latter over the former.
If you really want to use compareTo then you can provide your own Ordering.
scalaList.maxOption(Ordering.fromLessThan[LocalDate]((d1, d2) => d1.compareTo(d2) < 0))
Ordering[A] is a type class which defines how to compare 2 elements of type A. So to compare LocalDates you need Ordering[LocalDate] instance.
LocalDate extends Comparable in Java and Scala conveniently provides instances for Comparables so when you invoke:
Ordering[java.time.LocalDate]
in REPL you'll see that Scala is able to provide you the instance without you needing to do anything (you could take a look at the list of methods provided by this typeclass).
Since you have and Ordering in implicit scope which types matches the Stream's type (e.g. Stream[LocalDate] needs Ordering[LocalDate]) you can call .max method... and that's it.
val processedResult : Stream[LocalDate] = ...
val newestDate: LocalDate = processedResult.max

Is it Scala style to use a for loop in Scala/Spark?

I have heard that it is a good practice in Scala to eliminate for loops and do things "the Scala way". I even found a Scala style checker at http://www.scalastyle.org. Are for loops a no-no in Scala? In a course at https://www.udemy.com/course/apache-spark-with-scala-hands-on-with-big-data/learn/lecture/5363798#overview I found this example, which makes me thing that for looks are okay to use, but using the Scala format and syntax of course, in a single line and not like the traditional Java for looks in multiple lines of code. See this example I found from that Udemy course:
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
for (ship <- shipList) {println(ship)}
That for loop prints this result, as expected:
Enterprise Defiant Voyager Deep Space Nine
I was wondering if using for as in the example above is acceptable Scala style code, or it if is a no-no and why. Thank you!
There is no problem in this for loop, but you can use functions form List object for your work in more functional way.
e.g. instead of using
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
for (ship <- shipList) {println(ship)}
You can use
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
shipList.foreach(element => println(element) )
or
shipList.foreach(println)
You can use for loops in Scala, there is no problem with that. But the difference is that this for-loop is not an expression and does not return a value, so you need to use a variable in order to return any value. Scala gives preference to work with immutable types.
In your example you print messages in the console, you need to perform a "side effect" to extract the value breaking the referencial transparency, I mean, you depend on the IO operation to extract a value, or you have mutate a variable which is in the scope which maybe is being accessed by another thread or another concurrent task thereby there is no guarantee that the value that you collect wont be what you are expecting. Obviously, all these hypothesis are related to concurrent/parallel programming and there is where Scala and the immutable style help.
To show the elements of a collection you can use a for loop, but if you want to count the total number of chars in Scala you do that using a expression like:
val chars = shipList.foldLeft(0)((a, b) => a + b.length)
To sum up, most of the times the Scala code that you will read uses immutable style of programming although not always because Scala supports the other way of coding too, but it is weird to find something using a classic Java OOP style, mutating object instances and using getters and setters.

Everything's an object in Scala

I am new to Scala and heard a lot that everything is an object in Scala. What I don't get is what's the advantage of "everything's an object"? What are things that I cannot do if everything is not an object? Examples are welcome. Thanks
The advantage of having "everything" be an object is that you have far fewer cases where abstraction breaks.
For example, methods are not objects in Java. So if I have two strings, I can
String s1 = "one";
String s2 = "two";
static String caps(String s) { return s.toUpperCase(); }
caps(s1); // Works
caps(s2); // Also works
So we have abstracted away string identity in our operation of making something upper case. But what if we want to abstract away the identity of the operation--that is, we do something to a String that gives back another String but we want to abstract away what the details are? Now we're stuck, because methods aren't objects in Java.
In Scala, methods can be converted to functions, which are objects. For instance:
def stringop(s: String, f: String => String) = if (s.length > 0) f(s) else s
stringop(s1, _.toUpperCase)
stringop(s2, _.toLowerCase)
Now we have abstracted the idea of performing some string transformation on nonempty strings.
And we can make lists of the operations and such and pass them around, if that's what we need to do.
There are other less essential cases (object vs. class, primitive vs. not, value classes, etc.), but the big one is collapsing the distinction between method and object so that passing around and abstracting over functionality is just as easy as passing around and abstracting over data.
The advantage is that you don't have different operators that follow different rules within your language. For example, in Java to perform operations involving objects, you use the dot name technique of calling the code (static objects still use the dot name technique, but sometimes the this object or the static object is inferred) while built-in items (not objects) use a different method, that of built-in operator manipulation.
Number one = Integer.valueOf(1);
Number two = Integer.valueOf(2);
Number three = one.plus(two); // if only such methods existed.
int one = 1;
int two = 2;
int three = one + two;
the main differences is that the dot name technique is subject to polymorphisim, operator overloading, method hiding, and all the good stuff that you can do with Java objects. The + technique is predefined and completely not flexible.
Scala circumvents the inflexibility of the + method by basically handling it as a dot name operator, and defining a strong one-to-one mapping of such operators to object methods. Hence, in Scala everything is an object means that everything is an object, so the operation
5 + 7
results in two objects being created (a 5 object and a 7 object) the plus method of the 5 object being called with the parameter 7 (if my scala memory serves me correctly) and a "12" object being returned as the value of the 5 + 7 operation.
This everything is an object has a lot of benefits in a functional programming environment, for example, blocks of code now are object too, making it possible to pass back and forth blocks of code (without names) as parameters, yet still be bound to strict type checking (the block of code only returns Long or a subclass of String or whatever).
When it boils down to it, it makes some kinds of solutions very easy to implement, and often the inefficiencies are mitigated by the lack of need to handle "move into primitives, manipulate, move out of primitives" marshalling code.
One specific advantage that comes to my mind (since you asked for examples) is what in Java are primitive types (int, boolean ...) , in Scala are objects that you can add functionality to with implicit conversions. For example, if you want to add a toRoman method to ints, you could write an implicit class like:
implicit class RomanInt(i:Int){
def toRoman = //some algorithm to convert i to a Roman representation
}
Then, you could call this method from any Int literal like :
val romanFive = 5.toRoman // V
This way you can 'pimp' basic types to adapt them to your needs
In addition to the points made by others, I always emphasize that the uniform treatment of all values in Scala is in part an illusion. For the most part it is a very welcome illusion. And Scala is very smart to use real JVM primitives as much as possible and to perform automatic transformations (usually referred to as boxing and unboxing) only as much as necessary.
However, if the dynamic pattern of application of automatic boxing and unboxing is very high, there can be undesirable costs (both memory and CPU) associated with it. This can be partially mitigated with the use of specialization, which creates special versions of generic classes when particular type parameters are of (programmer-specified) primitive types. This avoids boxing and unboxing but comes at the cost of more .class files in your running application.
Not everything is an object in Scala, though more things are objects in Scala than their analogues in Java.
The advantage of objects is that they're bags of state which also have some behavior coupled with them. With the addition of polymorphism, objects give you ways of changing the implicit behavior and state. Enough with the poetry, let's go into some examples.
The if statement is not an object, in either scala or java. If it were, you could be able to subclass it, inject another dependency in its place, and use it to do stuff like logging to a file any time your code makes use of the if statement. Wouldn't that be magical? It would in some cases help you debug stuff, and in other cases it would make your hairs grow white before you found a bug caused by someone overwriting the behavior of if.
Visiting an objectless, statementful world: Imaging your favorite OOP programming language. Think of the standard library it provides. There's plenty of classes there, right? They offer ways for customization, right? They take parameters that are other objects, they create other objects. You can customize all of these. You have polymorphism. Now imagine that all the standard library was simply keywords. You wouldn't be able to customize nearly as much, because you can't overwrite keywords. You'd be stuck with whatever cases the language designers decided to implement, and you'd be helpless in customizing anything there. Such languages exist, you know them well, they're the sequel-like languages. You can barely create functions there, but in order to customize the behavior of the SELECT statement, new versions of the language had to appear which included the features most desired. This would be an extreme world, where you'd only be able to program by asking the language designers for new features (which you might not get, because someone else more important would require some feature incompatible with what you want)
In conclusion, NOT everything is an object in scala: Classes, expressions, keywords and packages surely aren't. More things however are, like functions.
What's IMHO a nice rule of thumb is that more objects equals more flexibility
P.S. in Python for example, even more things are objects (like the classes themselves, the analogous concept for packages (that is python modules and packages). You'd see how there, black magic is easier to do, and that brings both good and bad consequences.

Scala Case Class Map Expansion

In groovy one can do:
class Foo {
Integer a,b
}
Map map = [a:1,b:2]
def foo = new Foo(map) // map expanded, object created
I understand that Scala is not in any sense of the word, Groovy, but am wondering if map expansion in this context is supported
Simplistically, I tried and failed with:
case class Foo(a:Int, b:Int)
val map = Map("a"-> 1, "b"-> 2)
Foo(map: _*) // no dice, always applied to first property
A related thread that shows possible solutions to the problem.
Now, from what I've been able to dig up, as of Scala 2.9.1 at least, reflection in regard to case classes is basically a no-op. The net effect then appears to be that one is forced into some form of manual object creation, which, given the power of Scala, is somewhat ironic.
I should mention that the use case involves the servlet request parameters map. Specifically, using Lift, Play, Spray, Scalatra, etc., I would like to take the sanitized params map (filtered via routing layer) and bind it to a target case class instance without needing to manually create the object, nor specify its types. This would require "reliable" reflection and implicits like "str2Date" to handle type conversion errors.
Perhaps in 2.10 with the new reflection library, implementing the above will be cake. Only 2 months into Scala, so just scratching the surface; I do not see any straightforward way to pull this off right now (for seasoned Scala developers, maybe doable)
Well, the good news is that Scala's Product interface, implemented by all case classes, actually doesn't make this very hard to do. I'm the author of a Scala serialization library called Salat that supplies some utilities for using pickled Scala signatures to get typed field information
https://github.com/novus/salat - check out some of the utilities in the salat-util package.
Actually, I think this is something that Salat should do - what a good idea.
Re: D.C. Sobral's point about the impossibility of verifying params at compile time - point taken, but in practice this should work at runtime just like deserializing anything else with no guarantees about structure, like JSON or a Mongo DBObject. Also, Salat has utilities to leverage default args where supplied.
This is not possible, because it is impossible to verify at compile time that all parameters were passed in that map.

scala - is it possible to force immutability on an object?

I mean if there's some declarative way to prevent an object from changing any of it's members.
In the following example
class student(var name:String)
val s = new student("John")
"s" has been declared as a val, so it will always point to the same student.
But is there some way to prevent s.name from being changed by just declaring it like immutable???
Or the only solution is to declare everything as val, and manually force immutability?
No, it's not possible to declare something immutable. You have to enforce immutability yourself, by not allowing anyone to change it, that is remove all ways of modifying the class.
Someone can still modify it using reflection, but that's another story.
Scala doesn't enforce that, so there is no way to know. There is, however, an interesting compiler-plugin project named pusca (I guess it stands for Pure-Scala). Pure is defined there as not mutating a non-local variable and being side-effect free (e.g. not printing to the console)—so that calling a pure method repeatedly will always yield the same result (what is called referentially transparent).
I haven't tried out that plug-in myself, so I can't say if it's any stable or usable already.
There is no way that Scala could do this generally.
Consider the following hypothetical example:
class Student(var name : String, var course : Course)
def stuff(course : Course) {
magically_pure_val s = new Student("Fredzilla", course)
someFunctionOfStudent(s)
genericHigherOrderFunction(s, someFunctionOfStudent)
course.someMethod()
}
The pitfalls for any attempt to actually implement that magically_pure_val keyword are:
someFunctionOfStudent takes an arbitrary student, and isn't implemented in this compilation unit. It was written/compiled knowing that Student consists of two mutable fields. How do we know it doesn't actually mutate them?
genericHigherOrderFunction is even worse; it's going to take our Student and a function of Student, but it's written polymorphically. Whether or not it actually mutates s depends on what its other arguments are; determining that at compile time with full generality requires solving the Halting Problem.
Let's assume we could get around that (maybe we could set some secret flags that mean exceptions get raised if the s object is actually mutated, though personally I wouldn't find that good enough). What about that course field? Does course.someMethod() mutate it? That method call isn't invoked from s directly.
Worse than that, we only know that we'll have passed in an instance of Course or some subclass of Course. So even if we are able to analyze a particular implementation of Course and Course.someMethod and conclude that this is safe, someone can always add a new subclass of Course whose implementation of someMethod mutates the Course.
There's simply no way for the compiler to check that a given object cannot be mutated. The pusca plugin mentioned by 0__ appears to detect purity the same way Mercury does; by ensuring that every method is known from its signature to be either pure or impure, and by raising a compiler error if the implementation of anything declared to be pure does anything that could cause impurity (unless the programmer promises that the method is pure anyway).[1]
This is quite a different from simply declaring a value to be completely (and deeply) immutable and expecting the compiler to notice if any of the code that could touch it could mutate it. It's also not a perfect inference, just a conservative one
[1]The pusca README claims that it can infer impurity of methods whose last expression is a call to an impure method. I'm not quite sure how it can do this, as checking if that last expression is an impure call requires checking if it's calling a not-declared-impure method that should be declared impure by this rule, and the implementation might not be available to the compiler at that point (and indeed could be changed later even if it is). But all I've done is look at the README and think about it for a few minutes, so I might be missing something.