Throwaway values - what is the best practice in Scala? - scala

WartRemover's NonUnitStatements requires that statements that aren't returning unit must have an assignment. OK, but sometimes we have to use annoying Java APIs that both mutate and return a value, and we don't actually hardly ever care about the returned value.
So I end up trying this:
val _ = mutateSomething(foo)
But if I have multiple of these, _ is actually a legit val that has been assigned to, so I cannot reassign. Wartremover also rightly will admonish for gratuitous var-usage, so I can't just do var _ =.
I can do the following (need the ; to avoid Scala thinking it is a continue definition unless I add a full newline everytime I do this).
;{val _ = mutateSomething(foo)}
Is there a better way?

My general thoughts about linting tools is that you should not jump through hoops to satisfy them.
The point is to make your code better, both with fewer bugs and stylistically. But just assigning to var _ = does not achieve this. I would first be sure I really really don't care about the return value, not even asserting that it is what I expect it to be. If I don't, I would just add a #SuppressWarnings(Array("org.wartremover.warts.NonUnitStatements")) and maybe a comment about why and be done with it.
Scala is somewhat unique in that it's a somewhat opinionated language that also tries to integrate with another not-so-opinionated language. This leads to pain points. There are different philosophies to dealing with this, but I tend to not sweat it, and just be aware of and try to isolate the boundaries.

I'm posting an answer, but the real credit goes to Shane Delmore for pointing this out:
def discard(evaluateForSideEffectOnly: Any): Unit = {
val _: Any = evaluateForSideEffectOnly
() //Return unit to prevent warning due to discarding value
}
Alternatively (or see #som-snytt's comments below):
#specialized def discard[A](evaluateForSideEffectOnly: A): Unit = {
val _: A = evaluateForSideEffectOnly
() //Return unit to prevent warning due to discarding value
}
Then use it like: discard{ badMutateFun(foo) }.
Unlike the ;{ val _ = ... } solution, it looks way better, and also works in the beginning of a block without needing to alter style (a ; can't come at the start of a block, it must come after a statement).

Related

(Scala) Am I using Options correctly?

I'm currently working on my functional programming - I am fairly new to it. Am i using Options correctly here? I feel pretty insecure on my skills currently. I want my code to be as safe as possible - Can any one point out what am I doing wrong here or is it not that bad? My code is pretty straight forward here:
def main(args: Array[String]): Unit =
{
val file = "myFile.txt"
val myGame = Game(file) //I have my game that returns an Option here
if(myGame.isDefined) //Check if I indeed past a .txt file
{
val solutions = myGame.get.getAllSolutions() //This returns options as well
if(solutions.isDefined) //Is it possible to solve the puzzle(crossword)
{
for(i <- solutions.get){ //print all solutions to the crossword
i.solvedCrossword foreach println
}
}
}
}
-Thanks!! ^^
When using Option, it is recommended to use match case instead of calling 'isDefined' and 'get'
Instead of the java style for loop, use higher-order function:
myGame match {
case Some(allSolutions) =>
val solutions = allSolutions.getAllSolutions
solutions.foreach(_.solvedCrossword.foreach(println))
case None =>
}
As a rule of thumb, you can think of Option as a replacement for Java's null pointer. That is, in cases where you might want to use null in Java, it often makes sense to use Option in Scala.
Your Game() function uses None to represent errors. So you're not really using it as a replacement for null (at least I'd consider it poor practice for an equivalent Java method to return null there instead of throwing an exception), but as a replacement for exceptions. That's not a good use of Option because it loses error information: you can no longer differentiate between the file not existing, the file being in the wrong format or other types of errors.
Instead you should use Either. Either consists of the cases Left and Right where Right is like Option's Some, but Left differs from None in that it also takes an argument. Here that argument can be used to store information about the error. So you can create a case class containing the possible types of errors and use that as an argument to Left. Or, if you never need to handle the errors differently, but just present them to the user, you can use a string with the error message as the argument to Left instead of case classes.
In getAllSolutions you're just using None as a replacement for the empty list. That's unnecessary because the empty list needs no replacement. It's perfectly fine to just return an empty list when there are no solutions.
When it comes to interacting with the Options, you're using isDefined + get, which is a bit of an anti pattern. get can be used as a shortcut if you know that the option you have is never None, but should generally be avoided. isDefined should generally only be used in situations where you need to know whether an option contains a value, but don't need to know the value.
In cases where you need to know both whether there is a value and what that value is, you should either use pattern matching or one of Option's higher-order functions, such as map, flatMap, getOrElse (which is kind of a higher-order function if you squint a bit and consider by-name arguments as kind-of like functions). For cases where you want to do something with the value if there is one and do nothing otherwise, you can use foreach (or equivalently a for loop), but note that you really shouldn't do nothing in the error case here. You should tell the user about the error instead.
If all you need here is to print it in case all is good, you can use for-comprehension which is considered quite idiomatic Scala way
for {
myGame <- Game("mFile.txt")
solutions <- myGame.getAllSolutions()
solution <- solutions
crossword <- solution.solvedCrossword
} println(crossword)

Is it safe to remove elements from a collection.mutable.HashSet during iteration?

A mutable Set's retain method is implemented as follows:
def retain(p: A => Boolean): Unit =
for (elem <- this.toList) // SI-7269 toList avoids ConcurrentModificationException
if (!p(elem)) this -= elem
But if I implement my own method that doesn't make a copy for iterating, nothing blows up.
def dumbRetain[A](self: mutable.Set[A], p: A => Boolean): Unit =
for (elem <- self)
if (!p(elem)) self -= elem
dumbRetain(mutable.HashSet(1,2,3,4,5,6), Set(2,4,6))
// everything is ok
I see that SI-7269's test case uses the JavaConversions wrapper around a java Set/Map, and it seems like the issue arises from the underlying java collection.
I know there will never be a java collection passed to my algorithm, so can I use dumbRetain without worrying about the ConcurrentModificationException? Or is this "coincidental behavior" that I shouldn't rely on?
edit to clarify, I would be using dumbRetain as an implementation detail in an algorithm which would be in full control of what it passes to dumbRetain. And this would be run in a single-threaded context.
This seems to rely on the specific implementation of mutable.HashSet, and there is nothing in the API that guarantees that it would work for all other implementations of mutable.Set, even if we exclude all wrappers for the Java collections.
The for-loop
for (elem <- self) {
...
}
is desugared into foreach, which for mutable.HashSet is implemented as follows:
override def foreach[U](f: A => U) {
var i = 0
val len = table.length
while (i < len) {
val curEntry = table(i)
if (curEntry ne null) f(entryToElem(curEntry))
i += 1
}
}
Essentially, it simply loops through the Array of the underlying FlatHashTable, and invokes the passed function f on every element. The whole foreach simply does not have any lines which could throw anything, it doesn't check for concurrent [footnote-1] modifications at all.
A ConcurrentModificationException seems to be the less troubling case: at least, your program fails fast, and even returns a detailed stack trace that points to the line in which the problem occurred. It would be actually much worse if it simply deteriorated into undefined behavior without throwing anything. This would be the worst case. However, this worst case shouldn't occur for collections from the standard library: Throw ConcurrentModificationException exception's in scala collections? #188
Quote:
In scala/scala#5295 (merged in to 2.12.x) I made sure that removing the element last returned from an iterator would not cause a problem for the iterator.
So, as long as you clearly state in the documentation that only the collections from standard library are supported, you will most likely not have any problems using it in your own code. But if you use it in a public interface, this would be an invitation for a bug analogous to "SI-7269" quoted in your question.
[footnote-1] "concurrent" as in "ConcurrentModificationException", not as in "concurrently executed threads".
EDIT: I've tried to choose less ambiguous formulations. Great Thanks #Dima for the feedback and the numerous suggestions.
Yeah, you can do it, as long as you are sure this is the scala's native HashSet implementation, not a wrapper around java ... and with understanding, that this is not thread-safe, and should never be used concurrently (the original HashSet.retain is that way too as well as the other mutators).
Better yet, just use immutable Set.filter, unless you actually have real hard evidence (not just intuition) demonstrating that your specific case absolutely requires mutable container.

scala programming practice with option

May be my design is flawed (most probably it is) but I have been thinking about the way Option is used in Scala and I am not so very happy about it. Let's say I have 3 methods calling one another like this:
def A(): reads a file and returns something
def B(): returns something
def C(): Side effect (writes into DB)
and C() calls B() and in turn B() calls A()
Now, as A() is dependent on I/O ops, I had to handle the exceptions and return and Option otherwise it won't compile (if A() does not return anything). As B() receives an Option from A() and it has to return something, it is bound to return another Option to C(). So, you can possibly imagine that my code is flooded with match/case Some/case None (don't have the liberty to use getOrElse() always). And, if C() is dependent on some other methods which also return Option, you would be scared to look at the definition of C().
So, am I missing something? Or how flawed is my design? How can I improve it?
Using match/case on type Option is often useful when you want to throw away the Option and produce some value after processing the Some(...) but a different value of the same type if you have a None. (Personally, I usually find fold to be cleaner for such situations.)
If, on the other hand, you're passing the Option along, then there are other ways to go about it.
def a():Option[DataType] = {/*read new data or fail*/}
def b(): Optioon[DataType] = {
... //some setup
a().map{ inData =>
... //inData is real, process it for output
}
}
def c():Unit = {
... //some setup
b().foreach{ outData =>
... //outData is real, write it to DB
}
}
am I missing something?
Option is one design decision, but there can be others. I.e what happens when you want to describe the error returned by the API? Option can only tell you two kinds of state, either I have successfully read a value, or I failed. But sometimes you really want to know why you failed. Or more so, If I return None, is it because the file isn't there or because I failed on an exception (i.e. I don't have permission to read the file?).
Whichever path you choose, you'll usually be dealing with one or more effects. Option is one such effect which representing a partial function, i.e. this operation may not yield a result. While using pattern matching with Option, as other said, is one way of handling it, there are other operations which decrease the verbosity.
For example, if you want to invoke an operation in case the value exists and another in case it isn't and they both have the same return type, you can use Option.fold:
scala> val maybeValue = Some(1)
maybeValue: Some[Int] = Some(1)
scala> maybeValue.fold(0)(x => x + 1)
res0: Int = 2
Generally, there are many such combinators defined on Option and other effects, and they might seem cumbersome at the beginning, later they come to grow on you and you see their real power when you want to compose operations one after the other.

Scala: Try .getOrElse vs if/else

I'm a fairly new Scala developer. I am an experienced Java developer and so far I've been enjoying Scala's simplicity. I really like the functional constructs and quite often they force you to write cleaner code. However recently I noticed due to comfort and simplicity I end up using constructs I wouldn't necessarily use in Java and would actually be considered a bad practice e.g.
private def convertStringToSourceIds(value: String) : Seq[Integer] = {
Try(value.split(",").toSeq.map(convertToSourceId(_))).getOrElse(Seq())
}
The same code snippet can be written as
private def convertStringToSourceIds(value: String) : Seq[Integer] = {
if(value!=null) value.split(",").toSeq.map(convertToSourceId(_)) else Seq()
}
A part of me realizes that the Try/getOrElse block is designed with Options in mind but quite often it makes code more readable and handles cases you might have missed (which of course isn't always a good thing).
I would be interested to know what is the opinion of an experienced Scala developer on the matter.
I am not claiming any "experience" title but I much prefer your second construct for a few reasons
Throwing an exception (an NPE in this case) is expensive and best avoided; it should remain just that, exceptional
if are expressions in Scala, which avoids declaring "dangling" variables to hold the result of the test (just like ternary operators). Alternatively the match..case construct provides for very readable code.
I would personally return an Option[Seq[Integer]] to "pass back" the information that the values was null and facilitate further chaining of your function.
Something like
private def convertStringToSourceIds(value: String) : Option[Seq[Integer]] = value match {
case null => None
case _ => Some(value.split(",").map(convertToSourceId(_)))
}
note 1: not sure you need the toSeq
note 2: for good or bad, looks a bit Haskellish
The combination of Scala + FP makes it almost doubly certain you will get different opinions :)
Edit
Please read comments below for additional reasons and alternatives, i.e,
def convertStringToSourceIds(value: String): Option[Array[String]] = Option(value).map(_.split(",").map(convertToSourceId(_)))
If you can, use Options instead of null to show when a value is missing.
Assuming that you can't use Options, a more readable way to handle this could be
private def convertStringToSourceIds(value: String) : Seq[Integer] = value match {
case null => Seq();
case s => s.split(",").toSeq.map(convertToSourceId(_));
}

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.