scala mutable.Map put/get handles null's in an unexpected way? - scala

I know one is not supposed to use nulls in scala but sometimes when interoperating with Java it happens. The way a scala mutable map handles this seems off though:
scala> import scala.collection.mutable
import scala.collection.mutable
scala> val m: mutable.Map[String, String] = mutable.Map.empty
m: scala.collection.mutable.Map[String,String] = Map()
scala> m.put("Bogus", null)
res0: Option[String] = None
scala> m.get("Bogus")
res1: Option[String] = Some(null)
scala> m.getOrElse("Bogus", "default")
res2: String = null
I would have expected m.get to return None in this case. Almost seems like a bug, like somewhere in the code there was a Some(v) instead of Option(v)
Is there discussion w/r/t to changing this behavior?

I would have expected m.get to return None in this case.
Why? None would mean the key "Bogus" is not in the map, but you just put it in (with value null).
Java's Map API has problems distinguishing "the value for this key is null" from "this key is not in the map", but Scala's doesn't.

Null is subtype of String:
scala> implicitly[Null <:< String]
res3: Null <:< String = <function1>
Therefore null is a valid String value:
scala> val s: String = null
s: String = null
If you want to store a null as a String in a map, it's your good right.
Compared to Java's Map#get (let's call it javaLikeGet), the Scala's get behaves roughly as follows:
def get(k: K) = if (containsKey(k)) {
Some(this.javaLikeGet(k))
} else {
None
}
and not like what you have assumed:
def get(k: K) = Option(this.javaLikeGet(k))
The latter version (presumably what you thought) would get a null for an existing key, pass it to Option(...), and return None. But the former version (which imitates how the real implementation works) would notice that the key exists, and wrap the null returned by javaLikeGet into a Some.
The simple and consistent rule is:
If the key k exists, then get(k) returns Some[V], otherwise it returns None.
This is much less surprising than the strange behavior of Java's get that returns null in two entirely different situations.
This is the Billion-Dollar Mistake, but Scala is not the language that is likely to fix it, because it has to interop with Java. My guess is that there is and will be no discussion about changing this behavior, at least not until something fundamentally changes in the entire programming landscape.

Related

Read a tuple from a file in Scala

my Task is to read registrations from a file given like:
Keri,345246,2
Ingar,488058,2
Almeta,422016,1
and insert them into a list(Tuple of (String, Int, Int).
So far I wrote this:
The problem is that I don‘t understand why I can't try to cast value2 and value3 to Int even tho they should be Strings because they come from an Array of Strings. Could someone tell me, what my mistake is, I am relatively new to Scala
What is the point of using Scala if you are going to write Java code?
This is how you would properly read a file as a List of case classes.
import scala.io.Source
import scala.util.Using
// Use proper names for the fields.
final case class Registration(field1: String, field2: Int, field3: Int)
// You may change the error handling logic.
def readRegistrationsFromFile(fileName: String): List[Registration] =
Using(Source.fromFile(fileName)) { source =>
source.getLines().map(line => line.split(',').toList).flatMap {
case field1Raw :: field2Raw :: field3Raw :: Nil =>
for {
field2 <- field2Raw.toIntOption
field3 <- field3Raw.toIntOption
} yield Registration(field1 = field1Raw.trim, field2, field3)
case _ =>
None
}.toList
}.getOrElse(default = List.empty)
(feel free to ask any question you may have about this code)
In Scala, in order to convert a String to an Int you need explicit casting.
This can be achieved like this if you are sure the string can be parsed into a integer:
val values = values(1).toInt
If you cannot trust the input (and you probably should not), you can use .toIntOption which will give you a Option[Int] defined if the value was converted successfully or undefined if the string did not represent an integer.
The previous answers are correct. I would add a few more points.
saveContent is declared as a val. This is means it cannot be changed (assigned another value). You can use the Scala REPL (command-line) tool to check:
scala> val saveContent = Nil
val v: collection.immutable.Nil.type = List()
scala> saveContent = 3
^
error: reassignment to val
Instead, you could use a var, although it would be more idiomatic to have an overall pattern like the one provided by Luis Miguel's answer - with pattern-matching and a for-comprehension.
You can use the Scala REPL to check the types of the variables, too. Splitting a String will always lead to more Strings, not Ints, etc.
> val values = "a,2,3".split(",")
val values: Array[String] = Array(a, 2, 3)
> values(2)
val res3: String = 3
This is why a cast like Gael's is necessary.
Array-type access is done with parentheses and not square brackets, in Scala. See above, and http://scalatutorials.com/tour/interactive_tour_of_scala_lists for more details.

When should I use Option.empty[A] and when should I use None in Scala?

In Scala, when I want to set something to None, I have a couple of choices: using None or Option.empty[A].
Should I just pick one and use it consistently, or are there times when I should be using one over the other?
Example:
scala> def f(str: Option[String]) = str
f: (str: Option[String])Option[String]
scala> f(None)
res0: Option[String] = None
scala> f(Option.empty)
res1: Option[String] = None
I would stick to None whenever possible, which is almost always. It is shorter and widely used. Option.empty allows you to specify the type of underlying value, so use it when you need to help type inference. If the type is already known for the compiler None would work as expected, however while defining new variable
var a = None
would cause infering a as None.type which is unlikely what you wanted.
You can then use one of the couple ways to help infer what you need
# var a = Option.empty[String]
a: Option[String] = None
# var a: Option[String] = None
a: Option[String] = None
# var a = None: Option[String] // this one is rather uncommon
a: Option[String] = None
Another place when compiler would need help:
List(1, 2, 3).foldLeft(Option.empty[String])((a, e) => a.map(s => s + e.toString))
(Code makes no sense but just as an example) If you were to omit the type, or replace it with None the type of accumulator would be infered to Option[Nothing] and None.type respectively.
And for me personally this is the place I would go with Option.empty, for other cases I stick with None whenever possible.
Short answer use None if talking about a value for example when passing parameter to any function, use Option.empty[T] when defining something.
var something = Option.empty[String] means something is None for now but can become Some("hede") in the future. On the other hand var something = None means nothing. you can't reassign it with Some("hede") compiler will be angry:
found : Some[String]
required: None.type
So, this means None and Option.empty[T] are not alternatives. You can pass None to any Option[T] but you can't pass Some[T] to None.type
Given that Option[A].empty simply returns None:
/** An Option factory which returns `None` in a manner consistent with
* the collections hierarchy.
*/
def empty[A] : Option[A] = None
I'd say:
As you said, be consistent throughout the codebase. Making it consistent would mean that programmers entrying your codebase have one less thing to worry about. "Should I use None or Option.empty? Well, I see #cdmckay is using X throughout the call base, I'll use that as well"
Readability - think what conveys the point you want the most. If you were to read a particular method, would it make more sense to you if it returned an empty Option (let's disregard for a moment the fact that the underlying implementation is simply returning None) or an explicit None? IMO, I think of None as a non-existent value, as the documentation specifies:
/** This case object represents non-existent values.
*
* #author Martin Odersky
* #version 1.0, 16/07/2003
*/
Following are worksheet exports using Scala and Scalaz .
def f(str: Option[String]) = str //> f: (str: Option[String])Option[String]
f(None) //> res1: Option[String] = None
var x:Option[String]=None //> x : Option[String] = None
x=Some("test")
x //> res2: Option[String] = Some(test)
x=None
x
Now using Scalaz ,
def fz(str: Option[String]) = str //> fz: (str: Option[String])Option[String]
fz(none) //> res4: Option[String] = None
var xz:Option[String]=none //> xz : Option[String] = None
xz=some("test")
xz //> res5: Option[String] = Some(test)
xz=none
xz
Note that all the statements evaluate in the same way irrespective of you use None or Option.Empty. How ?
As you can see it is important to let Scala know of your intentions via the return type in the var x:Option[String]=None statement. This allows a later assignment of a Some. However a simple var x=None will fail in later lines because this will make the variable x resolve to None.type and not Option[T].
I would think that one should follow the convention. For assignments i would go for the var x:Option[String]=None option. Also whenever using None it is good to use a return type (in this case Option[String]) so that the assignment does not resolve to None.type.
Only in cases where i have no way to provide a type and i need some assignment done will i go for Option.empty
As everyone else pointed out, it's more a matter of personal taste, in which most of the people prefer None, or, in some cases, you explicitly need to put the type because the compiler can't infer.
This question can be extrapolated to other Scala classes, such as Sequences, Map, Set, List and so on. In all of them you have several ways to define empty state. Using sequence:
Seq()
Seq.empty
Seq.empty[Type]
From the 3, I prefer the second, because:
The first (Seq()) is error prone. It looks like if someone wanted to create a sequence and forgot to add the elements
The second (Seq.empty) is explicit about the desire of having an empty sequence
While the third (Seq.empty[Type]) is as explicit as the second, it is more verbose, so I don't use typically

Scala Syntactic Sugar for converting to `Option`

When working in Scala, I often want to parse a field of type [A] and convert it to a Option[A], with a single case (for example, "NA" or "") being converted to None, and the other cases being wrapped in some.
Right now, I'm using the following matching syntax.
match {
case "" => None
case s: String => Some(s)
}
// converts an empty String to None, and otherwise wraps it in a Some.
Is there any more concise / idiomatic way to write this?
There are a more concise ways. One of:
Option(x).filter(_ != "")
Option(x).filterNot(_ == "")
will do the trick, though it's a bit less efficient since it creates an Option and then may throw it away.
If you do this a lot, you probably want to create an extension method (or just a method, if you don't mind having the method name first):
implicit class ToOptionWithDefault[A](private val underlying: A) extends AnyVal {
def optNot(not: A) = if (underlying == not) None else Some(underlying)
}
Now you can
scala> 47.toString optNot ""
res1: Option[String] = Some(47)
(And, of course, you can always create a method whose body is your match solution, or an equivalent one with if, so you can reuse it for that particular case.)
I'd probably use filterNot here:
scala> Option("hey").filterNot(_ == "NA")
res0: Option[String] = Some(hey)
scala> Option("NA").filterNot(_ == "NA")
res1: Option[String] = None
It requires you to think of Option as a collection with one or zero elements, but if you get into that habit it's reasonably clear.
A simple and intuitive approach includes this expression,
if (s.isEmpty) None else Some(s)
This assumes s labels the value to be otherwise matched (thanks to #RexKerr for the note).

Why Some(null) isn't considered None?

I am curious:
scala> Some(null) == None
res10: Boolean = false
Why isn't Some(null) transformed to None?
You should use Option(null) to reach the desired effect and return None.
Some(null) just creates a new Option with a defined value (hence Some) which is actually null, and there are few valid reasons to ever create one like this in real code.
Unfortunately, null is a valid value for any AnyRef type -- a consequence of Scala's interoperability with Java. So a method that takes an object of type A and, internally, store it inside an Option, might well need to store a null inside that option.
For example, let's say you have a method that takes the head of a list, checks if that head correspond to a key in a store, and then return true if it is. One might implement it like this:
def isFirstAcceptable(list: List[String], keys: Set[String]): Boolean =
list.headOption map keys getOrElse false
So, here's the thing... if the that inside list and keys come from some Java API, they both may well contain null! If Some(null) wasn't possible, then isFirstAcceptable(List[String](null), Set[String](null)) would return false instead of true.
I think the others in the thread do a good job explaining why Some(null) "should" exist, but if you happen to be getting Some(null) somewhere and want a quick way to turn it into None, I've done this before:
scala> val x: Option[String] = Some(null)
x: Option[String] = Some(null)
scala> x.flatMap(Option(_))
res8: Option[String] = None
And when the starting Option is a legit non-null value things work as you probably want:
scala> val y: Option[String] = Some("asdf")
y: Option[String] = Some(asdf)
scala> y.flatMap(Option(_))
res9: Option[String] = Some(asdf)
Much of Scala's WTFs can be attributed to its need for compatibility with Java. null is often used in Java as a value, indicating, perhaps the absence of a value. For example hashMap.get(key) will return null if the key is not matched.
With this in mind, consider the following possible values from wrapping a null returning method in an Option:
if (b) Some(hashMap.get(key)) else None
// becomes -->
None // the method was not invoked;
Some(value) // the method was invoked and a value returned; or
Some(null) // the method was invoked and null was returned.
Some(null) seems sufficiently distinct from None in this case to warrant allowing it in the language.
Of course if this is not desirable in your case then simply use:
if (b) Option(hashMap.get(key)) else None
// becomes -->
None // the method was not invoked or the mapped value was null; or
Some(value) // the method was invoked and a value returned
As a simple thought experiment, consider two lists of Strings, one of length 5 and one of length 20.
Because we're running on the JVM, it's possible to insert null as a valid element into one of these lists - so put that in the long list as element #10
What, then, should the difference be in the values returned from the two following expressions?
EDIT: Exchanged get for lift, I was thinking of maps...
shortList.lift(10) //this element doesn't exist
longList.lift(10) //this element exists, and contains null
Because Option is considered to be a Functor and being a Functor means:
Has unit function (apply or just Option("blah") in Scala)
Has map function which transforms value from T=>B but not a context
Obeys 2 Functor laws - identity law and associative law
In this topic the main part is #2 - Option(1).map(t=>null) can not transform context. Some should remain. Otherwise it brakes associative law!
Just consider the following laws example:
def identity[T](v: T) = v
def f1(v: String) = v.toUpperCase
def f2(v: String) = v + v
def fNull(v: String): String = null
val opt = Option("hello")
//identity law
opt.map(identity) == opt //Some(hello) == Some(hello)
//associative law
opt.map(f1 _ andThen f2) == opt.map(f1).map(f2) //Some(HELLOHELLO) == Some(HELLOHELLO)
opt.map(fNull _ andThen f2) == opt.map(fNull).map(f2) //Some(nullnull) == Some(nullnull)
But what if Option("hello").map(t=>null) produced None? Associative law would be broken:
opt.map(fNull _ andThen f2) == opt.map(fNull).map(f2) //Some(nullnull) != None
That is my thought, might be wrong

Is this the proper way to initialize null references in Scala?

Let's say I have a MyObject instance which is not initialized:
var a:MyObject = null
is this the proper way to initialize it to null?
Alternatives
Use null as a last resort. As already mentioned, Option replaces most usages of null. If you using null to implement deferred initialisation of a field with some expensive calculation, you should use a lazy val.
Canonical initialisation to null
That said, Scala does support null. I personally use it in combination with Spring Dependency Injection.
Your code is perfectly valid. However, I suggest that you use var t: T = _ to initialize t to it's default value. If T is a primitive, you get the default specific to the type. Otherwise you get null.
Not only is this more concise, but it is necessary when you don't know in advance what T will be:
scala> class A[T] { var t: T = _ }
defined class A
scala> new A[String].t
res0: String = null
scala> new A[Object].t
res1: java.lang.Object = null
scala> new A[Int].t
res2: Int = 0
scala> new A[Byte].t
res3: Byte = 0
scala> new A[Boolean].t
res4: Boolean = false
scala> new A[Any].t
res5: Any = null
Advanced
Using var t: T= null is a compile error if T is unbounded:
scala> class A[T] { var t: T = null }
<console>:5: error: type mismatch;
found : Null(null)
required: T
class A[T] { var t: T = null }
You can add an implicit parameter as evidence that T is nullable -- a subtype of AnyRef not a subtype of NotNull This isn't fully baked, even in Scala 2.8, so just consider it a curiousity for now.
scala> class A[T](implicit ev: Null <:< T) { var t: T = null }
defined class A
The canonical answer is don't use null. Instead, use an option type:
var a = None : Option[MyObject]
When you want to set it:
a = Some(foo)
And when you want to read from it, test for None:
a match {
case None => Console.println("not here")
case Some(value) => Console.println("got: "+value)
}
As David and retronym have already mentioned, it's a good idea to use Option in most cases, as Option makes it more obvious that you have to handle a no-result situation. However, returning Some(x) requires an object creation, and calling .get or .getOrElse can be more expensive than an if-statement. Thus, in high-performance code, using Option is not always the best strategy (especially in collection-lookup code, where you may look up a value very many times and do not want correspondingly many object creations). Then again, if you're doing something like returning the text of an entire web page (which might not exist), there's no reason not to use Option.
Also, just to add to retronym's point on generics with null, you can do this in a fully-baked way if you really mean it should be null:
class A[T >: Null] { var t: T = null }
and this works in 2.7 and 2.8. It's a little less general than the <:< method, because it doesn't obey NotNull AFAIK, but it otherwise does exactly what you'd hope it would do.
I came across this question since scalastyle told me to not use null when initialising an object within my test with null.
My solution without changing any type that satisfied scalastyle:
var a: MyObject = (None: Option[MyObject]).orNull