Conversion of breakOut - use iterator or view? - scala

Scala 2.13 migration guide contains a note regarding how to port collection.breakOut:
collection.breakOut no longer exists, use .view and .to(Collection) instead.
and few paragraphs below in a overview table there is:
Description
Old Code
New Code
Automatic Migration Rule
collection.breakOutno longer exists
val xs: List[Int]= ys.map(f)(collection.breakOut)
val xs =ys.iterator.map(f).to(List)
Collection213Upgrade
The scala-collection-migration rewrite rule uses .iterator. What is the difference between the two? Is there a reason to prefer one to the other?

When used like that there is no real difference.
A View can be reused while an Iterator must be discarded after it's been used once.
val list = List(1,2,3,4,5)
val view = list.view
val viewPlus1 = view.map(_ + 1).toList
view.foreach(println) // works as expected
val it = list.iterator
val itPlus1 = it.map(_ + 1).toList
it.foreach(println) // undefined behavior
In its simplest form a View[A] is a wrapper around a function () => Iterator[A], so all its methods can create a fresh Iterator[A] and delegate to the appropriate method on that iterator.

Related

What is the difference between List.view and LazyList?

I am new to Scala and I just learned that LazyList was created to replace Stream, and at the same time they added the .view methods to all collections.
So, I am wondering why was LazyList added to Scala collections library, when we can do List.view?
I just looked at the Scaladoc, and it seems that the only difference is that LazyList has memoization, while View does not. Am I right or wrong?
Stream elements are realized lazily except for the 1st (head) element. That was seen as a deficiency.
A List view is re-evaluated lazily but, as far as I know, has to be completely realized first.
def bang :Int = {print("BANG! ");1}
LazyList.fill(4)(bang) //res0: LazyList[Int] = LazyList(<not computed>)
Stream.fill(3)(bang) //BANG! res1: Stream[Int] = Stream(1, <not computed>)
List.fill(2)(bang).view //BANG! BANG! res2: SeqView[Int] = SeqView(<not computed>)
In 2.13, you can't force your way back from a view to the original collection type:
scala> case class C(n: Int) { def bump = new C(n+1).tap(i => println(s"bump to $i")) }
defined class C
scala> List(C(42)).map(_.bump)
bump to C(43)
res0: List[C] = List(C(43))
scala> List(C(42)).view.map(_.bump)
res1: scala.collection.SeqView[C] = SeqView(<not computed>)
scala> .force
^
warning: method force in trait View is deprecated (since 2.13.0): Views no longer know about their underlying collection type; .force always returns an IndexedSeq
bump to C(43)
res2: scala.collection.IndexedSeq[C] = Vector(C(43))
scala> LazyList(C(42)).map(_.bump)
res3: scala.collection.immutable.LazyList[C] = LazyList(<not computed>)
scala> .force
bump to C(43)
res4: res3.type = LazyList(C(43))
A function taking a view and optionally returning a strict realization would have to also take a "forcing function" such as _.toList, if the caller needs to choose the result type.
I don't do this sort of thing at my day job, but this behavior surprises me.
The difference is that LazyList can be generated from huge/infinite sequence, so you can do something like:
val xs = (1 to 1_000_000_000).to(LazyList)
And that won't run out of memory. After that you can operate on the lazy list with transformers. You won't be able to do the same by creating a List and taking a view from it. Having said that, SeqView has a much reacher set of methods compared to LazyList and that's why you can actually take a view of a LazyList like:
val xs = (1 to 1_000_000_000).to(LazyList)
val listView = xs.view

Best way to handle Error on basic Array

val myArray = Array("1", "2")
val error = myArray(5)//throws an ArrayOutOfBoundsException
myArray has no fixed size, which explains why a call like performed on the above second line might happen.
First, I never really understood the reasons to use error handling for expected errors. Am I wrong to consider this practice as bad, resulting from poor coding skills or an inclination towards laziness?
What would be the best way to handle the above case?
What I am leaning towards: basic implementation (condition) to prevent accessing the data like depicted;
use Option;
use Try or Either;
use a try-catch block.
1 Avoid addressing elements through index
Scala offers a rich set of collection operations that are applied to Arrays through ArrayOps implicit conversions. This lets us use combinators like map, flatMap, take, drop, .... on arrays instead of addressing elements by index.
2 Prevent access out of range
An example I've seen often when parsing CSV-like data (in Spark):
case class Record(id:String, name: String, address:String)
val RecordSize = 3
val csvData = // some comma separated data
val records = csvData.map(line => line.split(","))
.collect{case arr if (arr.size == RecordSize) =>
Record(arr(0), arr(1), arr(2))}
3 Use checks that fit in the current context
If we are using monadic constructs to compose access to some resource, use a fitting way of lift errors to the application flow:
e.g. Imagine we are retrieving user preferences from some repository and we want the first one:
Option
def getUserById(id:ID):Option[User]
def getPreferences(user:User) : Option[Array[Preferences]]
val topPreference = for {
user <- userById(id)
preferences <- getPreferences(user)
topPreference <- preferences.lift(0)
} yield topPreference
(or even better, applying advice #1):
val topPreference = for {
user <- userById(id)
preferences <- getPreferences(user)
topPreference <- preferences.headOption
} yield topPreference
Try
def getUserById(id:ID): Try[User]
def getPreferences(user:User) : Try[Array[Preferences]]
val topPreference = for {
user <- userById(id)
preferences <- getPreferences(user)
topPreference <- Try(preferences(0))
} yield topPreference
As general guidance: Use the principle of least power.
If possible, use error-free combinators: = array.drop(4).take(1)
If all that matters is having an element or not, use Option
If we need to preserve the reason why we could not find an element, use Try.
Let the types and context of the program guide you.
If indexing myArray can be expected to error on occasion, then it sounds like Option would be the way to go.
myArray.lift(1) // Option[String] = Some(2)
myArray.lift(5) // Option[String] = None
You could use Try() but why bother if you already know what the error is and you're not interested in catching or reporting it?
Use arr.lift (available in standard library) which returns Option instead of throwing exception.
if not use safely
Try to access the element safely to avoid accidentally throwing exceptions in middle of the code.
implicit class ArrUtils[T](arr: Array[T]) {
import scala.util.Try
def safely(index: Int): Option[T] = Try(arr(index)).toOption
}
Usage:
arr.safely(4)
REPL
scala> val arr = Array(1, 2, 3)
arr: Array[Int] = Array(1, 2, 3)
scala> implicit class ArrUtils[T](arr: Array[T]) {
import scala.util.Try
def safely(index: Int): Option[T] = Try(arr(index)).toOption
}
defined class ArrUtils
scala> arr.safely(4)
res5: Option[Int] = None
scala> arr.safely(1)
res6: Option[Int] = Some(2)

Scala - Multiple ways of initializing containers

I am new to Scala and was wondering what is the difference between initializing a Map data structure using the following three ways:
private val currentFiles: HashMap[String, Long] = new HashMap[String, Long]()
private val currentJars = new HashMap[String, Long]
private val currentVars = Map[String, Long]
There are two different parts to your question.
first, the difference between using an explicit type or not (cases 1 and 2) goes for any class, not necessarily containers.
val x = 1
Here the type is not explicit, and the compiler will try to figure it out using type inference. The type of x will be Int.
val x: Int = 1
Same as above, but now explicitly. If whatever you have at the right of = can't be cast to an Int, you will get a compiler error.
val x: Any = 1
Here we will still store a 1, but the type of the variable will be a parent class, using polymorphism.
The second part of your question is about initialization. The base initialization is as in java:
val x = new List[Int]()
This calls the class constructor and returns a new instance of the exact class.
Now, there is a special method called .apply that you can define and call with just parenthesis, like this:
val x = Seq[Int]()
This is a shortcut for this:
val x = Seq.apply[Int]()
Notice this is a function on the Seq object. The return type is whatever the function wants it to be, it is just another function. That said, it is mostly used to return a new instance of the given type, but there are no guarantees, you need to look at the function documentation to be sure of the contract.
That said, in the case of val x = Map[String, Long]() the implementation returns an actual instance of immutable.HashMap[String, Long], which is kind of the default Map implementation.
Map and HashMap are almost equivalent, but not exactly the same thing.
Map is trait, and HashMap is a class. Although under the hood they may be the same thing (scala.collection.immutable.HashMap) (more on that later).
When using
private val currentVars = Map[String, Long]()
You get a Map instance. In scala, () is a sugar, under the hood you are actually calling the apply() method of the object Map. This would be equivalent to:
private val currentVars = Map.apply[String, Long]()
Using
private val currentJars = new HashMap[String, Long]()
You get a HashMap instance.
In the third statement:
private val currentJars: HashMap[String, Long] = new HashMap[String, Long]()
You are just not relying anymore on type inference. This is exactly the same as the second statement:
private val currentJars: HashMap[String, Long] = new HashMap[String, Long]()
private val currentJars = new HashMap[String, Long]() // same thing
When / Which I use / Why
About type inference, I would recommend you to go with type inference. IMHO in this case it removes verbosity from the code where it is not really needed. But if you really miss like-java code, then include the type :) .
Now, about the two constructors...
Map vs HashMap
Short answer
You should probably always go with Map(): it is shorter, already imported and returns a trait (like a java interface). This last reason is nice because when passing this Map around you won't rely on implementation details since Map is just an interface of what you want or need.
On the other side, HashMap is an implementation.
Long answer
Map is not always a HashMap.
As seen in Programming in Scala, Map.apply[K, V]() can return a different class depending on how many key-value pairs you pass to it (ref):
Number of elements Implementation
0 scala.collection.immutable.EmptyMap
1 scala.collection.immutable.Map1
2 scala.collection.immutable.Map2
3 scala.collection.immutable.Map3
4 scala.collection.immutable.Map4
5 or more scala.collection.immutable.HashMap
When you have less then 5 elements you get an special class for each of these small collections and when you have an empty Map, you get a singleton object.
This is done mostly to get better performance.
You can try it out in repl:
import scala.collection.immutable.HashMap
val m2 = Map(1 -> 1, 2 -> 2)
m2.isInstanceOf[HashMap[Int, Int]]
// false
val m5 = Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5, 6 -> 6)
m5.isInstanceOf[HashMap[Int, Int]]
// true
If you are really curious you can even take a look at the source code.
So, even for performance you should also probably stick with Map().

Using getOrElseUpdate of TrieMap in Scala

I am using the getOrElseUpdate method of scala.collection.concurrent.TrieMap (from 2.11.6)
// simplified for clarity
val trie = new TrieMap[Int, Future[String]]
def foo(): String = ... // a very long process
val fut: Future[String] = trie.getOrElseUpdate(id, Future(foo()))
As I understand, if I invoke the getOrElseUpdate in multiple threads without any synchronization the foo is invoked just once.
Is it correct ?
The current implementation is that it will be invoked zero or one times. It may be invoked without the result being inserted, however. (This is standard behavior for CAS-based maps as opposed to ones that use synchronized.)

How to convert Enumeration to Seq/List in scala?

I'm writing a servlet, and need to get all parameters from the request. I found request.getParameterNames returns a java.util.Enumeration, so I have to write code as:
val names = request.getParameterNames
while(names.hasMoreElements) {
val name = names.nextElement
}
I wanna know is there any way to convert a Enumeration to a Seq/List, then I can use the map method?
Use JavaConverters
See https://stackoverflow.com/a/5184386/133106
Use a wrapper Iterator
You could build up a wrapper:
val nameIterator = new Iterator[SomeType] { def hasNext = names.hasMoreElements; def next = names.nextElement }
Use JavaConversions wrapper
val nameIterator = new scala.collection.JavaConversions.JEnumerationWrapper(names)
Using JavaConversions implicits
If you import
import scala.collection.JavaConversions._
you can do it implicitly (and you’ll also get implicit conversions for other Java collecitons)
request.getParameterNames.map(println)
Use Iterator.continually
You might be tempted to build an iterator using Iterator.continually like an earlier version of this answer proposed:
val nameIterator = Iterator.continually((names, names.nextElement)).takeWhile(_._1.hasMoreElements).map(_._2)
but it's incorrect as the last element of the enumerator will be discarded.
The reason is that the hasMoreElement call in the takeWhile is executed after calling nextElement in the continually, thus discarding the last value.
Current best practice (since 2.8.1) is to use scala.collection.JavaConverters
Scaladoc here
This class differs from JavaConversions slightly, in that the conversions are not fully automatic, giving you more control (this is a good thing):
import collection.JavaConverters._
val names = ...
val nameIterator = names.asScala
Using this mechanism, you'll get appropriate and type-safe conversions for most collection types via the asScala/asJava methods.
I don't disagree with any of the other answers but I had to add a type cast to get this to compile in Scala 2.9.2 and Java 7.
import scala.collection.JavaConversions._
...
val names=request.getParameterNames.asInstanceOf[java.util.Enumeration[String]].toSet
A comment on Debilski's answer that the Iterator.continually approach is wrong because it misses the last entry. Here's my test:
val list = new java.util.ArrayList[String]
list.add("hello")
list.add("world")
val en = java.util.Collections.enumeration(list)
val names = Iterator.continually((en, en.nextElement)).takeWhile(_._1.hasMoreElements).map(_._2)
.foreach { name => println("name=" + name) }
Output is
name=hello
The second item (name=world) is missing!
I got this to work by using JavaConversions.enumerationAsScalaIterator as mentioned by others.
Note I don't have enough rep to comment on Debilski's post directly.