Convert tuple to array in Scala - scala

What is the best way to convert a tuple into an array in Scala? Here "best" means in as few lines of code as possible. I was shocked to search Google and StackOverflow only to find nothing on this topic, which seems like it should be trivial and common. Lists have a a toArray function; why don't tuples?

Use productIterator, immediately followed by toArray:
(42, 3.14, "hello", true).productIterator.toArray
gives:
res0: Array[Any] = Array(42, 3.14, hello, true)
The type of the result shows the main reason why it's rarely used: in tuples, the types of the elements can be heterogeneous, in arrays they must be homogeneous, so that often too much type information is lost during this conversion. If you want to do this, then you probably shouldn't have stored your information in tuples in the first place.
There is simply almost nothing you can (safely) do with an Array[Any], except printing it out, or converting it to an even more degenerate Set[Any]. Instead you could use:
Lists of case classes belonging to a common sealed trait,
shapeless HLists,
a carefully chosen base class with a bit of inheritance,
or something that at least keeps some kind of schema at runtime (like Apache Spark Datasets)
they would all be better alternatives.
In the somewhat less likely case that the elements of the "tuples" that you are processing frequently turn out to have an informative least upper bound type, then it might be because you aren't working with plain tuples, but with some kind of traversable data structure that puts restrictions on the number of substructures in the nodes. In this case, you should consider implementing something like Traverse interface for the structure, instead of messing with some "tuples" manually.

Related

Scala - covariant type in mutable collections

I am new in Scala world and now I am reading the book called "Scala in Action" (by Nilanjan Raychaudhuri), namely the part called "Mutable object need to be invariant" on page 97 and I don't understand the following part which is taken directly from the mentioned book.
Assume ListBuffer is covariant and the following code snippet works without any compilation problem:
scala> val mxs: ListBuffer[String] = ListBuffer("pants")
mxs: scala.collection.mutable.ListBuffer[String] =
ListBuffer(pants)
scala> val everything: ListBuffer[Any] = mxs
scala> everything += 1
res4: everything.type = ListBuffer(1, pants)
Can you spot the problem? Because everything is of the type Any, you can store an
integer value into a collection of strings. This is a disaster waiting to happen. To avoid these kinds of problems, it’s always a
good idea to make mutable objects invariant.
I would have the following questions..
1) What type of everything is in reality? String or Any? The declaration is "val everything: ListBuffer[Any]" and hence I would expect Any and because everything should be type of Any then I don't see any problems to have Integer and String in one ListBuffer[Any]. How can I store integer value into collection of strings how they write??? Why disaster??? Why should I use List (which is immutable) instead of ListBuffer (which is mutable)? I see no difference. I found a lot of answers that mutably collections should have type invariant and that immutable collections should have covariant type but why?
2) What does the last part "res4: everything.type = ListBuffer(1, pants)" mean? What does "everything.type" mean? I guess that everything does not have any method/function or variable called type.. Why is there no ListBuffer[Any] or ListBuffer[String]?
Thanks a lot,
Andrew
1 This doesn't look like a single question, so I have to subdivide it further:
"In reality" everything is ListBuffer[_], with erased parameter type. Depending on the JVM, it holds either 32 or 64 bit references to some objects. The types ListBuffer[String] and ListBuffer[Any] is what the compiler knows about it at compile time. If it "knows" two contradictory things, then it's obviously very bad.
"I don't see any problems to have Integer and String in
one ListBuffer[Any]". There is no problem to have Int and String in ListBuffer[Any], because ListBuffer is invariant. However, in your hypothetical example, ListBuffer is covariant, so you are storing an Int in a ListBuffer[String]. If someone later gets an Int from a ListBuffer[String], and tries to interpret it as String, then it's obviously very bad.
"How can I store integer value into collection
of strings how they write?" Why would you want to do something that is obviously very bad, as explained above?
"Why disaster???" It wouldn't be a major disaster. Java has been living with covariant arrays forever. It's does not lead to cataclysms, it's just bad and annoying.
"Why should I use List (which is immutable) instead of ListBuffer (which is mutable)?" There is no absolute imperative that tells you to always use List and to never use ListBuffer. Use both when it is appropriate. In 99.999% of cases, List is of course more appropriate, because you use Lists to represent data way more often than you design complicated algorithms that require local mutable state of a ListBuffer.
"I found a lot of answers that mutably collections
should have type invariant and that immutable collections should
have covariant type but why?". This is wrong, you are over-simplifying. For example, intensional immutable sets should be neither covariant, nor invariant, but contravariant. You should use covariance, contravariance, and invariance when it's appropriate. This little silly illustration has proven unreasonably effective for explaining the difference, maybe you too find it useful.
2 This is a singleton type, just like in the following example:
scala> val x = "hello"
x: String = hello
scala> val y: x.type = x
y: x.type = hello
Here is a longer discussion about the motivation for this.
I agree with most of what #Andrey is saying I would just add that covariance and contravariance belong exclusively to immutable structures, the exercisce that the books proposes is just a example so people can understand but it is not possible to implement a mutable structure that is covariant, you won't be able to make it compile.
As an exercise you could try to implement a MutableList[+A], you'll find out that there is not way to do this without tricking the compiler putting asInstanceOf everywhere

Arrays.asList() in Scala

Is there an equivalent to Arrays.asList() in Scala?
Or rather, how would you take a String and convert it into an Array, and then a List in Scala?
Any advice would be appreciated.
One common use of Arrays.asList is to produce a list containing the given elements:
Arrays.asList(x, y, z);
The Scala equivalent to that is just
Seq(x, y, z)
The other is to turn an existing array into list:
Arrays.asList(array);
In Scala, this is
array.toSeq
(note that I use Seq instead of List here; in Scala, List is a specific implementation, not an interface. Depending on what you want to do with it, some other type can be appropriate).
Or in many cases, nothing at all. Because Array[A] is implicitly convertible to IndexedSeq[A], collection operations can be done directly on it without converting first.
The same applies to String, with a caveat that operations Lists are good at are quite uncommon for strings, so string.toList is even less likely to be appropriate.

Scala - encapsulating data in objects

Motivations
This question is about working with Lists of data in Scala, and about resorting to either tuples or class objects for holding data. Perhaps some of my assumptions are wrong, so there it goes.
My current approach
As I understand, tuples do not afford the possibility of elegantly addressing their elements beyond the provided ._1, ._2, etc. I can use them, but code will be a bit unpleasant wherever data is extracted far from the lines of code that had defined it.
Also, as I understand, a Scala Map can only use a single type declaration for its values, so it can't diversify the value type of its values except for the case of type inheritance. (to the later point, considering the use of a type hierarchy for Map values "diversity" - may seem to be very artificial unless a class hierarchy fits any "model" intuition to begin with).
So, when I need to have lists where each element contains two or more named data entities, e.g. as below one of type String and one of type List, each accessible through an intelligible name, I resort to:
case class Foo (name1: String, name2: List[String])
val foos: List[Foo] = ...
Then I can later access instances of the list using .name1 and .name2.
Shortcomings and problems I see here
When the list is very large, should I assume this is less performant or more memory consuming than using a tuple as the List's type? alternatively, is there a different elegant way of accomplishing struct semantics in Scala?
In terms of performance, I don't think there is going to be any distinction between a tuple and an instance of a cases class. In fact, a tuple is an instance of a case class.
Secondly, if you're looking for another, more readable way to get the data out of the tuple, I suggest you consider pattern matching:
val (name1, name2) = ("first", List("second", "third"))

Should Scala immutable case classes be defined to hold Seq[T], immutable.Seq[T], List[T] or Vector[T]?

If we want to define a case class that holds a single object, say a tuple, we can do it easily:
sealed case class A(x: (Int, Int))
In this case, retrieving the "x" value will take a small constant amount of time, and this class will only take a small constant amount of space, regardless of how it was created.
Now, let's assume we want to hold a sequence of values instead; we could it like this:
sealed final case class A(x: Seq[Int])
This might seem to work as before, except that now storage and time to read all of x is proportional to x.length.
However, this is not actually the case, because someone could do something like this:
val hugeList = (1 to 1000000000).toList
val a = A(hugeList.view.filter(_ == 500000000))
In this case, the a object looks like an innocent case class holding a single int in a sequence, but in fact it requires gigabytes of memory, and it will take on the order of seconds to access that single element every time.
This could be fixed by specifying something like List[T] as the type instead of Seq[T]; however, this seems ugly since it adds a reference to a specific implementation, while in fact other well behaved implementations, like Vector[T], would also do.
Another worrying issue is that one could pass a mutable Seq[T], so it seems that one should at least use immutable.Seq instead of scala.collection.Seq (although the compiler can't actually enforce the immutability at the moment).
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
Or perhaps Seq is being used just because it's the shortest to type, and in fact it would be best to use immutable.Seq[T], List[T], Vector[T] or something else?
New text added in edit
Looking at the class library, some of the most core functionality like scala.reflect.api.Trees does in fact use List[T], and in general using a concrete class seems a good idea.
But then, why use List and not Vector?
Vector has O(1)/O(log(n)) length, prepend, append and random access, is asymptotically smaller (List is ~3-4 times bigger due to vtable and next pointers), and supports cache efficient and parallelized computation, while List has none of those properties except O(1) prepend.
So, personally I'm leaning towards Vector[T] being the correct choice for something exposed in a library data structure, where one doesn't know what operations the library user will need, despite the fact that it seems less popular.
First of all, you talk both about space and time requirements. In terms of space, your object will always be as large as the collection. It doesn't matter whether you wrap a mutable or immutable collection, that collection for obvious reasons needs to be in memory, and the case class wrapping it doesn't take any additional space (except its own small object reference). So if your collection takes "gigabytes of memory", that's a problem of your collection, not whether you wrap it in a case class or not.
You then go on to argue that a problem arises when using views instead of eager collections. But again the question is what the problem actually is? You use the example of lazily filtering a collection. In general running a filter will be an O(n) operation just as if you were iterating over the original list. In that example it would be O(1) for successive calls if that collection was made strict. But that's a problem of the calling site of your case class, not the definition of your case class.
The only valid point I see is with respect to mutable collections. Given the defining semantics of case classes, you should really only use effectively immutable objects as arguments, so either pure immutable collections or collections to which no instance has any more write access.
There is a design error in Scala in that scala.Seq is not aliased to collection.immutable.Seq but a general seq which can be either mutable or immutable. I advise against any use of unqualified Seq. It is really wrong and should be rectified in the Scala standard library. Use collection.immutable.Seq instead, or if the collection doesn't need to be ordered, collection.immutable.Traversable.
So I agree with your suspicion:
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
No! Not good. It might be convenient, because you can pass in an Array for example without explicit conversion, but I think a cleaner design is to require immutability.

How do I deal with Scala collections generically?

I have realized that my typical way of passing Scala collections around could use some improvement.
def doSomethingCool(theFoos: List[Foo]) = { /* insert cool stuff here */ }
// if I happen to have a List
doSomethingCool(theFoos)
// but elsewhere I may have a Vector, Set, Option, ...
doSomethingCool(theFoos.toList)
I tend to write my library functions to take a List as the parameter type, but I'm certain that there's something more general I can put there to avoid all the occasional .toList calls I have in the application code. This is especially annoying since my doSomethingCool function typically only needs to call map, flatMap and filter, which are defined on all the collection types.
What are my options for that 'something more general'?
Here are more general traits, each of which extends the previous one:
GenTraversableOnce
GenTraversable
GenIterable
GenSeq
The traits above do not specify whether the collection is sequential or parallel. If your code requires that things be executed sequentially (typically, if your code has side effects of any kind), they are too general for it.
The following traits mandate sequential execution:
TraversableOnce
Traversable
Iterable
Seq
LinearSeq
The first one, TraversableOnce only allows you to call one method on the collection. After that, the collection has been "used". In exchange, it is general enough to accept iterators as well as collections.
Traversable is a pretty general collection that has most methods. There are some things it cannot do, however, in which case you need to go to Iterable.
All Iterable implement the iterator method, which allows you to get an Iterator for that collection. This gives it the capability for a few methods not present in Traversable.
A Seq[A] implements the function Int => A, which means you can access any element by its index. This is not guaranteed to be efficient, but it is a guarantee that each element has an index, and that you can make assertions about what that index is going to be. Contrast this with Map and Set, where you cannot tell what the index of an element is.
A LinearSeq is a Seq that provides fast head, tail, isEmpty and prepend. This is as close as you can get to a List without actually using a List explicitly.
Alternatively, you could have an IndexedSeq, which has fast indexed access (something List does not provide).
See also this question and this FAQ based on it.
The most obvious one is to use Traversable as the most general trait which will have the goodies you want. However, I think you are generally better sticking to:
Seq
IndexedSeq
Set
Map
A Seq will cover List, Vector etc, IndexedSeq will cover Vector etc etc. I found myself not using Iterable because I often need (or want) to know the size of the thing I have and back pre scala-2.8 Iterable did not provide access to this, so I kept having to turn things into sequences anyway!
Looks like Traversable and Iterable now have size methods so maybe I should go back to using them! Of course you could start "going mad" with GenTraversableOnce but that is not likely to aid in readability.