Use of the word Canonical - scala

This might be a bit ambiguous but I'm struggling to get a tangible explanation of the use of the word canonical when reading about Scala and FP in general.
Some statements I have read:
Vector is the canonical concrete type for Seq
What is the canonical way to do this in Scala?
My understanding of the canonical form (in computing) is that it represents -
The default unique representation where more than one possible representation is possible.
Is it acceptable to say that asking what is the canonical way of doing something, is really just the same as asking what is the idiomatic way of doing something?
Is there a way to discover what the canonical type is for any abstract type in the Scala hierarchy in a particular context?

As #SarveshKumarSingh points out, generally in Scala, the word canonical means the same as it does elsewhere in English. So yes as to point 1, you can use canonical like that.
Point 2 is more interesting because the Scala standard library strongly suggests that certain concrete collections are the canonical implementations of abstract traits by having the apply method on the companion object return a specific concrete class that is just upcast to the abstract type.
To tell which concrete class is actually created when you do say Seq(1, 2, 3), if you have a little bit of familiarity with the collections hierarchy, you can just take a look at the source. In this case you'll see the Builder for a Seq is a ListBuffer, which means Seq.apply will give you back a List (not a Vector, you get that from IndexedSeq).

Related

Convert tuple to array in Scala

What is the best way to convert a tuple into an array in Scala? Here "best" means in as few lines of code as possible. I was shocked to search Google and StackOverflow only to find nothing on this topic, which seems like it should be trivial and common. Lists have a a toArray function; why don't tuples?
Use productIterator, immediately followed by toArray:
(42, 3.14, "hello", true).productIterator.toArray
gives:
res0: Array[Any] = Array(42, 3.14, hello, true)
The type of the result shows the main reason why it's rarely used: in tuples, the types of the elements can be heterogeneous, in arrays they must be homogeneous, so that often too much type information is lost during this conversion. If you want to do this, then you probably shouldn't have stored your information in tuples in the first place.
There is simply almost nothing you can (safely) do with an Array[Any], except printing it out, or converting it to an even more degenerate Set[Any]. Instead you could use:
Lists of case classes belonging to a common sealed trait,
shapeless HLists,
a carefully chosen base class with a bit of inheritance,
or something that at least keeps some kind of schema at runtime (like Apache Spark Datasets)
they would all be better alternatives.
In the somewhat less likely case that the elements of the "tuples" that you are processing frequently turn out to have an informative least upper bound type, then it might be because you aren't working with plain tuples, but with some kind of traversable data structure that puts restrictions on the number of substructures in the nodes. In this case, you should consider implementing something like Traverse interface for the structure, instead of messing with some "tuples" manually.

The object-functional impedance mismatch

In OOP it is good practice to talk to interfaces not to implementations. So, e.g., you write something like this (by Seq I mean scala.collection.immutable.Seq :)):
// talk to the interface - good OOP practice
doSomething[A](xs: Seq[A]) = ???
not something like the following:
// talk to the implementation - bad OOP practice
doSomething[A](xs: List[A]) = ???
However, in pure functional programming languages, such as Haskell, you don't have subtype polymorphism and use, instead, ad hoc polymorphism through type classes. So, for example, you have the list data type and a monadic instance for list. You don't need to worry about using an interface/abstract class because you don't have such a concept.
In hybrid languages, such as Scala, you have both type classes (through a pattern, actually, and not first-class citizens as in Haskell, but I digress) and subtype polymorphism. In scalaz, cats and so on you have monadic instances for concrete types, not for the abstract ones, of course.
Finally the question: given this hybridism of Scala do you still respect the OOP rule to talk to interfaces or just talk to concrete types to take advantage of functors, monads and so on directly without having to convert to a concrete type whenever you need to use them? Put differently, is in Scala still good practice to talk to interfaces even if you want to embrace FP instead of OOP? If not, what if you chose to use List and, later on, you realized that a Vector would have been a better choice?
P.S.: In my examples I used a simple method, but the same reasoning applies to user defined types. E.g.:
case class Foo(bars: Seq[Bar], ...)
What I would attack here is your "concrete vs. interface" concept. Look at it this way: every type has an interface, in the general sense of the term "interface." A "concrete" type is just a limiting case.
So let's look at Haskell lists from this angle. What's the interface of a list? Well, lists are an algebraic data type, and all such data types have the same general form of interface and contract:
You can construct instances of the type using its constructors according to their arities and argument types;
You can observe instances of the type by matching against their constructors according to their arities and argument types;
Construction and observation are inverses—when you pattern match against a value, what you get out is exactly what was put into it.
If you look at it in these terms, I think the following rule works pretty well in either paradigm:
Choose types whose interfaces and contracts match exactly with your requirements.
If their contract is weaker than your requirements, then they won't maintain invariants that you need;
If their contracts are stronger than your requirements, you may unintentionally couple yourself to the "extra" details and limit your ability to change the program later on.
So you no longer ask whether a type is "concrete" or "abstract"—just whether it fits your requirements.
These are my two cents on this subject. In Haskell you have data types (ADTs). You have both lists (linked lists) and vectors (int-indexed arrays) but they don't share a common supertype. If your function takes a list you cannot pass it a vector.
In Scala, being it a hybrid OOP-FP language, you have subtype polymorphism too so you may not care if the client code passes a List or a Vector, just require a Seq (possibly immutable) and you're done.
I guess to answer to this question you have to ask yourself another question: "Do I want to embrace FP in toto?". If the answer is yes then you shouldn't use Seq or any other abstract superclass in the OOP sense. Of course, the exception to this rule is the use of a trait/abstract class when defining ADTs in Scala. For example:
sealed trait Tree[+A]
case object Empty extends Tree[Nothing]
case class Node[A](value: A, left: Tree[A], right: Tree[A]) extends Tree[A]
In this case one would require Tree[A] as a type, of course, and then use, e.g., pattern matching to determine if it's either Empty or Node[A].
I guess my feeling about this subject is confirmed by the red book (Functional Programming in Scala). There they never use Seq, but List, Vector and so on. Also, haskellers, don't care about these problems and use lists whenever they need linked-list semantic and vectors whenever they need int-indexed-array semantic.
If, on the other hand, you want to embrace OOP and use Scala as a better Java then OK, you should follow the OOP best practice to talk to interfaces not to implementations.
If you're thinking: "I'd rather opt for mostly functional" then you should read Erik Meijer's The Curse of the Excluded Middle.

Scala: when to use explicit type annotations

I've been reading a lot of other people's Scala code recently, and one of the things that I have difficultly with (coming from Java) is a lack of explicit type annotations.
It's certainly convenient when writing code to be able to leave out type annotations -- however when reading code I often find that explicit type annotations help me to understand at a glance what code is doing more easily.
The Scala style guide (http://docs.scala-lang.org/style/types.html) doesn't seem to provide any definitive guidance on this, stating:
Use type inference where possible, but put clarity first, and favour explicitness in public APIs.
To my mind, this is a bit contradictory. While it's clearly obvious what type this variable is:
val tokens = new HashMap[String, Int]
It's not so obvious what type this one is:
val tokens = readTokens()
So, if I was putting clarity first I would probably annotate all variables where the type is not already declared on the same line.
Do any Scala practitioners have guidance on this? Am I crazy to be considering adding type annotations to my local variables? I'm particularly interested in hearing from folks who spend a lot of time reading scala code (for example, in code reviews), as well as writing it.
It's not so obvious what type this one is:
val tokens = readTokens()
Good names are important: the name is plural, ergo it returns some collection of some kind. The most general collection types in Scala are Traversable and Iterator, and they mostly share a common interface, so it's not really important which one of the two it is. The name also talks about "reading tokens", ergo it obviously should return Tokens in some fashion. And last but not least, the method call has parentheses, which according to the style guide means it has side-effects, so I wouldn't count on being able to traverse the collection more than once.
Ergo, the return type is something like
Traversable[Token]
or
Iterator[Token]
and which of the two it is doesn't really matter because their client interfaces are mostly identical.
Note also that the latter constraint (only traversing the collection once) isn't even captured in the type, even if you were providing an explicit type, you would still have to look at the name and the style!

Scalaz Bind[Seq] typeclass

I'm currently porting some code from traditional Scala to Scalaz style.
It's fairly common through most of my code to use the Seq trait in my exposed API signatures rather than a concrete type (i.e. List, Vector) directly. However, this poses some problem with Scalaz, since it doesn't provide an implementation of a Bind[Seq] typeclass.
i.e. This will work correctly.
List(1,2,3,4) >>= bindOperation
But this will not
Seq(1,2,3,4) >>= bindOperation
failing with the error could not find implicit value for parameter F0: scalaz.Bind[Seq]
I assume this is an intentional design decision in Scalaz - however am unsure about intended/best practice on how to precede.
Should I instead write my code directly to List/Vector as appropriate instead of using the more flexible Seq interface? Or should I simply define my own Bind[Seq] typeclass?
The collections library does backflips to accommodate subtyping: when you use map on a specific collection type (list, map, etc.), you'll (usually) get the same type back. It manages this through the use of an extremely complex inheritance hierarchy together with type classes like CanBuildFrom. It gets the job done (at least arguably), but the complexity doesn't feel very principled. It's a mess. Lots of people hate it.
The complexity is generally pretty easy to avoid as a library user, but for a library designer it's a nightmare. If I provide a monad instance for Seq, that means all of my users' types get bumped up the hierarchy to Seq every type they use a monadic operation.
Scalaz folks tend not to like subtyping very much, anyway, so for the most part Scalaz stays around the leaves of the hierarchy—List, Vector, etc. You can see some discussion of this decision on the mailing list, for example.
When I first started using Scalaz I wrote a lot of utility code that tried to provide instances for Seq, etc. and make them usable with CanBuildFrom. Then I stopped, and now I tend to follow Scalaz in only ever using List, Vector, Map, and Set in my own code. If you're committed to "Scalaz style", you should do that as well (or even adopt Scalaz's own IList, ISet, ==>>, etc.). You're not going to find clear agreement on best practices more generally, though, and both approaches can be made to work, so you'll just need to experiment to find which you prefer.

What are the important features of the shapeless API (in Scala), and what do they do?

I'm trying to learn shapeless (2.0.0). It seems like an amazing tool and I'm very excited about it, but I am having problems moving forward. Because there is not yet much documentation, I've been poring over examples and the source code. I am having difficulties because most examples use multiple shapeless concepts and, in the source code, one shapeless type will often make use of others, so I end up going down the shapeless rabbit hole, so to speak. I think it would be helpful to have a list of the important features of the shapeless API along with a simple description of what each one does. As I'm clearly unqualified to make such a list, I am asking you, the humans of Stack Overflow!
For each feature, please include as much as you can of the following:
The feature's name and how to import it.
A short, simple description of what it does.
Why is this feature important / why would someone bother to use it?
A simple example that uses as few other shapeless or advanced Scala concepts as possible.
By a feature of the API, I mean a single thing (e.g., a type, a function, an object, etc.), or small set of closely coupled such things, that is defined by shapeless 2.0 and can be imported and used in a program. I am not referring to general concepts such as higher order polymorphism or type-level recursion. And please only include one feature per answer. Maybe if there are enough answers and enough others also use this list, we can use the votes on the answers to rank the importance of the different features.
Note: I am aware of this feature list. I think it's great, and it has helped me a lot. However, I'm looking for something more similar to API documentation than a list of things you can do. I can understand many of the examples and infer the purposes of some features from them, but I will often get tripped up on some particular piece and be unable to figure out its function.
HList
An HList is a list-like data structure that can hold objects of multiple types. HList is actually a trait. A given HList will have a more specific type that fully specifies the types of its contents. HLists are immutable. The usual way to import HList functionality is via
import shapeless._
HLists are useful when you need an immutable collection of heterogenous objects that isn't a tuple.
HLists are constructed using HNil, which is the empty HList, and the :: operator. The following example shows how to create an HList that counts to "cat":
val hl = 1 :: 2 :: "cat" :: HNil
The type of hl above includes two Int types and a String type. Shapeless includes many useful operations on HLists, which should be the subjects of other answers.