Writting function that can operate on RDD and Seq in Scala - scala

I am trying to write functions that can receive both Spark RDD and Scala native Seq, so that I can showcase the performance difference of the two approaches. However, I couldn't figure out a common type or interface for the aforesaid function parameters. Let's imagine something simple like computing the mean using a map operation. Both RDD and Seq have this operation. I've tried using the type Either[RDD[Int], Seq[Int]] but it just doesn't typecheck :/.
Any pointer would be very appreciated :)

Basically, you can't. They don't show any common superclass - besides AnyRef I guess. Their map functions have completely different signature (params etc) even though they share a name (and purpose)

Related

How to find max date from stream in scala using CompareTo?

I am new to Scala and trying to explore how I can use Java functionalities with Scala.
I am having stream of LocalDate which is a Java class and I am trying to find maximum date out of my list.
var processedResult : Stream[LocalDate] =List(javaList)
.toStream
.map { s => {
//some processing
LocalDate.parse(str, formatter)
}
}
I know we can do easily by using .compare() and .compareTo() in Java but I am not sure how do I use the same thing over here.
Also, I have no idea how Ordering works in Scala when it comes to sorting.
Can anyone suggest how can get this done?
First of all, a lot of minor details that I will point out since it seems you are pretty new to the language and I expect those to help you with your learning path.
First, avoid var at all costs, especially when learning.
While mutability has its place and is not always wrong, forcing you to avoid it while learning will help you. Particularly, avoid it when it doesn't provide any value; like in this case.
Second, this List(javaList) doesn't do what you think it does. It creates a single element Scala List whose unique element is a Java List. What you probably want is to transform that Java List into a Scala one, for that you can use the CollectionConverters.
import scala.jdk.CollectionConverters._ // This works if you are in 2.13
// if you are in 2.12 or lower use: import scala.collection.JavaConverters._
val scalaList = javaList.asScala.toList
Third, not sure why you want to use a Scala Stream, a Stream is for infinite or very large collections where you want all the transformations to be made lazily and only produce elements as they are consumed (also, btw, it was deprecated in 2.13 in favour of LazyList).
Maybe, you are confused because in Java you need a "Stream" to apply functional operations like map? If so, note that in Scala all collections provide the same rich API.
Fourth, Ordering is a Typeclass which is a functional pattern for Polymorphism. On its own, this is a very broad question so I won't answer it here, but I hope the two links provide insight.
The TL;DR; is simple, it is just that an Ordering for a type T knows how to order (sort) elements of type T. Thus operations like max will work for any collection of any type if, and only if, the compiler can prove the existence of an Ordering for that type if it can then it will pass such value implicitly to the method call for you; again the implicits topic is very broad and deserves its own question.
Now for your particular question, you can just call max or maxOption in the List or Stream and that is all.
Note that max will throw if the List is empty, whereas maxOption returns an Option which will be empty (None) for an empty input; idiomatic Scala favour the latter over the former.
If you really want to use compareTo then you can provide your own Ordering.
scalaList.maxOption(Ordering.fromLessThan[LocalDate]((d1, d2) => d1.compareTo(d2) < 0))
Ordering[A] is a type class which defines how to compare 2 elements of type A. So to compare LocalDates you need Ordering[LocalDate] instance.
LocalDate extends Comparable in Java and Scala conveniently provides instances for Comparables so when you invoke:
Ordering[java.time.LocalDate]
in REPL you'll see that Scala is able to provide you the instance without you needing to do anything (you could take a look at the list of methods provided by this typeclass).
Since you have and Ordering in implicit scope which types matches the Stream's type (e.g. Stream[LocalDate] needs Ordering[LocalDate]) you can call .max method... and that's it.
val processedResult : Stream[LocalDate] = ...
val newestDate: LocalDate = processedResult.max

Is there a way iterate on contents of a shapeless HMap (heterogeneous map)?

I'm trying to implement some sort of polymorphic cache when translating a source AST1 to a target AST2 (building a DSL compiler in scala). Since I wanted the cache to retain precise types for the translation results I had a go with shapeless HMap. It works as intended, however at some point I need to iterate on the cache contents to dump it to a file that must document the translation process, and would later be used to construct a backtranslation from A2 to A1. By looking at the source code of HMap I saw there is an underlying HashMap[Any, Any] that I cannot access since it is not a val in the HMap, and I saw that the HMap was in fact a Polymorphic function value, which means I can apply it over an HList which types correspond to a subset of the HMap key's types, but what I would really like to do is to be able to fold a polymorphic function which accepts polymorhic (key,value) arguments over this HMap to retrieve its contents in another form (for instance sliced in a tuple of standard HashMaps). Is there any way to do that?
Best.

Scalaz Bind[Seq] typeclass

I'm currently porting some code from traditional Scala to Scalaz style.
It's fairly common through most of my code to use the Seq trait in my exposed API signatures rather than a concrete type (i.e. List, Vector) directly. However, this poses some problem with Scalaz, since it doesn't provide an implementation of a Bind[Seq] typeclass.
i.e. This will work correctly.
List(1,2,3,4) >>= bindOperation
But this will not
Seq(1,2,3,4) >>= bindOperation
failing with the error could not find implicit value for parameter F0: scalaz.Bind[Seq]
I assume this is an intentional design decision in Scalaz - however am unsure about intended/best practice on how to precede.
Should I instead write my code directly to List/Vector as appropriate instead of using the more flexible Seq interface? Or should I simply define my own Bind[Seq] typeclass?
The collections library does backflips to accommodate subtyping: when you use map on a specific collection type (list, map, etc.), you'll (usually) get the same type back. It manages this through the use of an extremely complex inheritance hierarchy together with type classes like CanBuildFrom. It gets the job done (at least arguably), but the complexity doesn't feel very principled. It's a mess. Lots of people hate it.
The complexity is generally pretty easy to avoid as a library user, but for a library designer it's a nightmare. If I provide a monad instance for Seq, that means all of my users' types get bumped up the hierarchy to Seq every type they use a monadic operation.
Scalaz folks tend not to like subtyping very much, anyway, so for the most part Scalaz stays around the leaves of the hierarchy—List, Vector, etc. You can see some discussion of this decision on the mailing list, for example.
When I first started using Scalaz I wrote a lot of utility code that tried to provide instances for Seq, etc. and make them usable with CanBuildFrom. Then I stopped, and now I tend to follow Scalaz in only ever using List, Vector, Map, and Set in my own code. If you're committed to "Scalaz style", you should do that as well (or even adopt Scalaz's own IList, ISet, ==>>, etc.). You're not going to find clear agreement on best practices more generally, though, and both approaches can be made to work, so you'll just need to experiment to find which you prefer.

What are the differences between mapcat in Clojure and flatMap in Scala in terms of what they operate on?

I understand the equivalent to flatMap in Scala is mapcat in Clojure.
I have an inkling that mapcat in clojure only works with sequences, unlike flatMap in Scala which is more flexible.
My question is - what are the differences between mapcat in Clojure and flatMap in Scala in terms of what they operate on?
Assumptions:
I understand that Scala has a rich type system and Clojure has optional typing - I'm interested to know if the is a limitation in the parameters that mapcat accepts that make it only have a subset of flatMaps functionality.
I know a little about Scala but it seems to me flatMap is the Scala bind function in a monad and mapcat is a possible implementation of the bind function for the sequence monad in Clojure. So they are the same for sequences.
But Scala for example has a flatMap function for Futures: it takes a future and a mapping function and returns a future that will complete after input future completes. This operation doesn't seem to be a simple mapcat in Clojure. It may be realized this way instead
(defn flat-map [f mv] (mapcat (fn [v] (future (f #v))) mv))
So, no. They are not the same, neither in terms of what they operate on. In Scala flatMap is the common name for different functions and for example Futures' flatMap coordinates input and output futures. A simple mapcat in Clojure won't work because it won't return a future.
They seem very similar and appear to work on the same kind of things. From looking at the documentation and examples I can't see a functional difference.
mapcat works on sequences, and just about every clojure data type can be a sequence. If you pass something that is not already a seq to mapcat it will call seq on it automatically, so in practice you can pass just about all clojure values to mapcat. If you want to iterate over a tree you would need to call prewalk or postwalk to specify the traversal order.
In the standard Scala library: Responder, Future, Parser, ControlContext. None of them are sequences or particularly sequence-like. There is also a slight variation in ParseResult.
The real difference is that flatMap is polymorphic on the type, and mapcat isn't. So any type can decide to provide a "flatMap" like behaviour. That's how you get things like Futures being flatMapable.
In Clojure, mapcat is specific to the seqable type. Any seqable can be coerced into a sequence, and all sequence can be mapped and concatenated. The mapcat implementation will check if the input is seqable, if so, it will call seq on it to coerce it to a sequence, and then it will map and cat that sequence and give you back a sequence. You don't get back a result of the original type.
In Scala, if you implement IterableLike trait (I think that's the right interface), you get the default flatMap implementation which is a bit like the Clojure one minus the coercion to sequence. But, many types also provide a custom implementation of flatMap, making it generic in that way.

Create an immutable list from a java.lang.Iterator

I'm using a library (JXPath) to query a graph of beans in order to extract matching elements. However, JXPath returns groups of matching elements as an instance of java.lang.Iterator and I'd rather like to convert it into an immutable scala list. Is there any simpler way of doing than iterating over the iterator and creating a new immutable list at each iteration step ?
You might want to rethink the need for a List, although it feels very familiar when coming from Java, and List is the default implementation of an immutable Seq, it often isn't the best choice of collection.
The operations that list is optimal for are those already available via an iterator (basically taking consecutive head elements and prepending elements). If an iterator doesn't already give you what you need, then I can pretty much guarantee that a List won't be your best choice - a vector would be more appropriate.
Having got that out the way... The recommended technique to convert between Java and Scala collections (since Scala 2.8.1) is via scala.collection.JavaConverters. This gives you more control than JavaConversions and avoids some possible implicit conflicts.
You won't have a direct implicit conversion this way. Instead, you get asScala and asJava methods pimped onto collections, allowing you to perform the conversions explicitly.
To convert a Java iterator to a Scala iterator:
javaIterator.asScala
To convert a Java iterator to a Scala List (via the scala iterator):
javaIterator.asScala.toList
You may also want to consider converting toSeq instead of toList. In the case of iterators, this'll return a Stream - allowing you to retain the lazy behaviour of iterators within the richer Seq interface.
EDIT:
There's no toVector method, but (as Daniel pointed out) there's a toIndexedSeq method that will return a Vector as the default IndexedSeq subclass (just as List is the default Seq).
javaIterator.asScala.toIndexedSeq
EDIT: You should probably look at Kevin Wright's answer, which provides a better solution available since Scala 2.8.1, with less implicit magic.
You can import the implicit conversions from scala.collection.JavaConversions and then create a new Scala collection seamlessly, e.g. like this:
import collection.JavaConversions._
println(List() ++ javaIterator)
Your Java iterator is converted to a Scala iterator by JavaConversions.asScalaIterator. A Scala iterator with elements of type A implements TraversableOnce[A], which is the argument type needed to concatenate collections with ++.
If you need another collection type, just change List() to whatever you need (e.g., IndexedSeq() or collection.mutable.Seq(), etc.).