Spark Notebook: Does GeoPointsChart accept a Dataframe? - scala

I have a Dataframe which has two columns latitude and longitude. I passed that to GeoPointsChart. The output is "showing 1000 rows" but it isn't actually showing me anything. Has anyone faced the same issue? Is this a syntactical mistake?

I have not worked with this notebook, but it looks like you have an API call somewhere producing a java.util.List. That class does not have a toSeq method. You want to convert java.util.List into its Scala equivalent.
First, import this:
import scala.collection.JavaConverters._
This import enriches (or pimps) the Java collections with an asScala method to do the conversion:
val testAsScala = test.asScala.toSeq
I would note though that the call to toSeq is unnecessary since the result already mixes in Seq. Still, with asScala you can now work entirely with Scala collections, which are so much easier.

Related

No implicits found for parameter evidence

I have a line of code in a scala app that takes a dataframe with one column and two rows, and assigns them to variables start and end:
val Array(start, end) = datesInt.map(_.getInt(0)).collect()
This code works fine when run in a REPL, but when I try to put the same line in a scala object in Intellij, it inserts a grey (?: Encoder[Int]) before the .collect() statement, and show an inline error No implicits found for parameter evidence$6: Encoder[Int]
I'm pretty new to scala and I'm not sure how to resolve this.
Spark needs to know how to serialize JVM types to send them from workers to the master. In some cases they can be automatically generated and for some types there are explicit implementations written by Spark devs. In this case you can implicitly pass them. If your SparkSession is named spark then you miss following line:
import spark.implicits._
As you are new to Scala: implicits are parameters that you don't have to explicitly pass. In your example map function requires Encoder[Int]. By adding this import, it is going to be included in the scope and thus passed automatically to map function.
Check Scala documentation to learn more.

Converting a list of either to a cats ValidatedNel

Given:
def convert[T](list: List[Either[String, T]]): Validated[NonEmptyList[String], NonEmptyList[T]] =
NonEmptyList.fromList(list)
.toRight("list is empty")
.flatMap(...
How do I flat map the NonEmptyList[Either[String, T]] so ultimately I end up with my Validated return value?
Is there anything in the cats library to account for this scenario? Or do I need to do this manually following something like: Best way to turn a Lists of Eithers into an Either of Lists?
I'd write this as follows:
import cats.data.{ NonEmptyList, Validated, ValidatedNel }
import cats.instances.list._, cats.syntax.list._
import cats.syntax.either._
import cats.syntax.option._
import cats.syntax.traverse._
def convert[T](list: List[Either[String, T]]): ValidatedNel[String, NonEmptyList[T]] =
list.traverse(_.toValidatedNel).andThen(_.toNel.toValidNel("list is empty"))
First we flip the whole thing inside out while transforming the Eithers to Validateds (with traverse and toValidatedNel), to get a ValidatedNel[String, List[T]], and then we handle the case where the result is empty (with andThen and toNel).
The andThen is probably one of the pieces you're missing—it's essentially flatMap for Validated (but without the implications and syntactic sugar baggage that flatMap brings). If you wanted you could probably pretty easily change my version to do the empty list check first, as in your sketch, but the way I've written it feels a little more natural to me.
Footnote: I have no idea why the enrichment method for Option is named toValidNel while the one for Either is toValidatedNel—I hadn't noticed this before, probably because I hadn't used them in the same line before. This seems unfortunate, especially since we're stuck with it for a while now that Cats 1.0 is out.
Another footnote: note that you'll need the -Ypartial-unification compiler option enabled for traverse to work without type parameters if you're on 2.11.

forEach in scala shows expected: Consumer[_ >:Path] actual: (Path) => Boolean

Wrong syntax problem in recursively deleting scala files
Files.walk(path, FileVisitOption.FOLLOW_LINKS)
.sorted(Comparator.reverseOrder())
.forEach(Files.deleteIfExists)
The issue is that you're trying to pass a scala-style function to a method expecting a java-8-style function. There's a couple libraries out there that can do the conversion, or you could write it yourself (it's not complicated), or probably the simplest is to just convert the java collection to a scala collection that has a foreach method expecting a scala-style function as an argument:
import scala.collection.JavaConverters._
Files.walk(path, FileVisitOption.FOLLOW_LINKS)
.sorted(Comparator.reverseOrder())
.iterator().asScala
.foreach(Files.deleteIfExists)
In Scala 2.12 I expect this should work:
...forEach(Files.deleteIfExists(_: Path))
The reason you need to specify argument type is because expected type is Consumer[_ >: Path], not Consumer[Path] as it would be in Scala.
If it doesn't work (can't test at the moment), try
val deleteIfExists: Consumer[Path] = Files.deleteIfExists(_)
...forEach(deleteIfExists)
Before Scala 2.12, Joe K's answer is the correct one.

How to convert Dataset to a Scala Iterable?

Is there a way to convert a org.apache.spark.sql.Dataset to a scala.collection.Iterable? It seems like this should be simple enough.
You can do myDataset.collect or myDataset.collectAsList.
But then it will no longer be distributed. If you want to be able to spread your computations out on multiple machines you need to use one of the distributed datastructures such as RDD, Dataframe or Dataset.
You can also use toLocalIterator if you just need to iterate the contents on the driver as it has the advantage of only loading one partition at a time, instead of the entire dataset, into memory. Iterator is not an Iterable (although it is a Traverable) but depending on what you are doing it may be what you want.
You could try something like this:
def toLocalIterable[T](dataset: Dataset[T]): Iterable[T] = new Iterable[T] {
def iterator = scala.collection.JavaConverters.asScalaIterator(dataset.toLocalIterator)
}
The conversion via JavaConverters.asScalaIterator is necessary because the toLocalIterator method of Dataset returns a java.util.Iterator instead of a scala.collection.Iterator (which is what the toLocalIterator on RDD returns.) I suspect this is a bug.
In Scala 2.11 you can do the following:
import scala.collection.JavaConverters._
dataset.toLocalIterator.asScala.toIterable

Scala Unit Testing - Mocking an implicitly wrapped function

I have a question concerning unit tests that I'm trying to achieve using Mockito in Scala. I've also looked up ScalaMock but it sounds like the feature is not provided as well. I suppose that maybe I'm looking from a narrow way to the solution and there might be a different perspective or approach to what im doing so all your opinions are welcomed.
Basically, I want to mock a function that is available to the object using implicit conversion, and I don't have any control to change how that is done. Since I'm a user to the library. The concrete example is similar to the following scenario
rdd: RDD[T] = //existing RDD
sqlContext: SQLContext = //existing sqlcontext
import sqlContext.implicits._
rdd.toDF()
/*toDF() doesn't originally exist at RDD but is implicitly added when importing sqlContext.implicits._*/
Now In the testing, I'm mocking the rdd and the sqlContext and I want to mock the toDF() function. I Can't mock the function toDF() since it doesn't exist on the RDD level. Even if I do a simple trick, importing the mocked sqlContext.implicit._ I get an error that any function that is not publicaly available to the object can't be mocked. I even tried to mock the code that is implicitly executed until toDF() but I get stuck with Final/Pivate[in accessible] classes that I also can't mock. Your suggestions are more than welcomed. Thanks in advance :)