Spark cannot find case class on classpath - scala

I have an issue where Spark is failing to generate code for a case class. Here is the spark error
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 52, Column 43: Identifier expected instead of '.'
Here is the referenced line in the generated code
/* 052 */ private com.avro.message.video.public.MetricObservation MapObjects_loopValue34;
It should be noted that com.avro.message.video.public.MetricObservation is a nested case class in part of a larger hierarchy. It is also used in other places in the code fine. It should also be noted that this pipeline works fine if I use the RDD API, but I want to use the Dataset API because I want to write out the Dataset in parquet. Has anyone seen this issue before?
I'm using Scala 2.11 and Spark 2.1.0. I was able to upgrade to Spark 2.2.1 and the issue is still there.

Do you think that SI-7555 or something like it has any bearing on this? I have noticed the past that Scala reflection has had issues generating TypeTags for statically nested classes. Do you think something like that is going on or is this strictly a catalyst issue in spark? You might want to file a spark ticket too.

So it turns out that changing the package name of the affect class "fixes" (ie made go away) the problem. I really have no idea why this is or even how to reproduce it in a small test case. What worked for me was I just created a higher level package that work. Specifically com.avro.message.video.public -> com.avro.message.publicVideo.

Related

Spark giving multiple datasource error on saving parquet file

I am trying to learn spark and scala, on my trying to write the dataframe object of my result to parquet file by calling the parquet method, i am getting error as such
Code Base that fails:-
df2.write.mode(SaveMode.Overwrite).parquet(outputPath)
This fails too
df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).parquet(outputPath)
Error Log:-
Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:707)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:967)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:304)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:848)
How ever if I called another method for the save, the code works properly,
This works fine:-
df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).save(outputPath)
Although I have a solution for the issue, i'd like to understand why the first approach is not working and how I can solve it.
The details of the specification i am using are:-
Scala 2.12.9
Java 1.8
Spark 2.4.4
P.S. This issue is only seen on spark-submit

Getting a scala stacktrace

When my scala-js code throws an error, I'd like to send a sensible stacktrace back to my server to put in the logs. By "sensible stacktrace" I mean something that gives the Scala methods, filenames, and line numbers rather than the transpiled javascript code.
I've made good progress by getting the source map and using the Javascript source-map library (https://github.com/mozilla/source-map) to translate each element of the stacktrace from javascript to the corresponding Scala code.
My issue: I need the column number of the javascript code that threw the error but don't see how to obtain it. Printing a StackTraceElement gives a result similar to
oat.browser.views.query.QueryRunView$.renderParamsTable$1(https://localhost:9443/assets/browser-fastopt.js:34787:188)
I need the "188" at the end of the line but don't see how to get it other than calling toString and parsing the result. Looking at the StackTraceElement code, the column number is a private variable with nothing in the API to access it.
Is there another approach to this that I'm completely overlooking? Anything built into scala-js that converts a javascript stacktrace to a Scala stacktrace?
I subsequently found the StackTraceJS library which does what I needed. I combined a ScalaJS facade for it with a facade for JSNlog to come up with a package that meets my needs pretty well. See jsnlog-facade. It logs to the browser console and/or the server, with Scala stack traces. Demo code included.
There is nothing in the public API to access the column number because this is a Java API, and Scala.js cannot add public members to Java APIs.
To work around this issue in the case of StackTraceElement, we export getColumnNumber(): Int to JavaScript. You can therefore use the following code to retrieve the column number:
def columnNumberOfStackTraceElement(ste: StackTraceElement): Int =
ste.asInstanceOf[js.Dynamic].getColumnNumber().asInstanceOf[Int]
Note that this "feature" is undocumented, and might change without notice in a future major version of Scala.js. If it disappears, it will be replaced by something reliable. In the meantime, the above should get you going.

Play! Framework Templating Engine issues importing long class names

I've got a List of classes I want to send down to a Scala Template in Play! Framework 2.2.3
however I ran into some issues while trying to do so.
The class I want the list to contain is an arbitrary class type that comes from a package outside of my workspace, but not natively from Java. See the picture below.
Note: I do not have a project/Build.scala file.
The above image represents the first line in my scala template, I have tried to use #import as well (#import com.***.***.type._, com.***.***.type.Version, etc) but to no avail.
This is the error message given to me by Play! Framework.
Is there an issue with the namespacing? Everything works fine when using classes located in my workspace.
The Paths are correct, I've double checked that. For reasons I cannot disclose more code in this region, if more information is required please ask for it and I'll edit the post.
The problem is related to package named type. This word is reserved in Scala as language keyword. You need to escape it like this:
#import List[com.your.package.`type`.Version]

Problems compiling routes after migrating to Play 2.1

After migrating to Play-2.1 I stuck into problem that routes compiler stopped working for my routes file. It's been completely fine with Play-2.0.4, but now I'm getting the build error and can't find any workaround for it.
In my project I'm using cake pattern, so controller actions are visible not through <package>.<controller class>.<action>, but through <package>.<component registry>.<controller instance>.<action>. New Play routes compiler is using all action path components except for the last two to form package name that will be used in managed sources (as far as I can get code in https://github.com/playframework/Play20/blob/2.1.0/framework/src/routes-compiler/src/main/scala/play/router/RoutesCompiler.scala). In my case it leads to situation when <package>.<component registry> is chosen as package name, which results in error during build:
[error] server/target/scala-2.10/src_managed/main/com/grumpycats/mmmtg/componentsRegistry/routes.java:5: componentsRegistry is already defined as object componentsRegistry
[error] package com.grumpycats.mmmtg.componentsRegistry;
I made the sample project to demonstrate this problem: https://github.com/rmihael/play-2.1-routes-problem
Is it possible to workaround this problem somehow without dropping cake pattern for controllers? It's the pity that I can't proceed with Play 2.1 due to this problem.
Because of reputation I can not create a comment.
The convention is that classes and objects start with upper case. This convention is applied to pattern matching as well. Looking at a string there seems to be no difference between a package object and normal object (appart from the case). I am not sure how Play 2.1 handles things, that's why this is not an answer but a comment.
You could try the new # syntax in the router. That allows you to create an instance from the Global class. You would still specify <package>.<controller class>.<action>, but in the Global you get it from somewhere else (for example a component registry).
You can find a bit of extra information here under the 'Managed Controller classes instantiation': http://www.playframework.com/documentation/2.1.0/Highlights
This demo project shows it's usage: https://github.com/guillaumebort/play20-spring-demo

Casbah Scala MongoDB driver - a strange error

I am trying to use Casbah, I get a strange error right in the beginning, on this line:
val mongoDB = MongoConnection("MyDatabase")
the error on MongoConenction says:
class file needed by MongoConnection is missing. reference type
MongoOptions of package com.mongodb refers to nonexisting symbol.
I do not know what to do with this. The jars that I have attached to my projects are:
casbah-commons_2.9.1-3.0.0-SNAPSHOT.jar
casbah-core_2.9.1-3.0.0-SNAPSHOT.jar
casbah-gridfs_2.9.1-3.0.0-SNAPSHOT.jar
casbah-query_2.9.1-3.0.0-SNAPSHOT.jar
casbah-util_2.9.1-3.0.0-SNAPSHOT.jar
which looks like a full setup of Casbah and I do not understand what it might be yearning for. So there is the question number one - what do I have to do to resolve this problem?
The question number two is - the Casbah tutorial says that I could import just one thing, and get the mongoConn() method, which is also not truth. The mongoConn() simply does not get found if I follow the instructions. So, how can I acheive that everythong works as in the tutorial?
I don't know the details of your setup, but it seems like you are not referencing the dependencies of the casbah-commons module.
According to the docs, those are:
mongo-java-driver, scalaj-collection, scalaj-time, JodaTime, slf4j-api