Getting class not found spark/internal/logging$class - scala

I am trying to understand the logging.scala code for spark_2.11_v_2.4.1 and spark_2.12_v_3.1.2. When I am trying to check at the decompiled files for the two, I am seeing multiple classes created in the interface in the JD.
I am trying to understand how these classes are coming here and why is it not coming in spark 3.1.2
Is it because of any change in the spark logging code or because of Java decompiler?
I have an application where I have placed spark 3.2 and it is throwing error for spark/internal/Logging$class.
I am relatively new to spark and scala.

Related

Spark giving multiple datasource error on saving parquet file

I am trying to learn spark and scala, on my trying to write the dataframe object of my result to parquet file by calling the parquet method, i am getting error as such
Code Base that fails:-
df2.write.mode(SaveMode.Overwrite).parquet(outputPath)
This fails too
df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).parquet(outputPath)
Error Log:-
Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:707)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:967)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:304)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:848)
How ever if I called another method for the save, the code works properly,
This works fine:-
df2.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").mode(SaveMode.Overwrite).save(outputPath)
Although I have a solution for the issue, i'd like to understand why the first approach is not working and how I can solve it.
The details of the specification i am using are:-
Scala 2.12.9
Java 1.8
Spark 2.4.4
P.S. This issue is only seen on spark-submit

Scala worksheet evaluation results not showing up in Intellij

I am trying out scala for the first time and am following this page to set upmy first scala project :
https://docs.scala-lang.org/getting-started-intellij-track/getting-started-with-scala-in-intellij.html
However despite this on creating a simple worksheet with println("hello") upon evaluation no results come up.
What am I doing wrong ?
This seems to be an issue that is only happens on Scala 2.13. (I have not exhaustively tested it though.)
I have used 2.12.9 successfully.
I opened an issue on the Scala docs project too.
https://github.com/scala/docs.scala-lang/issues/1486

Spark cannot find case class on classpath

I have an issue where Spark is failing to generate code for a case class. Here is the spark error
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 52, Column 43: Identifier expected instead of '.'
Here is the referenced line in the generated code
/* 052 */ private com.avro.message.video.public.MetricObservation MapObjects_loopValue34;
It should be noted that com.avro.message.video.public.MetricObservation is a nested case class in part of a larger hierarchy. It is also used in other places in the code fine. It should also be noted that this pipeline works fine if I use the RDD API, but I want to use the Dataset API because I want to write out the Dataset in parquet. Has anyone seen this issue before?
I'm using Scala 2.11 and Spark 2.1.0. I was able to upgrade to Spark 2.2.1 and the issue is still there.
Do you think that SI-7555 or something like it has any bearing on this? I have noticed the past that Scala reflection has had issues generating TypeTags for statically nested classes. Do you think something like that is going on or is this strictly a catalyst issue in spark? You might want to file a spark ticket too.
So it turns out that changing the package name of the affect class "fixes" (ie made go away) the problem. I really have no idea why this is or even how to reproduce it in a small test case. What worked for me was I just created a higher level package that work. Specifically com.avro.message.video.public -> com.avro.message.publicVideo.

Spark word count in Scala (running in Apache Sandbox)

I am trying to do a word count lab in Spark on Scala. I am able to successfully load the text file into a variable (RDD), but when I do the .flatmap, .map, and reduceByKey, I receive the attached error message. I am new to this, so any type of help would be greatly appreciated. Please let me know.capture
Your program is failing because it was not able to detect the file present on Hadoop
Need to specify the file in the following format
sc.textFile("hdfs://namenodedetails:8020/input.txt")
You need to give the complete qualified path of the file. Since Spark builds a Dependency graph and evaluates lazily when an action is called, you are facing the error when you are trying to call an action.
It is better to debug after reading the file from HDFS using .first or .take(n) methods

How can the Kryo complaint under Spark 2.0.2 about object 13994 be fixed?

In the process of porting a Spark 1.6 app to Spark 2.0.2, there's this complaint in the log:
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 13994
This sometimes occurs after a couple batches have run, not the first time through. The setup is using:
sparkConf.set("spark.kryo.registrationRequired", "false")
Others have seen the same issue, but I haven't seen any resolutions. For example, Kryo exception : Encountered unregistered class ID: 13994
I've tried switching to:
sparkConf.set("spark.kryo.registrationRequired", "true")
and registering classes as suggested in one thread, but ran into How to register Receiver[] with Kryo?.
Apparently there's no way to disable Kryo for certain classes, according to some blog chatter. Is there a way to determine what class it's not recognizing so that I can register it? Recompiling from source and changing the error message is one possible route, but perhaps there's an easier way. Better yet, is there a known fix for this?