IllegalAccessException .. can not access a member of class with modifiers "protected" - scala

this is my scala code . i am trying to ingest geotiff file into HDFS using the geotrellis library.
package RasterDataIngest.RasterDataIngestIntoHadoop
import geotrellis.spark._
import geotrellis.spark.ingest._
import geotrellis.spark.io.hadoop._
import geotrellis.spark.io.index._
import geotrellis.spark.tiling._
import geotrellis.spark.utils.SparkUtils
import geotrellis.vector._
import org.apache.hadoop.fs.Path
import org.apache.spark._
import com.quantifind.sumac.ArgMain
import com.quantifind.sumac.validation.Required
class HadoopIngestArgs extends IngestArgs {
#Required var catalog: String = _
def catalogPath = new Path(catalog)
}
object HadoopIngest extends ArgMain[HadoopIngestArgs] with Logging {
def main(args: HadoopIngestArgs): Unit = {
System.setProperty("com.sun.media.jai.disableMediaLib", "true")
implicit val sparkContext = SparkUtils.createSparkContext("Ingest")
val conf = sparkContext.hadoopConfiguration
conf.set("io.map.index.interval", "1")
val catalog = HadoopRasterCatalog(args.catalogPath)
val source = sparkContext.hadoopGeoTiffRDD(args.inPath)
val layoutScheme = ZoomedLayoutScheme()
Ingest[ProjectedExtent, SpatialKey](source, args.destCrs, layoutScheme, args.pyramid){ (rdd, level) =>
catalog
.writer[SpatialKey](RowMajorKeyIndexMethod, args.clobber)
.write(LayerId(args.layerName, level.zoom), rdd)
}
}
}
When i run this code , i get the following error.
Please help me to solve this error.
java.lang.IllegalAccessException: Class org.osgeo.proj4j.Registry can not access a member of class org.osgeo.proj4j.proj.Projection with modifiers "protected"

I believe the problem is related to a bad sbt cache or Java version mismatch. Try the latest stable GeoTrellis version: 0.10.3 (Scala 2.10/2.11, Java 8, Spark 1.6.x). If you plan to use GeoTrellis with Spark 2, take a look at the GeoTrellis snapshot (version 1.0.0 will support Spark 2+, Java 8, and Scala 2.11).

Related

ArrayIndexOutOfBoundsException while reading from dataframe in scala

I'm working on scala streaming unit test but while reading from csv file getting ArrayOutOfBoundsException
Code :
import org.scalatest.matchers.should.Matchers
import org.scalatest.wordspec.AnyWordSpecLike
import org.apache.spark.sql.SparkSession
class StreamingTest extends AnyWordSpecLike with Matchers {
val sparkses = SparkSession.builder.appName("MyApp").config("spark.master","local").getOrCreate()
val df = sparkses.read.format("csv").load("file.csv")
df.printSchema()
}
The code works fine without extending AnyWordSpecLike with Matchers, but we need it to work with EmbeddedKafka.
Any guidance would be helpful.

Zeppelin with Spark interpreter ignores imports declared outside of class/function definition

I'm trying to use some Scala code in Zeppelin 0.8.0 with Spark interpreter:
%spark
import scala.beans.BeanProperty
class Node(#BeanProperty val parent: Option[Node]) {
}
But imports do not seem to be taken into account
import scala.beans.BeanProperty
<console>:14: error: not found: type BeanProperty
#BeanProperty val parent: Option[Node]) {
^
EDIT: I found out that the following code works :
class Node(#scala.beans.BeanProperty val parent: Option[Node]) {
}
This also works fine :
def loadCsv(CSVPATH: String): DataFrame = {
import org.apache.spark.sql.types._
//[...] some code
val schema = StructType(
firstRow.map(s => StructField(s, StringType))
)
//[…] some code again
}
So I guess everything works fine if it is imported between braces or directly specified with a path.to.package.Class when used.
QUESTION: How do I import outside of a class/function definition?
Importing by path.to.package.Class works well in Zeppelin. You can try it with importing and using java.sql.Date;
import java.sql.Date
val date = Date.valueOf("2019-01-01")
The problem is about Zeppelin context. If you try to use following code snippets in Zeppelin, you will see that it works fine;
object TestImport {
import scala.beans.BeanProperty
class Node(#BeanProperty val parent: Option[Node]){}
}
val testObj = new TestImport.Node(None)
testObj.getParent
//prints Option[Node] = None
I hope it helps!

value toDF is not a member of Seq[(Int,String)]

I am trying to execute the following code but getting this error:
value toDF is not a member of Seq[(Int,String)].
I have the case class outside main and I have imported implicits too. But still I am getting this error. Can someone help me to resolve this ? I am using Spark 2.11-2.1.0 and Scala 2.11.8
import org.apache.spark.sql._
import org.apache.spark.ml.clustering._
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark._
final case class Email(id: Int, text: String)
object SampleKMeans {
def main(args: Array[String]) = {
val spark = SparkSession.builder.appName("SampleKMeans")
.master("yarn")
.getOrCreate()
import spark.implicits._
val emails = Seq(
"This is an email from...",
"SPAM SPAM spam",
"Hello, We'd like to offer you")
.zipWithIndex.map(_.swap).toDF("id", "text").as[Email]
}
}
You already have a SparkSession you can just import the spark.implicits._ will work in your case
val spark = SparkSession.builder.appName("SampleKMeans")
.master("local[*]")
.getOrCreate()
import spark.implicits._
Now toDF method works as expected.
If the error still exists, You need to check the version of spark and scala libraries that you are using.
Hope this helps!

Apache Zeppelin 0.6.1: Run Spark 2.0 Twitter Stream App

I have a cluster with Spark 2.0 and Zeppelin 0.6.1 installed. Since the class TwitterUtils.scala is moved from Spark project to Apache Bahir, I can't use the TwitterUtils in my Zeppelin notebook anymore.
Here the snippets of my notebook:
Dependency loading:
%dep
z.reset
z.load("org.apache.bahir:spark-streaming-twitter_2.11:2.0.0")
DepInterpreter(%dep) deprecated. Remove dependencies and repositories through GUI interpreter menu instead.
DepInterpreter(%dep) deprecated. Load dependency through GUI interpreter menu instead.
res1: org.apache.zeppelin.dep.Dependency = org.apache.zeppelin.dep.Dependency#4793109a
And the Spark part:
import org.apache.spark.streaming.twitter
import org.apache.spark.streaming._
import org.apache.spark.storage.StorageLevel
import scala.io.Source
import scala.collection.mutable.HashMap
import java.io.File
import org.apache.log4j.Logger
import org.apache.log4j.Level
import sys.process.stringSeqToProcess
import org.apache.spark.SparkConf
// ********************************* Configures the Oauth Credentials for accessing Twitter ****************************
def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) {...}
// ***************************************** Configure Twitter credentials ********************************************
val apiKey = ...
val apiSecret = ...
val accessToken = ...
val accessTokenSecret = ...
configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)
// ************************************************* The logic itself *************************************************
val ssc = new StreamingContext(sc, Seconds(2))
val tweets = TwitterUtils.createStream(ssc, None)
val twt = tweets.window(Seconds(60))
When I try to run the Spark part of the notebook after importing the dependency, I get the following exception:
<console>:44: error: object twitter is not a member of package org.apache.spark.streaming
import org.apache.spark.streaming.twitter
What am I doing wrong here? Bahir documentation also uses the import org.apache.spark.streaming.twitter._ command, see http://bahir.apache.org/docs/spark/2.0.0/spark-streaming-twitter/
Well, dep is not exactly stable and since it is deprecated anyway why not use supported methods? If you don't won't to modify neither Spark nor Zeppelin configuration files you can add dependencies to the interpreter configuration (I omitted properties for clarity):

Class org.apache.spark.sql.types.SQLUserDefinedType not found - continuing with a stub

I have a basic spark mllib program as follows.
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.linalg.Vectors
class Sample {
val conf = new SparkConf().setAppName("helloApp").setMaster("local")
val sc = new SparkContext(conf)
val data = sc.textFile("data/mllib/kmeans_data.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
// Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)
// Export to PMML
println("PMML Model:\n" + clusters.toPMML)
}
I have manually added spark-core , spark-mllib and spark-sql to the project class path through intellij all having version 1.5.0.
I am getting the below error when I run the program? any idea what's wrong?
Error:scalac: error while loading Vector, Missing dependency 'bad
symbolic reference. A signature in Vector.class refers to term types
in package org.apache.spark.sql which is not available. It may be
completely missing from the current classpath, or the version on the
classpath might be incompatible with the version used when compiling
Vector.class.', required by
/home/fazlann/Downloads/spark-mllib_2.10-1.5.0.jar(org/apache/spark/mllib/linalg/Vector.class
DesirePRG. I have met the same problem as yours. The solution is to import some jar which assemble the spark and hadoop, such as spark-assembly-1.4.1-hadoop2.4.0.jar, then it could work properly.