Can we create a table using spark.sql api from the IDE - scala

I'm on IntelliJ and my spark session looks like this -
val spark = SparkSession.builder()
.appName("Spark SQL")
.config("spark.master", "local")
.config("spark.sql.warehouse.dir", "src/main/resources/warehouse") //to create a user-defined warehouse for storing tables
.config("spark.network.timeout" , "10000000s")//to avoid Heartbeat exception
.getOrCreate()
While I can create a database using
spark.sql("create database newdb")
This creates a directory under src/main/resources/warehouse
However when I attempt to create a table using the same manner
spark.sql("create table testing(id int, name string)"), it fails
Exception in thread "main" org.apache.spark.sql.AnalysisException: Hive support is required to CREATE Hive TABLE (AS SELECT);;
'CreateTable `testing`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, ErrorIfExists
Thereafter, I did add enableHiveSupport() while creating spark session, but it also leads me to this exception
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Can anyone help me with this?
EDIT
This is my sbt
name := "spark-essentials"
version := "0.1"
scalaVersion := "2.12.10"
val sparkVersion = "3.0.0-preview"
val vegasVersion = "0.3.11"
val postgresVersion = "42.2.2"
resolvers ++= Seq(
"bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven",
"Typesafe Simple Repository" at "https://repo.typesafe.com/typesafe/simple/maven-releases",
"MavenRepository" at "https://mvnrepository.com"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion,
// logging
"org.apache.logging.log4j" % "log4j-api" % "2.4.1",
"org.apache.logging.log4j" % "log4j-core" % "2.4.1",
// postgres for DB connectivity
"org.postgresql" % "postgresql" % postgresVersion
)
EDIT:
below is the stack trace
20:41:08 WARN ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
20:41:08 WARN Hive:168 - Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:203)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:127)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:300)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:421)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:314)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:68)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:67)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:221)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:221)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:147)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:137)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:170)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:165)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:56)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:92)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:92)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:741)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:781)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:725)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:765)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:757)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$2(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:376)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:214)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:374)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$2(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:376)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:214)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:374)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:87)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:757)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:694)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:130)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:127)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:119)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:119)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:168)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:162)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:122)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:98)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:98)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:146)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:145)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:66)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:63)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:63)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:55)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:95)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
at part4sql.SparkSql$.delayedEndpoint$part4sql$SparkSql$1(SparkSql.scala:29)
at part4sql.SparkSql$delayedInit$body.apply(SparkSql.scala:7)
at scala.Function0.apply$mcV$sp(Function0.scala:39)
at scala.Function0.apply$mcV$sp$(Function0.scala:39)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1$adapted(App.scala:80)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.App.main(App.scala:80)
at scala.App.main$(App.scala:78)
at part4sql.SparkSql$.main(SparkSql.scala:7)
at part4sql.SparkSql.main(SparkSql.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
... 94 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 100 more
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:src/main/resources/warehouse

Last line in the stacktrace --
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:src/main/resources/warehouse
-- hints that the proper way to set spark.sql.warehouse.dir is to supply an absolute path for metastore directory, something like file:///<project_root_dir>/src/main/resources/warehouse.

try including /usr/hdp/current/spark-client/conf/hive-site.xml or respective hive-site file while you submit spark.
Also you can try including below configuration when submitting spark job
--conf "spark.sql.catalogImplementation=hive" \

Related

Spark with Scala read json and insert the data in MongoDB

I tray to read json file and insert the data in mongodb and this my code
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
val spark = SparkSession
.builder()
.master("local[*]")
.appName("SparkByExamples.com")
.config("spark.mongodb.input.uri","mongodb://doss:doss123#localhost:27017/spark?authSource=test")
.config("spark.mongodb.output.uri","mongodb://doss:doss123#localhost:27017/spark?authSource=test")
.getOrCreate()
val dataset = spark.read
.option("multiline",true)
.json("src/main/scala/tes.json")
dataset.write.format("com.mongodb.spark.sql.DefaultSource")
.option("database","jsonDB")
.option("collection","jsonData")
.mode("append").save()
but I get a error:
Exception in thread "main" java.lang.ClassNotFoundException: Failed
to find data source: com.mongodb.spark.sql.DefaultSource. Please find
packages at https://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:587)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
at
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:864)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
at Main$.main(Main.scala:33) at Main.main(Main.scala) Caused by:
java.lang.ClassNotFoundException:
com.mongodb.spark.sql.DefaultSource.DefaultSource at
java.net.URLClassLoader.findClass(URLClassLoader.java:387) at
java.lang.ClassLoader.loadClass(ClassLoader.java:418) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) at
java.lang.ClassLoader.loadClass(ClassLoader.java:351) at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:661)
at scala.util.Try$.apply(Try.scala:213) at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:661)
at scala.util.Failure.orElse(Try.scala:224) at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:661)
... 6 more
i dont know what the problem and the build.sbt file:
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.1"
// https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector
libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector" % "10.0.5"
pleas help me

java.lang.NoSuchMethodError: com.facebook.fb303.FacebookService$Client.sendBaseOneway(Ljava/lang/String;Lorg/apache/thrift/TBase;)V

I have the following code:
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
implicit val spark = SparkSession
.builder
.appName("test")
.config("spark.sql.warehouse.dir", warehouseLocation)
.config("hive.execution.engine","spark")
.enableHiveSupport()
.getOrCreate
spark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS person(name string) STORED AS PARQUET LOCATION '/user/my_user/data/my_data.parquet")
I obtain the following error
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: User class threw exception: java.lang.NoSuchMethodError: com.facebook.fb303.FacebookService$Client.sendBaseOneway(Ljava/lang/String;Lorg/apache/thrift/TBase;)V
at com.facebook.fb303.FacebookService$Client.send_shutdown(FacebookService.java:436)
at com.facebook.fb303.FacebookService$Client.shutdown(FacebookService.java:430)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:492)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy33.close(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.close(Hive.java:291)
at org.apache.hadoop.hive.ql.metadata.Hive.access$000(Hive.java:137)
at org.apache.hadoop.hive.ql.metadata.Hive$1.remove(Hive.java:157)
at org.apache.hadoop.hive.ql.metadata.Hive.closeCurrent(Hive.java:261)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:231)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:208)
at org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:768)
at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:739)
at org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1394)
at org.apache.hadoop.hive.ql.session.SessionState.getUserFromAuthenticator(SessionState.java:987)
at org.apache.hadoop.hive.ql.metadata.Table.getEmptyTable(Table.java:177)
at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:119)
at org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:898)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:470)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:468)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:468)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:274)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:212)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:211)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:257)
at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:468)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:258)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:307)
at org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:128)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at fr.enedis.ctd.TestHive2$.delayedEndpoint$fr$enedis$ctd$TestHive2$1(TestHive2.scala:23)
at fr.enedis.ctd.TestHive2$delayedInit$body.apply(TestHive2.scala:6)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at fr.enedis.ctd.TestHive2$.main(TestHive2.scala:6)
at fr.enedis.ctd.TestHive2.main(TestHive2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
Here are my dependencies:
"org.apache.spark" %% "spark-core" % "2.3.2"% "provided",
"org.apache.spark" %% "spark-sql" % "2.3.2"% "provided",
"org.apache.phoenix" % "phoenix-core" % "4.7.0.2.6.5.102-5",
"org.apache.phoenix" % "phoenix-spark2" % "4.7.0.2.6.5.0-292"
I test to add, but It doesn't works:
"org.apache.thrift" % "libfb303" % "0.9.3"
Add this one resolve my problem: "org.apache.thrift" % "libfb303" % "0.9.2"

FileTailSource throws null pointer exception

I wrote the following Alpakka code
val fs = FileSystems.getDefault
val resource = getClass.getResource("countrycapital.csv")
val source = FileTailSource.lines(Paths.get(resource.toURI), maxLineSize = 8092, pollingInterval = 10000 seconds)
I have a file called "countrycapital.csv" in the src/main/resources folder.
My objective is to build a Akka Streams Source for this file and process all the records of this file.
when I run this code. the list line throws an exception
[error] (run-main-0) java.lang.NullPointerException
java.lang.NullPointerException
at com.abhi.ElasticAkkaStreams$.delayedEndpoint$com$abhi$ElasticAkkaStreams$1(ElasticAkkaStreams.scala:37)
at com.abhi.ElasticAkkaStreams$delayedInit$body.apply(ElasticAkkaStreams.scala:28)
at scala.Function0.apply$mcV$sp(Function0.scala:34)
at scala.Function0.apply$mcV$sp$(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App.$anonfun$main$1$adapted(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:389)
I have imported the following libraries in my build.sbt
"com.lightbend.akka" %% "akka-stream-alpakka-elasticsearch" % "0.13",
"com.lightbend.akka" %% "akka-stream-alpakka-csv" % "0.13",
"com.lightbend.akka" %% "akka-stream-alpakka-file" % "0.13"
OK. I solved this myself
I didn't have a / when picking the file path.
This works without the null pointer exception
val resource = getClass.getResource("/countrycapital.csv")

Exception when using the saveToPhoenix method to load/save a RDD on Hbase

I would like to use the apache-phoenix framework.
The problem is that I keep having an exception telling me that the class HBaseConfiguration can't be found.
Here is the code I want to use:
import org.apache.spark.SparkContext
import org.apache.spark.sql._
import org.apache.phoenix.spark._
// Load INPUT_TABLE
object MainTest2 extends App {
val sc = new SparkContext("local", "phoenix-test")
val sqlContext = new SQLContext(sc)
val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> "INPUT_TABLE",
"zkUrl" -> "localhost:3888"))
}
Here is the SBT I'm using :
name := "spark-to-hbase"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.3.0",
"org.apache.phoenix" % "phoenix-core" % "4.11.0-HBase-1.3",
"org.apache.spark" % "spark-core_2.11" % "2.1.1",
"org.apache.spark" % "spark-sql_2.11" % "2.1.1",
"org.apache.phoenix" % "phoenix-spark" % "4.11.0-HBase-1.3"
)
And here is the exception :
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/hbase/HBaseConfiguration at
org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:49)
at
org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl$1.call(ConfigurationFactory.java:46)
at
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
at
org.apache.phoenix.util.PhoenixContextExecutor.callWithoutPropagation(PhoenixContextExecutor.java:91)
at
org.apache.phoenix.query.ConfigurationFactory$ConfigurationFactoryImpl.getConfiguration(ConfigurationFactory.java:46)
at
org.apache.phoenix.jdbc.PhoenixDriver.initializeConnectionCache(PhoenixDriver.java:151)
at
org.apache.phoenix.jdbc.PhoenixDriver.(PhoenixDriver.java:142)
at
org.apache.phoenix.jdbc.PhoenixDriver.(PhoenixDriver.java:69)
at org.apache.phoenix.spark.PhoenixRDD.(PhoenixRDD.scala:43)
at
org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:52)
at
org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:40)
at
org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:965) at
MainTest2$.delayedEndpoint$MainTest2$1(MainTest2.scala:9) at
MainTest2$delayedInit$body.apply(MainTest2.scala:6) at
scala.Function0$class.apply$mcV$sp(Function0.scala:34) at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76) at
scala.App$$anonfun$main$1.apply(App.scala:76) at
scala.collection.immutable.List.foreach(List.scala:381) at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76) at
MainTest2$.main(MainTest2.scala:6) at MainTest2.main(MainTest2.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.HBaseConfiguration at
java.net.URLClassLoader.findClass(URLClassLoader.java:381) at
java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at
java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more
I've already tried to change the HADOOP_CLASSPATH in hadoop-env.sh like it is said in this previous post.
What can I do to overcome this problem?
I found a solution to my problem. As the exception says, my compiler isn't able to find the class HBaseConfiguration. HBaseConfiguration is used inside org.apache.hadoop.hbase library and so is needed to compile. I noticed that the HBaseConfiguration class wasn't present in the org.apache.hadoop library as I thought. For the hbase 1.3.1 version installed on my PC computer, I managed to find this class in the hbase-common-1.3.1 jar located in my HBASE_HOME/lib folder.
Then I include this dependency in my built.SBT :
"org.apache.hbase" % "hbase-common" % "1.3.1"
And the Exception was gone.

Play Framework 2.4 with Akka

I'm having a Play Framework app when throws the following error when trying to run:
[info] Set current project to inland24 (in build file:/Users/MyUser/Desktop/MyProj/)
[info] Updating {file:/Users/MyUser/Desktop/MyProj/}root...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * com.typesafe.akka:akka-actor_2.11:2.3.13 -> 2.4-SNAPSHOT
[warn] Run 'evicted' to see detailed eviction warnings
--- (Running the application, auto-reloading is enabled) ---
java.lang.ClassNotFoundException: akka.event.slf4j.Slf4jLoggingFilter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:67)
at akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:66)
at scala.util.Try$.apply(Try.scala:191)
at akka.actor.ReflectiveDynamicAccess.getClassFor(DynamicAccess.scala:66)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:612)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:143)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:127)
at play.api.libs.concurrent.ActorSystemProvider$.start(Akka.scala:291)
at play.core.server.DevServerStart$$anonfun$mainDev$1.apply(DevServerStart.scala:205)
at play.core.server.DevServerStart$$anonfun$mainDev$1.apply(DevServerStart.scala:61)
at play.utils.Threads$.withContextClassLoader(Threads.scala:21)
at play.core.server.DevServerStart$.mainDev(DevServerStart.scala:60)
at play.core.server.DevServerStart$.mainDevHttpMode(DevServerStart.scala:50)
at play.core.server.DevServerStart.mainDevHttpMode(DevServerStart.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at play.runsupport.Reloader$.startDevMode(Reloader.scala:223)
at play.sbt.run.PlayRun$$anonfun$playRunTask$1$$anonfun$apply$2$$anonfun$apply$3.devModeServer$lzycompute$1(PlayRun.scala:74)
at play.sbt.run.PlayRun$$anonfun$playRunTask$1$$anonfun$apply$2$$anonfun$apply$3.play$sbt$run$PlayRun$$anonfun$$anonfun$$anonfun$$devModeServer$1(PlayRun.scala:74)
at play.sbt.run.PlayRun$$anonfun$playRunTask$1$$anonfun$apply$2$$anonfun$apply$3.apply(PlayRun.scala:100)
at play.sbt.run.PlayRun$$anonfun$playRunTask$1$$anonfun$apply$2$$anonfun$apply$3.apply(PlayRun.scala:53)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) java.lang.reflect.InvocationTargetException
[error] Total time: 13 s, completed Oct 9, 2015 8:06:18 PM
Here is what I have as dependency:
libraryDependencies += "com.typesafe.akka" % "akka-actor_2.11" % "2.4-SNAPSHOT"
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.1.0"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.2"
libraryDependencies += "com.typesafe.akka" %% "akka-slf4j" % "2.3.6"
scalaVersion := "2.11.6"
Is there anything that I should add?
After changing the dependency to the following:
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
resolvers += "Typesafe Snapshots" at "http://repo.typesafe.com/typesafe/snapshots/"
It worked!