I am trying to create HiveContext, but its throwing error.
Is it just because of i dont have winutils.exe? or how can i solve this issue?
The reason i want to create HiveContext is i am planning to use collect_set functions which is created as UDF in Hive.
Error log:
16/11/23 14:12:09 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/11/23 14:12:10 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/11/23 14:12:10 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/11/23 14:12:14 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/11/23 14:12:14 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/11/23 14:12:15 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/11/23 14:12:15 INFO ObjectStore: Initialized ObjectStore
16/11/23 14:12:15 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/11/23 14:12:16 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/11/23 14:12:16 WARN : Your hostname, INNR90GLH5G resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:1584%net6, but we couldn't find any external IP address!
16/11/23 14:12:17 INFO HiveMetaStore: Added admin role in metastore
16/11/23 14:12:17 INFO HiveMetaStore: Added public role in metastore
16/11/23 14:12:17 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/11/23 14:12:18 INFO HiveMetaStore: 0: get_all_databases
16/11/23 14:12:18 INFO audit: ugi=1554161 ip=unknown-ip-addr cmd=get_all_databases
16/11/23 14:12:18 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/11/23 14:12:18 INFO audit: ugi=1554161 ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/11/23 14:12:18 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at com.scb.cnc.payments.CockpitChannelStatus$.main(CockpitChannelStatus.scala:13)
at com.scb.cnc.payments.CockpitChannelStatus.main(CockpitChannelStatus.scala)
Caused by: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:547)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 12 more
If issue is lack of winutils.exe despite of hadoop and spark on your workstation then try this article in order to solve your problem
solution for spark
Related
I'm new to everything related to java/scala/maven/sbt/spark, so please bear with me.
Managed to get everything up and running, or at least so it seems when running the spark-shell locally. The SparkContext gets initialized properly and I can create RDDs.
However, when I call spark-submit locally, I get a SparkContext error.
%SPARK_HOME%\bin\spark-submit --master local --class example.bd.MyApp target\scala-2.12\sparkapp_2.12-1.5.5.jar
This is my code.
package example.bd
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object MyApp {
def main(args : Array[String]) {
val conf = new SparkConf().setAppName("My first Spark application")
val sc = new SparkContext(conf)
println("hi everyone")
}
}
These are my error logs:
21/10/24 18:38:45 WARN Shell: Did not find winutils.exe: {}
java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548)
at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569)
at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:689)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814)
at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791)
at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207)
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302)
at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58)
at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala)
at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468)
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516)
... 21 more
21/10/24 18:38:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/10/24 18:38:45 INFO SparkContext: Running Spark version 3.1.2
21/10/24 18:38:45 INFO ResourceUtils: ==============================================================
21/10/24 18:38:45 INFO ResourceUtils: No custom resources configured for spark.driver.
21/10/24 18:38:45 INFO ResourceUtils: ==============================================================
21/10/24 18:38:45 INFO SparkContext: Submitted application: My first Spark application
21/10/24 18:38:45 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
21/10/24 18:38:45 INFO ResourceProfile: Limiting resource is cpu
21/10/24 18:38:45 INFO ResourceProfileManager: Added ResourceProfile id: 0
21/10/24 18:38:45 INFO SecurityManager: Changing view acls to: User
21/10/24 18:38:45 INFO SecurityManager: Changing modify acls to: User
21/10/24 18:38:45 INFO SecurityManager: Changing view acls groups to:
21/10/24 18:38:45 INFO SecurityManager: Changing modify acls groups to:
21/10/24 18:38:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(User); groups with view permissions: Set(); users with modify permissions: Set(User); groups with modify permissions: Set()
21/10/24 18:38:46 INFO Utils: Successfully started service 'sparkDriver' on port 56899.
21/10/24 18:38:46 INFO SparkEnv: Registering MapOutputTracker
21/10/24 18:38:46 INFO SparkEnv: Registering BlockManagerMaster
21/10/24 18:38:46 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/10/24 18:38:46 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/10/24 18:38:46 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/10/24 18:38:46 INFO DiskBlockManager: Created local directory at C:\Users\User\AppData\Local\Temp\blockmgr-8df572ec-4206-48ae-b517-bd56242fca4c
21/10/24 18:38:46 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
21/10/24 18:38:46 INFO SparkEnv: Registering OutputCommitCoordinator
21/10/24 18:38:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/10/24 18:38:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://LAPTOP-RKJJV5E4:4040
21/10/24 18:38:46 INFO SparkContext: Added JAR file:/C:/[path]/sparkapp/target/scala-2.12/sparkapp_2.12-1.5.5.jar at spark://LAPTOP-RKJJV5E4:56899/jars/sparkapp_2.12-1.5.5.jar with timestamp 1635093525663
21/10/24 18:38:46 INFO Executor: Starting executor ID driver on host LAPTOP-RKJJV5E4
21/10/24 18:38:46 INFO Executor: Fetching spark://LAPTOP-RKJJV5E4:56899/jars/sparkapp_2.12-1.5.5.jar with timestamp 1635093525663
21/10/24 18:38:46 INFO TransportClientFactory: Successfully created connection to LAPTOP-RKJJV5E4/192.168.0.175:56899 after 30 ms (0 ms spent in bootstraps)
21/10/24 18:38:46 INFO Utils: Fetching spark://LAPTOP-RKJJV5E4:56899/jars/sparkapp_2.12-1.5.5.jar to C:\Users\User\AppData\Local\Temp\spark-9c13f31f-92a7-4fc5-a87d-a0ae6410e6d2\userFiles-5ac95e31-3656-4a4d-a205-e0750c041bcb\fetchFileTemp7994667570718611461.tmp
21/10/24 18:38:46 ERROR SparkContext: Error initializing SparkContext.
java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:736)
at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:271)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1120)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1106)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:563)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:953)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:945)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:945)
at org.apache.spark.executor.Executor.<init>(Executor.scala:247)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
at example.bd.MyApp$.main(MyApp.scala:10)
at example.bd.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548)
at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569)
at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:689)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814)
at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791)
at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207)
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302)
at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58)
at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala)
at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
... 6 more
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468)
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516)
... 21 more
21/10/24 18:38:46 INFO SparkUI: Stopped Spark web UI at http://LAPTOP-RKJJV5E4:4040
21/10/24 18:38:46 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:173)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:881)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2370)
at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2069)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1419)
at org.apache.spark.SparkContext.stop(SparkContext.scala:2069)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:671)
at example.bd.MyApp$.main(MyApp.scala:10)
at example.bd.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/10/24 18:38:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/10/24 18:38:46 INFO MemoryStore: MemoryStore cleared
21/10/24 18:38:46 INFO BlockManager: BlockManager stopped
21/10/24 18:38:46 INFO BlockManagerMaster: BlockManagerMaster stopped
21/10/24 18:38:46 WARN MetricsSystem: Stopping a MetricsSystem that is not running
21/10/24 18:38:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/10/24 18:38:46 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:736)
at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:271)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1120)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:1106)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:563)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:953)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:945)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:945)
at org.apache.spark.executor.Executor.<init>(Executor.scala:247)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
at example.bd.MyApp$.main(MyApp.scala:10)
at example.bd.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548)
at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569)
at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:689)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814)
at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791)
at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
at org.apache.hadoop.util.ShutdownHookManager$HookEntry.<init>(ShutdownHookManager.java:207)
at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302)
at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
at org.apache.spark.util.ShutdownHookManager$.<init>(ShutdownHookManager.scala:58)
at org.apache.spark.util.ShutdownHookManager$.<clinit>(ShutdownHookManager.scala)
at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
... 6 more
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468)
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:516)
... 21 more
21/10/24 18:38:47 ERROR Utils: Uncaught exception in thread shutdown-hook-0
java.lang.NullPointerException
at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:332)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
at org.apache.spark.executor.Executor.stop(Executor.scala:332)
at org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:76)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/10/24 18:38:47 INFO ShutdownHookManager: Shutdown hook called
21/10/24 18:38:47 INFO ShutdownHookManager: Deleting directory C:\Users\User\AppData\Local\Temp\spark-ac076276-e6ab-4cfb-a153-56ad35c46d56
21/10/24 18:38:47 INFO ShutdownHookManager: Deleting directory C:\Users\User\AppData\Local\Temp\spark-9c13f31f-92a7-4fc5-a87d-a0ae6410e6d2
As far as I can tell the errors are directly related to Hadoop not being set up, but that shouldn't be an issue given that I am trying to run it locally, no?
Any help will be greatly appreciated.
I have set the hive metastore in mySql and same can be accessed through hive and create database and tables. If I try to access hive table through spark-shell then able to get the tables info correctly by getting from mysql hive metastore. But it is not fetching from Mysql if execute from eclipse.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext#2b9e69fb
scala> sqlContext.sql("show databases");
res0: org.apache.spark.sql.DataFrame = [databaseName: string]
But if I try to access through eclipse, then it is not point to MySql. Instead of that it is Derby. Please find the below log and hive-site.xml for an idea.
Note: hive-site.xml is same in hive/conf and spark/conf paths.
Spark code that is executing from eclipse:
package events
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql
object DataContent {
def main(args:Array[String])
{
val conf = new SparkConf()
conf.setAppName("Word Count2").setMaster("local")
val sc = new SparkContext(conf)
println("Hello to Spark World")
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
var query = sqlContext.sql("show databases");
query.collect()
println("Bye to Spark example2")
}
}
Spark output log:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/02/17 12:03:15 INFO SparkContext: Running Spark version 2.0.0
18/02/17 12:03:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/17 12:03:18 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.189.136 instead (on interface ens33)
18/02/17 12:03:18 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/02/17 12:03:18 INFO SecurityManager: Changing view acls to: vm4learning
18/02/17 12:03:18 INFO SecurityManager: Changing modify acls to: vm4learning
18/02/17 12:03:18 INFO SecurityManager: Changing view acls groups to:
18/02/17 12:03:18 INFO SecurityManager: Changing modify acls groups to:
18/02/17 12:03:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vm4learning); groups with view permissions: Set(); users with modify permissions: Set(vm4learning); groups with modify permissions: Set()
18/02/17 12:03:20 INFO Utils: Successfully started service 'sparkDriver' on port 45637.
18/02/17 12:03:20 INFO SparkEnv: Registering MapOutputTracker
18/02/17 12:03:20 INFO SparkEnv: Registering BlockManagerMaster
18/02/17 12:03:20 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-37db59bd-c12a-4603-ba9f-8e8fec88cc29
18/02/17 12:03:20 INFO MemoryStore: MemoryStore started with capacity 881.4 MB
18/02/17 12:03:21 INFO SparkEnv: Registering OutputCommitCoordinator
18/02/17 12:03:23 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/02/17 12:03:23 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.189.136:4040
18/02/17 12:03:23 INFO Executor: Starting executor ID driver on host localhost
18/02/17 12:03:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38313.
18/02/17 12:03:23 INFO NettyBlockTransferService: Server created on 192.168.189.136:38313
18/02/17 12:03:23 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.189.136, 38313)
18/02/17 12:03:23 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.189.136:38313 with 881.4 MB RAM, BlockManagerId(driver, 192.168.189.136, 38313)
18/02/17 12:03:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.189.136, 38313)
Hello to Spark World
18/02/17 12:03:30 INFO HiveSharedState: Warehouse path is 'file:/home/vm4learning/workspace/Acumen/spark-warehouse'.
18/02/17 12:03:30 INFO SparkSqlParser: Parsing command: show databases
18/02/17 12:03:32 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/02/17 12:03:34 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/02/17 12:03:34 INFO ObjectStore: ObjectStore, initialize called
18/02/17 12:03:36 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/02/17 12:03:36 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/02/17 12:03:41 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/02/17 12:03:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/17 12:03:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/17 12:03:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/17 12:03:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/17 12:03:49 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery#0" since the connection used is closing
18/02/17 12:03:49 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/02/17 12:03:49 INFO ObjectStore: Initialized ObjectStore
18/02/17 12:03:50 INFO HiveMetaStore: Added admin role in metastore
18/02/17 12:03:50 INFO HiveMetaStore: Added public role in metastore
18/02/17 12:03:50 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/02/17 12:03:51 INFO HiveMetaStore: 0: get_all_databases
18/02/17 12:03:51 INFO audit: ugi=vm4learning ip=unknown-ip-addr cmd=get_all_databases
18/02/17 12:03:51 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/02/17 12:03:51 INFO audit: ugi=vm4learning ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/02/17 12:03:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/02/17 12:03:52 INFO SessionState: Created local directory: /tmp/5c825a73-be72-4bd1-8bfc-966d8a095919_resources
18/02/17 12:03:52 INFO SessionState: Created HDFS directory: /tmp/hive/vm4learning/5c825a73-be72-4bd1-8bfc-966d8a095919
18/02/17 12:03:52 INFO SessionState: Created local directory: /tmp/vm4learning/5c825a73-be72-4bd1-8bfc-966d8a095919
18/02/17 12:03:52 INFO SessionState: Created HDFS directory: /tmp/hive/vm4learning/5c825a73-be72-4bd1-8bfc-966d8a095919/_tmp_space.db
18/02/17 12:03:52 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/vm4learning/workspace/Acumen/spark-warehouse
18/02/17 12:03:52 INFO SessionState: Created local directory: /tmp/32b99842-2ac2-491e-934d-9726a6213c37_resources
18/02/17 12:03:52 INFO SessionState: Created HDFS directory: /tmp/hive/vm4learning/32b99842-2ac2-491e-934d-9726a6213c37
18/02/17 12:03:52 INFO SessionState: Created local directory: /tmp/vm4learning/32b99842-2ac2-491e-934d-9726a6213c37
18/02/17 12:03:52 INFO SessionState: Created HDFS directory: /tmp/hive/vm4learning/32b99842-2ac2-491e-934d-9726a6213c37/_tmp_space.db
18/02/17 12:03:52 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/vm4learning/workspace/Acumen/spark-warehouse
18/02/17 12:03:53 INFO HiveMetaStore: 0: create_database: Database(name:default, description:default database, locationUri:file:/home/vm4learning/workspace/Acumen/spark-warehouse, parameters:{})
18/02/17 12:03:53 INFO audit: ugi=vm4learning ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:file:/home/vm4learning/workspace/Acumen/spark-warehouse, parameters:{})
18/02/17 12:03:55 INFO HiveMetaStore: 0: get_databases: *
18/02/17 12:03:55 INFO audit: ugi=vm4learning ip=unknown-ip-addr cmd=get_databases: *
18/02/17 12:03:56 INFO CodeGenerator: Code generated in 1037.20509 ms
Bye to Spark example2
18/02/17 12:03:56 INFO SparkContext: Invoking stop() from shutdown hook
18/02/17 12:03:57 INFO SparkUI: Stopped Spark web UI at http://192.168.189.136:4040
18/02/17 12:03:57 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/02/17 12:03:57 INFO MemoryStore: MemoryStore cleared
18/02/17 12:03:57 INFO BlockManager: BlockManager stopped
18/02/17 12:03:57 INFO BlockManagerMaster: BlockManagerMaster stopped
18/02/17 12:03:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/02/17 12:03:57 INFO SparkContext: Successfully stopped SparkContext
18/02/17 12:03:57 INFO ShutdownHookManager: Shutdown hook called
18/02/17 12:03:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-6addf0da-f076-4dd1-a5eb-38dca93a2ad6
hive-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/hcatalog?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>vm4learning</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.hwi.listen.host</name>
<value>localhost</value>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-0.11.0.war</value>
</property>
</configuration>
This question already has answers here:
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
(11 answers)
Closed 2 years ago.
Hive .14
Spark 1.6
.Trying to connect hive table from spark pragmatically. I have already put my hive-site.xml in spark conf folder. But when I run this code, everytime its connecting to underlying hive metastore i.e. Derby. I tried googled a lot but evertywhere I am getting suggestion to put hive-site.xml in spark cofiguration folder, which I already did. Please someone suggest me the solution.Below is my code
FYI: My existing hive is using MYSQL as metastore.
I am running this code directly from eclipse, not using spark-submit utility.
package org.scala.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.hive.HiveContext
object HiveToHdfs {
def main(args: Array[String])
{
val conf=new SparkConf().setAppName("HDFS to Local").setMaster("local")
val sc=new SparkContext(conf)
val hiveContext=new org.apache.spark.sql.hive.HiveContext(sc)
import hiveContext.implicits._
hiveContext.sql("load data local inpath '/home/cloudera/Documents/emp_table.txt' into table employee")
sc.stop()
}
}
Below are my eclipse error log:
16/11/18 22:09:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/11/18 22:09:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/11/18 22:09:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/11/18 22:09:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
**16/11/18 22:09:06 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY**
16/11/18 22:09:06 INFO ObjectStore: Initialized ObjectStore
16/11/18 22:09:06 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/11/18 22:09:06 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/11/18 22:09:07 INFO HiveMetaStore: Added admin role in metastore
16/11/18 22:09:07 INFO HiveMetaStore: Added public role in metastore
16/11/18 22:09:07 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/11/18 22:09:07 INFO HiveMetaStore: 0: get_all_databases
16/11/18 22:09:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_all_databases
16/11/18 22:09:07 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/11/18 22:09:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/11/18 22:09:07 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at org.scala.spark.HiveToHdfs$.main(HiveToHdfs.scala:15)
at org.scala.spark.HiveToHdfs.main(HiveToHdfs.scala)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 12 more
16/11/18 22:09:07 INFO SparkContext: Invoking stop() from shutdown hook
Please let me know if any other in other information is also needed to rectify it.
check this link -> https://issues.apache.org/jira/browse/SPARK-15118
metastore might be using mysql db
the above error is from,
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
give permission for /tmp/hive
I have some problem with Cloudera VM and Spark. First of all, I'm completely new on Spark, and my boss asked to me to run Spark on Scala in a Virtual Machine for some test.
I have downloaded the Virtual Machine on Virtual Box environment, so I open Eclipse and I had a new Project on Maven.
Obliviously, after I run previously the Cloudera environment and start all services, like Spark, Yarn, Hive and so on.
All services work fine, and all check, in Cloudera services are green. I had do some test with Impala and that works perfectly.
With Eclipse and Scala-Maven environment, the things became worst: that is my very simple code in Scala:
package org.test.spark
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
object TestSelectAlgorithm {
def main(args: Array[String]) = {
val conf = new SparkConf()
.setAppName("TestSelectAlgorithm")
.setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.sql("SELECT * FROM products").show()
}
}
The test is very simple, because the table "products" exist: if I copy-and-paste the same query on Impala, the query works fine!
On the Eclipse environment, otherwise, I have some problem:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/06/30 05:43:17 INFO SparkContext: Running Spark version 1.6.0
16/06/30 05:43:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/06/30 05:43:18 WARN Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/06/30 05:43:18 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/06/30 05:43:18 INFO SecurityManager: Changing view acls to: cloudera
16/06/30 05:43:18 INFO SecurityManager: Changing modify acls to: cloudera
16/06/30 05:43:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
16/06/30 05:43:19 INFO Utils: Successfully started service 'sparkDriver' on port 53730.
16/06/30 05:43:19 INFO Slf4jLogger: Slf4jLogger started
16/06/30 05:43:19 INFO Remoting: Starting remoting
16/06/30 05:43:19 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.0.2.15:39288]
16/06/30 05:43:19 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 39288.
16/06/30 05:43:19 INFO SparkEnv: Registering MapOutputTracker
16/06/30 05:43:19 INFO SparkEnv: Registering BlockManagerMaster
16/06/30 05:43:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7d685fc0-ea88-423a-9335-42ca12db85da
16/06/30 05:43:19 INFO MemoryStore: MemoryStore started with capacity 1619.3 MB
16/06/30 05:43:20 INFO SparkEnv: Registering OutputCommitCoordinator
16/06/30 05:43:20 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/06/30 05:43:20 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040
16/06/30 05:43:20 INFO Executor: Starting executor ID driver on host localhost
16/06/30 05:43:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57294.
16/06/30 05:43:20 INFO NettyBlockTransferService: Server created on 57294
16/06/30 05:43:20 INFO BlockManagerMaster: Trying to register BlockManager
16/06/30 05:43:20 INFO BlockManagerMasterEndpoint: Registering block manager localhost:57294 with 1619.3 MB RAM, BlockManagerId(driver, localhost, 57294)
16/06/30 05:43:20 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table not found: products;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:306)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:315)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:310)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:265)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:305)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:54)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:310)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:300)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:36)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at org.test.spark.TestSelectAlgorithm$.main(TestSelectAlgorithm.scala:18)
at org.test.spark.TestSelectAlgorithm.main(TestSelectAlgorithm.scala)
16/06/30 05:43:22 INFO SparkContext: Invoking stop() from shutdown hook
16/06/30 05:43:22 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
16/06/30 05:43:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/06/30 05:43:22 INFO MemoryStore: MemoryStore cleared
16/06/30 05:43:22 INFO BlockManager: BlockManager stopped
16/06/30 05:43:22 INFO BlockManagerMaster: BlockManagerMaster stopped
16/06/30 05:43:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/06/30 05:43:22 INFO SparkContext: Successfully stopped SparkContext
16/06/30 05:43:22 INFO ShutdownHookManager: Shutdown hook called
16/06/30 05:43:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-29d381e9-b5e7-485c-92f2-55dc57ca7d25
The main error is (for me):
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table not found: products;
I searched on other site and documentation, and I founded that the problem is connected with the Hive table... but I don't use the Hive table, I use SparkSql...
Can anyone help me, please?
Thank you for any reply.
In spark, For impala there is no direct support as hive has .So, You have to load file. If it is csv you can use spark-csv,
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("your .csv file location")
import sqlContext.implicits._
import sqlContext._
df.registerTempTable("products")
sqlContext.sql("select * from products").show()
pom dependency for spark-csv
<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv_2.10 -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
</dependency>
for avro there is spark-avro
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.avro("your .avro file location")
import sqlContext.implicits._
import sqlContext._
df.registerTempTable("products")
val result= sqlContext.sql("select * from products")
val result.show()
result.write
.format("com.databricks.spark.avro")
.save("Your ouput location")
pom dependency for avro
<!-- http://mvnrepository.com/artifact/com.databricks/spark-avro_2.10 -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-avro_2.10</artifactId>
<version>2.0.1</version>
</dependency>
and parquet spark has in-build support
val sqlContext = new SQLContext(sc)
val parquetFile = sqlContext.read.parquet("your parquet file location")
parquetFile.registerTempTable("products")
sqlContext.sql("select * from products").show()
Can you check /user/cloudera/.sparkStaging/stagingArea location exist or it contains .avro file?? And please change "Your ouput location" by directory location.
Please check avro github page for more detail. https://github.com/databricks/spark-avro
I'm parsing a JSON with Spark SQL and it works really well, it finds the schema and I'm doing queries with it.
Now I need to "flat" the JSON and I have read in the forum that the best way is to Explode with Hive (Lateral View), so I trying to do the same with it. But I can't even create the context... Spark gives me an error and I can't find how to fix it.
As I have said, at this point I'm only trying to create de context:
println ("Create Spark Context:")
val sc = new SparkContext( "local", "Simple", "$SPARK_HOME")
println ("Create Hive context:")
val hiveContext = new HiveContext(sc)
And it gives me this error:
Create Spark Context:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/26 15:13:44 INFO Remoting: Starting remoting
15/12/26 15:13:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#192.168.80.136:40624]
Create Hive context:
15/12/26 15:13:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/12/26 15:13:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/26 15:13:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:14:01 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/26 15:14:01 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)
at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:174)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:177)
at pebd.emb.Bicing$.main(Bicing.scala:73)
at pebd.emb.Bicing.main(Bicing.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.OutOfMemoryError: PermGen space
Process finished with exit code 1
I know it is a very simple question, but I don't really know the reason of that error.
Thank you in advance for everyone.
Here's the relevant part of the exception:
Caused by: java.lang.OutOfMemoryError: PermGen space
You need to increase the amount of PermGen memory that you give to the JVM. By default (SPARK-1879), Spark's own launch scripts increase this to 128 MB, so I think you'll have to do something similar in your IntelliJ run configuration. Try adding -XX:MaxPermSize=128m to the "VM options" list.