Hive Table Creation Using MongoDB Hadoop Driver - mongodb

I am trying to connect from a Hive Database to a collection in MongoDB using a driver (jars) provided on the wiki site. Here are the steps I did: -
I created a collection in MongoDB called "Diamond" under a database called "Moe" and it has got 20 documents:
I wanted to connect from Hive via the Hadoop MongoDB Driver and view these documents via Hive.
I have both MongoDB and Hive installed on the same server and configured. However I don't see any variable called the HIVE_CLASPATH I wonder where that is.
So I installed 3 divers on the server: -
mongo-hadoop-core-1.5.2.jar;
mongo-hadoop-hive-1.5.2.jar;
mongo-java-driver-3.0.0.jar;
Now, I connect to Hive, and then add these 2 jar's to my classpath by the following commands: - (they get added successfully)
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-hadoop-hive-1.5.2.jar;
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-hadoop-core-1.5.2.jar;
add jar /hadoopgdc/hadoop-2.6.0/share/hadoop/common/lib/mongo-java-driver-3.0.0.jar;
Now I create a table in HIVE: -
CREATE TABLE Diamond
(
carat DOUBLE,
cut STRING,
color STRING,
clarity STRING,
depth DOUBLE,
table DOUBLE,
price DOUBLE,
xcord DOUBLE,
ycord DOUBLE,
zcord DOUBLE
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"carat":"carat","cut":"cut",
"color":"color", "clarity":"clarity", "depth":"depth", "table":"table",
"price":"price", "xcord":"x", "ycord":"y", "zcord":"z"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/Moe.Diamond');
However when I execute the above command in Hive I get the error below: -
java.lang.NoClassDefFoundError: com/mongodb/util/JSON
at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:110)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:210)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:268)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:261)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:587)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:573)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3784)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:256)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:155)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: com.mongodb.util.JSON
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 23 more
FAILED: Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.DDLTask
I have tried the following: -
- placing the jars in every possible directory with no effect
- The class that is supposed to be missing, is pretty much present in the jar file.
- oh yes and the MongoStorageHandler class is very much in the jar.
I am done breaking my head with this !! If anyone can shed some light on what I could do to alleviate my anxiety, it would be great.
Thanks again.
Mario

I identified what the issue was. To connect from HIVE to MongoDB, the MongoDb Driver uses invokes a java class in a hive jar library
**
java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.hooks.PreExecutePrinter
**
Now this class is supposed to be found in the jar file - hive-exec-0.11.0.1.3.2.0-111.jar. However it is available only in more recent versions of HIVE and not older ones.
It is not available in 0.11.0.1.3.2.0-111 but is visibly detectable in 0.13.0.2.1.7.0-784.
The solution here was to connect to a version of HIVE that is supported by the driver. MongoDB does state that its driver supports a certain version of Hadoop, but doesn't drill down to the individual Application (HIVE / SQOOP).

Related

AWS GLUE: Cassandra connection using SSL is not working

I wanted to connect to Cassandra using Spark, when trying to connect Cassandra using the default port it is working, but when I try accessing it via SSL the job fails, below is the code:
val spark: SparkSession = SparkSession.builder()
.config("spark.cassandra.connection.host","server.abc")
.config("spark.cassandra.connection.port","9142")
.config("spark.cassandra.connection.ssl.enabled",true)
.config("spark.cassandra.connection.ssl.trustStore.path","s3:/dev-code/certs/trust.jks")
.config("spark.cassandra.connection.ssl.trustStore.password","mypass")
.config("spark.cassandra.auth.username","myuser")
.config("spark.cassandra.auth.password","userpass")
.appName("CassandraIntegration").getOrCreate()
FYI: it has access to the S3 bucket, I am able to read the CSV file from the same location. Also, both the ports are enabled 9042 and 9142. Closed 9042 and kept only 9142 port still the error persists.
Below is the error:
ERROR [main] glue.ProcessLauncher (Logging.scala:logError(94)): Exception in User Class
java.io.IOException: Failed to open native connection to Cassandra at {server.abc:9142} :: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
at com.datastax.spark.connector.cql.CassandraConnector$.createSession(CassandraConnector.scala:173)
at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$1(CassandraConnector.scala:161)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
at com.datastax.spark.connector.datasource.CassandraCatalog$.com$datastax$spark$connector$datasource$CassandraCatalog$$getMetadata(CassandraCatalog.scala:455)
at com.datastax.spark.connector.datasource.CassandraCatalog$.getTableMetaData(CassandraCatalog.scala:421)
at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:68)
at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:296)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
at MyCsvToCassandrsJob$.main(csv-to-cassanra-job:63)
at MyCsvToCassandrsJob.main(csv-to-cassanra-job-job)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke(ProcessLauncher.scala:47)
at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke$(ProcessLauncher.scala:47)
at com.amazonaws.services.glue.ProcessLauncher$$anon$1.invoke(ProcessLauncher.scala:75)
at com.amazonaws.services.glue.ProcessLauncher.launch(ProcessLauncher.scala:123)
at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:29)
at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala)
Caused by: java.lang.IllegalArgumentException: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:253)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:108)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslEngineFactory(DefaultDriverContext.java:414)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.lambda$new$4(DefaultDriverContext.java:279)
at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslEngineFactory(DefaultDriverContext.java:733)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslHandlerFactory(DefaultDriverContext.java:470)
at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslHandlerFactory(DefaultDriverContext.java:799)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:348)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1100(DefaultSession.java:300)
at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:146)
at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
at com.datastax.oss.driver.shaded.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
at com.datastax.oss.driver.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at com.datastax.oss.driver.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Cannot initialize SSL Context
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:246)
... 18 more
Caused by: java.nio.file.NoSuchFileException: s3:/dev-code/certs/trust.jks
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.buildContext(DefaultSslEngineFactory.java:119)
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:72)
... 23 more
Big help if there is any workaround for this problem.
At the bottom of your error message, I see this:
NoSuchFileException: s3:/dev-code/certs/trust.jks
Alex is right, in that you need to provide a path to that file that the Spark connector can actually get to. From the looks of it, S3 won't work here.
Added the .jks s3 file into "Referenced files path" of Glue Job and then just try to access just provide the file name. As the file will be automatically be placed under /tmp folder. But it will still not solve the issue.
From the this website, I understood that we need to provide all the default values as well:
Below is my final code:
val spark: SparkSession = SparkSession.builder()
.config("spark.cassandra.connection.host","server.abc")
.config("spark.cassandra.connection.port","9142")
.config("spark.cassandra.connection.ssl.enabled",true)
.config("spark.cassandra.connection.ssl.enabledAlgorithms", "TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA")
.config("spark.cassandra.connection.ssl.trustStore.path","trust.jks")
.config("spark.cassandra.connection.ssl.trustStore.password","mypass")
.config("spark.cassandra.connection.ssl.trustStore.type","JKS")
.config("spark.cassandra.connection.ssl.protocol","TLS")
.config("spark.cassandra.auth.username","myuser")
.config("spark.cassandra.auth.password","userpass")
.appName("CassandraIntegration").getOrCreate()

Spark executor is throwing error "java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver"

I am trying to import a table from my oracle database using spark and here I am using Scala to import the table.
My jdbc driver is ojdbc7.jar and it's added in both the parameter spark.driver.extraClassPath and spark.executor.extraClassPath in configuration file
spark.driver.extraClassPath :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/s
hare/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/ojdbc7.jar
spark.driver.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.executor.extraClassPath :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/s
hare/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/ojdbc7.jar
spark.executor.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
I can successfully import the table. I can print the schema of the table. But while performing any operations like Count,show() it throws below error
`
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver
at java.lang.ClassLoader.findClass(ClassLoader.java:530) at
org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
at
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:77)
... 21 more
`
This error was because Spark was not able to locate the ojdbc7.jar from every core node. So placing this jar in a shared location like /usr/lib/spark/jars will resolve this issue.
You can also do few other things including adding jar file full location as a dependency under spark section in the interpreter as an artifact
If you just want %jdbc to work, update the jdbc section under interpreter, add the jar file full location as a artifact under the dependencies and also update the default.driver, default.url, default.user, default.password accordingly

Is it possible to deploy Orbeon on Oracle Weblogic 12c?

I am trying to deploy Orbeon PE 2016.3 on Weblogic 12c with no luck.
I have tried to deploy the orbeon.war as is to weblogic, also tried along with the other wars that were included in the zip package and the solution suggested in Orbeon documentation to create a custom EAR to avoid some weblogic 11g dependencies.
The stacktrace I get is:
Root cause of ServletException.
java.lang.NoClassDefFoundError: Could not initialize class org.orbeon.oxf.util.DateUtils$
at org.orbeon.oxf.common.PEVersion$LicenseInfo$$anonfun$8.apply(PEVersion.scala:169)
at org.orbeon.oxf.common.PEVersion$LicenseInfo$$anonfun$8.apply(PEVersion.scala:169)
at scala.Option.map(Option.scala:146)
at org.orbeon.oxf.common.PEVersion$LicenseInfo$.apply(PEVersion.scala:169)
at org.orbeon.oxf.common.PEVersion$LicenseInfo$$anonfun$tryApply$1.apply(PEVersion.scala:175)
If I add joda.time to the prefered application packages in weblogic-application.xml this error disappears, but not sure if that is all.
Also when Orbeon is loaded in Weblogic I get the following error:
<Error> <Weblogic-Validation> <BEA-2156404> <Unable to find a Validation Context. java.lang.RuntimeException: javax.naming.NamingException: String index out of range: -1
java.lang.RuntimeException: javax.naming.NamingException: String index out of range: -1
at weblogic.validation.ValidationContextManager.getValidationContext(ValidationContextManager.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
Truncated. see log file for complete stacktrace
Caused By: javax.naming.NamingException: String index out of range: -1
at weblogic.jndi.Environment.getContext(Environment.java:311)
at weblogic.jndi.Environment.getContext(Environment.java:288)
at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:117)
at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684)
at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:307)
I also noticed that the ASCII art in the logs is shown 1 time, then 2 times together (Orbeon Forms, Orbeon Forms) and then 3 times.

'new HiveContext' is wanting an X11 display? com.trend.iwss.jscan?

Spark 1.6.2 (YARN master)
Package name: com.example.spark.Main
Basic SparkSQL code
val conf = new SparkConf()
conf.setAppName("SparkSQL w/ Hive")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
import hiveContext.implicits._
// val rdd = <some RDD making>
val df = rdd.toDF()
df.write.saveAsTable("example")
And stacktrace...
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.<init>(Window.java:536)
at java.awt.Frame.<init>(Frame.java:420)
at java.awt.Frame.<init>(Frame.java:385)
at com.trend.iwss.jscan.runtime.BaseDialog.getActiveFrame(BaseDialog.java:75)
at com.trend.iwss.jscan.runtime.AllowDialog.make(AllowDialog.java:32)
at com.trend.iwss.jscan.runtime.PolicyRuntime.showAllowDialog(PolicyRuntime.java:325)
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopActionInner(PolicyRuntime.java:240)
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopAction(PolicyRuntime.java:172)
at com.trend.iwss.jscan.runtime.PolicyRuntime.stopAction(PolicyRuntime.java:165)
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime.checkURL(NetworkPolicyRuntime.java:284)
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime._preFilter(NetworkPolicyRuntime.java:164)
at com.trend.iwss.jscan.runtime.PolicyRuntime.preFilter(PolicyRuntime.java:132)
at com.trend.iwss.jscan.runtime.NetworkPolicyRuntime.preFilter(NetworkPolicyRuntime.java:108)
at org.apache.commons.logging.LogFactory$5.run(LogFactory.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.commons.logging.LogFactory.getProperties(LogFactory.java:1376)
at org.apache.commons.logging.LogFactory.getConfigurationFile(LogFactory.java:1412)
at org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:455)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.<clinit>(HadoopShimsSecure.java:60)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:146)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:141)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.spark.sql.hive.client.ClientWrapper.overrideHadoopShims(ClientWrapper.scala:116)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:69)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:345)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:255)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:459)
at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:233)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:236)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at com.example.spark.Main1$.main(Main.scala:52)
at com.example.spark.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/2.5.3.0-37/0/ivysettings.xml will be used
This same code was working a week ago on a fresh HDP cluster, and it works fine in the sandbox... the only thing I remember doing was trying to change around the JAVA_HOME variable, but I am fairly sure I undid those changes.
I'm at a loss - not sure how to start tracking down the issue.
The cluster is headless, so of-course it has no X11 display, but what piece of new HiveContext even needs to pop-up any JFrame?
Based on the logs, I'd say it's a Java configuration issue I messed up and something within org.apache.hadoop.hive.shims.HadoopShimsSecure.<clinit>(HadoopShimsSecure.java:60) got triggered, therefore a Java security dialog is appearing, but I don't know.
Can't do X11 forwarding, and tried to do export SPARK_OPTS="-Djava.awt.headless=true" before a spark-submit, and that didn't help.
Tried these, but again, can't forward and don't have a display
Getting a HeadlessException: No X11 DISPLAY variable was set
"No X11 DISPLAY variable" - what does it mean?
The error seems to be reproducible on two of the Spark clients.
Only on one machine did I try changing JAVA_HOME.
Did an Ambari Hive service-check. Didn't fix it.
Can connect fine to Hive database via Hive/Beeline CLI
As far as the Spark code is concerned, this seemed to mitigate the error.
val conf = SparkConf()
conf.set("spark.executor.extraJavaOptions" , "-Djava.awt.headless=true")
Original answer
Found this post. Spring 3.0.5 - java.awt.HeadlessException - com.trend.iwss.jscan
Basically, Trend Micro is inserting some com.trend.iwss.jscan package into the JAR files that are downloaded via Maven through a company firewall , and I have no control over that.
(link not working) http://esupport.trendmicro.com/Pages/IWSx-3x-Some-files-and-folders-are-added-to-the-Jar-files-after-passin.aspx
Wayback Machine to the rescue...
If anyone else has input, I would also like to hear it.
Problem:
When downloading some .JAR files via IWSA, a directory filled with .class file, which is not related to what is being downloaded, is added to the jar file (com\trend\iwss\jscan\runtime\).
Solution:
This happens because if a JAR file is originally unsigned, IWSA will insert some code into the applet to monitor and restrict potential harmful actions.
For IWSS/IWSA, every "get" request is the same so it will not know if you are trying to download an archive or an applet, which will be executed by your browser.
This code is added for security reason to monitor the behavior of the "possible" applet to be sure that it does not do any harm to the machine and its environment.
To prevent this issue, please follow these steps:
Log on to the IWSS web console.
Go to HTTP > Applets and ActiveX > Policies > Java Applet Security Rules.
Under Java Applet Security, change the value of "No signature" to either "Pass" or "Block", depending on what you want to do with the unsigned .JAR files.
Click Save.

MongoDB hadoop connector fails to query on mongo hive table

I am using MongoDB hadoop connector to query mongoDB using hive table in hadoop.
I am able to execute
select * from mongoDBTestHiveTable;
But when I try to execute following query
select id from mongoDBTestHiveTable;
it throws following exception.
Following class exist in hive lib folder.
Exception stacktrace:
Diagnostic Messages for this Task:
Error: java.io.IOException: Cannot create an instance of InputSplit class = com.mongodb.hadoop.hive.input.HiveMongoInputFormat$MongoHiveInputSplit:Class com.mongodb.hadoop.hive.input.HiveMongoInputFormat$MongoHiveInputSplit not found
at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:147)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:370)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.hive.input.HiveMongoInputFormat$MongoHiveInputSplit not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:144)
... 10 more
Container killed by the ApplicationMaster.
Please advice.
You also need to add mongo-hadoop-* and well as mongo driver jars to MR1/MR2 classpath on all workers