spark scala warning : do they need to be resolved? - scala

I get below screen when I run Scala (Using Scala version 2.10.5). I am using it along with Spark version 1.6.1. The book that I am learning from has a screenshot and it doesn't show so many warnings. Why am I getting those warning. Do I need to resolve them for a smooth execution?
c:\spark-1.6.1-bin-hadoop2.6\spark-1.6.1-bin-hadoop2.6>bin\spark-shell
Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
16/03/16 08:12:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/03/16 08:13:09 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/c:/spark-1.6.1-bin-hado
p2.6/spark-1.6.1-bin-hadoop2.6/bin/../lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2
6/lib/datanucleus-rdbms-3.2.9.jar."
16/03/16 08:13:09 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/spark-1.6.1-bin-hadoop2.
/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/c:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2.6/bin/..
lib/datanucleus-api-jdo-3.2.6.jar."
16/03/16 08:13:09 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/spark-1.6.1-bin-hadoop2.6/spark-
.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/c:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2.6/bin/../lib/datan
cleus-core-3.2.10.jar."
16/03/16 08:13:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/16 08:13:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/16 08:13:14 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/03/16 08:13:14 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/03/16 08:13:15 WARN : Your hostname, njog-MOBL1 resolves to a loopback/non-reachable address: 192.168.56.1, but we couldn't find any external IP address!
16/03/16 08:13:22 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/c:/spark-1.6.1-bin-hado
p2.6/spark-1.6.1-bin-hadoop2.6/bin/../lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2
6/lib/datanucleus-rdbms-3.2.9.jar."
16/03/16 08:13:22 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/spark-1.6.1-bin-hadoop2.
/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/c:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2.6/bin/..
lib/datanucleus-api-jdo-3.2.6.jar."
16/03/16 08:13:22 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/spark-1.6.1-bin-hadoop2.6/spark-
.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/c:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2.6/bin/../lib/datan
cleus-core-3.2.10.jar."
16/03/16 08:13:22 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/16 08:13:22 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.

Related

Issue after Spark Installation on Windows 10

This is a cmd log that I see after running spark-shell command (C:\Spark>spark-shell). As I understand, it's mainly an issue with Hadoop. I use Windows 10. Can somehow please with the below issue?
C:\Users\mac>cd c:\
c:\>winutils\bin\winutils.exe chmod 777 \tmp\hive
c:\>cd c:\spark
c:\Spark>spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/05/14 13:21:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/14 13:21:34 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/c:/Spark/bin/../jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/jars/datanucleus-rdbms-3.2.9.jar."
17/05/14 13:21:34 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/c:/Spark/bin/../jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/jars/datanucleus-core-3.2.10.jar."
17/05/14 13:21:34 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/c:/Spark/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/jars/datanucleus-api-jdo-3.2.6.jar."
17/05/14 13:21:48 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.1.9:4040
Spark context available as 'sc' (master = local[*], app id = local-1494764489031).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
There's no issue in your output. These WARN messages can simply be ignored.
In other words, it looks like you've installed Spark 2.1.1 on Windows 10 properly.
To make sure you installed it properly (so I could remove looks from above sentence) is to do the following:
spark.range(1).show
That by default will trigger loading Hive classes that may or may not end up with exceptions on Windows due to Hadoop's requirements (and hence the need for winutils.exe to work them around).

Cant find class in uber jar

I am on Hortonworks Distribution 2.4 (effectively hadoop 2.7.1 and spark 1.6.1)
I am packaging my own version of spark in the uber jar (2.1.0) while cluster is on 1.6.1. In the process, i am sending all required libraries through a fat jar (built using maven - uber jar concept).
However, spark submit (through spark 2.1.0 client) fails citing NoClassFound Error on jersey client. Upon listing my uber jar contents, i can see the exact class file in the jar, still spark/yarn cant find it.
here goes -
The error message -
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
And here is my attempt to find the class in jar file -
jar -tf uber-xxxxx-something.jar | grep jersey | grep ClientCon
com/sun/jersey/api/client/ComponentsClientConfig.class
com/sun/jersey/api/client/config/ClientConfig.class
... Other files
what could be going on here ? Suggestions ? ideas please..
EDIT
the jersey client section of the pom goes here -
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-client</artifactId>
<version>1.19.3</version>
</dependency>
EDIT
I also wanted to indicate this, that my code is compiled with Scala 2.12 with compatibility level set to 2.11. However, the cluster is perhaps on 2.10. I am saying perhaps since I believe that cluster nodes dont necessarily have to have Scala binaries installed; YARN just launches the components' jar/class files without using Scala binaries. wonder if thats playing a role here !!!

Warnings when starting Spark application on Ubuntu 16.04 [duplicate]

On Mac OS X, I compiled Spark from the sources using the following command:
jacek:~/oss/spark
$ SPARK_HADOOP_VERSION=2.4.0 SPARK_YARN=true SPARK_HIVE=true SPARK_GANGLIA_LGPL=true xsbt
...
[info] Set current project to root (in build file:/Users/jacek/oss/spark/)
> ; clean ; assembly
...
[info] Packaging /Users/jacek/oss/spark/examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.4.0.jar ...
[info] Done packaging.
[info] Done packaging.
[success] Total time: 1964 s, completed May 9, 2014 5:07:45 AM
When I started ./bin/spark-shell I noticed the following WARN message:
WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
What might be the issue?
jacek:~/oss/spark
$ ./bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/05/09 21:11:17 INFO SecurityManager: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/05/09 21:11:17 INFO SecurityManager: Changing view acls to: jacek
14/05/09 21:11:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jacek)
14/05/09 21:11:17 INFO HttpServer: Starting HTTP Server
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0)
Type in expressions to have them evaluated.
Type :help for more information.
...
14/05/09 21:11:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...
Supported Platforms of the Native Libraries Guide documentation in Apache Hadoop reads:
The native hadoop library is supported on *nix platforms only. The
library does not to work with Cygwin or the Mac OS X platform.
The native hadoop library is mainly used on the GNU/Linus platform and
has been tested on these distributions:
RHEL4/Fedora
Ubuntu
Gentoo
On all the above distributions a 32/64 bit native hadoop library will work with a respective 32/64 bit jvm.
It appears that the WARN message should be disregarded on Mac OS X as the native library doesn't simply exist for the platform.
In my experience, if you cd into the /sparkDir/conf and rename the spark-env.sh.template to spark-env.sh, and then set the JAVA_OPTSand hadoop_DIR, it works.
You will also have to edit this /etc/profile line:
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH

Play framework bind error

I have a play application which I'm actively working on development environment. Somehow it started to give me an error;
ProvisionException: Unable to provision, see the following errors:
1) Error in custom provider, org.jboss.netty.channel.ChannelException: Failed to bind to: /127.0.0.1:2552
while locating play.api.libs.concurrent.ActorSystemProvider
while locating akka.actor.ActorSystem
for parameter 6 at play.api.DefaultApplication.<init>(Application.scala:240)
at play.api.DefaultApplication.class(Application.scala:240)
while locating play.api.DefaultApplication
while locating play.api.Application
I verified that port is not used by any other application but on console I see that
Caused by: java.net.BindException: Address already in use
Play version 2.4.3 and scala version 2.11.7
I found the answer it's simple mistake but I want to document it here to help others like me.
Play application had a dependency to another akka based module. After some changes this jar file was packaged with application.conf which is configuring the akka-remote on 2552 port.
Excluding application.conf from dependency solved problem.

"org.datanucleus" is already registered under Spring Source Toosuite

Exception in thread "main" Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/zakaria/.m2/repository/org/datanucleus/datanucleus-core/1.1.6/datanucleus-core-1.1.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/zakaria/springsource/sts-2.3.3.M2/plugins/com.google.appengine.eclipse.sdkbundle.1.3.5_1.3.5.v201007021040/appengine-java-sdk-1.3.5/lib/user/orm/datanucleus-core-1.1.5.jar."
org.datanucleus.exceptions.NucleusException: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/zakaria/.m2/repository/org/datanucleus/datanucleus-core/1.1.6/datanucleus-core-1.1.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/zakaria/springsource/sts-2.3.3.M2/plugins/com.google.appengine.eclipse.sdkbundle.1.3.5_1.3.5.v201007021040/appengine-java-sdk-1.3.5/lib/user/orm/datanucleus-core-1.1.5.jar."
at org.datanucleus.plugin.NonManagedPluginRegistry.registerBundle(NonManagedPluginRegistry.java:434)
at org.datanucleus.plugin.NonManagedPluginRegistry.registerBundle(NonManagedPluginRegistry.java:340)
at org.datanucleus.plugin.NonManagedPluginRegistry.registerExtensions(NonManagedPluginRegistry.java:222)
at org.datanucleus.plugin.NonManagedPluginRegistry.registerExtensionPoints(NonManagedPluginRegistry.java:153)
at org.datanucleus.plugin.PluginManager.registerExtensionPoints(PluginManager.java:82)
at org.datanucleus.OMFContext.(OMFContext.java:160)
at org.datanucleus.enhancer.DataNucleusEnhancer.(DataNucleusEnhancer.java:172)
at org.datanucleus.enhancer.DataNucleusEnhancer.(DataNucleusEnhancer.java:150)
at org.datanucleus.enhancer.DataNucleusEnhancer.main(DataNucleusEnhancer.java:1157)
Can you help me please? I tried to re-enter the following command in the built-in Roo Shell but no way:
persistence setup --provider DATANUCLEUS --database HYPERSONIC_IN_MEMORY
Thanks,
Regards.
As the message says, you have two "datanucleus-core" jars in the CLASSPATH, so check your CLASSPATH