SparkContext not initializing in zeppelin - pyspark

While I am trying to import SparkContext from pyspark on apache zeppelin, following error is shown:

You should use the spark.pyspark interpreter instead of the python one if you need a spark context.
Try:
%spark.pyspark
print(sc)

Related

NoSuchMethodError in google dataproc cluster for excel files

While consuming Excel file in dataproc cluster, getting errorjava.lang.NoSuchMethodError.
Note: schema is getting printed but not the actual data.
Error:
py4j.protocol.Py4JJavaError: An error occurred while calling
o74.showString. : java.lang.NoSuchMethodError:
scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at
com.crealytics.spark.excel.ExcelRelation.buildScan(ExcelRelation.scala:74)
Code:
from pyspark.sql import SparkSession
from pyspark import SparkConf, SparkContext
from google.cloud import storage
from google.cloud import bigquery
import pyspark
client = storage.Client()
bucket_name = "test_bucket"
path=f"gs://{bucket_name}/test_file.xlsx"
def make_spark_session(app_name, jars=[]):
configuration = (SparkConf()
.set("spark.jars", ','.join(jars)))
spark = SparkSession.builder.appName(app_name) \
.config(conf=configuration).getOrCreate()
return spark
app_name = 'test_app'
jars = ['gs://bucket/spark-excel_2.11_uber-0.12.0.jar']
spark = make_spark_session(app_name,jars)
df = spark.read.format("com.crealytics.spark.excel") \
.option("useHeader","true") \
.load(path)
df.show(1)
This appears to be Scala version mismatch between your job jars and the cluster. Both Dataproc 1.5 and 2.0 come with Scala 2.12. The gs://bucket/spark-excel_2.11_uber-0.12.0.jar in your code seems to be Scala 2.11 based, you might want to use spark-excel_2.12_... instead. In addition to that, make sure your Spark application is also built with Scala 2.12.

How to fix 'Exception: Java gateway process exited before sending its port number' in Eclipse IDE

I am trying to connect MySQL using pyspark in pydev environment of Eclipse IDE.
Getting below error:
Exception: Java gateway process exited before sending its port number
I have checked Java is properly installed and also set PYSPARK_SUBMIT_ARGS to value --master local[*] --jars path\mysql-connector-java-5.1.44-bin.jar pyspark-shell in windows-> preferences->Pydev->Python Interpreter->Environment.
Java Path is also set. Tried setting it via code also but no luck.
#import os
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql.context import SQLContext
#os.environ['JAVA_HOME']= 'C:/Program Files/Java/jdk1.8.0_141/'
#os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars D:/Softwares/mysql-connector-java-5.1.44.tar/mysql-connector-java-5.1.44/mysql-connector-java-5.1.44-bin.jar pyspark-shell'
conf = SparkConf().setMaster('local').setAppName('MySQLdataread')
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "XXXXX").option("user", "root").option("password", "XXXX").load()
dataframe_mysql.show()
My problem was slightly different, I am running spark in spyder with windows.
when I am using
from pyspark.sql import SQLContext, SparkSession
I had the issue and followed google search links and not able to solve the problem.
Then I changed the import to:
from pyspark.sql import SparkSession
from pyspark import SQLContext
and the error message disappeared.
I am running on Windows, anaconda3, python3.7, spyder Hope it is helpful to someone.
Edit:
Later, i found the real problem is from the following. When any of the configuration was not working properly, the same exception shows up. Previously, I used 28gb and 4gb instead of 28g and 4g and that cause all the problems I had.
from pyspark.sql import SparkSession
from pyspark import SQLContext
spark = SparkSession.builder \
.master('local') \
.appName('muthootSample1') \
.config('spark.executor.memory', '28g') \
.config('spark.driver.memory','4g')\
.config("spark.cores.max", "6") \
.getOrCreate()

HiveException when running a sql example in Spark shell

a newbie in apache spark here! I am using Spark 2.4.0 and Scala version 2.11.12, and I'm trying to run the following code in my spark shell -
import org.apache.spark.sql.SparkSession
import spark.implicits._
var df = spark.read.json("storesales.json")
df.createOrReplaceTempView("storesales")
spark.sql("SELECT * FROM storesales")
And I get the following error -
2018-12-18 07:05:03 WARN Hive:168 - Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.
hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:117)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java
:62)
I also saw this Issues trying out example in Spark-shell and as per the accepted answer, I have tried to start my spark shell like so,
~/spark-2.4.0-bin-hadoop2.7/bin/spark-shell --conf spark.sql.warehouse.dir=file:///tmp/spark-warehouse, however, it did not help and the issue persists.

How to use mesos master url in a self-contained Scala Spark program

I am creating a self-contained Scala program that uses Spark for parallelization in some parts. In my specific situation, the Spark cluster is available through mesos.
I create spark context like this:
val conf = new SparkConf().setMaster("mesos://zk://<mesos-url1>,<mesos-url2>/spark/mesos-rtspark").setAppName("foo")
val sc = new SparkContext(conf)
I found out from searching around that you have to specify MESOS_NATIVE_JAVA_LIBRARY env var to point to the libmesos library, so when running my Scala program I do this:
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.dylib sbt run
But, this results in a SparkException:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: 'mesos://zk://<mesos-url1>,<mesos-url2>/spark/mesos-rtspark'
At the same time, using spark-submit seems to work fine after exporting the MESOS_NATIVE_JAVA_LIBRARY env var.
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.dylib spark-submit --class <MAIN CLASS> ./target/scala-2.10/<APP_JAR>.jar
Why?
How can I make the standalone program run like spark-submit?
Add spark-mesos jar to your classpath.

spark 1.5.2 - Programmatically launching spark on yarn-client mode

we were using spark 1.3.1 and launching our spark jobs on yarn-client mode programmatically via creating a sparkConf and sparkContext object manually. It was inspired from spark self-contained application example here:
https://spark.apache.org/docs/1.5.2/quick-start.html#self-contained-applications\
Only additional configuration we would provide would be all related to yarn like executor instance, cores etc.
However after upgrading to spark 1.5.2 above application breaks on a line val sparkContext = new SparkContext(sparkConf)
It throws following in driver application:
16/01/28 17:38:35 ERROR util.Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)
at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1228)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:100)
at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1749)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1748)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:593)
So is this approach still supposed to work? Or do I must use SparkLauncher class with spark 1.5.2 to launch spark job programmatically on yarn-client mode?