Spark Streaming StreamingContext error - scala

Hi i am started spark streaming learning but i can't run an simple application
My code is here
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val conf = new SparkConf().setMaster("spark://beyhan:7077").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
And i am getting error like as the following
scala> val newscc = new StreamingContext(conf, Seconds(1))
15/10/21 13:41:18 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
Thanks

If you are using spark-shell, and it seems like you do, you should not instantiate StreamingContext using SparkConf object, you should pass shell-provided sc directly.
This means:
val conf = new SparkConf().setMaster("spark://beyhan:7077").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
becomes,
val ssc = new StreamingContext(sc, Seconds(1))

It looks like you work in the Spark Shell.
There is already a SparkContext defined for you there, so you don't need to create a new one. The SparkContext in the shell is available as sc
If you need a StreamingContext you can create one using the existing SparkContext:
val ssc = new StreamingContext(sc, Seconds(1))
You only need the SparkConf and SparkContext if you create an application.

Related

How to create SparkSession from existing SparkContext

I have a Spark application which using Spark 2.0 new API with SparkSession.
I am building this application on top of the another application which is using SparkContext. I would like to pass SparkContext to my application and initialize SparkSession using existing SparkContext.
However I could not find a way how to do that. I found that SparkSession constructor with SparkContext is private so I can't initialize it in that way and builder does not offer any setSparkContext method. Do you think there exist some workaround?
Deriving the SparkSession object out of SparkContext or even SparkConf is easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4 but this should work in the older 2.x releases as well):
// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
Hope that helps!
Like in the above example you cannot create because SparkSession's constructor is private
Instead you can create a SQLContext using the SparkContext, and later get the sparksession from the sqlcontext like this
val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession
Hope this helps
Apparently there is no way how to initialize SparkSession from existing SparkContext.
public JavaSparkContext getSparkContext()
{
SparkConf conf = new SparkConf()
.setAppName("appName")
.setMaster("local[*]");
JavaSparkContext jsc = new JavaSparkContext(conf);
return jsc;
}
public SparkSession getSparkSession()
{
sparkSession= new SparkSession(getSparkContext().sc());
return sparkSession;
}
you can also try using builder
public SparkSession getSparkSession()
{
SparkConf conf = new SparkConf()
.setAppName("appName")
.setMaster("local");
SparkSession sparkSession = SparkSession
.builder()
.config(conf)
.getOrCreate();
return sparkSession;
}
val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()

Can SparkContext and StreamingContext co-exist in the same program?

I am trying to set up a Sparkstreaming code which reads line from the Kafka server but processes it using rules written in another local file. I am creating streamingContext for the streaming data and sparkContext for other applying all other spark features - like string manipulation, reading local files etc
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("ReadLine")
val ssc = new StreamingContext(sparkConf, Seconds(15))
ssc.checkpoint("checkpoint")
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
val sentence = lines.toString
val conf = new SparkConf().setAppName("Bi Gram").setMaster("local[2]")
val sc = new SparkContext(conf)
val stringRDD = sc.parallelize(Array(sentence))
But this throws the following error
Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:874)
org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81)
One application can only have ONE SparkContext. StreamingContext is created on SparkContext. Just need to create ssc StreamingContext using SparkContext
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(15))
If using the following constructor.
StreamingContext(conf: SparkConf, batchDuration: Duration)
It internally create another SparkContext
this(StreamingContext.createNewSparkContext(conf), null, batchDuration)
the SparkContext can get from StreamingContext by
ssc.sparkContext
yes you can do it
you have to first start spark session and
then use its context to start any number of streaming context
val spark = SparkSession.builder().appName("someappname").
config("spark.sql.warehouse.dir",warehouseLocation).getOrCreate()
val ssc = new StreamingContext(spark.sparkContext, Seconds(1))
Simple!!!

Importing Spark libraries using Intellij IDEA

I would like to use spark SQL in an Intellij IDEA SBT project.
Even though I have imported the library the code does not seem to import it.
Spark Core seems to be working however.
You can't create a DataFrame from a scala List[A]. You need first to create an RDD[A], and then transform that to a DataFrame. You also need an SQLContext:
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val test = sc.parallelize(List(1,2,3,4)).toDF
For reference this is how the Spark 2.0 boilerplate with spark sql should look like:
import org.apache.spark.sql.SparkSession
object Test {
def main(args: Array[String]) {
val spark = SparkSession.builder()
.master("local")
.appName("some name")
.getOrCreate()
import spark.sqlContext.implicits._
}
}

More than one spark context error

I have this spark code below:
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor }
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.util.Bytes
import kafka.serializer.StringDecoder
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
object Hbase {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Spark-Hbase").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
...
val ssc = new StreamingContext(sparkConf, Seconds(3))
val kafkaBrokers = Map("metadata.broker.list" -> "localhost:9092")
val topics = List("test").toSet
val lines = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaBrokers, topics)
}
}
Now the error I am getting is:
Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
Is there anything wrong with my code above? I do not see where I am creating the context again...
These are the two SparkContext you're creating. This is not allowed.
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sparkConf, Seconds(3))
You should create the streaming context from the original context.
val ssc = new StreamingContext(sc, Seconds(3))
you are initializing two spark context in the same JVM i.e. (sparkContext and streamingContext). That's why you are getting this exception. you can set spark.driver.allowMultipleContexts = true in config. Although, multiple Spark contexts is discouraged. You can get unexpected results.

How to stop a running SparkContext before opening the new one

I am executing tests in Scala with Spark creating a SparkContext as follows:
val conf = new SparkConf().setMaster("local").setAppName("test")
val sc = new SparkContext(conf)
After the first execution there was no error. But now I am getting this message (and a failed test notification):
Only one SparkContext may be running in this JVM (see SPARK-2243).
It looks like I need to check if there is any running SparkContext and stop it before launching a new one (I do not want to allow multiple contexts).
How can I do this?
UPDATE:
I tried this, but there is the same error (I am running tests from IntellijIdea and I make the code before executing it):
val conf = new SparkConf().setMaster("local").setAppName("test")
// also tried: .set("spark.driver.allowMultipleContexts", "true")
UPDATE 2:
class TestApp extends SparkFunSuite with TestSuiteBase {
// use longer wait time to ensure job completion
override def maxWaitTimeMillis: Int = 20000
System.clearProperty("spark.driver.port")
System.clearProperty("spark.hostPort")
var ssc: StreamingContext = _
val config: SparkConf = new SparkConf().setMaster("local").setAppName("test")
.set("spark.driver.allowMultipleContexts", "true")
val sc: SparkContext = new SparkContext(config)
//...
test("Test1")
{
sc.stop()
}
}
To stop existing context you can use stop method on a given SparkContext instance.
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = ???
val sc: SparkContext = new SparkContext(conf)
...
sc.stop()
To reuse existing context or create a new one you can use SparkContex.getOrCreate method.
val sc1 = SparkContext.getOrCreate(conf)
...
val sc2 = SparkContext.getOrCreate(conf)
When used in test suites both methods can be used to achieve different things:
stop - stopping context in afterAll method (see for example MLlibTestSparkContext.afterAll)
getOrCreate - to get active instance in individual test cases (see for example QuantileDiscretizerSuite)