Can SparkContext and StreamingContext co-exist in the same program? - scala

I am trying to set up a Sparkstreaming code which reads line from the Kafka server but processes it using rules written in another local file. I am creating streamingContext for the streaming data and sparkContext for other applying all other spark features - like string manipulation, reading local files etc
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("ReadLine")
val ssc = new StreamingContext(sparkConf, Seconds(15))
ssc.checkpoint("checkpoint")
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
val sentence = lines.toString
val conf = new SparkConf().setAppName("Bi Gram").setMaster("local[2]")
val sc = new SparkContext(conf)
val stringRDD = sc.parallelize(Array(sentence))
But this throws the following error
Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:874)
org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81)

One application can only have ONE SparkContext. StreamingContext is created on SparkContext. Just need to create ssc StreamingContext using SparkContext
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(15))
If using the following constructor.
StreamingContext(conf: SparkConf, batchDuration: Duration)
It internally create another SparkContext
this(StreamingContext.createNewSparkContext(conf), null, batchDuration)
the SparkContext can get from StreamingContext by
ssc.sparkContext

yes you can do it
you have to first start spark session and
then use its context to start any number of streaming context
val spark = SparkSession.builder().appName("someappname").
config("spark.sql.warehouse.dir",warehouseLocation).getOrCreate()
val ssc = new StreamingContext(spark.sparkContext, Seconds(1))
Simple!!!

Related

Can not read MongoDB secondary node by Spark in scala

I wrote a Spark code to read data from MongoDB via Scala. Some of the code examples as following:
val mongoConfig = new Configuration()
mongoConfig.set("mongo.input.uri","mongodb://secondarydb.test.local/testdb.test?readPreference=secondary")
val sparkConf = new SparkConf().setMaster("local[5]")
val sc = new SparkContext(sparkConf)
val documents = sc.newAPIHadoopRDD(mongoConfig,classOf[MongoInputFormat],classOf[Object], classOf[BSONObject])
I did add readPreference=secondary, but I still get the following exception:
Exception in thread "main" com.mongodb.MongoNotPrimaryException:
The server is not the primary and did not execute the operation

Saving data in to a different host in spark scala application

I'm trying to persist a data frame created out of a kafka topic data in to a different host.
The code i've used:
val topicMaps = Map("topic" -> 2)
val conf = new Configuration()
conf.set("fs.defaultFS","maprfs://host-2:7222")
val fs =FileSystem.get(conf)
val messages = KafkaUtils.createStream[String, String,StringDecoder,StringDecoder](ssc, kafkaConf, topicMaps, StorageLevel.MEMORY_ONLY_SER)
messages.foreachRDD(rdd=>
{
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val dataframe =sqlContext.read.json(rdd.map(_._2))
val myDF =dataframe.toDF()
import org.apache.spark.sql.SaveMode
myDF.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("maprfs://host-2:7222/hdfs/path")
})
The above code has created a path in the host directory, but the data is not being written whatsoever.
Any help is appreciated.

More than one spark context error

I have this spark code below:
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor }
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.util.Bytes
import kafka.serializer.StringDecoder
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
object Hbase {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Spark-Hbase").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
...
val ssc = new StreamingContext(sparkConf, Seconds(3))
val kafkaBrokers = Map("metadata.broker.list" -> "localhost:9092")
val topics = List("test").toSet
val lines = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaBrokers, topics)
}
}
Now the error I am getting is:
Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
Is there anything wrong with my code above? I do not see where I am creating the context again...
These are the two SparkContext you're creating. This is not allowed.
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sparkConf, Seconds(3))
You should create the streaming context from the original context.
val ssc = new StreamingContext(sc, Seconds(3))
you are initializing two spark context in the same JVM i.e. (sparkContext and streamingContext). That's why you are getting this exception. you can set spark.driver.allowMultipleContexts = true in config. Although, multiple Spark contexts is discouraged. You can get unexpected results.

How to stop a running SparkContext before opening the new one

I am executing tests in Scala with Spark creating a SparkContext as follows:
val conf = new SparkConf().setMaster("local").setAppName("test")
val sc = new SparkContext(conf)
After the first execution there was no error. But now I am getting this message (and a failed test notification):
Only one SparkContext may be running in this JVM (see SPARK-2243).
It looks like I need to check if there is any running SparkContext and stop it before launching a new one (I do not want to allow multiple contexts).
How can I do this?
UPDATE:
I tried this, but there is the same error (I am running tests from IntellijIdea and I make the code before executing it):
val conf = new SparkConf().setMaster("local").setAppName("test")
// also tried: .set("spark.driver.allowMultipleContexts", "true")
UPDATE 2:
class TestApp extends SparkFunSuite with TestSuiteBase {
// use longer wait time to ensure job completion
override def maxWaitTimeMillis: Int = 20000
System.clearProperty("spark.driver.port")
System.clearProperty("spark.hostPort")
var ssc: StreamingContext = _
val config: SparkConf = new SparkConf().setMaster("local").setAppName("test")
.set("spark.driver.allowMultipleContexts", "true")
val sc: SparkContext = new SparkContext(config)
//...
test("Test1")
{
sc.stop()
}
}
To stop existing context you can use stop method on a given SparkContext instance.
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = ???
val sc: SparkContext = new SparkContext(conf)
...
sc.stop()
To reuse existing context or create a new one you can use SparkContex.getOrCreate method.
val sc1 = SparkContext.getOrCreate(conf)
...
val sc2 = SparkContext.getOrCreate(conf)
When used in test suites both methods can be used to achieve different things:
stop - stopping context in afterAll method (see for example MLlibTestSparkContext.afterAll)
getOrCreate - to get active instance in individual test cases (see for example QuantileDiscretizerSuite)

Spark Streaming StreamingContext error

Hi i am started spark streaming learning but i can't run an simple application
My code is here
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val conf = new SparkConf().setMaster("spark://beyhan:7077").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
And i am getting error like as the following
scala> val newscc = new StreamingContext(conf, Seconds(1))
15/10/21 13:41:18 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
Thanks
If you are using spark-shell, and it seems like you do, you should not instantiate StreamingContext using SparkConf object, you should pass shell-provided sc directly.
This means:
val conf = new SparkConf().setMaster("spark://beyhan:7077").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
becomes,
val ssc = new StreamingContext(sc, Seconds(1))
It looks like you work in the Spark Shell.
There is already a SparkContext defined for you there, so you don't need to create a new one. The SparkContext in the shell is available as sc
If you need a StreamingContext you can create one using the existing SparkContext:
val ssc = new StreamingContext(sc, Seconds(1))
You only need the SparkConf and SparkContext if you create an application.