How to add the "--deploy-mode cluster" option to my scala code - scala

209/5000
Hello
I want to add the option "--deploy-mode cluster" to my code scala:
val sparkConf = new SparkConfig ().setMaster ("spark: //192.168.60.80:7077")
Without using the shell (the command. \ Spark-submit)
i whant to usage the " spark.submit.deployMode " in scala

with SparkConfig:
//set up the spark configuration and create contexts
val sparkConf = new SparkConf().setAppName("SparkApp").setMaster("spark: //192.168.60.80:7077")
val sc = new SparkContext(sparkConf).set("spark.submit.deployMode", "cluster")
with SparkSession:
val spark = SparkSession
.builder()
.appName("SparkApp")
.master("spark: //192.168.60.80:7077")
.config("spark.submit.deployMode","cluster")
.enableHiveSupport()
.getOrCreate()

You can use
val sparkConf = new SparkConf ().setMaster ("spark: //192.168.60.80:7077").set("spark.submit.deployMode","cluster")

Related

Adding Mongo config to active spark session

I am trying to add the configuraions to an active spark session. Below is my code
val spark = SparkSession.getActiveSession.get
spark.conf.set("spark.mongodb.input.uri",
"mongodb://hello_admin:hello123#localhost:27017/testdb.products?authSource=admin")
spark.conf.set("spark.mongodb.input.partitioner" ,"MongoPaginateBySizePartitioner")
import com.mongodb.spark._
val customRdd = MongoSpark.load(sc)
println(customRdd.count())
println(customRdd.first.toJson)
println(customRdd.collect().foreach(println))
But I am getting an error:
java.lang.IllegalArgumentException: Missing database name. Set via the
'spark.mongodb.input.uri' or 'spark.mongodb.input.database' property
While when I write the code
val spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.input.uri", "mongodb://hello_admin:hello123#localhost:27017/testdb.products?authSource=admin")
// .config("spark.mongodb.output.uri", "mongodb://hello_admin:hello123#localhost:27017/testdb.products?authSource=admin")
.config("spark.mongodb.input.partitioner" ,"MongoPaginateBySizePartitioner")
.getOrCreate()
val sc = spark.sparkContext
val customRdd = MongoSpark.load(sc)
println(customRdd.count())
println(customRdd.first.toJson)
println(customRdd.collect().foreach(println))
My code is excecuting fine.
Kindly let me know what changes i need in the first code
You can define sparkSession like this with SparkConf. ( i don't know if this helps you )
def sparkSession(conf: SparkConf): SparkSession = SparkSession
.builder()
.config(conf)
.getOrCreate()
val sparkConf = new SparkConf()
sparkConf.set("prop","value")
val ss = sparkSession(sparkConf)
Or you can try to use SparkEnv ( i'm using sparkEnv for a lot of things to change props ):
SparkEnv.get.conf.set("prop", "value")

How to config gcs-connector in local environment properly

I'm trying to config gcs-connector in my scala project but I always get java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
Here is my project config:
val sparkConf = new SparkConf()
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "2")
.set("spark.driver.memory", "4g")
.set("temporaryGcsBucket", "some-bucket")
val spark = SparkSession.builder()
.config(sparkConf)
.master("spark://spark-master:7077")
.getOrCreate()
val hadoopConfig = spark.sparkContext.hadoopConfiguration
hadoopConfig.set("fs.gs.auth.service.account.enable", "true")
hadoopConfig.set("fs.gs.auth.service.account.json.keyfile", "./path-to-key-file.json")
hadoopConfig.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
hadoopConfig.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
I tried to set gcs-connector using both:
.set("spark.jars.packages", "com.google.cloud.bigdataoss:gcs-connector:hadoop2-2.1.6")
.set("spark.driver.extraClassPath", ":/home/celsomarques/Desktop/gcs-connector-hadoop2-2.1.6.jar")
But neither of them load the specified class to classpath.
Could you point me what I'm doing wrong, please?
The following config worked:
val sparkConf = new SparkConf()
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "2")
.set("spark.driver.memory", "4g")
val spark = SparkSession.builder()
.config(sparkConf)
.master("local")
.getOrCreate()

How to create SparkSession from existing SparkContext

I have a Spark application which using Spark 2.0 new API with SparkSession.
I am building this application on top of the another application which is using SparkContext. I would like to pass SparkContext to my application and initialize SparkSession using existing SparkContext.
However I could not find a way how to do that. I found that SparkSession constructor with SparkContext is private so I can't initialize it in that way and builder does not offer any setSparkContext method. Do you think there exist some workaround?
Deriving the SparkSession object out of SparkContext or even SparkConf is easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4 but this should work in the older 2.x releases as well):
// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
Hope that helps!
Like in the above example you cannot create because SparkSession's constructor is private
Instead you can create a SQLContext using the SparkContext, and later get the sparksession from the sqlcontext like this
val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession
Hope this helps
Apparently there is no way how to initialize SparkSession from existing SparkContext.
public JavaSparkContext getSparkContext()
{
SparkConf conf = new SparkConf()
.setAppName("appName")
.setMaster("local[*]");
JavaSparkContext jsc = new JavaSparkContext(conf);
return jsc;
}
public SparkSession getSparkSession()
{
sparkSession= new SparkSession(getSparkContext().sc());
return sparkSession;
}
you can also try using builder
public SparkSession getSparkSession()
{
SparkConf conf = new SparkConf()
.setAppName("appName")
.setMaster("local");
SparkSession sparkSession = SparkSession
.builder()
.config(conf)
.getOrCreate();
return sparkSession;
}
val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()

How to load data from Cassandra table

I am working on Spark version: 2.0.1 and Cassandra 3.9. I want to read data from a table in cassandra by CassandraSQLContext. However, Spark 2.0 was changed and using sparkSession. I am trying to use sparkSession and I am lucky, the following is my code.
Could you please review and give your advice?
def main(args: Array[String], date_filter: String): Unit = {
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val sc = new SparkContext(conf)
val sparkSession = SparkSession.builder
.master("local")
.appName("my-spark-app")
.config(conf)
.getOrCreate()
import sparkSession.implicits._
import org.apache.spark.sql._
val rdd = sparkSession
.read
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "users", "keyspace" -> "monita"))
.load()
println("count: " +rdd.count())
}
Your code looks ok. You don't need to create SC. You can set Cassandra connection properties in config like below.
val sparkSession = SparkSession
.builder
.master("local")
.appName("my-spark-app")
.config("spark.cassandra.connection.host", "127.0.0.1")
.config("spark.cassandra.connection.port", "9042")
.getOrCreate()

Importing Spark libraries using Intellij IDEA

I would like to use spark SQL in an Intellij IDEA SBT project.
Even though I have imported the library the code does not seem to import it.
Spark Core seems to be working however.
You can't create a DataFrame from a scala List[A]. You need first to create an RDD[A], and then transform that to a DataFrame. You also need an SQLContext:
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val test = sc.parallelize(List(1,2,3,4)).toDF
For reference this is how the Spark 2.0 boilerplate with spark sql should look like:
import org.apache.spark.sql.SparkSession
object Test {
def main(args: Array[String]) {
val spark = SparkSession.builder()
.master("local")
.appName("some name")
.getOrCreate()
import spark.sqlContext.implicits._
}
}