Passing sqlContext as an implicit parameter in Spark - scala

I have a function in a Scala object which has the following signature
def f(v1:Int)(implicit sqlContext: SQLContext)
when I try to call this function from spark-shell I call it like
f(1)
and I expect the existing sqlContext gets passed to it implicitly but it doesn't. How can I make it to work so the sqlContext get passed to this function automatically?
--------------update-------------------
I tried to import sqlContext.implicits._ in the spark-shell before calling my function but it didn't help

You just need to add a SQLContext implicitly to the same context which you are calling your function:
implicit val sqlContext = new SQLContext() // just an example
// and then
f(1)
If you are using apacha Spark, you can use this import:
import sqlContext.implicits._

Related

Passing sparkSession to function in Scala - Spark 2.1

I'm migrating from Spark 1.6 version to 2.1 and looking at the below
mentioned function in Scala, trying to figure out, how do we pass
SparkSession variable to the function instead of sqlContext -
private def readHiveTable(sqlContext: HiveContext, hiveTableNm: String,
hiveWorkDb: String): DataFrame
I mean, would sqlContext change to sparkSession ? what is the right way to pass this variable in Spark 2.1 ? may be something like -
private def readHiveTable(spark: sparkSession, hiveTableNm: String,
hiveWorkDb: String): DataFrame
UPDATE
Resolved it by passing -
spark: SparkSession instead of sqlContext: HiveContext

How to pass HiveContext as an argument from one function to another function using spark scala

I have a sceneria where I need to pass HiveContext as an argument to another function. Below is my code and where I am stuck with issue:
Object Sample {
def main(args:Array[String]){
val fileName = "SampleFile.txt"
val conf = new SparkConf().setMaster("local").setAppName("LoadToHivePart")
conf.set("spark.ui.port","4041")
val sc=new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
hc.setConf("hive.metastore.uris","thrift://127.0.0.1:9083")
test(hc,fileName)
sc.stop()
}
def test(hc:String, fileName: String){
//code.....
}
}
As per above code I am unable to pass a HiveContext variable "hc" from main to another function. Also tried with:
def test(hc:HiveContext, fileName:String){}
but it is showing error for both.
def test(hc:HiveContext, fileName: String){
//code.....
}
Note: Hive Context available in org.apache.spark.sql.hive.HiveContext
so import it using import org.apache.spark.sql.hive.HiveContext

How to convert RDD of custom Java class objects to a DataFrame with toDF()?

I am trying to convert a Spark RDD to a Spark SQL dataframe with toDF(). I have used this function successfully many times, but in this case I'm getting a compiler error:
error: value toDF is not a member of org.apache.spark.rdd.RDD[com.example.protobuf.SensorData]
Here is my code below:
// SensorData is an auto-generated class
import com.example.protobuf.SensorData
def loadSensorDataToRdd : RDD[SensorData] = ???
object MyApplication {
def main(argv: Array[String]): Unit = {
val conf = new SparkConf()
conf.setAppName("My application")
conf.set("io.compression.codecs", "com.hadoop.compression.lzo.LzopCodec")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val sensorDataRdd = loadSensorDataToRdd()
val sensorDataDf = sensorDataRdd.toDF() // <-- CAUSES COMPILER ERROR
}
}
I am guessing that the problem is with the SensorData class, which is a Java class that was auto-generated from a Protocol Buffer. What can I do in order to convert the RDD to a dataframe?
The reason for the compilation error is that there's no Encoder in scope to convert a RDD with com.example.protobuf.SensorData to a Dataset of com.example.protobuf.SensorData.
Encoders (ExpressionEncoders to be exact) are used to convert InternalRow objects into JVM objects according to the schema (usually a case class or a Java bean).
There's a hope you can create an Encoder for the custom Java class using org.apache.spark.sql.Encoders object's bean method.
Creates an encoder for Java Bean of type T.
Something like the following:
import org.apache.spark.sql.Encoders
implicit val SensorDataEncoder = Encoders.bean(classOf[com.example.protobuf.SensorData])
If SensorData uses unsupported types you'll have to map the RDD[SensorData] to an RDD of some simpler type(s), e.g. a tuple of the fields, and only then expect toDF work.

44: error: value read is not a member of object org.apache.spark.sql.SQLContext

I am using Spark 1.6.1, and Scala 2.10.5. I am trying to read the csv file through com.databricks.
While launching the spark-shell, I use below lines as well
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0 --driver-class-path path to/sqljdbc4.jar, and below is the whole code
import java.util.Properties
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
val conf = new SparkConf().setAppName("test").setMaster("local").set("spark.driver.allowMultipleContexts", "true");
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = SQLContext.read().format("com.databricks.spark.csv").option("inferScheme","true").option("header","true").load("path_to/data.csv");
I am getting below error:-
error: value read is not a member of object org.apache.spark.sql.SQLContext,
and the "^" is pointing toward "SQLContext.read().format" in the error message.
I did try the suggestions available in stackoverflow, as well as other sites as well. but nothing seems to be working.
SQLContext means object access - static methods in class.
You should use sqlContext variable, as methods are not static, but are in class
So code should be:
val df = sqlContext.read.format("com.databricks.spark.csv").option("inferScheme","true").option("header","true").load("path_to/data.csv");

How to share SparkContext with methods that need it implicitly

I have the following method:
def loadData(a:String, b:String)(implicit sparkContext: SparkContext) : RDD[Result]
I am trying to test it using this SharedSparkContext: https://github.com/holdenk/spark-testing-base/wiki/SharedSparkContext.
So, I made my test class extend SharedSparkContext:
class Ingest$Test extends FunSuite with SharedSparkContext
And within the test method I made this call:
val res: RDD[Result] = loadData("x", "y")
However, I am getting this error:
Error:(113, 64) could not find implicit value for parameter sparkContext: org.apache.spark.SparkContext
val result: RDD[Result] = loadData("x", "y")
So how can I make the SparkContext from the testing method visible?
EDIT:
I don't see how the question is related with Understanding implicit in Scala
What is the variable name of your Spark Context? If it is 'sc' as is typically the case, you will have to alias it to the variable name the method is looking for via implicit val sparkContext = sc and then proceed to call your method in the same environment