cannot resolve symbol sqlcontext in Spark - scala

my SPARK version is spark--version : 2.3.2.
while importing
import sqlContext.implicits
i am getting error :
cannot resolve symbol sqlcontext
i am using Intellij and scala
Scala version 2.11.8
Kindly share your thoughts.

The import you're trying will not work because the object is defined within the class SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
Take a look at Why import implicit SqlContext.implicits._ after initializing SQLContext in a scala spark application
Hope this helps!

You have to create the sqlContext object first from the SQLContext, see https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SQLContext.implicits$.html. Otherwise, as it says, it doesn't know about such an objectl

Related

How to initialise SparkSession in Spark 3.x

I've been trying to learn Spark & Scala, and have an environment setup in IntelliJ.
I'd previously been using SparkContext to initialise my Spark instance successfully, using the following code:
import org.apache.spark._
val sc = new SparkContext("local[*]", "SparkTest")
When I tried to start loading .csv data in, most information I found used spark.read.format("csv").load("filename.csv") but this requires initialising a SparkSession object using:
val spark = SparkSession
.master("local")
.builder()
.appName("Test")
.getOrCreate()
But when I tried to use this, there doesn't seem to be any SparkSession in org.apache.spark._ in my version of Spark 3.x.
As far as I'm aware, the use of SparkContext is the Spark 1.x method, and SparkSession is Spark 2.x where spark.sql is built-in to the SparkSession object.
My question is whether I'm incorrectly trying to load SparkSession or if there's a separate way to approach initialising Spark (and loading .csv files) in Spark 3?
Spark version: 3.3.0
Scala version: 2.13.8
If you are using Maven type project then try adding dependencies to the POM file. Otherwise, for the sake of troubleshooting, create a new Maven type project, add dependencies and check whether you are still having same issue.

not found: type SparkContext || object apache is not a member of package org

I am trying to write one simple program in Scala but when I use SparkContext in Intellij this is throwing an error. Can someone give me any solution?
Scala 3.1.1
Spark version 3.2.1
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object Wordcount extends App {
val sc = new SparkContext("Local[*]","wordcount")
}

error not found value spark import spark.implicits._ import spark.sql

I am using hadoop 2.7.2 , hbase 1.4.9, spark 2.2.0, scala 2.11.8 and java 1.8 on a hadoop cluster which is composed of one master and two slave.
when I run spark-shell after starting the cluster , it works fine.
I am trying to connect to hbase using scala by following this tutorial : [https://www.youtube.com/watch?v=gGwB0kCcdu0][1] .
But when I try like he does to run the spark-shell by adding those jars like argument I have this error:
spark-shell --jars
"hbase-annotations-1.4.9.jar,hbase-common-1.4.9.jar,hbase-protocol-1.4.9.jar,htrace-core-3.1.0-incubating.jar,zookeeper-3.4.6.jar,hbase-client-1.4.9.jar,hbase-hadoop2-compat-1.4.9.jar,metrics-json-3.1.2.jar,hbase-server-1.4.9.jar"
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
and after that even I log out and run spark-shell another time I have the same issue.
Can any one tell me please what is the cause and how to fix it .
In your import statement spark should be an object of type SparkSession. That object should have been created previously for you. Or you need to create it yourself (read spark docs). I didn't watch your tutorial video.
The point is it doesn't have to be called spark. It could be for instance called sparkSession and then you can do import sparkSession.implicits._

Apache Spark and Scala required jars

I am new to Scala.
Can any one suggest me what are the jar files required for running Apache Spark with Scala in Linux environment.Below code was a piece of original code. I am getting exceptions like java.lang.NoSuchMethodError: org.jboss.netty.channel.socket.nio.NioWorkerPool.(Ljava/util/concurrent/Executor;I)V
java -cp ".:/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p1876.1944/jars/:./"
TestAll.scala
import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import java.io._
import java.sql.{Connection,DriverManager}
import scala.collection._
import scala.collection.mutable.MutableList
object TestAll {
def main(args: Array[String]) {
val conf =new SparkConf().setAppName("Testing App").setMaster("local")
val sc=new SparkContext(conf)
println("Hello, world!")
}
}
You need to download Spark from here. Choose the "Pre-built with Hadoop" option. Then you can follow the directions of the Quick Start. This will get you through the Hello World. I am not sure which IDE you are using, but the most friendly for Scala is Intellij IDEA

Spark - "sbt package" - "value $ is not a member of StringContext" - Missing Scala plugin?

When running "sbt package" from the command line for a small Spark Scala application, I'm getting the "value $ is not a member of StringContext" compilation error on the following line of code:
val joined = ordered.join(empLogins, $"login" === $"username", "inner")
.orderBy($"count".desc)
.select("login", "count")
Intellij 13.1 is giving me the same error message. The same .scala source code gets compiled without any issue in Eclipse 4.4.2. And also it works well with maven in a separate maven project from the command line.
It looks like sbt doesn't recognize the $ sign because I'm missing some plugin in my project/plugins.sbt file or some setting in my build.sbt file.
Are you familiar with this issue? Any pointers will be appreciated. I can provide build.sbt and/or project/plugins.sbt if needed be.
You need to make sure you import sqlContext.implicits._
This gets you implicit class StringToColumn extends AnyRef
Which is commented as:
Converts $"col name" into an Column.
In Spark 2.0+
$-notation for columns can be used by importing implicit on SparkSession object (spark)
val spark = org.apache.spark.sql.SparkSession.builder
.master("local")
.appName("App name")
.getOrCreate;
import spark.implicits._
then your code with $ notation
val joined = ordered.join(empLogins, $"login" === $"username", "inner")
.orderBy($"count".desc)
.select("login", "count")
Great answer guys, if resolving import is a concern, then will this work
import org.apache.spark.sql.{SparkSession, SQLContext}
val ss = SparkSession.builder().appName("test").getOrCreate()
val dataDf = ...
import ss.sqlContext.implicits._
dataDf.filter(not($"column_name1" === "condition"))