Facing Ambiguous Reference to SQL Error in Spark Scala - scala

I am facing below Error in my Spark scala code ...
error: reference to sql is ambiguous;
it is imported twice in the same scope by
import org.apache.spark._
and import sqlContext.{sql, table}
Below are the API's i am trying to import.
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Row
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.storage.StorageLevel
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions.spark_partition_id
import org.apache.spark.sql.functions.{trim,ltrim,rtrim,col}
Failure is at where i create temporary table or view and write SQL on top of it...
createOrReplaceTempView
Thanks,
Naveen

Where do you import the sqlContext? I don't see
import sqlContext.{sql, table}
in case you want to avoid naming conflict you can always do:
org.apache.spark.{sql => whatever_you_want_to_name}
but I doubt if you are importing the right package since normally sqlContext.sql is rarely used if you are using org.apache.spark.sql

Related

SQL alchemy split project into different files

I am implementing simple database, right now my goal is to split project into different modules, as it should be done. So far, my project consists of bunch of files:
base.py
from sqlalchemy.orm import declarative_base
Base = declarative_base()
file implementing tables
classes.py
from sqlalchemy import Column, Integer, String, Date, ForeignKey
from sqlalchemy.orm import relationship, validates
from base import Base
class Book(Base):
...
class Owner(Base):
...
creation of engine, session, etc
database.py
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from datetime import date
from base import Base
engine = create_engine("sqlite+pysqlite:///mydb.db")
session = Session(engine)
from operations import *
import parser_file
Base.metadata.create_all(engine)
if __name__ == '__main__':
parser_file.main()
Here I import session from database.py
operations.py
from classes import Book, Owner
from database import session
def add_book(title, author, year_published=None):
...
# and many more
parser_file.py
import argparse
from datetime import date
from operations import *
def main():
...
I am not sure about the imports. operations.py, parser_file.py and database.py all import from each other. It used to throw error, but i moved from operations import * and import parser_file after creation of Session. It feels sketchy having imports in the middle of the code as I am used to imports on top of file, and on some posts there are people mentioning that modules should not depend on each other. On the other hand, the code is now nicely split and it feels better this way. What is the correct way to handle these?
Edit: From PEP8 guide on imports
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants
So it seems like what I did is considered bad.

Error while trying to import Array._ and import org.apache.spark.sql.functions._ in Spark Scala [duplicate]

This question already has an answer here:
Ambiguous imports in Scala
(1 answer)
Closed 3 years ago.
I was running below code,
import Array._
import org.apache.spark.sql.functions._
df.withColumn(name, concat(substring(col(name),1,4),substring(col(name),6,2), substring(col(name),9,2) ))
and getting an import error,
Error:(188, 26) reference to concat is ambiguous;
it is imported twice in the same scope by
import Array._
and import org.apache.spark.sql.functions._
df.withColumn(name, concat(substring(col(name),1,4),substring(col(name),6,2), substring(col(name),9,2) ))
How can I overcome this ? I need to use both the imports.
Scala Array class contains method concat
and spark sql object org.apache.spark.sql.functions contains method concat
If you need to import both concats use alias:
import Array.{concat => concatArray}

Apache Spark: Can't resolve constructor StreamingContext

I've been trying to establish a StreamingContext in my program but I can't for the life of me figure out what's going on. I added the spark-streaming jar file to the dependencies and imported it in the code but I can't help feeling like I'm missing some small detail somewhere. How should I proceed?
picture of code
You forgot to import StreamingContext in your case.
Use
import org.apache.spark.streaming.StreamingContext
Not
import org.apache.spark.streaming.StreamingContext._
It will import inner objects not the class.

Which import of package is better? [duplicate]

This question already has answers here:
Why is using a wild card with a Java import statement bad?
(18 answers)
Closed 5 years ago.
import java.util.ArrayList;
import java.util.Collections;
...
or
import java.util.*;
Is there any execute time difference according to usage ?
Which one should I prefer to use?
If you use
import java.util.ArrayList;
import java.util.Collections;
or
import java.util.*;
They both result to same byte code after compilation.
There is no execution time difference. But you should prefer the first option as it will helps you if two or more packages have same Class File name.
for example if java.xyz and java.abc, both packages have class Sample and if you import both packages directly in your class then compiler will raise error and asks to resolve the ambiguity

Dataframe methods within SBT project

I have the following code that works on the spark-shell
df1.withColumn("tags_splitted", split($"tags", ",")).withColumn("tag_exploded", explode($"tags_splitted")).select("id", "tag_exploded").show()
But fails in sbt with the following errors:
not found: value split
not found: value explode
My scala code has the following
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Books").getOrCreate()
import spark.implicits._
Can someone give me a pointer to what is wrong in the sbt enviroment?
Thanks
The split and explode function are available in the package org.apache.spark.sql inside functions.
So you need to import both
org.apache.spark.sql.functions.split
org.apache.spark.sql.functions.explode
Or
org.apache.spark.sql.functions._
Hope this helps!