Scala ignores imported members - scala

I have the following code snippet:
package org.test.test.datahelper
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
class WeatherHelper(sparkSession: SparkSession, weather: DataFrame) {
def prepareRRRColumn: DataFrame = {
import org.apache.spark.sql.functions
weather.withColumn("Year", year(col("DateTime")))
weather
}
}
The problem is that Scala (or probably IntelliJ IDEA) does not see method year just like col (Cannot resolve symbol year and col respectfully), despite the necessary import is just one line above (however, it doesn't work even if the import is global). Following to the source code of org.apache.spark.sql.functions I found the following lines:
def col(colName : scala.Predef.String) : org.apache.spark.sql.Column = { /* compiled code */ }
def year(e : org.apache.spark.sql.Column) : org.apache.spark.sql.Column = { /* compiled code */ }
i.e. both methods do exist. What am I doing wrong?

This is more of a scala import syntax issue.
To import the methods(col,year) within the class/package function you have to use.
import org.apache.spark.sql.functions._
// Or import only specific functions
import org.apache.spark.sql.functions.{col, year}
Instead of
import org.apache.spark.sql.functions

Related

how to fix Scala error with "Not found type"

I'm newbie in Scala, just trying to learn it in Spark. Now I'm writing a Scala app to load csv file from hadoop into dataframe, then I want to add a new column in that dataframe. There is a function to populate the content of that new column, for testing the function just uppercase the column from csv file, the csv file only contains one column: emp_id and it's string.. the function is defined in Object TestService. My IDE is Eclipse. Now I have error: not found: type TestService
Very appreciate if anyone can help me.
\\This is the main:
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.functions._
import com.poc.spark.service.TestService;
object SparkIntTest {
def main(args:Array[String]){
sys.props.+=(("hadoop.home.dir","C:\\OpenSource\\Hadoop"))
val sparkConf = new SparkConf().setMaster("local").setAppName("employee").set("spark.testing.memory", "2147480000")
val sparkContext = new SparkContext(sparkConf)
val spark = SparkSession.builder().appName("employee").getOrCreate()
val df = spark.read.option("header", "true").csv(".\\src\\main\\resources\\employee.csv")
df.show();
println(df.schema);
val df_Applied = df.withColumn("award_rule",runAllRulesUDF(df("emp_id")))
df_Applied.show();
println(df_Applied.schema)
}
def runAllRulesUDF = udf(new TestService().runAllRulesForUDF(_:String))
}
Here is the Object TestService:
package com.poc.spark.service
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.functions._
object TestService {
def runAllRulesForUDF(empid: String): String = {
empid.toUpperCase();
}
}
TestService is an object, which means that it is a statically created singleton. So instead of
new TestService()
You can just say
TestService

scala import library wildcard

I am new to scala. Please be gentle.
The import below imports everything (every class, trait and object) under ml.
import org.apache.spark.ml._
but NOT ParamMap, which is under
import org.apache.spark.ml.param._
In other words, for the code below, if I do:
import org.apache.spark.ml.param._
import org.apache.spark.ml._
class Kmeans extends Transformer {
def copy(extra: ParamMap): Unit = {
defaultCopy(extra)
}}
Then I have no import errors, but if I comment import org.apache.spark.ml.param._:
//import org.apache.spark.ml.param._
import org.apache.spark.ml._
class Kmeans extends Transformer {
def copy(extra: ParamMap): Unit = {
defaultCopy(extra)
}}
It gives an import error on ParamMap.
Question
why isn't this import org.apache.spark.ml.param.ParamMap included import org.apache.spark.ml.param._
Scala imports are not recursive - import org.apache.spark.ml._ means import all classes and fields directly under ml package but not the ones under its sub-packages.
Since ParamMap is under one of ml's sub-packages (ml.param), you'll have to import that package or ParamMap class directly.

Error when parsing a line from the data into the class. Spark Mllib

I've this code implemented:
scala> import org.apache.spark._
scala> import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD
scala> import org.apache.spark.util.IntParam
import org.apache.spark.util.IntParam
scala> import org.apache.spark.graphx._
import org.apache.spark.graphx._
scala> import org.apache.spark.graphx.util.GraphGenerators
import org.apache.spark.graphx.util.GraphGenerators
scala> case class Transactions(ID:Long,Chain:Int,Dept:Int,Category:Int,Company:Long,Brand:Long,Date:String,ProductSize:Int,ProductMeasure:String,PurchaseQuantity:Int,PurchaseAmount:Double)
defined class Transactions
When I try to run this:
def parseTransactions(str:String): Transactions = {
| val line = str.split(",")
| Transactions(line(0),line(1),line(2),line(3),line(4),line(5),line(6),line(7),line(8),line(9),line(10))
| }
I am obtaining this error: :38: error: type mismatch;
found : String
required: Long
Anyone knows why I'm getting this error? I am doing a social netowork analysis over the schema that I put above.
Many thanks!
You are creating array from "," separated values which returns String array. Cast it to appropriate type before assigning to case class arguments.
val line = str.split(",")
line(0).toLong

value lookup is not a member of org.apache.spark.rdd.RDD[(String, String)]

I have got a problem when I tired to compile my scala program with SBT.
I have import the class I need .Here is part of my code.
import java.io.File
import java.io.FileWriter
import java.io.PrintWriter
import java.io.IOException
import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.rdd.PairRDDFunctions
import scala.util.Random
......
val data=sc.textFile(path)
val kv=data.map{s=>
val a=s.split(",")
(a(0),a(1))
}.cache()
kv.first()
val start=System.currentTimeMillis()
for(tg<-target){
kv.lookup(tg.toString)
}
The error detail is :
value lookup is not a member of org.apache.spark.rdd.RDD[(String, String)]
[error] kv.lookup(tg.toString)
What confused me is I have import import org.apache.spark.rdd.PairRDDFunctions,
but it doesn't work . And when I run this in Spark-shell ,it runs well.
import org.apache.spark.SparkContext._
to have access to the implicits that let you use PairRDDFunctions on a RDD of type (K,V).
There's no need to directly import PairRDDFunctions

Compiled Querys in Slick

I need to compile a query in Slick with Play and PostgreSQL
val bioMaterialTypes: TableQuery[Tables.BioMaterialType] = Tables.BioMaterialType
def getAllBmts() = for{ bmt <- bioMaterialTypes } yield bmt
val queryCompiled = Compiled(getAllBmts _)
but in Scala IDE I get this error in the Apply of Compiled
Multiple markers at this line
- Computation of type () => scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq]
cannot be compiled (as type C)
- not enough arguments for method apply: (implicit compilable: scala.slick.lifted.Compilable[() =>
scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq],C], implicit driver:
scala.slick.profile.BasicProfile)C in object Compiled. Unspecified value parameters compilable, driver.
This are my imports:
import scala.concurrent.Future
import scala.slick.jdbc.StaticQuery.staticQueryToInvoker
import scala.slick.lifted.Compiled
import scala.slick.driver.PostgresDriver
import javax.inject.Inject
import javax.inject.Singleton
import models.BioMaterialType
import models.Tables
import play.api.Application
import play.api.db.slick.Config.driver.simple.TableQuery
import play.api.db.slick.Config.driver.simple.columnExtensionMethods
import play.api.db.slick.Config.driver.simple.longColumnType
import play.api.db.slick.Config.driver.simple.queryToAppliedQueryInvoker
import play.api.db.slick.Config.driver.simple.queryToInsertInvoker
import play.api.db.slick.Config.driver.simple.stringColumnExtensionMethods
import play.api.db.slick.Config.driver.simple.stringColumnType
import play.api.db.slick.Config.driver.simple.valueToConstColumn
import play.api.db.slick.DB
import play.api.db.slick.DBAction
You can simply do
val queryCompiled = Compiled(bioMaterialTypes)