value lookup is not a member of org.apache.spark.rdd.RDD[(String, String)] - scala

I have got a problem when I tired to compile my scala program with SBT.
I have import the class I need .Here is part of my code.
import java.io.File
import java.io.FileWriter
import java.io.PrintWriter
import java.io.IOException
import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.rdd.PairRDDFunctions
import scala.util.Random
......
val data=sc.textFile(path)
val kv=data.map{s=>
val a=s.split(",")
(a(0),a(1))
}.cache()
kv.first()
val start=System.currentTimeMillis()
for(tg<-target){
kv.lookup(tg.toString)
}
The error detail is :
value lookup is not a member of org.apache.spark.rdd.RDD[(String, String)]
[error] kv.lookup(tg.toString)
What confused me is I have import import org.apache.spark.rdd.PairRDDFunctions,
but it doesn't work . And when I run this in Spark-shell ,it runs well.

import org.apache.spark.SparkContext._
to have access to the implicits that let you use PairRDDFunctions on a RDD of type (K,V).
There's no need to directly import PairRDDFunctions

Related

Packaging scala class on databricks (error: not found: value dbutils)

Trying to make a package with a class
package x.y.Log
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.{DataFrame}
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import spark.implicits._
class Log{
...
}
Everything runs fine on same notebook, but once I try to create package that I could use in other notebooks I get errors:
<notebook>:11: error: not found: object spark
import spark.implicits._
^
<notebook>:21: error: not found: value dbutils
val notebookPath = dbutils.notebook.getContext().notebookPath.get
^
<notebook>:22: error: not found: value dbutils
val userName = dbutils.notebook.getContext.tags("user")
^
<notebook>:23: error: not found: value dbutils
val userId = dbutils.notebook.getContext.tags("userId")
^
<notebook>:41: error: not found: value spark
var rawMeta = spark.read.format("json").option("multiLine", true).load("/FileStore/tables/xxx.json")
^
<notebook>:42: error: value $ is not a member of StringContext
.filter($"Name".isin(readSources))
Anyone knows how to package this class with these libs?
Assuming you are running Spark 2.x, the statement import spark.implicits._ only works when you have SparkSession object in the scope. The object Implicits is defined inside the SparkSession object. This object extends the SQLImplicits from previous verisons of spark Link to SparkSession code on Github. You can check the link to verify
package x.y.Log
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
class Log{
val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
import spark.implicits._
...[rest of the code below]
}

value na is not a member of?

hello i just started to learn scala.
and just follow the tutorial in udemy.
i was followed the same code but give me an error.
i have no idea about that error.
and this my code
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.sql.SparkSession
import org.apache.log4j._
import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel}
import org.apache.spark.ml.feature.Word2Vec
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession.builder().getOrCreate()
val data = spark.read.option("header","true").
option("inferSchema","true").
option("delimiter","\t").
format("csv").
load("dataset.tsv").
withColumn("subject", split($"subject", " "))
val logRegDataAll = (data.select(data("label")).as("label"),$"subject")
val logRegData = logRegDataAll.na.drop()
and give me error like this
scala> :load LogisticRegression.scala
Loading LogisticRegression.scala...
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.sql.SparkSession
import org.apache.log4j._
import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel}
import org.apache.spark.ml.feature.Word2Vec
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession#1efcba00
data: org.apache.spark.sql.DataFrame = [label: string, subject: array<string>]
logRegDataAll: (org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], org.apache.spark.sql.ColumnName) = ([label: string],subject)
<console>:43: error: value na is not a member of (org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], org.apache.spark.sql.ColumnName)
val logRegData = logRegDataAll.na.drop()
^
thanks for helping
You can see clearly
val logRegDataAll = (data.select(data("label")).as("label"),$"subject")
This returns
(org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], org.apache.spark.sql.ColumnName)
So there is an extra parantheses ) data("label")) which should be data.select(data("label").as("label"),$"subject") in actual.

Error when parsing a line from the data into the class. Spark Mllib

I've this code implemented:
scala> import org.apache.spark._
scala> import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD
scala> import org.apache.spark.util.IntParam
import org.apache.spark.util.IntParam
scala> import org.apache.spark.graphx._
import org.apache.spark.graphx._
scala> import org.apache.spark.graphx.util.GraphGenerators
import org.apache.spark.graphx.util.GraphGenerators
scala> case class Transactions(ID:Long,Chain:Int,Dept:Int,Category:Int,Company:Long,Brand:Long,Date:String,ProductSize:Int,ProductMeasure:String,PurchaseQuantity:Int,PurchaseAmount:Double)
defined class Transactions
When I try to run this:
def parseTransactions(str:String): Transactions = {
| val line = str.split(",")
| Transactions(line(0),line(1),line(2),line(3),line(4),line(5),line(6),line(7),line(8),line(9),line(10))
| }
I am obtaining this error: :38: error: type mismatch;
found : String
required: Long
Anyone knows why I'm getting this error? I am doing a social netowork analysis over the schema that I put above.
Many thanks!
You are creating array from "," separated values which returns String array. Cast it to appropriate type before assigning to case class arguments.
val line = str.split(",")
line(0).toLong

org.apache.spark.ml.feature.IDF error

As mentioned in http://spark.apache.org/docs/latest/ml-features.html
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
Spark displays
scala> import org.apache.spark.ml.feature.IDF
<console>:13: error: object IDF is not a member of package org.apache.spark.ml.feature
import org.apache.spark.ml.feature.IDF
Whereas, import org.apache.spark.mllib.feature.IDF works fine.
Any reasons for the error. I am new to spark and scala.
The reason for the error is that the feature.IDF class was introduced into spark-ml with spark 1.4. Thus the object IDF is not a member of package org.apache.spark.ml.feature error.
You can try to use the spark-mllib IDF class instead.
This is not reproducible in spark-1.4.1. Which version are you using?
scala> import org.apache.spark.ml.feature.IDF
import org.apache.spark.ml.feature.IDF
scala> import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
EDIT1
Spark 1.2.x contains only: org.apache.spark.mllib.feature.IDF
Try searching for IDF here: https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.mllib.feature.IDF

Compiled Querys in Slick

I need to compile a query in Slick with Play and PostgreSQL
val bioMaterialTypes: TableQuery[Tables.BioMaterialType] = Tables.BioMaterialType
def getAllBmts() = for{ bmt <- bioMaterialTypes } yield bmt
val queryCompiled = Compiled(getAllBmts _)
but in Scala IDE I get this error in the Apply of Compiled
Multiple markers at this line
- Computation of type () => scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq]
cannot be compiled (as type C)
- not enough arguments for method apply: (implicit compilable: scala.slick.lifted.Compilable[() =>
scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq],C], implicit driver:
scala.slick.profile.BasicProfile)C in object Compiled. Unspecified value parameters compilable, driver.
This are my imports:
import scala.concurrent.Future
import scala.slick.jdbc.StaticQuery.staticQueryToInvoker
import scala.slick.lifted.Compiled
import scala.slick.driver.PostgresDriver
import javax.inject.Inject
import javax.inject.Singleton
import models.BioMaterialType
import models.Tables
import play.api.Application
import play.api.db.slick.Config.driver.simple.TableQuery
import play.api.db.slick.Config.driver.simple.columnExtensionMethods
import play.api.db.slick.Config.driver.simple.longColumnType
import play.api.db.slick.Config.driver.simple.queryToAppliedQueryInvoker
import play.api.db.slick.Config.driver.simple.queryToInsertInvoker
import play.api.db.slick.Config.driver.simple.stringColumnExtensionMethods
import play.api.db.slick.Config.driver.simple.stringColumnType
import play.api.db.slick.Config.driver.simple.valueToConstColumn
import play.api.db.slick.DB
import play.api.db.slick.DBAction
You can simply do
val queryCompiled = Compiled(bioMaterialTypes)