org.apache.spark.ml.feature.IDF error - scala

As mentioned in http://spark.apache.org/docs/latest/ml-features.html
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
Spark displays
scala> import org.apache.spark.ml.feature.IDF
<console>:13: error: object IDF is not a member of package org.apache.spark.ml.feature
import org.apache.spark.ml.feature.IDF
Whereas, import org.apache.spark.mllib.feature.IDF works fine.
Any reasons for the error. I am new to spark and scala.

The reason for the error is that the feature.IDF class was introduced into spark-ml with spark 1.4. Thus the object IDF is not a member of package org.apache.spark.ml.feature error.
You can try to use the spark-mllib IDF class instead.

This is not reproducible in spark-1.4.1. Which version are you using?
scala> import org.apache.spark.ml.feature.IDF
import org.apache.spark.ml.feature.IDF
scala> import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
EDIT1
Spark 1.2.x contains only: org.apache.spark.mllib.feature.IDF
Try searching for IDF here: https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.mllib.feature.IDF

Related

Packaging scala class on databricks (error: not found: value dbutils)

Trying to make a package with a class
package x.y.Log
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.{DataFrame}
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import spark.implicits._
class Log{
...
}
Everything runs fine on same notebook, but once I try to create package that I could use in other notebooks I get errors:
<notebook>:11: error: not found: object spark
import spark.implicits._
^
<notebook>:21: error: not found: value dbutils
val notebookPath = dbutils.notebook.getContext().notebookPath.get
^
<notebook>:22: error: not found: value dbutils
val userName = dbutils.notebook.getContext.tags("user")
^
<notebook>:23: error: not found: value dbutils
val userId = dbutils.notebook.getContext.tags("userId")
^
<notebook>:41: error: not found: value spark
var rawMeta = spark.read.format("json").option("multiLine", true).load("/FileStore/tables/xxx.json")
^
<notebook>:42: error: value $ is not a member of StringContext
.filter($"Name".isin(readSources))
Anyone knows how to package this class with these libs?
Assuming you are running Spark 2.x, the statement import spark.implicits._ only works when you have SparkSession object in the scope. The object Implicits is defined inside the SparkSession object. This object extends the SQLImplicits from previous verisons of spark Link to SparkSession code on Github. You can check the link to verify
package x.y.Log
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
class Log{
val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
import spark.implicits._
...[rest of the code below]
}

Spark JSON DStream Print() / saveAsTextFiles not working

Issue Description:
Spark Version: 1.6.2
Execution: Spark-shell (REPL) master = local[2] (tried local[*])
example.json is as below:
{"name":"D2" ,"lovesPandas":"Y"}
{"name":"D3" ,"lovesPandas":"Y"}
{"name":"D4" ,"lovesPandas":"Y"}
{"name":"D5" ,"lovesPandas":"Y"}
Code executing in Spark-shell local mode:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.sql._
import org.json4s._
import org.json4s.jackson.JsonMethods._
import _root_.kafka.serializer.StringDecoder
import _root_.kafka.serializer.Decoder
import _root_.kafka.utils.VerifiableProperties
import org.apache.hadoop.hbase._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapred.TableOutputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapred.JobConf
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
val ssc = new StreamingContext(sc, Seconds(2) )
val messages = ssc.textFileStream("C:\\pdtemp\\test\\example.json")
messages.print()
I tried the saveAsTextFiles but it is not saving any files too.
this does not work -- shows no output -- tried the same with reading stream from Kafka on a spark-shell
tried the following too -- does not work:
messages.foreachRDD(rdd => rdd.foreach(print))
Also, tried parsing the schema converting to dataframe but nothing seems to work
normal json parsing is working and i can print the contents of normal //RDD/DF //to console in Spark-shell
Can anyone help, please?

Why recommendProductsForUsers is not a member of org.apache.spark.mllib.recommendation.MatrixFactorizationModel

i have build recommendations system using Spark with ALS collaboratife filtering mllib
my snippet code :
bestModel.get
.predict(toBePredictedBroadcasted.value)
evrything is ok, but i need change code for fullfilment requirement, i read from scala doc in here
i need to use def recommendProducts
but when i tried in my code :
bestModel.get.recommendProductsForUsers(100)
and error when compile :
value recommendProductsForUsers is not a member of org.apache.spark.mllib.recommendation.MatrixFactorizationModel
[error] bestModel.get.recommendProductsForUsers(100)
maybe anyone can help me
thx
NB : i use Spark 1.5.0
my import :
import com.datastax.spark.connector._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import java.io.File
import scala.io.Source
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.rdd._
import org.apache.spark.mllib.recommendation.{ALS, Rating, MatrixFactorizationModel}
import org.apache.spark.sql.SQLContext
import org.apache.spark.broadcast.Broadcast

value lookup is not a member of org.apache.spark.rdd.RDD[(String, String)]

I have got a problem when I tired to compile my scala program with SBT.
I have import the class I need .Here is part of my code.
import java.io.File
import java.io.FileWriter
import java.io.PrintWriter
import java.io.IOException
import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.rdd.PairRDDFunctions
import scala.util.Random
......
val data=sc.textFile(path)
val kv=data.map{s=>
val a=s.split(",")
(a(0),a(1))
}.cache()
kv.first()
val start=System.currentTimeMillis()
for(tg<-target){
kv.lookup(tg.toString)
}
The error detail is :
value lookup is not a member of org.apache.spark.rdd.RDD[(String, String)]
[error] kv.lookup(tg.toString)
What confused me is I have import import org.apache.spark.rdd.PairRDDFunctions,
but it doesn't work . And when I run this in Spark-shell ,it runs well.
import org.apache.spark.SparkContext._
to have access to the implicits that let you use PairRDDFunctions on a RDD of type (K,V).
There's no need to directly import PairRDDFunctions

Compiled Querys in Slick

I need to compile a query in Slick with Play and PostgreSQL
val bioMaterialTypes: TableQuery[Tables.BioMaterialType] = Tables.BioMaterialType
def getAllBmts() = for{ bmt <- bioMaterialTypes } yield bmt
val queryCompiled = Compiled(getAllBmts _)
but in Scala IDE I get this error in the Apply of Compiled
Multiple markers at this line
- Computation of type () => scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq]
cannot be compiled (as type C)
- not enough arguments for method apply: (implicit compilable: scala.slick.lifted.Compilable[() =>
scala.slick.lifted.Query[models.Tables.BioMaterialType,models.Tables.BioMaterialTypeRow,Seq],C], implicit driver:
scala.slick.profile.BasicProfile)C in object Compiled. Unspecified value parameters compilable, driver.
This are my imports:
import scala.concurrent.Future
import scala.slick.jdbc.StaticQuery.staticQueryToInvoker
import scala.slick.lifted.Compiled
import scala.slick.driver.PostgresDriver
import javax.inject.Inject
import javax.inject.Singleton
import models.BioMaterialType
import models.Tables
import play.api.Application
import play.api.db.slick.Config.driver.simple.TableQuery
import play.api.db.slick.Config.driver.simple.columnExtensionMethods
import play.api.db.slick.Config.driver.simple.longColumnType
import play.api.db.slick.Config.driver.simple.queryToAppliedQueryInvoker
import play.api.db.slick.Config.driver.simple.queryToInsertInvoker
import play.api.db.slick.Config.driver.simple.stringColumnExtensionMethods
import play.api.db.slick.Config.driver.simple.stringColumnType
import play.api.db.slick.Config.driver.simple.valueToConstColumn
import play.api.db.slick.DB
import play.api.db.slick.DBAction
You can simply do
val queryCompiled = Compiled(bioMaterialTypes)