Scala:case class runTime Error - scala

This demo ran Ok. But when I move it to another class function(my former project) and call the function, it compiles failure.
object DFMain {
case class Person(name: String, age: Double, t:String)
def main (args: Array[String]): Unit = {
val sc = new SparkContext("local", "Scala Word Count")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val bsonRDD = sc.parallelize(("foo",1,"female")::
("bar",2,"male")::
("baz",-1,"female")::Nil)
.map(tuple=>{
var bson = new BasicBSONObject()
bson.put("name","bfoo")
bson.put("value",0.1)
bson.put("t","female")
(null,bson)
})
val tDf = bsonRDD.map(_._2)
.map(f=>Person(f.get("name").toString,
f.get("value").toString.toDouble,
f.get("t").toString)).toDF()
tDf.limit(1).show()
}
}
'MySQLDao.insertIntoMySQL()' compile error
object MySQLDao {
private val sc= new SparkContext("local", "Scala Word Count")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Person(name: String, age: Double, t:String)
def insertIntoMySQL(): Unit ={
val bsonRDD = sc.parallelize(("foo",1,"female")::
("bar",2,"male")::
("baz",-1,"female")::Nil)
.map(tuple=>{
val bson = new BasicBSONObject()
bson.put("name","bfoo")
bson.put("value",0.1)
bson.put("t","female")
(null,bson)
})
val tDf = bsonRDD.map(_._2).map( f=> Person(f.get("name").toString,
f.get("value").toString.toDouble,
f.get("t").toString)).toDF()
tDf.limit(1).show()
}
}
Will, when I call 'MySQLDao.insertIntoMySQL()' gets the Error of
value typedProductIterator is not a member of object scala.runtim.scala.scalaRuntTime
case class Person(name: String, age: Double, t:String)

I suppose that the case class isn't seen in closure inside map function. Move it to the package level.
case class Person(name: String, age: Double, t:String)
object MySQLDao {
...
}

Related

Run Object notebook in Databricks

I am trying to execute this code on databricks in scala. Everything is in an object, then I have a case class and def main and other def functions.
Trying to work with "package cells" but I got Warning: classes defined within packages cannot be redefined without a cluster restart.
Compilation successful.
removing the object didn't work either
package x.y.z
import java.util.Date
import java.io.File
import java.io.PrintWriter
import org.apache.hadoop.fs.{FileSystem, Path}
object Meter {
val dateFormat = new SimpleDateFormat("yyyyMMdd")
case class Forc (cust: String, Num: String, date: String, results: Double)
def main(args: Array[String]): Unit = {
val inputFile = "sv" //
val outputFile = "ssv" //
val fileSystem = getFileSystem(inputFile)
val inputData = readLines(fileSystem, inputFile, skipHeader = true).toSeq
val filtinp = inputData.filter(x => x.nonEmpty)
.map(x => Results(x(6), x(5), x(0), x(8).toDouble))
def getTimestamp(date: String): Long = dateFormat.parse(date).getTime
def getDate(timeStampInMills: Long): String = {
val time = new Date(timeStampInMills)
dateFormat.format(time)
}
def getFileSystem(path: String): FileSystem = {
val hconf = new Configuration()
new Path(path).getFileSystem(hconf)
}
override def next(): String = {
val result = line
line = inputData.readLine()
if (line == null) {
inputData.close()
}
result
}
}
}
}

Why one Kafka streams block the other one from getting started?

I am working with the new Kafka-scala-streams api recently opensourced by lightbend.
And I am trying to run two streams. But Whats happening is two of them don't run simultaneously and I am not getting the desired output.
package in.internity
import java.util.Properties
import java.util.concurrent.TimeUnit
import com.lightbend.kafka.scala.streams.{KStreamS, StreamsBuilderS}
import org.apache.kafka.common.serialization.Serdes
import org.apache.kafka.streams.kstream.Produced
import org.apache.kafka.streams.{StreamsConfig, _}
import org.json4s.DefaultFormats
import org.json4s.native.JsonMethods.parse
import org.json4s.native.Serialization.write
import scala.util.Try
/**
* #author Shivansh <shiv4nsh#gmail.com>
* #since 8/1/18
*/
object Boot extends App {
implicit val formats: DefaultFormats.type = DefaultFormats
val config: Properties = {
val p = new Properties()
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application")
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
p.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass)
p.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass)
p
}
val streams1 = wordSplit("lines", "wordCount")
val streams2 = readAndWriteJson("person", "personName")
private def wordSplit(intopic: String, outTopic: String) = {
val builder = new StreamsBuilderS()
val produced = Produced.`with`(Serdes.String(), Serdes.String())
val textLines: KStreamS[String, String] = builder.stream(intopic)
val data: KStreamS[String, String] = textLines.flatMapValues(value => value.toLowerCase.split("\\W+").toIterable)
data.to(outTopic, produced)
val streams: KafkaStreams = new KafkaStreams(builder.build(), config)
streams
}
private def readAndWriteJson(intopic: String, outTopic: String) = {
val builder = new StreamsBuilderS()
val produced = Produced.`with`(Serdes.String(), Serdes.String())
val textLines: KStreamS[String, String] = builder.stream(intopic)
val data: KStreamS[String, String] = textLines.mapValues(value => {
val person = Try(parse(value).extract[Person]).toOption
println("1::", person)
val personNameAndEmail = person.map(a => PersonNameAndEmail(a.name, a.email))
println("2::", personNameAndEmail)
write(personNameAndEmail)
})
data.to(outTopic, produced)
val streams: KafkaStreams = new KafkaStreams(builder.build(), config)
streams
}
streams1.start()
streams2.start()
Runtime.getRuntime.addShutdownHook(new Thread(() => {
streams2.close(10, TimeUnit.SECONDS)
streams1.close(10, TimeUnit.SECONDS)
}))
}
case class Person(name: String, age: Int, email: String)
case class PersonNameAndEmail(name: String, email: String)
When I run this and produce Messages on topic person they do not get consumed.
But When I change the ordering of their start i.e
streams2.start()
streams1.start()
It works fine. So why is starting of One stream blocks the other .Can't we run multiple streams at the same time.
Got it working , seems like I was trying to initialize the stream twice in different methods themselves (silly of me :P )
Working code :
package in.internity
import java.util.Properties
import java.util.concurrent.TimeUnit
import com.lightbend.kafka.scala.streams.{KStreamS, StreamsBuilderS}
import org.apache.kafka.common.serialization.Serdes
import org.apache.kafka.streams.kstream.Produced
import org.apache.kafka.streams.{StreamsConfig, _}
import org.json4s.DefaultFormats
import org.json4s.native.JsonMethods.parse
import org.json4s.native.Serialization.write
import scala.util.Try
/**
* #author Shivansh <shiv4nsh#gmail.com>
* #since 8/1/18
*/
object Boot extends App {
implicit val formats: DefaultFormats.type = DefaultFormats
val config: Properties = {
val p = new Properties()
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application")
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
p.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass)
p.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass)
p
}
val builder = new StreamsBuilderS()
private def wordSplit(intopic: String, outTopic: String) = {
val produced = Produced.`with`(Serdes.String(), Serdes.String())
val textLines: KStreamS[String, String] = builder.stream(intopic)
val data: KStreamS[String, String] = textLines.flatMapValues(value => value.toLowerCase.split("\\W+").toIterable)
data.to(outTopic, produced)
}
private def readAndWriteJson(intopic: String, outTopic: String) = {
val produced = Produced.`with`(Serdes.String(), Serdes.String())
val textLines: KStreamS[String, String] = builder.stream(intopic)
val data: KStreamS[String, String] = textLines.mapValues(value => {
val person = Try(parse(value).extract[Person]).toOption
println("1::", person)
val personNameAndEmail = person.map(a => PersonNameAndEmail(a.name, a.email))
println("2::", personNameAndEmail)
write(personNameAndEmail)
})
data.to(outTopic, produced)
}
wordSplit("lines", "wordCount")
readAndWriteJson("person", "personName")
val streams: KafkaStreams = new KafkaStreams(builder.build(), config)
streams.start()
streams
Runtime.getRuntime.addShutdownHook(new Thread(() => {
streams.close(10, TimeUnit.SECONDS)
}))
}
case class Person(name: String, age: Int, email: String)
case class PersonNameAndEmail(name: String, email: String)

Implicits in a Spark Scala program not working

I am not able to perform an implicit conversion from an RDD to a Dataframe in a Scala program although I am importing spark.implicits._.
Any help would be appreciated.
Main Program with the implicits:
object spark1 {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("e1").config("o1", "sv").getOrCreate()
import spark.implicits._
val conf = new SparkConf().setMaster("local").setAppName("My App")
val sc = spark.sparkContext
val data = sc.textFile("/TestDataB.txt")
val allSplit = data.map(line => line.split(","))
case class CC1(LAT: Double, LONG: Double)
val allData = allSplit.map( p => CC1( p(0).trim.toDouble, p(1).trim.toDouble))
val allDF = allData.toDF()
// ... other code
}
}
Error is as follows:
Error:(40, 25) value toDF is not a member of org.apache.spark.rdd.RDD[CC1]
val allDF = allData.toDF()
When you define the case class CC1 inside the main method, you hit https://issues.scala-lang.org/browse/SI-6649; toDF() then fails to locate the appropriate implicit TypeTag for that class at compile time.
You can see this in this simple example:
case class Out()
object TestImplicits {
def main(args: Array[String]) {
case class In()
val typeTagOut = implicitly[TypeTag[Out]] // compiles
val typeTagIn = implicitly[TypeTag[In]] // does not compile: Error:(23, 31) No TypeTag available for In
}
}
Spark's relevant implicit conversion has this type parameter: [T <: Product : TypeTag] (see newProductEncoder here), which means an implicit TypeTag[CC1] is required.
To fix this - simply move the definition of CC1 out of the method, or out of object entirely:
case class CC1(LAT: Double, LONG: Double)
object spark1 {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("e1").config("o1", "sv").getOrCreate()
import spark.implicits._
val data = spark.sparkContext.textFile("/TestDataB.txt")
val allSplit = data.map(line => line.split(","))
val allData = allSplit.map( p => CC1( p(0).trim.toDouble, p(1).trim.toDouble))
val allDF = allData.toDF()
// ... other code
}
}
I thought the toDF is in sqlContext.implicits._ so you need to import that not spark.implicits._. At least that is the case in spark 1.6

Spark Task not serializable (Case Classes)

Spark throws Task not serializable when I use case class or class/object that extends Serializable inside a closure.
object WriteToHbase extends Serializable {
def main(args: Array[String]) {
val csvRows: RDD[Array[String] = ...
val dateFormatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
val usersRDD = csvRows.map(row => {
new UserTable(row(0), row(1), row(2), row(9), row(10), row(11))
})
processUsers(sc: SparkContext, usersRDD, dateFormatter)
})
}
def processUsers(sc: SparkContext, usersRDD: RDD[UserTable], dateFormatter: DateTimeFormatter): Unit = {
usersRDD.foreachPartition(part => {
val conf = HBaseConfiguration.create()
val table = new HTable(conf, tablename)
part.foreach(userRow => {
val id = userRow.id
val date1 = dateFormatter.parseDateTime(userRow.date1)
})
table.flushCommits()
table.close()
})
}
My first attempt was to use a case class:
case class UserTable(id: String, name: String, address: String, ...) extends Serializable
My second attempt was to use a class instead of a case class:
class UserTable (val id: String, val name: String, val addtess: String, ...) extends Serializable {
}
My third attempt was to use a companion object in the class:
object UserTable extends Serializable {
def apply(id: String, name: String, address: String, ...) = new UserTable(id, name, address, ...)
}
Most likely the function "doSomething" is defined on your class which isn't serilizable. Instead move the "doSomething" function to a companion object (e.g. make it static).
It was the dateFormatter, I placed it inside the partition loop and it works now.
usersRDD.foreachPartition(part => {
val id = userRow.id
val dateFormatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
val date1 = dateFormatter.parseDateTime(userRow.date1)
})

Scala: How to access a class property dynamically by name?

How can I look up the value of an object's property dynamically by name in Scala 2.10.x?
E.g. Given the class (it can't be a case class):
class Row(val click: Boolean,
val date: String,
val time: String)
I want to do something like:
val fields = List("click", "date", "time")
val row = new Row(click=true, date="2015-01-01", time="12:00:00")
fields.foreach(f => println(row.getProperty(f))) // how to do this?
class Row(val click: Boolean,
val date: String,
val time: String)
val row = new Row(click=true, date="2015-01-01", time="12:00:00")
row.getClass.getDeclaredFields foreach { f =>
f.setAccessible(true)
println(f.getName)
println(f.get(row))
}
You could also use the bean functionality from java/scala:
import scala.beans.BeanProperty
import java.beans.Introspector
object BeanEx extends App {
case class Stuff(#BeanProperty val i: Int, #BeanProperty val j: String)
val info = Introspector.getBeanInfo(classOf[Stuff])
val instance = Stuff(10, "Hello")
info.getPropertyDescriptors.map { p =>
println(p.getReadMethod.invoke(instance))
}
}