Spark throws Task not serializable when I use case class or class/object that extends Serializable inside a closure.
object WriteToHbase extends Serializable {
def main(args: Array[String]) {
val csvRows: RDD[Array[String] = ...
val dateFormatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
val usersRDD = csvRows.map(row => {
new UserTable(row(0), row(1), row(2), row(9), row(10), row(11))
})
processUsers(sc: SparkContext, usersRDD, dateFormatter)
})
}
def processUsers(sc: SparkContext, usersRDD: RDD[UserTable], dateFormatter: DateTimeFormatter): Unit = {
usersRDD.foreachPartition(part => {
val conf = HBaseConfiguration.create()
val table = new HTable(conf, tablename)
part.foreach(userRow => {
val id = userRow.id
val date1 = dateFormatter.parseDateTime(userRow.date1)
})
table.flushCommits()
table.close()
})
}
My first attempt was to use a case class:
case class UserTable(id: String, name: String, address: String, ...) extends Serializable
My second attempt was to use a class instead of a case class:
class UserTable (val id: String, val name: String, val addtess: String, ...) extends Serializable {
}
My third attempt was to use a companion object in the class:
object UserTable extends Serializable {
def apply(id: String, name: String, address: String, ...) = new UserTable(id, name, address, ...)
}
Most likely the function "doSomething" is defined on your class which isn't serilizable. Instead move the "doSomething" function to a companion object (e.g. make it static).
It was the dateFormatter, I placed it inside the partition loop and it works now.
usersRDD.foreachPartition(part => {
val id = userRow.id
val dateFormatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
val date1 = dateFormatter.parseDateTime(userRow.date1)
})
Related
Let's say we have two following case classes:
case class Person(name:String, age:Int, createdAt:LocalDate, lastModified:LocalDate)
case class Address(street:String, zip:Int, createdAt:LocalDate, lastModified:LocalDate)
I want to be able to do as follows:
for {
p <- query[Person]
.changedAfter(lift(LocalDate.of(2022,1,1)))
a <- query[Address].join(a => a.ownerId == p.id)
.changedAfter(lift(LocalDate.of(2022,1,1)))
} yield (p, a)
Where .changedAfter() will work for any entity containing createdAt and lastModified fields.
How would I go on to create such a modification?
Your can create a extension method like this.
import io.getquill._
import java.time._
val ctx = new SqlMirrorContext(PostgresDialect, SnakeCase)
import ctx._
trait EntityLike {
val createAt: LocalDate
val lastModified: LocalDate
}
case class Person(name: String, age: Int, createAt: LocalDate, lastModified: LocalDate) extends EntityLike
implicit class EntityOps[A <: EntityLike](q: Query[A]) {
val changeAfter = quote { (d: LocalDate) =>
q.filter(e => infix"${e.createAt} < ${d}".as[Boolean])
}
}
val d = LocalDate.of(2000, 1, 1)
val m = ctx.run(query[Person].changeAfter(lift(d)))
println(m)
class Person {
val studentName = "Arpana"
def changeName(id:String, name:String) ={
val studentName = name
useName(id)
}
def useName(id:String) = {
println(s"use name is $id, by $studentName")
}
}
object Person {
def main(args: Array[String]): Unit = {
(new Person).changeName("2", "Shubham")
}
}
I don't want to use var in code, can we do it by keywords, I tried with keywords like super, protected, private, final but didn't work.
In actual I want to apply this in the below code.
abstract class BaseRepository[T <: BaseModel : ClassTag : StriveSerializer] {
self: BaseConnection =>
val tableName: String = implicitly[ClassTag[T]].runtimeClass.getSimpleName
private val serializer = implicitly[StriveSerializer[T]]
private def executeInserts(query: String): Future[Boolean] = Future {
val preparedStatement = self.connection.prepareStatement(query)
preparedStatement.execute()
}
def exist(id: String, name: String): Future[Boolean] = {
val tableName = name
val promise = Promise[Boolean]
queryById(id).onComplete {
case Success(_) => promise.success(true)
case Failure(ex) => promise.failure(ex)
}
promise.future
}
def queryById(id: String): Future[T] = {
val getSql = s"SELECT * FROM $tableName WHERE id == $id;"
executeReads(getSql).map(serializer.fromResultSet)
}
}
I want when i call exist function then table name given in exist function override in queryById method table name .
It seems like a bit of mix of Java and Scala style. I tried to refactor a bit assuming the intention behind the code. Try and see if this achieves what you want to do:
class Person(_id: String, _studentName: String) {
private val id: String = _id
private val studentName: String = _studentName
def useName() = {
println(s"use name is $id, by $studentName")
}
}
object Person extends App {
new Person("2", "Shubham").useName()
}
I think you should use case class For Model
case class Student(id:String,name:String)
def changeId(student:Student, newId:String): Student ={
student.copy(id=newId)
}
val s1 = Student("1","A")
val newS1 = changeId(s1,"2")
I think it okay to use mutable in a class
e.g.
class MySuperService{
var lastHeartbeat: Option[Timestamp] = None
def setLastHeartbeat(ts:Timestamp): Unit ={
lastHeartbeat = Some(ts)
}
}
val mss1 = new MySuperService()
mss1.setLastHeartbeat(???)
class MyTable(tag: Tag) extends Table[MyEntity](tag, "1970Table") {
def id = column[Int]("id")
override def * =
(
id
) <> (MyEntity.tupled, MyEntity.unapply)
}
val myTable = TableQuery[MyTable]
class MyRepository(val config: DatabaseConfig[JdbcProfile])
extends MyRepository[MyTable, String] {
override val table: config.profile.api.TableQuery[MyTable] = myTable
def insert(me: MyEntity): Future[Int] = {
db.run(table += me)
}
}
I use this in my other classes like this:
val myRepository = new MyRepository(dbConfig)
myRepository.insert(myrecord)
Question
I would like to not have a hardcoded tablename but rather make the tablename dynamic.
I would like to change the insert method such that it accepts a year (int) parameter and based on the year parameter it chooses the right table. i.e. if the year passed in is 1970 then table name is 1970Table but if the year passed in is 1980 then the table is 1980Table.
Try
class MyRepository(val config: DatabaseConfig[JdbcProfile]) {
import config._
import profile.api._
abstract class MyTable(tag: Tag, name: String) extends Table[MyEntity](tag, name) {
def id = column[Int]("id")
override def * = (id) <> (MyEntity.tupled, MyEntity.unapply)
}
class Table1970(tag: Tag) extends MyTable[MyEntity](tag, "1970Table")
class Table1980(tag: Tag) extends MyTable[MyEntity](tag, "1980Table")
val table1970 = TableQuery[Table1970]
val table1980 = TableQuery[Table1980]
def insert(me: MyEntity, year: Int): Future[Int] = db.run {
year match {
case "1970" => table1970 += me
case "1980" => table1980 += me
}
}
}
Now
val myRepository = new MyRepository(dbConfig)
myRepository.insert(myrecord, "1970")
There is two apply methods in TableQuery. val myTable = TableQuery[MyTable] -
this one uses macros to create MyTable.
The other one is defined like this:
def apply[E <: AbstractTable[_]](cons: Tag => E): TableQuery[E] =
new TableQuery[E](cons)
So you can do smth like this
class MyTable(tag: Tag, tableName: String) extends Table[MyEntity](tag, tableName)
...
def myTable(name: String) = TableQuery[MyTable](tag => new MyTable(tag, name))
Now you can predefine all tables you need and use them or do smth like this
class MyRepository(val config: DatabaseConfig[JdbcProfile])
extends MyRepository[MyTable, String] {
override def table(year: Int): config.profile.api.TableQuery[MyTable] = myTable(year.toString)
def insert(me: MyEntity, year: Int): Future[Int] = {
db.run(table(year) += me)
}
}
I saw lots of websites about scala reflection library but none of them have a straightforward answer to instantiate an object of the class at runtime.
For example, I have the following code:
trait HydraTargetTable {
val inputTables = Seq.empty[Int]
val tableType: String
val tableName: String
val schema: StructType
def className: String = this.getClass.toString
}
trait HydraIntermediateTable extends HydraTargetTable {
val tableType = "Intermediate"
def build(textRDD: RDD[String]): DataFrame = {
DataframeUtils.safeParseFromSchema(textRDD, schema)
}
}
class Table1 extends HydraIntermediateTable {
override val inputTables: Seq[Int] = Seq(1, 2)
override val tableName: String = ""
override val schema: StructType = new StructType()
}
At runtime, I want to be able to instantiate an object of Table1 given the class name as a String value. Here is my reflection code.
object ReflectionTestApp {
def tableBuilder(name: String): Intermediate = {
Class.forName("hearsay.hydra.dataflow.api." + name).newInstance()
.asInstanceOf[Intermediate]
}
def hydraTableBuilder(name: String): HydraTargetTable = {
val action = Class
.forName("hearsay.hydra.dataflow.api." + name).newInstance()
action.asInstanceOf[HydraTargetTable]
}
def main(args: Array[String]): Unit = {
hydraTableBuilder("Table1").inputTables.foreach(println)
}
}
Here is how you can achieve reflection for Object/Class.
package reflection
trait Table {
val id: Int
}
class ActivityTable extends Table {
val id = 10
}
object ActivityTable2 extends Table {
val id = 10
}
object Reflection extends App {
val obj = activityTableBuilder("ActivityTable")
println(obj.id) //output 10
val obj2 = objectBuilder("ActivityTable2$")
println(obj2.id) //output 10
/*
class reflection
*/
def activityTableBuilder(name: String): Table = {
val action = Class.forName("reflection." + name).newInstance()
action.asInstanceOf[Table]
}
/*
object reflection
*/
def objectBuilder(name: String): Table = {
val action = Class.forName("reflection." + name)
action.getField("MODULE$").get(classOf[Table]).asInstanceOf[Table]
}
}
How can I look up the value of an object's property dynamically by name in Scala 2.10.x?
E.g. Given the class (it can't be a case class):
class Row(val click: Boolean,
val date: String,
val time: String)
I want to do something like:
val fields = List("click", "date", "time")
val row = new Row(click=true, date="2015-01-01", time="12:00:00")
fields.foreach(f => println(row.getProperty(f))) // how to do this?
class Row(val click: Boolean,
val date: String,
val time: String)
val row = new Row(click=true, date="2015-01-01", time="12:00:00")
row.getClass.getDeclaredFields foreach { f =>
f.setAccessible(true)
println(f.getName)
println(f.get(row))
}
You could also use the bean functionality from java/scala:
import scala.beans.BeanProperty
import java.beans.Introspector
object BeanEx extends App {
case class Stuff(#BeanProperty val i: Int, #BeanProperty val j: String)
val info = Introspector.getBeanInfo(classOf[Stuff])
val instance = Stuff(10, "Hello")
info.getPropertyDescriptors.map { p =>
println(p.getReadMethod.invoke(instance))
}
}