Scala: provide class type in parameter - scala

I have one method that takes class as parameter like below.
val hBaseRDD = spark.sparkContext.newAPIHadoopFile(path,
classOf[org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat[ImmutableBytesWritable, Result]],
classOf[ImmutableBytesWritable],
classOf[Result], conf)
I want to write a method that takes parameter as Type of class and than I can call this line inside it. like below.
case class SequenceInput(conf: Configuration,
path: String,
storageClass: String,
keyClass: String,
valueClass: String,
){
override def read(sparkSession: SparkSession): DataFrame = {
val rdd = sparkSession.sparkContext.newAPIHadoopFile(path,
classOf[storageClass],
classOf[keyClass],
classOf[valueClass], conf)
rdd
}
but this ask me to create storaClass, keyClass, valueClass but these are the variable that holds the class type.
How to do this?

You're writing a constructor, not a method, but change
storageClass: String,
keyClass: String,
valueClass: String
To be Class objects, not Strings
Then your function can
return sparkSession.sparkContext.newAPIHadoopFile(path,
storageClass,
keyClass
valueClass, conf)
Then
val storageClass = Class.forName(config.get("storage_class"))
...
// remove path from the constructor since you should be able to use multiple paths
val df = SequenceInput(storageClass,...).read(spark, path)
Keep in mind that, Class.forName expects the fully qualified name, not just "ImmutableBytesWritable", for example

If I understand correctly, you need to convert a String into Class. You can do this with Class.forName(String)
case class SequenceInput(conf: Configuration,
path: String,
storageClass: String,
keyClass: String,
valueClass: String,
) {
override def read(sparkSession: SparkSession): DataFrame = {
val rdd = sparkSession.sparkContext.newAPIHadoopFile(path,
Class.forName(storageClass),
Class.forName(keyClass),
Class.forName(valueClass), conf)
rdd
}
}

Related

Factory with companion object where each type of object takes a common parameter

I have a class like this -
class Cache (
tableName: String,
TTL: Int) {
// Creates a cache
}
I have a companion object that returns different types of caches. It has functions that require a base table name and can construct the cache.
object Cache {
def getOpsCache(baseTableName: String): Cache = {
new Cache(s"baseTableName_ops", OpsTTL);
}
def getSnapshotCache(baseTableName: String): Cache = {
new Cache(s"baseTableName_snaps", SnapshotTTL);
}
def getMetadataCache(baseTableName: String): Cache = {
new Cache(s"baseTableName_metadata", MetadataTTL);
}
}
The object does a few more things and the Cache class has more parameters, which makes it useful to have a companion object to create different types of Caches. The baseTableName parameter is same for all of the caches. Is there a way in which I can pass this parameter only once and then just call the functions to get different types of caches ?
Alternative to this is to create a factory class and pass the baseTableName parameter to constructor and then call the functions. But I am wondering if it could be done in any way with the Companion object.
The simplest way is to put your factory in a case class:
case class CacheFactory(baseTableName: String) {
lazy val getOpsCache: Cache =
Cache(s"baseTableName_ops", OpsTTL)
lazy val getSnapshotCache =
Cache(s"baseTableName_snaps", SnapshotTTL)
lazy val getMetadataCache =
Cache(s"baseTableName_metadata", MetadataTTL)
}
As I like case classes I changed your Cache also to a case class:
case class Cache(tableName: String, TTL: Int)
As you can see I adjusted your Java code to correct Scala code.
If you want to put it in the companion object, you could use implicits, like:
object Cache {
def getOpsCache(implicit baseTableName: String): Cache =
Cache(s"baseTableName_ops", OpsTTL)
def getSnapshotCache(implicit baseTableName: String) =
Cache(s"baseTableName_snaps", SnapshotTTL)
def getMetadataCache(implicit baseTableName: String) =
Cache(s"baseTableName_metadata", MetadataTTL)
}
Then your client looks like:
implicit val baseTableName: String = "baseName"
cache.getSnapshotCache
cache.getMetadataCache
Consider creating algebraic data type like so
sealed abstract class Cache(tablePostfix: String, ttl: Int) {
val tableName = s"baseTableName_$tablePostfix"
}
case object OpsCache extends Cache("ops", 60)
case object SnapshotCache extends Cache("snaps", 120)
case object MetadataCache extends Cache("metadata", 180)
OpsCache.tableName // res0: String = baseTableName_ops

assign any val scala pureconfig during configuration read

I know this is going against the very nature of Scala pureconfig ... however ...
Is it even feasible to implement with scala pureconfig configuration reading for this case class, so that instead of having strongly typed value( as String) for the constructor parameter "variable" to have Any type or at least String, Integer, Double, Array[Strings], Array[Integer], Array[Double].
case class Filter(
field: String,
operator: String,
variable: String // should support Int , Double , List[String], List[Int]
)
To my poor understanding, neither CoProductHint nor Custom Reader approach will work ...
By default pureconfig doesn't provide a way to read Any. If for a specific class you would like to read Any then you can define a codec for Any in the context of that class:
case class Filter(field: String, operator: String, variable: Any)
implicit val readFilter = {
implicit val readAny = new ConfigReader[Any] {
def from(config: ConfigValue): Either[ConfigReaderFailures, Any] = {
Right(config.unwrapped())
}
}
ConfigReader[Filter]
}
and then you can read Filter
val config = ConfigFactory.parseString(
"""
{
field: "foo"
operator: "bar"
variable: []
}
""")
println(pureconfig.loadConfig[Filter](config))
// will print Right(Filter(foo,bar,[]))
unwrapped converts a ConfigValue to Any recursively.
So the answer is yes, it if possible to tell pureconfig how to read Any.
The reason why pureconfig doesn't provide the codec for Any by default is because Any is the ancestor of all the classes in Scala and it's impossible to create a codec for anything (e.g. database connections). When you know that you are expecting a restricted set of types, like the ones you listed, you can wrap everything in a coproduct:
sealed abstract class MySupportedType
final case class MyInt(value: Int) extends MySupportedType
final case class MyDouble(value: Double) extends MySupportedType
final case class MyListOfString(value: List[String]) extends MySupportedType
final case class MyListOfInt(value: List[Int]) extends MySupportedType
final case class Filter2(field: String, operator: String, variable: MySupportedType)
and then use the default way to extract the coproduct value or a custom codec for MySupportedType
val config = ConfigFactory.parseString(
"""
{
field: "foo"
operator: "bar"
variable: {
type: mylistofint
value: []
}
}
""")
println(pureconfig.loadConfig[Filter2](config))
// will print Right(Filter2(foo,bar,MyListOfInt(List())))
Using a coproduct instead of Any limits the possible values that variable can have and let the compiler help you if something is wrong with what you are doing.
You can make that field ANY:
Example:
scala> case class Filter(
| field: String,
| operator: String,
| variable: Any // should support Int , Double , List[String], List[Int]
| )
defined class Filter
scala> Filter("anurag","data",List(12,34))
res5: Filter = Filter(anurag,data,List(12, 34))
scala> Filter("anurag","data",List(12.12,34.12))
res6: Filter = Filter(anurag,data,List(12.12, 34.12))
scala> Filter("anurag","data",List("Hello","Hii"))
res8: Filter = Filter(anurag,data,List(Hello, Hii))

Passing different object models as a parameter to a method in scala

I've really struggled with type relationships in scala and how to use them effectively. I am currently trying to understand how I would use them to only edit certain fields in a Mongo Collection. This means passing a particular object containing only those fields to a method which (after reading about variances) I thought that I could do like this:
abstract class DocClass
case class DocPart1(oId: Option[BSONObjectID], name: String, other: String) extends DocClass
case class DocPart2(city: String, country: String) extends DocClass
With the method that calls a more generic method as:
def updateMultipleFields(oId: Option[BSONObjectID], dataModel: DocClass): Future[Result] = serviceClientDb.updateFields[T](collectionName, dataModel, oId)
// updateFields updates the collection by passing *dataModel* into the collection, i.e. Json.obj("$set" -> dataModel)
So dataModel can be a DocPart1 or DocPart2 object. I'm eager not to use a
type parameter on updateMultipleFields (as this interesting article may suggest) as this leads me to further issues in passing these to this method in other files in the project. I'm doing this to abide with DRY and in order to maintain efficient database operations.
I've gone round in circles with this one - can anyone shed any light on this?
Edited after #SerGr's comments
So to be completely clear; I'm using Play/Scala/ReactiveMongo Play JSON (as documented here) and I have a MongoDB collection with lots of fields.
case class Doc(oId: Option[BSONObjectID], name: String, city: String, country: String, city: String, continent: String, region: String, region: String, latitude: Long, longitude: Long)
To create a new document I have auto-mapped Doc (above) to the collection structure (in Play - like this) and created a form (to insert/update the collection) - all working well!
But when editing a document; I would like to update only some fields (so that all of the fields are not updated). I have therefore created multiple case classes to divide these fields into smaller models (like the examples of DocPart1 & DocPart2) and mapped the form data to just one. This has led me to pass these as a parameter to the updateMultipleFields method as shown above. I hope that this makes more sense.
I'm not sure if I understand correctly what you need. Still here is some code that might be it. Assume we have our FullDoc class defined as:
case class FullDoc(_id: Option[BSONObjectID], name: String, other: String)
and we have 2 partial updates defined as:
sealed trait BaseDocPart
case class DocPart1(name: String) extends BaseDocPart
case class DocPart2(other: String) extends BaseDocPart
Also assume we have an accessor to our Mongo collection:
def docCollection: Future[JSONCollection] = ...
So if I understand your requirements, what you need is something like this:
def update[T <: BaseDocPart](oId: BSONObjectID, docPart: T)(implicit format: OFormat[T]) = {
docCollection.flatMap(_.update(BSONDocument("_id" -> oId),
JsObject(Seq("$set" -> Json.toJson(docPart)))))
}
Essentially the main trick is to use generic T <: BaseDocPart and pass implicit format: OFormat[T] so that we can convert our specific child of BaseDocPart to JSON even after type erasure.
And here is some additional test code (that I used in my console application)
implicit val fullFormat = Json.format[FullDoc]
implicit val part1Format = Json.format[DocPart1]
implicit val part2Format = Json.format[DocPart2]
def insert(id: Int) = {
val fullDoc = FullDoc(None, s"fullDoc_$id", s"other_$id")
val insF: Future[WriteResult] = docCollection.flatMap(_.insert(fullDoc))
val insRes = Await.result(insF, 2 seconds)
println(s"insRes = $insRes")
}
def loadAndPrintAll() = {
val readF = docCollection.flatMap(_.find(Json.obj()).cursor[FullDoc](ReadPreference.primaryPreferred).collect(100, Cursor.FailOnError[Vector[FullDoc]]()))
val readRes = Await.result(readF, 2 seconds)
println(s"readRes =\n${readRes.mkString("\n")}")
}
def loadRandomDocument(): FullDoc = {
val readF = docCollection.flatMap(_.find(Json.obj()).cursor[FullDoc](ReadPreference.primaryPreferred).collect(100, Cursor.FailOnError[Vector[FullDoc]]()))
val readRes = Await.result(readF, 2 seconds)
readRes(Random.nextInt(readRes.length))
}
def updateWrapper[T <: BaseDocPart](oId: BSONObjectID, docPart: T)(implicit writer: OFormat[T]) = {
val updateRes = Await.result(update(oId, docPart), 2 seconds)
println(s"updateRes = $updateRes")
}
// pre-fill with some data
insert(1)
insert(2)
insert(3)
insert(4)
val newId: Int = ((System.currentTimeMillis() - 1511464148000L) / 100).toInt
println(s"newId = $newId")
val doc21: FullDoc = loadRandomDocument()
println(s"doc21 = $doc21")
updateWrapper(doc21._id.get, DocPart1(s"p1_modified_$newId"))
val doc22: FullDoc = loadRandomDocument()
println(s"doc22 = $doc22")
updateWrapper(doc22._id.get, DocPart2(s"p2_modified_$newId"))
loadAndPrintAll()

Is it possible to implement a class constructor type converter in Scala

As example I have this class
import java.sql.Timestamp
class Service(name: String, stime: Timestamp,
etime:Timestamp)
how to make it accept the following in implicit way, let us called stringToTimestampConverter
val s = new AService("service1", "2015-2-15 07:15:43", "2015-2-15 10:15:43")
Time have been passed as a string.
How to implement such a converter?
you have two ways, the first is having in scope a String => Timestamp implicit conversion
// Just have this in scope before you instantiate the object
implicit def toTimestamp(s: String): Timestamp = Timestamp.valueOf(s) // convert to timestamp
the other one is adding another constructor to the class:
class Service(name: String, stime: Timestamp, etime:Timestamp) {
def this(name: String, stime: String, etime: String) = {
this(name, Service.toTimestamp(stime), Service.toTimestamp(etime))
}
}
object Service {
def toTimestamp(s: String): Timestamp = Timestamp.valueOf(s) // convert to timestamp
}

Scala - Add member variable to class from outside

Is it possible to add a member variable to a class from outside the class? (Or mimic this behavior?)
Here's an example of what I'm trying to do. I already use an implicit conversion to add additional functions to RDD, so I added a variable to ExtendedRDDFunctions. I'm guessing this doesn't work because the variable is lost after the conversion in a rdd.setMember(string) call.
Is there any way to get this kind of functionality? Is this the wrong approach?
implicit def toExtendedRDDFunctions(rdd: RDD[Map[String, String]]): ExtendedRDDFunctions = {
new ExtendedRDDFunctions(rdd)
}
class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) extends Logging with Serializable {
var member: Option[String] = None
def getMember(): String = {
if (member.isDefined) {
return member.get
} else {
return ""
}
}
def setMember(field: String): Unit = {
member = Some(field)
}
def queryForResult(query: String): String = {
// Uses member here
}
}
EDIT:
I am using these functions as follows: I first call rdd.setMember("state"), then rdd.queryForResult(expression).
Because the implicit conversion is applied each time you invoke a method defined in ExtendedRDDFunctions, there is a new instance of ExtendedRDDFunctions created for every call to setMember and queryForResult. Those instances do not share any member variables.
You have basically two options:
Maintain a Map[RDD, String] in ExtendedRDDFunctions's companion object which you use to assign the member value to an RDD in setMember. This is the evil option as you introduce global state and open pitfalls for a whole range of errors.
Create a wrapper class that contains your member value and is returned by the setMember method:
case class RDDWithMember(rdd: RDD[Map[String, String]], member: String) extends RDD[Map[String, String]] {
def queryForResult(query: String): String = {
// Uses member here
}
// methods of the RDD interface, just delegate to rdd
}
implicit class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) {
def setMember(field: String): RDDWithMember = {
RDDWithMember(rdd, field)
}
}
Beside the omitted global state, this approach is also more type safe because you cannot call queryForResult on instances that do not have a member. The only downsides are that you have to delegate all members of RDD and that queryForResult is not defined on RDD itself.
The first issue can probably be addressed with some macro magic (search for "delegate" or "proxy" and "macro").
The later issue can be resolved by defining an additional extension method in ExtendedRDDFunctions that checks if the RDD is a RDDWithMember:
implicit class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) {
def setMember(field: String): RDDWithMember = // ...
def queryForResult(query: String): Option[String] = rdd match {
case wm: RDDWithMember => Some(wm.queryForResult(query))
case _ => None
}
}
import ExtendedRDDFunctions._
will import all attributes and functions from Companion object to be used in the body of your class.
For your usage look for delagate pattern.