I'm trying to come up with a CSV Parser that can be called like this:
parser parse "/path/to/csv/file" using parserConfiguration
Where the parser will be a class that contains the target case class into which the CSV file will be parsed into:
class CSVParser[A] {
def parse(path: String) = Source.fromFile(fromFilePath).getLines().mkString("\n")
def using(cfg: ParserConfig) = ??? How do I chain this optionally???
}
val parser = CSVParser[SomeCaseClass]
I managed to get up to the point where I can call:
parser parse "/the/path/to/the/csv/file/"
But I do not want to run the parse method yet as I want to apply the configuration using the using like DSL as mentioned above! So there are two rules here. If the caller does not supply a parserConfig, I should be able to run with the default, but if the user supplies a parserConfig, I want to apply the config and then run the parse method. I tried it with a combination of implicits, but could not get them to work properly!
Any suggestions?
EDIT: So the solution looks like this as per comments from "Cyrille Corpet":
class CSVReader[A] {
def parse(path: String) = ReaderWithFile[A](path)
case class ReaderWithFile[A](path: String) {
def using(cfg: CSVParserConfig): Seq[A] = {
val lines = Source.fromFile(path).getLines().mkString("\n")
println(lines)
println(cfg)
null
}
}
object ReaderWithFile {
implicit def parser2parsed[A](parser: ReaderWithFile[A]): Seq[A] = parser.using(defaultParserCfg)
}
}
object CSVReader extends App {
def parser[A] = new CSVReader[A]
val sss: Seq[A] = parser parse "/csv-parser/test.csv" // assign this to a val so that the implicit conversion gets applied!! Very important to note!
}
I guess I need to get the implicit in scope at the location where I call the parser parse, but at the same time I do not want to mess up the structure that I have above!
If you replace using with an operator with a higher precedence than parse you can get it to work without needing extra type annotations. Take for instance <<:
object parsedsl {
class ParserConfig
object ParserConfig {
val default = new ParserConfig
}
case class ParseUnit(path: String, config: ParserConfig)
object ParseUnit {
implicit def path2PU(path: String) = ParseUnit(path, ParserConfig.default)
}
implicit class ConfigSyntax(path: String) {
def <<(config: ParserConfig) = ParseUnit(path, config)
}
class CSVParser {
def parse(pu: ParseUnit) = "parsing"
}
}
import parsedsl._
val parser = new CSVParser
parser parse "path" << ParserConfig.default
parser parse "path"
Your parse method should just give a partial result, without doing anything at all. To deal with default implem, you can use implicit conversion to output type:
class CSVParser[A] {
def parse(path: String) = ParserWithFile[A](path)
}
case class ParserWithFile[A](path: String) {
def using(cfg: ParserConfig): A = ???
}
object ParserWithFile {
implicit def parser2parsed[A](parser: ParserWithFile[A]): A = parser.using(ParserConfig.default)
}
val parser = CSVParser[SomeCaseClass]
Related
I am broadcasting a value in Spark Streaming application . But I am not sure how to access that variable in a different class than the class where it was broadcasted.
My code looks as follows:
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreachRDD(rdd => {
val obj = AppObject1
rdd.filter(p => obj.apply(p))
rdd.count
}
}
object AppObject1: Boolean{
def apply(str: String){
AnotherObject.process(str)
}
}
object AnotherObject{
// I want to use broadcast variable in this object
val B = broadcastA.Value // compilation error here
def process(): Boolean{
//need to use B inside this method
}
}
Can anyone suggest how to access broadcast variable in this case?
There is nothing particularly Spark specific here ignoring possible serialization issues. If you want to use some object it has to be available in the current scope and you can achieve this the same way as usual:
you can define your helpers in a scope where broadcast is already defined:
{
...
val x = sc.broadcast(1)
object Foo {
def foo = x.value
}
...
}
you can use it as a constructor argument:
case class Foo(x: org.apache.spark.broadcast.Broadcast[Int]) {
def foo = x.value
}
...
Foo(sc.broadcast(1)).foo
method argument
case class Foo() {
def foo(x: org.apache.spark.broadcast.Broadcast[Int]) = x.value
}
...
Foo().foo(sc.broadcast(1))
or even mixed-in your helpers like this:
trait Foo {
val x: org.apache.spark.broadcast.Broadcast[Int]
def foo = x.value
}
object Main extends Foo {
val sc = new SparkContext("local", "test", new SparkConf())
val x = sc.broadcast(1)
def main(args: Array[String]) {
sc.parallelize(Seq(None)).map(_ => foo).first
sc.stop
}
}
Just a short take on performance considerations that were introduced earlier.
Options proposed by zero233 are indeed very elegant way of doing this kind of things in Scala. At the same time it is important to understand implications of using certain patters in distributed system.
It is not the best idea to use mixin approach / any logic that uses enclosing class state. Whenever you use a state of enclosing class within lambdas Spark will have to serialize outer object. This is not always true but you'd better off writing safer code than one day accidentally blow up the whole cluster.
Being aware of this, I would personally go for explicit argument passing to the methods as this would not result in outer class serialization (method argument approach).
you can use classes and pass the broadcast variable to classes
your psudo code should look like :
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreach(rdd => {
val obj = new AppObject1(broadcastA)
rdd.filter(p => obj.apply(p))
rdd.count
})
}
}
class AppObject1(bc : Broadcast[String]){
val anotherObject = new AnotherObject(bc)
def apply(str: String): Boolean ={
anotherObject.process(str)
}
}
class AnotherObject(bc : Broadcast[String]){
// I want to use broadcast variable in this object
def process(str : String): Boolean = {
val a = bc.value
true
//need to use B inside this method
}
}
I want to write implicit PathBinder for this url /repo/:owner/:name and I my controller should be like this:
case class GitHubRepositoryId(owner: String, name: String)
def get(repoId: GitHubRepositoryId) = {}
Is it possible to write one ? From play docs I cannot find solution for that. Only QueryStringBindable can access multiple variables from URL and construct POJO from those.
Thank in advance
Change your route to GET /repo/*repoId controllers.Controller.get(repoId: GitHubRespositoryId)
Then define the PathBindable so that it manually parses out the / between owner and name. Something like this:
implicit val pathBinder = new PathBindable[GitHubRepositoryId] {
override def bind(key: String, value: String): Either[String, GitHubRepositoryId] = {
val parts = value.split('/')
if (parts.size != 2) {
Left("Not found")
} else {
Right(GitHubRepositoryId(parts(0), parts(1)))
}
}
override def unbind(key: String, repoId: GitHubRepositoryId): String = {
s"${repoId.owner}/${repoId.name}"
}
}
I'm new to the concept of using the Option type but I've tried to use it multiple places in this class to avoid these errors.
The following class is used to store data.
class InTags(val tag35: Option[String], val tag11: Option[String], val tag_109: Option[String], val tag_58: Option[String])
This following code takes a string and converts it into a Int -> String map by seperating on an equals sign.
val message= FIXMessage("8=FIX.4.29=25435=D49=REDACTED56=REDACTED115=REDACTED::::::::::CENTRAL34=296952=20151112-17:11:1111=Order7203109=CENTRAL1=TestAccount63=021=155=CSCO48=CSCO.O22=5207=OQ54=160=20151112-17:11:1338=5000040=244=2815=USD59=047=A13201=CSCO.O13202=510=127
")
val tag58 = message.fields(Some(58)).getOrElse("???")
val in_messages= new InTags(message.fields(Some(35)), message.fields(Some(11)), message.fields(Some(109)), Some(tag58))
println(in_messages.tag_109.getOrElse("???"))
where the FIXMessage object is defined as follows:
class FIXMessage (flds: Map[Option[Int], Option[String]]) {
val fields = flds
def this(fixString: String) = this(FIXMessage.parseFixString(Some(fixString)))
override def toString: String = {
fields.toString
}
}
object FIXMessage{
def apply(flds: Map[Option[Int], Option[String]]) = {
new FIXMessage(flds)
}
def apply(flds: String) = {
new FIXMessage(flds)
}
def parseFixString(fixString: Option[String]): Map[Option[Int], Option[String]] = {
val str = fixString.getOrElse("str=???")
val parts = str.split(1.toChar)
(for {
part <- parts
p = part.split('=')
} yield Some(p(0).toInt) -> Some(p(1))).toMap
}
}
The error I'm getting is ERROR key not found: Some(58) but doesnt the option class handle this? Which basically means that the string passed into the FIXMessage object doesnt contain a substring of the format 58=something(which is true) What is the best way to proceed?
You are using the apply method in Map, which returns the value or throw NoSuchElementException if key is not present.
Instead you could use getOrElse like
message.fields.getOrElse(Some(58), Some("str"))
I'm trying to write a custom SPickler / Unpickler pair to work around some the current limitations of scala-pickling.
The data type I'm trying to pickle is a case class, where some of the fields already have their own SPickler and Unpickler instances.
I'd like to use these instances in my custom pickler, but I don't know how.
Here's an example of what I mean:
// Here's a class for which I want a custom SPickler / Unpickler.
// One of its fields can already be pickled, so I'd like to reuse that logic.
case class MyClass[A: SPickler: Unpickler: FastTypeTag](myString: String, a: A)
// Here's my custom pickler.
class MyClassPickler[A: SPickler: Unpickler: FastTypeTag](
implicit val format: PickleFormat) extends SPickler[MyClass[A]] with Unpickler[MyClass[A]] {
override def pickle(
picklee: MyClass[A],
builder: PBuilder) {
builder.beginEntry(picklee)
// Here we save `myString` in some custom way.
builder.putField(
"mySpecialPickler",
b => b.hintTag(FastTypeTag.ScalaString).beginEntry(
picklee.myString).endEntry())
// Now we need to save `a`, which has an implicit SPickler.
// But how do we use it?
builder.endEntry()
}
override def unpickle(
tag: => FastTypeTag[_],
reader: PReader): MyClass[A] = {
reader.beginEntry()
// First we read the string.
val myString = reader.readField("mySpecialPickler").unpickle[String]
// Now we need to read `a`, which has an implicit Unpickler.
// But how do we use it?
val a: A = ???
reader.endEntry()
MyClass(myString, a)
}
}
I would really appreciate a working example.
Thanks!
Here is a working example:
case class MyClass[A](myString: String, a: A)
Note that the type parameter of MyClass does not need context bounds. Only the custom pickler class needs the corresponding implicits:
class MyClassPickler[A](implicit val format: PickleFormat, aTypeTag: FastTypeTag[A],
aPickler: SPickler[A], aUnpickler: Unpickler[A])
extends SPickler[MyClass[A]] with Unpickler[MyClass[A]] {
private val stringUnpickler = implicitly[Unpickler[String]]
override def pickle(picklee: MyClass[A], builder: PBuilder) = {
builder.beginEntry(picklee)
builder.putField("myString",
b => b.hintTag(FastTypeTag.ScalaString).beginEntry(picklee.myString).endEntry()
)
builder.putField("a",
b => {
b.hintTag(aTypeTag)
aPickler.pickle(picklee.a, b)
}
)
builder.endEntry()
}
override def unpickle(tag: => FastTypeTag[_], reader: PReader): MyClass[A] = {
reader.hintTag(FastTypeTag.ScalaString)
val tag = reader.beginEntry()
val myStringUnpickled = stringUnpickler.unpickle(tag, reader).asInstanceOf[String]
reader.endEntry()
reader.hintTag(aTypeTag)
val aTag = reader.beginEntry()
val aUnpickled = aUnpickler.unpickle(aTag, reader).asInstanceOf[A]
reader.endEntry()
MyClass(myStringUnpickled, aUnpickled)
}
}
In addition to the custom pickler class, we also need an implicit def which returns a pickler instance specialized for concrete type arguments:
implicit def myClassPickler[A: SPickler: Unpickler: FastTypeTag](implicit pf: PickleFormat) =
new MyClassPickler
ZeroC Ice for Java translates every Slice interface Simple into (among other things) a proxy interface SimplePrx and a proxy SimplePrxHelper. If I have an ObjectPrx (the base interface for all proxies), I can check whether it actually has interface Simple by using a static method on SimplePrxHelper:
val obj : Ice.ObjectPrx = ...; // Get a proxy from somewhere...
val simple : SimplePrx = SimplePrxHelper.checkedCast(obj);
if (simple != null)
// Object supports the Simple interface...
else
// Object is not of type Simple...
I wanted to write a method castTo so that I could replace the second line with
val simple = castTo[SimplePrx](obj)
or
val simple = castTo[SimplePrxHelper](obj)
So far as I can see, Scala's type system is not expressive enough to allow me to define castTo. Is this correct?
Should be able to do something with implicits, along these lines:
object Casting {
trait Caster[A] {
def checkedCast(obj: ObjectPrx): Option[A]
}
def castTo[A](obj: ObjectPrx)(implicit caster: Caster[A]) =
caster.checkedCast(obj)
implicit object SimplePrxCaster extends Caster[SimplePrx] {
def checkedCast(obj: ObjectPrx) = Option(SimplePrxHelper.checkedCast(obj))
}
}
Then you just bring things into scope where you want to use them:
package my.package
import Casting._
...
def whatever(prx: ObjectPrx) {
castTo[SimplePrx](prx) foreach (_.somethingSimple())
}
...
You can get something like what you want with structural types:
def castTo[A](helper: { def checkedCast(o: Object): A })(o: Object) = {
helper.checkedCast(o)
}
class FooPrx { }
object FooPrxHelper {
def checkedCast(o: Object): FooPrx = o match {
case fp : FooPrx => fp
case _ => null
}
}
scala> val o: Object = new FooPrx
o: java.lang.Object = FooPrx#da8742
scala> val fp = castTo(FooPrxHelper)(o)
fp: FooPrx = FooPrx#da8742