How to get all config parameters from a .scala file ? - scala

I want to call all the parameters from BeamConfig.scala in another scala class. The parameters stored in BeamConfig.scala are like below:
case class WarmStart(
enabled: scala.Boolean,
path: java.lang.String
)
object WarmStart {
def apply(c: com.typesafe.config.Config): BeamConfig.Beam.WarmStart = {
BeamConfig.Beam.WarmStart(
enabled = c.hasPathOrNull("enabled") && c.getBoolean("enabled"),
path = if (c.hasPathOrNull("path")) c.getString("path") else "output"
)
}
}
So There are more than 100 parameters object like above object in BeamConfig.scala. If I want to get the parameter from this file than I will do like this:
beam.warmStart.enable
beam.warmStart.path
Where beam is the root class.So is there any way so that i can call all the parameters in a bulk or I can store all the object in some Map or something else.
Thanks

there's a couple different ways you could do this:
Using Typesafe Config in a somewhat unsafe-ish manner:
https://github.com/lightbend/config#api-example
This would give you map-like access but it can very easily explode if the names are wrong, types don't line up etc.
Using PureConfig (a wrapper around typesafe config which allows automatic derivation of case class based config decoders, kinda like circe for json)
https://pureconfig.github.io/docs/
So you'd have to write you large caseclass with 100 fields once but you have a safe decoding of config into that case class and after that you have normal named properties with their correct types.
(Note that this will lose you invariance under rename refactor)

Firstly, I would separate the code that reads the config from the code that processes the results. In this case the default value "output" is embedded in the code that reads the config when it should probably be done in a separate pass.
Secondly, I would use a package to automatically populate a case class from a config entry. You then need one line per config object, and you get the results checked for you. E.g.
object BeamConfig {
val warmStart = Config[WarmStart]("warmStart")
val config2 = Config[Config2]("config2")
...
}
If you need some processing you can do this
val warmStart = ProcessWarmStart(Config[WarmStart]("warmStart"))
This approach still requires a bit of boiler plate code, but it has better type safety than a bulk import of the config.
I would also consider combining the objects into fewer, nested objects with matching nested case classes.
Here is a cut-down version of Config using json4s and jackson:
import com.typesafe.config._
import org.json4s._
import org.json4s.jackson.JsonMethods._
object Config {
private val cfgFile = "configFile"
private val conf = ConfigFactory.load(cfgFile).withFallback(ConfigFactory.load())
private val jData = parse(conf.root.render(ConfigRenderOptions.concise))
def apply[T](name: String)(implicit formats: Formats = DefaultFormats, mf: Manifest[T]): T =
Extraction.extract(jData \\ name)(formats, mf)
}
This will throw an exception if the particular config object does not exist or does not match the format of class T.

Related

How to iterate over result of Future List in Scala?

I am new to Scala and was trying my hands on with akka. I am trying to access data from MongoDB in Scala and want to convert it into JSON and XML format.
This code attached below is using path /getJson and calling getJson() function to get data in a form of future.
get {
concat(
path("getJson"){
val f = Patterns.ask(actor1,getJson(),10.seconds)
val res = Await.result(f,10.seconds)
val result = res.toString
complete(res.toString)
}
}
The getJson() method is as follows:
def getJson()= {
val future = collection.find().toFuture()
future
}
I have a Greeting Case class in file Greeting.scala:
case class Greeting(msg:String,name:String)
And MyJsonProtocol.scala file for Marshelling of scala object to JSON format as follows:
trait MyJsonProtocol extends SprayJsonSupport with DefaultJsonProtocol {
implicit val templateFormat = jsonFormat2(Greeting)
}
I am getting output of complete(res.toString) in Postman as :
Future(Success(List(
Iterable(
(_id,BsonObjectId{value=5fc73944986ced2b9c2527c4}),
(msg,BsonString{value='Hiiiiii'}),
(name,BsonString{value='Ruchirrrr'})
),
Iterable(
(_id,BsonObjectId{value=5fc73c35050ec6430ec4b211}),
(msg,BsonString{value='Holaaa Amigo'}),
(name,BsonString{value='Pablo'})),
Iterable(
(_id,BsonObjectId{value=5fc8c224e529b228916da59d}),
(msg,BsonString{value='Demo'}),
(name,BsonString{value='RuchirD'}))
)))
Can someone please tell me how to iterate over this output and to display it in JSON format?
When working with Scala, its very important to know your way around types. First step toweards this is at least knowing the types of your variables and values.
If you look at this method,
def getJson() = {
val future = collection.find().toFuture()
future
}
Is lacks the type type information at all levels, which is a really bad practice.
I am assuming that you are using mongo-scala-driver. And your collection is actually a MongoCollection[Document].
Which means that the output of collection.find() should be a FindOberservable[Document], hence collection.find().toFuture() should be a Future[Seq[Document]]. So, your getJson method should be written as,
def getJson(): Future[Seq[Document]] =
collection.find().toFuture()
Now, this means that you are passing a Future[Seq[Document]] to your actor1, which is again a bad practice. You should never send any kind of Future values among actors. It looks like your actor1 does nothing but sends the same message back. Why does this actor1 even required when it does nothing ?
Which means your f is a Future[Future[Seq[Document]]]. Then you are using Await.result to get the result of this future f. Which is again an anti-pattern, since Await blocks your thread.
Now, your res is a Future[Seq[Document]]. And you are converting it to a String and sending that string back with complete.
Your JsonProtocol is not working because you are not even passing it any Greeting's.
You have to do the following,
Read raw Bson objects from mongo.
convert raw Bson objects to your Gretting objects.
comlete your result with these Gretting objects. The JsonProtocol should take case of converting these Greeting objects to Json.
The easist way to do all this is by using the mongo driver's CodecRegistreis.
case class Greeting(msg:String, name:String)
Now, your MongoDAL object will look like following (it might be missing some imports, fill any missing imports as you did in your own code).
import org.mongodb.scala.bson.codecs.Macros
import org.mongodb.scala.bson.codecs.DEFAULT_CODEC_REGISTRY
import org.bson.codecs.configuration.CodecRegistries
import org.mongodb.scala.{MongoClient, MongoCollection, MongoDatabase}
object MongoDAL {
val greetingCodecProvider = Macros.createCodecProvider[Greeting]()
val codecRegistry = CodecRegistries.fromRegistries(
CodecRegistries.fromProviders(greetingCodecProvider),
DEFAULT_CODEC_REGISTRY
)
val mongoClient: MongoClient = ... // however you are connecting to mongo and creating a mongo client
val mongoDatabase: MongoDatabase =
mongoClient
.getDatabase("database_name")
.withCodecRegistry(codecRegistry)
val greetingCollection: MongoCollection[Greeting] =
mongoDatabase.getCollection[Greeting]("greeting_collection_name")
def fetchAllGreetings(): Future[Seq[Greeting]] =
greetingCollection.find().toFuture()
}
Now, your route can be defined as
get {
concat(
path("getJson") {
val greetingSeqFuture: Future[Seq[Greeting]] = MongoDAL.fetchAllGreetings()
// I don't see any need for that actor thing,
// but if you really need to do that, then you can
// do that by using flatMap to chain future computations.
val actorResponseFuture: Future[Seq[Greeting]] =
greetingSeqFuture
.flatMap(greetingSeq => Patterns.ask(actor1, greetingSeq, 10.seconds))
// complete can handle futures just fine
// it will wait for futre completion
// then convert the seq of Greetings to Json using your JsonProtocol
complete(actorResponseFuture)
}
}
First of all, don't call toString in complete(res.toString).
As it said in AkkaHTTP json support guide if you set everything right, your case class will be converted to json automatically.
But as I see in the output, your res is not an object of a Greeting type. Looks like it is somehow related to the Greeting and has the same structure. Seems to be a raw output of the MongoDB request. If it is a correct assumption, you should convert the raw output from MongoDB to your Greeting case class.
I guess it could be done in getJson() after collection.find().

How do I test code that requires an Environment Variable?

I have some code that requires an Environment Variable to run correctly. But when I run my unit tests, it bombs out once it reaches that point unless I specifically export the variable in the terminal. I am using Scala and sbt. My code does something like this:
class something() {
val envVar = sys.env("ENVIRONMENT_VARIABLE")
println(envVar)
}
How can I mock this in my unit tests so that whenever sys.env("ENVIRONMENT_VARIABLE") is called, it returns a string or something like that?
If you can't wrap existing code, you can change UnmodifiableMap System.getenv() for tests.
def setEnv(key: String, value: String) = {
val field = System.getenv().getClass.getDeclaredField("m")
field.setAccessible(true)
val map = field.get(System.getenv()).asInstanceOf[java.util.Map[java.lang.String, java.lang.String]]
map.put(key, value)
}
setEnv("ENVIRONMENT_VARIABLE", "TEST_VALUE1")
If you need to test console output, you may use separate PrintStream.
You can also implement your own PrintStream.
val baos = new java.io.ByteArrayOutputStream
val ps = new java.io.PrintStream(baos)
Console.withOut(ps)(
// your test code
println(sys.env("ENVIRONMENT_VARIABLE"))
)
// Get output and verify
val output: String = baos.toString(StandardCharsets.UTF_8.toString)
println("Test Output: [%s]".format(output))
assert(output.contains("TEST_VALUE1"))
Ideally, environment access should be rewritten to retrieve the data in a safe manner. Either with a default value ...
scala> scala.util.Properties.envOrElse("SESSION", "unknown")
res70: String = Lubuntu
scala> scala.util.Properties.envOrElse("SECTION", "unknown")
res71: String = unknown
... or as an option ...
scala> scala.util.Properties.envOrNone("SESSION")
res72: Option[String] = Some(Lubuntu)
scala> scala.util.Properties.envOrNone("SECTION")
res73: Option[String] = None
... or both [see envOrSome()].
I don't know of any way to make it look like any/all random env vars are set without actually setting them before running your tests.
You shouldn't test it in unit-test.
Just extract it out
class F(val param: String) {
...
}
In your prod code you do
new Foo(sys.env("ENVIRONMENT_VARIABLE"))
I would encapsulate the configuration in a contraption which does not expose the implementation, maybe a class ConfigValue
I would put the implementation in a class ConfigValueInEnvVar extends ConfigValue
This allows me to test the code that relies on the ConfigValue without having to set or clear environment variables.
It also allows me to test the base implementation of storing a value in an environment variable as a separate feature.
It also allows me to store the configuration in a database, a file or anything else, without changing my business logic.
I select implementation in the application layer.
I put the environment variable logic in a supporting domain.
I put the business logic and the traits/interfaces in the core domain.

When exactly a Spark task can be serialized?

I read some related questions about this topic, but still cannot understand the following. I have this simple Spark application which reads some JSON records from a file:
object Main {
// implicit val formats = DefaultFormats // OK: here it works
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local").setAppName("Spark Test App")
val sc = new SparkContext(conf)
val input = sc.textFile("/home/alex/data/person.json")
implicit val formats = DefaultFormats // Exception: Task not serializable
val persons = input.flatMap { line ⇒
// implicit val formats = DefaultFormats // OK: here it also works
try {
val json = parse(line)
Some(json.extract[Person])
} catch {
case e: Exception ⇒ None
}
}
}
}
I suppose the implicit formats is not serializable since it includes some ThreadLocal for the date format. But, why it works when placed as a member of the object Main or inside the closure of flatMap, and not as a common val inside the main function?
Thanks in advance.
If the formats is inside the flatMap, it's only created as part of executing the mapping function. So the mapper can be serialized and sent to the cluster, since it doesn't contain a formats yet. The flipside is that this will create formats anew every time the mapper runs (i.e. once for every row) - you might prefer to use mapPartitions rather than flatMap so that you can have the value created once for each partition.
If formats is outside the flatMap then it's created once on the master machine, and you're attempting to serialize it and send it to the cluster.
I don't understand why formats as a field of Main would work. Maybe objects are magically pseudo-serializable because they're singletons (i.e. their fields aren't actually serialized, rather the fact that this is a reference to the single static Main instance is serialized)? That's just a guess though.
The best way to answer your question I think is in three short answers:
1) Why it works when placed as a member of the object Main?, the question here is that code works because it's inside an Object, not necessary the Main Object. And now: Why? because Spark serializes your whole object and send it to each of the executors, moreover an Object in Scala is generated like a JAVA Static class and the initial values of static fields in a Java class are stored in the jar and workers can use it directly. This is not the same if you use a class instead an Object.
2) The second question is: why it works if it's inside a flatmap?.
When you run transformations on a RDD (filter, flatMap ... etc), your transformation code is: serialized on the driver node, send to worker, once there it will be deserialized and executed. As you can see exactly the same as in 1) the code will be serialized "automatycally".
And finally the 3) question: Why this is not working as a common val inside the main function? this is because the val is not serialized "automatically", but you can test it like this: val yourVal = new yourVal with Serializable

How to write efficient type bounded code if the types are unrelated in Scala

I want to improve the following Cassandra related Scala code. I have two unrelated user defined types which are actually in Java source files (leaving out the details).
public class Blob { .. }
public class Meta { .. }
So here is how I use them currently from Scala:
private val blobMapper: Mapper[Blob] = mappingManager.mapper(classOf[Blob])
private val metaMapper: Mapper[Meta] = mappingManager.mapper(classOf[Meta])
def save(entity: Object) = {
entity match {
case blob: Blob => blobMapper.saveAsync(blob)
case meta: Meta => metaMapper.saveAsync(meta)
case _ => // exception
}
}
While this works, how can you avoid the following problems
repetition when adding new user defined type classes like Blob or Meta
pattern matching repetition when adding new methods like save
having Object as parameter type
You can definitely use Mapper as a typeclass, doing:
def save[A](entity: A)(implicit mapper: Mapper[A]) = mapper.saveAsync(entity)
Now you have a generic method able to perform a save operation on every type A for which a Mapper[A] is in scope.
Also, the mappingManager.mapper implementation could be probably improved to avoid classOf, but it's hard to tell from the question in the current state.
A few questions:
Is mappingManager.mapper(cls) expensive?
How much do you care about handling subclasses of Blob or Meta?
Can something like this work for you?
def save[T: Manifest](entity: T) = {
mappingManager.mapper(manifest[T].runtimeClass).saveAsync(entity)
}
If you do care about making sure that subclasses of Meta grab the proper mapper then you may find isAssignableFrom helpful in your .mapper (and store found sub-classes in a HashMap so you only have to look once).
EDIT: Then maybe you want something like this (ignoring threading concerns):
private[this] val mapperMap = mutable.HashMap[Class[_], Mapper[_]]()
def save[T: Manifest](entity: T) = {
val cls = manifest[T].runtimeClass
mapperMap.getOrElseUpdate(cls, mappingManager.mapper(cls))
.asInstanceOf[Mapper[T]]
.saveAsync(entity)
}

Optional boolean parameters in Scala

I've been lately working on the DSL-style library wrapper over Apache POI functionality and faced a challenge which I can't seem to good solution for.
One of the goals of the library is to provide user with ability to build a spreadsheet model as a collection of immutable objects, i.e.
val headerStyle = CellStyle(fillPattern = CellFill.Solid, fillForegroundColor = Color.AquaMarine, font = Font(bold = true))
val italicStyle = CellStyle(font = Font(italic = true))
with the following assumptions:
User can optionally specify any parameter (that means, that you can create CellStyle with no parameters as well as with the full list of explicitly specified parameters);
If the parameter hasn't been specified explicitly by the user it is considered undefined and the default environment value (default value for the format we're converting to) will be used;
The 2nd point is important, as I want to convert this data model into multiple formats and i.e. the default font in Excel doesn't have to be the same as default font in HTML browser (and if user doesn't define the font family explicitly I'd like him to see the data using those defaults).
To deal with the requirements I've used the variation of the null pattern described here: Pattern for optional-parameters in Scala using null and also suggested here Scala default parameters and null (below a simplified example).
object ModelObject {
def apply(modelParam : String = null) : ModelObject = ModelObject(
modelParam = Option(modelParam)
)
}
case class ModelObject private(modelParam : Option[String])
Since null is used only internally in the companion object and very localized I decided to accept the null-sacrifice for the sake of the simplicity of the solution. The pattern works well with all the reference classes.
However for Scala primitive types wrappers null cannot be specified. This is especially a huge problem with Boolean for which I effectively consider 3 states (true, false and undefined). Wanting to provide the interface, where user still be able to write bold = true I decided to reach to Java wrappers which accept nulls.
object ModelObject {
def apply(boolParam : java.lang.Boolean = null) : ModelObject = ModelObject(
boolParam = Option(boolParam).map(_.booleanValue)
)
}
case class ModelObject private(boolParam : Option[Boolean])
This however doesn't right and I've been wondering whether there is a better approach to the problem. I've been thinking about defining the union types (with additional object denoting undefined value): How to define "type disjunction" (union types)?, however since the undefined state shouldn't be used explicitly the parameter type exposed by IDE to the user, it is going to be very confusing (ideally I'd like it to be Boolean).
Is there any better approach to the problem?
Further information:
More DSL API examples: https://github.com/norbert-radyk/spoiwo/blob/master/examples/com/norbitltd/spoiwo/examples/quickguide/SpoiwoExamples.scala
Sample implementation of the full class: https://github.com/norbert-radyk/spoiwo/blob/master/src/main/scala/com/norbitltd/spoiwo/model/CellStyle.scala
You can use a variation of the pattern I described here: How to provide helper methods to build a Map
To sum it up, you can use some helper generic class to represent optional arguments (much like an Option).
abstract sealed class OptArg[+T] {
def toOption: Option[T]
}
object OptArg{
implicit def autoWrap[T]( value: T ): OptArg[T] = SomeArg(value)
implicit def toOption[T]( arg: OptArg[T] ): Option[T] = arg.toOption
}
case class SomeArg[+T]( value: T ) extends OptArg[T] {
def toOption = Some( value )
}
case object NoArg extends OptArg[Nothing] {
val toOption = None
}
You can simply use it like this:
scala>case class ModelObject(boolParam: OptArg[Boolean] = NoArg)
defined class ModelObject
scala> ModelObject(true)
res12: ModelObject = ModelObject(SomeArg(true))
scala> ModelObject()
res13: ModelObject = ModelObject(NoArg)
However as you can see the OptArg now leaks in the ModelObject class itself (boolParam is typed as OptArg[Boolean] instead of Option[Boolean].
Fixing this (if it is important to you) just requires to define a separate factory as you have done yourself:
scala> :paste
// Entering paste mode (ctrl-D to finish)
case class ModelObject private(boolParam: Option[Boolean])
object ModelObject {
def apply(boolParam: OptArg[Boolean] = NoArg): ModelObject = new ModelObject(
boolParam = boolParam.toOption
)
}
// Exiting paste mode, now interpreting.
defined class ModelObject
defined module ModelObject
scala> ModelObject(true)
res22: ModelObject = ModelObject(Some(true))
scala> ModelObject()
res23: ModelObject = ModelObject(None)
UPDATE The advantage of using this pattern, over simply defining several overloaded apply methods as shown by #drexin is that in the latter case the number of overloads grows very fast with the number of arguments(2^N). If ModelObject had 4 parameters, that would mean 16 overloads to write by hand!