Is this instance created only once in my example? - scala

I have the following code. In the other class, I tried to create the S3ClientClass2 object as val s3 = new S3ClientClass2(). After creating the s3, then calling the readFromS3 method for every single request.
In this scenario, I am wondering that the amazonS3Client is created only once or created many times for every request. I think that is is created only once.
Is this right?
class S3ClientClass2 {
lazy val amazonS3Client = this.getS3Client()
private def getS3Client() = {
AmazonS3ClientBuilder
.standard()
.withRegion(Regions.AP_NORTHEAST_1)
.build()
}
def readFromS3(s3Bucket: String, filepath: String): String = {
var s3object: S3Object = null
try {
s3object = amazonS3Client.getObject(s3Bucket, filepath)
readFromS3(s3object)
}
finally {
if (s3object != null) {
s3object.close()
}
}
}
def readFromS3(obj: S3Object): String = {
val reader = new BufferedReader(new InputStreamReader(obj.getObjectContent))
reader.lines().collect(Collectors.joining())
}
}

yes, lazy val is initialised only once when it is first used. That means, the first time you use amazonS3Client the getS3Client method will be called, every subsequent usage of amazonS3Client will use the cached value.
Some other hints. You are mixing in Java stuff in readFromS3(obj: S3Object) method for no good reason, it could be easily rewritten to pure Scala:
def readFromS3(obj: S3Object): String = {
scala.io.Source.fromInputStream(obj.getObjectContent).mkString
}
Regarding readFromS3(s3Bucket: String, filepath: String), you should never used null in scala, if you are working with something that might or might not have a value see Option, for things that might crash with some error see scala.util.Either and scala.util.Try. Also, what is the expected behaviour of this function when exception is thrown in the try block? In the current design it will rethrow it and escalate up your call stack.

Related

How to declare static global values and define them later in Scala?

Primary goal
I want to use some static vals in a class so that I don't have to pass them as function parameters.
My approach
Since I want them to be static, I am declaring them in the companion object. But I cannot assign them values when I declare them, for some reasons. So I am following the below approach.
case class DemoParams(name: String)
class Demo {
def foo = {
println("Demo params name is: ", Demo.demoParams.name) // Works fine
anotherFoo(Demo.demoParams.name) // Throws NPE !
}
def anotherFoo(someName: String) = {
// some code
}
}
object Demo {
var demoParams: DemoParams = _ // Declare here
def apply() = new Demo()
def run = {
demoParams = DemoParams(name = "Salmon") // Define here
val demoObj = Demo()
demoObj.foo
}
def main() = {
run
}
}
Demo.main()
I am able to print Demo.demoParams but surprisingly, this throws a NullPointerException when I pass Demo.demoParams to another function, while running the Spark app on a cluster.
Questions
Firstly, is this the right way of declaring static values and defining them later? I would prefer to not use vars and use immutable vals. Is there a better alternative?
Second, could you think of any reason I would be getting a NPE while passing Demo.demoParams.name to another function?
Your code works fine and doesn't throw anything (after fixing a few compile errors).
But ... Don't do this, it's ... yucky :/
How about passing params to the class as ... well ... params instead?
case class Demo(params: DemoParams) {
def foo() = {
println("Demo params name is: " + params.name)
}
}
object Demo {
def run() = {
val demoParams = DemoParams(name = "Salmon")
val demoObj = Demo(demoParams)
demoObj.foo()
}
}
Not sure this is the best alternative, but consider using a trait, which still keeps you in the FP zone by avoiding the use of var:
case class DemoParams(name: String)
trait Demo {
val demoParams: DemoParams
}
Then just define it where you need it, and it's ready for use:
object MainApp extends App {
val demoObj = new Demo {
override val demoParams: DemoParams = DemoParams(name = "Salmon")
}
println("Demo params name is: ", demoObj.demoParams.name) // (Demo params name is: ,Salmon)
anotherFoo(demoObj.demoParams.name) // Salmon
def anotherFoo(name: String): Unit = println(name)
}
About the second question, without the actual code one can only guess (this sample code does not throw NPE). Probably somewhere you are using it without defining it previously, because var demoParams: DemoParams = _ just initializes demoParams to the default value of the reference type DemoParams, which is null in this case, and you get NPE when you try to access the name value of a null object. This is why using var is discouraged.

RichSinkFunction for Cassandra in Flink

I read the advantages of using RichSinkFunction over directly calling the DB methods. Therefore, I decided to write my own RichSinkFunction.
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.functions.sink.{RichSinkFunction, SinkFunction}
import com.datastax.driver.core.{Session, Cluster}
class CassandraAsSink extends RichSinkFunction {
override def open(parameters: Configuration): Unit = {
val cluster = Cluster.builder().addContactPoint("localhost").build()//
val session = cluster.connect("example")
}
override def invoke(value: Nothing, context: SinkFunction.Context): Unit = {
session.execute(
s"""
INSERT INTO users (name, credits, user_id)
VALUES ($name, $credits, $userId)
"""
)
}
override def close(): Unit = {
//something like session.close()
}
}
However, I am not able to develop it fully. I want to call this method under a separate class which should pass 3 arguments that I want to enter mentioned in the code. The record is in JSON format. I can manage that by parsing and getting the attributes. But how do I pass it to the invoke method and how can I pass the session object throughout the class. Also, is it a correct way of doing it since I am new to both Flink and Scala?
Will stream/string.new CassandraAsSink().invoke(name,credits,user_id) work when it comes to the calling part?
Modified:
class CassandraSink extends RichSinkFunction[String] {
var cluster: Cluster = _
var session: Session = _
println("inside....")
override def open(parameters: Configuration): Unit = {
cluster = Cluster.builder().addContactPoint("localhost").build() //
session = cluster.connect("example")
println("Connected....")
}
override def invoke(value: String): Unit = {
println("inside invoke: " + value)
session.execute(
s"""
INSERT INTO jsondata1(records_b)
VALUES ($value)
"""
)
}
override def close(): Unit = {
session.close()
println("Session Closed...")
//something like session.close()
}
}
Calling part:
val datastreamFromString:DataStream[String]=env.fromElements(data) // where data is string
datastreamFromString.addSink(new CassandraAsSink())
I figured out that there is some problem with my DataStream created from String. The class is working fine. I have initialized the env variable as the second line in the class.
Flink already has a Cassandra sink; it has valuable features you haven't attempted to support, especially checkpointing.
As for your questions:
You can make session a member variable that can be initialized in open and used in invoke.
Flink will call the invoke method for every stream record coming into the sink. This record passed to invoke as the value parameter. You'll need to extract the fields like name, etc from that value.
You'll need to attach the sink to your job graph; overall it will end up being something like this:
val env = StreamExecutionEnvironment.getExecutionEnvironment
env
.addSource(source)
... // some processing
.addSink(new CassandraAsSink())
env.execute()
By the way, there are training lessons with examples and exercises included in the Flink documentation to help you get started.

Trying to understand Scala enumerator/iteratees

I am new to Scala and Play!, but have a reasonable amount of experience of building webapps with Django and Python and of programming in general.
I've been doing an exercise of my own to try to improve my understanding - simply pull some records from a database and output them as a JSON array. I'm trying to use the Enumarator/Iteratee functionality to do this.
My code follows:
TestObjectController.scala:
def index = Action {
db.withConnection { conn=>
val stmt = conn.createStatement()
val result = stmt.executeQuery("select * from datatable")
logger.debug(result.toString)
val resultEnum:Enumerator[TestDataObject] = Enumerator.generateM {
logger.debug("called enumerator")
result.next() match {
case true =>
val obj = TestDataObject(result.getString("name"), result.getString("object_type"),
result.getString("quantity").toInt, result.getString("cost").toFloat)
logger.info(obj.toJsonString)
Future(Some(obj))
case false =>
logger.warn("reached end of iteration")
stmt.close()
null
}
}
val consume:Iteratee[TestDataObject,Seq[TestDataObject]] = {
Iteratee.fold[TestDataObject,Seq[TestDataObject]](Seq.empty[TestDataObject]) { (result,chunk) => result :+ chunk }
}
val newIteree = Iteratee.flatten(resultEnum(consume))
val eventuallyResult:Future[Seq[TestDataObject]] = newIteree.run
eventuallyResult.onSuccess { case x=> println(x)}
Ok("")
}
}
TestDataObject.scala:
package models
case class TestDataObject (name: String, objtype: String, quantity: Int, cost: Float){
def toJsonString: String = {
val mapper = new ObjectMapper()
mapper.registerModule(DefaultScalaModule)
mapper.writeValueAsString(this)
}
}
I have two main questions:
How do i signal that the input is complete from the Enumerator callback? The documentation says "this method takes a callback function e: => Future[Option[E]] that will be called each time the iteratee this Enumerator is applied to is ready to take some input." but I am unable to pass any kind of EOF that I've found because it;s the wrong type. Wrapping it in a Future does not help, but instinctively I am not sure that's the right approach.
How do I get the final result out of the Future to return from the controller view? My understanding is that I would effectively need to pause the main thread to wait for the subthreads to complete, but the only examples I've seen and only things i've found in the future class is the onSuccess callback - but how can I then return that from the view? Does Iteratee.run block until all input has been consumed?
A couple of sub-questions as well, to help my understanding:
Why do I need to wrap my object in Some() when it's already in a Future? What exactly does Some() represent?
When I run the code for the first time, I get a single record logged from logger.info and then it reports "reached end of iteration". Subsequent runs in the same session call nothing. I am closing the statement though, so why do I get no results the second time around? I was expecting it to loop indefinitely as I don't know how to signal the correct termination for the loop.
Thanks a lot in advance for any answers, I thought I was getting the hang of this but obviously not yet!
How do i signal that the input is complete from the Enumerator callback?
You return a Future(None).
How do I get the final result out of the Future to return from the controller view?
You can use Action.async (doc):
def index = Action.async {
db.withConnection { conn=>
...
val eventuallyResult:Future[Seq[TestDataObject]] = newIteree.run
eventuallyResult map { data =>
OK(...)
}
}
}
Why do I need to wrap my object in Some() when it's already in a Future? What exactly does Some() represent?
The Future represents the (potentially asynchronous) processing to obtain the next element. The Option represents the availability of the next element: Some(x) if another element is available, None if the enumeration is completed.

Pass values between 2 methods of same Object in Scala

I wish to pass the value of var/val from one method to another.
eg, I have
object abc {
def onStart = {
val startTime = new java.sql.Timestamp( new Date())
}
def onEnd = {
//use startTime here
}
}
calling:
onStart()
executeReports(reportName, sqlContexts)
onEnd()
Here onStart() and onEnd() are job monitoring functions for executeReports().
executeReports() runs in a loop for 5 reports.
I have tried using global variables like
object abc{
var startTime : java.sql.Timestamp = _
def onStart = {
startTime = new java.sql.Timestamp( new Date())
}
def onEnd = {
//use startTime here
}
}
but the catch with this is when the loop executes for the next report, the startTime does not change.
I also tried using Singleton Class that did not work for me either.
My requirement is to have a startTime for every iteration i.e, for every report.
Any ideas are welcome here. I'll be happy to provide more clarification on my requirement if needed.
The common Scala solution to this is to write a function that wraps other functions and performs the setup and shutdown internally.
def timeit[T]( fun: => T ): T = {
val start = System.currentTimeMillis //Do your start stuff
val res = fun
println (s"Time ${System.currentTimeMillis - start}") // Do your end stuff
res
}
RussS has the better solution, but if for some reason you're wedded to the design you've described, you might try using a mutable val, i.e. a mutable collection.
I got this to compile and pass some small tests.
object abc {
private val q = collection.mutable.Queue[java.sql.Timestamp]()
def onStart = {
q.enqueue(new java.sql.Timestamp(java.util.Calendar.getInstance().getTime.getTime))
}
def onEnd = {
val startTime = q.dequeue
}
}
Base from your requirements, it might be better to do it this way.
case class Job(report: List<Report>) {
def execute // does the looping on Report by calling start and call end to generate monitoring data
private def start // iterate over each Report and calls it's execute method
private def end // iterate over each Report and uses startTime and executionTime to generate monitoring data.
}
abstract class Report {
var startTime: DateTime //Time started for the report
def doReport // unimplemented method that does the report generation.
def execute // first set stateTime to Now then call doReport, lastly calculate executionTime
}
The subtype of the Report should implement the doReport which does actual reporting.
You can also change the Job.execute method to accept
report: List<Report>
so that you can have a singleton Job (For sure, start and end will be the same for all Job you have.)

WebSocket Action call Ignored in runtime

Hye Geeks. I am coding for a live notification module in my project. I am tyring to call WebSocket Action method from a function to pass the notification data over the connection to the client.
Here's my code..
def liveNotification(data: String) = WebSocket.using[JsValue] { implicit request =>
val iteratee = Iteratee.ignore[JsValue]
val enumerator = Enumerator[JsValue](Json.toJson(data))
(iteratee,enumerator)
}
def createNotification(notificationTo: BigInteger, notiParams:Tuple5[String,String,BigInteger,BigInteger,BigInteger]) = {
val retData = NotificationModel.createNotification(notificationTo,notiParams)
val strData = write(retData)
liveNotification(strData)
}
Problem is that the 'liveNotification()' call is simply ignored. Please help me with any suggestions that what i am doing wrong ?
Be sure to invoke it with a Json value, at least an empty object. The parser will only match against something that it recognizes as Json.