Implicits over function closures in Scala - scala

I've been trying to understand implicits for Scala and trying to use them at work - one particular place im stuck at is trying to pass implicits in the following manner
object DBUtils {
case class DB(val jdbcConnection: Connection) {
def execute[A](op: =>Unit): Any = {
implicit val con = jdbcConnection
op
}
}
object DB {
def SQL(query: String)(implicit jdbcConnection: Connection): PreparedStatement = {
jdbcConnection.prepareStatement(query)
}
}
val someDB1 = DB(jdbcConnection)
val someDB2 = DB(jdbcConnection2)
val someSQL = SQL("SOME SQL HERE")
someDB1.execute{someSQL}
someDB2.execute{someSQL}
Currently i get an execption saying that the SQL() function cannot find the implicit jdbcConnection.What gives and what do i do to make it work in the format i need?
Ps-:Im on a slightly older version of Scala(2.10.4) and cannot upgrade
Edit: Changed the problem statement to be more clear - I cannot use a single implicit connection in scope since i can have multiple DBs with different Connections

At the point where SQL is invoked there is no implicit value of type Connection in scope.
In your code snippet the declaration of jdbcConnection is missing, but if you change it from
val jdbcConnection = //...
to
implicit val jdbcConnection = // ...
then you will have an implicit instance of Connection in scope and the compiler should be happy.

Try this:
implicit val con = jdbcConnection // define implicit here
val someDB = DB(jdbcConnection)
val someSQL = SQL("SOME SQL HERE") // need implicit here
someDB.execute{someSQL}
The implicit must be defined in the scope where you need it. (In reality, it's more complicated, because there are rules for looking elsewhere, as you can find in the documentation. But the simplest thing is to make sure the implicit is available in the scope where you need it.)

Make the following changes
1) execute method take a function from Connection to Unit
2) Instead of this val someDB1 = DB(jdbcConnection) use this someDB1.execute{implicit con => someSQL}
object DBUtils {
case class DB(val jdbcConnection: Connection) {
def execute[A](op: Connection =>Unit): Any = {
val con = jdbcConnection
op(con)
}
}
Here is the complete code.
object DB {
def SQL(query: String)(implicit jdbcConnection: Connection): PreparedStatement = {
jdbcConnection.prepareStatement(query)
}
}
val someDB1 = DB(jdbcConnection)
val someDB2 = DB(jdbcConnection2)
val someSQL = SQL("SOME SQL HERE")
someDB1.execute{implicit con => someSQL}
someDB2.execute{implicit con => someSQL}

Related

Scala: Invokation of methods with/without () with overridable implicits

Here is a definition of method, that uses ExecutionContext implicitly, and allows client to override it. Two execution contexts are used to test it:
val defaultEc = ExecutionContext.fromExecutor(
Executors.newFixedThreadPool(5))
Names of threads look like: 'pool-1-thread-1' to 'pool-1-thread-5'
And the 2nd one from Scala:
scala.concurrent.ExecutionContext.Implicits.global
Names of threads look like: 'scala-execution-context-global-11'
Client can override default implicit via:
implicit val newEc = scala.concurrent.ExecutionContext.Implicits.global
Unfortunately it is overridable only, when a method with implicit is invoked without ():
val r = FutureClient.f("testDefault") //prints scala-execution-context-global-11
not working:
val r = FutureClient.f("testDefault")() //still prints: pool-1-thread-1
The question is WHY it works this way? Cause it makes it much more complicated for clients of API
Here is a full code to run it and play:
object FutureClient {
//thread names will be from 'pool-1-thread-1' to 'pool-1-thread-5'
val defaultEc = ExecutionContext.fromExecutor(
Executors.newFixedThreadPool(5))
def f(beans: String)
(implicit executor:ExecutionContext = defaultEc)
: Future[String] = Future {
println("thread: " + Thread.currentThread().getName)
TimeUnit.SECONDS.sleep(Random.nextInt(3))
s"$beans"
}
}
class FutureTest {
//prints thread: pool-1-thread-1
#Test def testFDefault(): Unit ={
val r = FutureClient.f("testDefault")
while (!r.isCompleted) {
TimeUnit.SECONDS.sleep(2)
}
}
//thread: scala-execution-context-global-11
#Test def testFOverridable(): Unit ={
implicit val newEc = scala.concurrent.ExecutionContext.Implicits.global
val r = FutureClient.f("testDefault")
while (!r.isCompleted) {
TimeUnit.SECONDS.sleep(2)
}
}
//prints pool-1-thread-1, but not 'scala-execution-context-global-11'
//cause the client invokes f with () at the end
#Test def testFOverridableWrong(): Unit ={
implicit val newEc = scala.concurrent.ExecutionContext.Implicits.global
val r = FutureClient.f("testDefault")()
while (!r.isCompleted) {
TimeUnit.SECONDS.sleep(2)
}
}
}
I have already discussed a couple of related topics, but they are related to API definition, so it is a new issue, not covered by those topics.
Scala Patterns To Avoid: Implicit Arguments With Default Values
f("testDefault") (or f("testDefault")(implicitly)) means that implicit argument is taken from implicit context.
f("testDefault")(newEc) means that you specify implicit argument explicitly. If you write f("testDefault")() this means that you specify implicit argument explicitly but since the value isn't provided it should be taken from default value.

Access Spark broadcast variable in different classes

I am broadcasting a value in Spark Streaming application . But I am not sure how to access that variable in a different class than the class where it was broadcasted.
My code looks as follows:
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreachRDD(rdd => {
val obj = AppObject1
rdd.filter(p => obj.apply(p))
rdd.count
}
}
object AppObject1: Boolean{
def apply(str: String){
AnotherObject.process(str)
}
}
object AnotherObject{
// I want to use broadcast variable in this object
val B = broadcastA.Value // compilation error here
def process(): Boolean{
//need to use B inside this method
}
}
Can anyone suggest how to access broadcast variable in this case?
There is nothing particularly Spark specific here ignoring possible serialization issues. If you want to use some object it has to be available in the current scope and you can achieve this the same way as usual:
you can define your helpers in a scope where broadcast is already defined:
{
...
val x = sc.broadcast(1)
object Foo {
def foo = x.value
}
...
}
you can use it as a constructor argument:
case class Foo(x: org.apache.spark.broadcast.Broadcast[Int]) {
def foo = x.value
}
...
Foo(sc.broadcast(1)).foo
method argument
case class Foo() {
def foo(x: org.apache.spark.broadcast.Broadcast[Int]) = x.value
}
...
Foo().foo(sc.broadcast(1))
or even mixed-in your helpers like this:
trait Foo {
val x: org.apache.spark.broadcast.Broadcast[Int]
def foo = x.value
}
object Main extends Foo {
val sc = new SparkContext("local", "test", new SparkConf())
val x = sc.broadcast(1)
def main(args: Array[String]) {
sc.parallelize(Seq(None)).map(_ => foo).first
sc.stop
}
}
Just a short take on performance considerations that were introduced earlier.
Options proposed by zero233 are indeed very elegant way of doing this kind of things in Scala. At the same time it is important to understand implications of using certain patters in distributed system.
It is not the best idea to use mixin approach / any logic that uses enclosing class state. Whenever you use a state of enclosing class within lambdas Spark will have to serialize outer object. This is not always true but you'd better off writing safer code than one day accidentally blow up the whole cluster.
Being aware of this, I would personally go for explicit argument passing to the methods as this would not result in outer class serialization (method argument approach).
you can use classes and pass the broadcast variable to classes
your psudo code should look like :
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreach(rdd => {
val obj = new AppObject1(broadcastA)
rdd.filter(p => obj.apply(p))
rdd.count
})
}
}
class AppObject1(bc : Broadcast[String]){
val anotherObject = new AnotherObject(bc)
def apply(str: String): Boolean ={
anotherObject.process(str)
}
}
class AnotherObject(bc : Broadcast[String]){
// I want to use broadcast variable in this object
def process(str : String): Boolean = {
val a = bc.value
true
//need to use B inside this method
}
}

Always execute code at beginning and end of method

I'm connecting to MongoDb using following code :
def insert() = {
val mc = new com.mongodb.MongoClient("localhost", 27017);
val db = mc.getDatabase("MyDb");
//My insert code
mc.close();
} //> insert: ()Unit
I have various methods which open and close the connection.
Can the lines :
val mc = new com.mongodb.MongoClient("localhost", 27017);
val db = mc.getDatabase("MyDb");
mc.close();
be extracted so that they are implicitly called at beginning and end of method.
Does Scala implicits cater for this scenario or is reflection required ?
A common pattern would be to use a call-by-name method where you can pass a function that accepts a DB and does something with it. The call-by-name method can facilitate the creation of the client, etc, and execute the code within.
def withDB[A](block: DB => A): A = {
val mc = new com.mongodb.MongoClient("localhost", 27017);
val db = mc.getDatabase("MyDb");
try block(db) finally mc.close()
}
And use it:
def insert() = withDB { db =>
// do something with `db`
}
However, a look at the documentation says:
A MongoDB client with internal connection pooling. For most applications, you should have one MongoClient instance for the entire JVM.
Which makes the above approach look like a bad idea, assuming that is the version you are using. I can definitely see some concurrency issues trying to do this and having too many connections open.
But, you can follow the same pattern, stuffing the connection creating into a singleton object. You would need to manage the closing of the client when your application is shutdown, however.
object Mongo {
lazy val mc = new com.mongodb.MongoClient("localhost", 27017);
lazy val db = mc.getDatabase("MyDb");
def withDB[A](block: DB => A): A = block(db)
def close(): Unit = mc.close()
}
You can define a method that executes some 'work' function e.g.
def withMongoDb[T](work: DB => T): T = {
val mc = new com.mongodb.MongoClient("localhost", 27017)
// I don't actually know what type `db` is so I'm calling it `DB`
val db: DB = mc.getDatabase("MyDb")
try { work(db) }
finally { mc.close() }
}
Then you can use it like:
withMongoDb { db =>
db.insert(...)
db.query(...)
}
This is a similar approach to the one used in Slick, pre-3.0, i.e. withSession and withTransaction.
Now, if you implemented some convenience methods e.g.
def insertStuff(values: Seq[Int])(implicit db: DB) = {
db.insert(values)
}
Then you could call mark the db as implicit in your withMongoDb call, effectively making sure you never accidentally call insertStuff outside of that block.
withMongoDb { implicit db =>
insertStuff(Seq(1,2,3,4))
}
insertStuff(Seq(1,2,3,4)) // compile error
Instead of implicits you could do something like this:
def mongoConn(ip:String, port:Int, dbName:String):(Database => Unit) => Unit = {
f => {
val mc = new com.mongodb.MongoClient(ip, port)
val db = mc.getDatabase(dbName)
f(db)
mc.close()
}
}
val conn = mongoConn("localhost", 27017, "MyDb")
conn(db => {
//insert code
})

Can a scala self-type be satisfied via delegation?

Suppose I have (and this is rather contrived)
trait DbConnection {
val dbName: String
val dbHost: String
}
class Query {
self: DbConnection =>
def doQuery(sql: String) {
// connect using dbName, dbHost
// perform query
}
}
class HasADbConnection(override val dbName: String,
override val dbHost: String) extends DbConnection {
self =>
def doSomething() {
doSomethingElseFirst()
}
def doSomethingElseFirst() = {
val query = new Query() with DbConnection {
override val dbName = self.dbName
override val dbHost = self.dbHost
}
query.doQuery("")
}
}
Is there a way to avoid the redundant "override val dbName = self.dbName, override val dbHost = self.dbHost" in the new Query() creation, and instead indicate that the new Query object should inherit from / delegate to the HasADbConnection instance for these fields?
I realize it may be more appropriate for Query to take a DbConnection as a constructor argument. I'm interested though in other ways of satisfying the Query self-type. Perhaps there is no way of propagating the HasADbconnection fields onto the new Query instance, which is a completely valid answer.
Not sure exactly what you are trying to do, but this seems like a possible match to your intent:
trait C extends B {
def myA = new A() with C { // Note that this could be "with B" rather than "with C" and it would still work.
val name: String = C.this.name // explicit type not actually required - String type can be inferred.
}
}
then, for example:
scala> val myC = new C() { val name = "myC-name" }
myC: C = $anon$1#7f830771
scala> val anA = myC.myA
anA: A with C = C$$anon$1#249c38d5
scala> val aName = anA.name
aName: String = myC-name
Hopefully that can at least help guide your eventual solution. Might be able to give further help if you clarify what you want to do (or why you want to do it) further.
EDIT - after update to question:
I suggest you may be thinking about this the wrong way. I would not like tying the Query class down to knowing how to forge a connection. Rather, either pass a ready-made connection as a parameter to your call that uses the connection, or (as shown below) set up the Query class to supply a function from a connection to a result, then call that function using a connection established elsewhere.
Here is a how I would think about solving this (note that this example doesn't create an actual DB connection per se, I just take your DbConnection type at face value - really you have actually defined a 'DbConnConfig' type):
trait DbConnection {
val dbName: String
val dbHost: String
}
class Query(sql: String) {
def doQuery: DbConnection => String = { conn: DbConnection =>
// No need here to know how to: connect using dbName, dbHost
// perform query, using provided connection:
s"${conn.dbName}-${sql}-${conn.dbHost}" // <- Dummy implementation only here, so you can, for example, try it out in the REPL.
}
}
class HasADbConnection(override val dbName: String,
override val dbHost: String) extends DbConnection {
// You can create your actual connection here...
val conn = makeConnection
def doSomething(query: Query) = {
// ... or here, according to your program's needs.
val conn = makeConnection
query.doQuery(conn)
}
def makeConnection = this // Not a real implementation, just a quick cheat for this example.
}
In reality, doQuery (which could be named better) should have a type of DbConnection => ResultSet or similar. Example usage:
scala> val hasConn = new HasADbConnection("myDb", "myHost")
hasConn: HasADbConnection = HasADbConnection#6c1e5d2f
scala> hasConn.doSomething(new Query("#"))
res2: String = myDb-#-myHost

How could I know if a database table is exists in ScalaQuery

I'm trying ScalaQuery, it is really amazing. I could defined the database table using Scala class, and query it easily.
But I would like to know, in the following code, how could I check if a table is exists, so I won't call 'Table.ddl.create' twice and get a exception when I run this program twice?
object Users extends Table[(Int, String, String)]("Users") {
def id = column[Int]("id")
def first = column[String]("first")
def last = column[String]("last")
def * = id ~ first ~ last
}
object Main
{
val database = Database.forURL("jdbc:sqlite:sample.db", driver = "org.sqlite.JDBC")
def main(args: Array[String]) {
database withSession {
// How could I know table Users is alrady in the DB?
if ( ??? ) {
Users.ddl.create
}
}
}
}
ScalaQuery version 0.9.4 includes a number of helpful SQL metadata wrapper classes in the org.scalaquery.meta package, such as MTable:
http://scalaquery.org/doc/api/scalaquery-0.9.4/#org.scalaquery.meta.MTable
In the test code for ScalaQuery, we can see examples of these classes being used. In particular, see org.scalaquery.test.MetaTest.
I wrote this little function to give me a map of all the known tables, keyed by table name.
import org.scalaquery.meta.{MTable}
def makeTableMap(dbsess: Session) : Map[String, MTable] = {
val tableList = MTable.getTables.list()(dbsess);
val tableMap = tableList.map{t => (t.name.name, t)}.toMap;
tableMap;
}
So now, before I create an SQL table, I can check "if (!tableMap.contains(tableName))".
This thread is a bit old, but maybe someone will find this useful. All my DAOs include this:
def create = db withSession {
if (!MTable.getTables.list.exists(_.name.name == MyTable.tableName))
MyTable.ddl.create
}
Here's a full solution that checks on application start using a PostGreSQL DB for PlayFramework
import globals.DBGlobal
import models.UsersTable
import org.scalaquery.meta.MTable
import org.scalaquery.session.Session
import play.api.GlobalSettings
import play.api.Application
object Global extends GlobalSettings {
override def onStart(app: Application) {
DBGlobal.db.withSession { session : Session =>
import org.scalaquery.session.Database.threadLocalSession
import org.scalaquery.ql.extended.PostgresDriver.Implicit._
if (!makeTableMap(session).contains("tableName")) {
UsersTable.ddl.create(session)
}
}
}
def makeTableMap(dbsess: Session): Map[String, MTable] = {
val tableList = MTable.getTables.list()(dbsess)
val tableMap = tableList.map {
t => (t.name.name, t)
}.toMap
tableMap
}
}
With java.sql.DatabaseMetaData (Interface). Depending on your Database, more or less functions might be implemented.
See also the related discussion here.I personally prefer hezamu's suggestion and extend it as follows to keep it DRY:
def createIfNotExists(tables: TableQuery[_ <: Table[_]]*)(implicit session: Session) {
tables foreach {table => if(MTable.getTables(table.baseTableRow.tableName).list.isEmpty) table.ddl.create}
}
Then you can just create your tables with the implicit session:
db withSession {
implicit session =>
createIfNotExists(table1, table2, ..., tablen)
}
You can define in your DAO impl the following method (taken from Slick MTable.getTables always fails with Unexpected exception[JdbcSQLException: Invalid value 7 for parameter columnIndex [90008-60]]) that gives you a true o false depending if there a defined table in your db:
def checkTable() : Boolean = {
val action = MTable.getTables
val future = db.run(action)
val retVal = future map {result =>
result map {x => x}
}
val x = Await.result(retVal, Duration.Inf)
if (x.length > 0) {
true
} else {
false
}
}
Or, you can check if some "GIVENTABLENAME" or something exists with println method:
def printTable() ={
val q = db.run(MTable.getTables)
println(Await.result(q, Duration.Inf).toList(0)) //prints first MTable element
println(Await.result(q, Duration.Inf).toList(1))//prints second MTable element
println(Await.result(q, Duration.Inf).toList.toString.contains("MTable(MQName(public.GIVENTABLENAME_pkey),INDEX,null,None,None,None)"))
}
Don't forget to add
import slick.jdbc.meta._
Then call the methods from anywhere with the usual #Inject(). Using
play 2.4 and play-slick 1.0.0.
Cheers,