Has anyone tried to use for-comprehensions with the decline config/command-line library? Using mapN with their Opts class to produce a config case class gets really unreadable and brittle if it has a lot of members. I'd like to use a for-comprehension instead, something like this:
val databaseConfig: Opts[DatabaseConfig] = {
for {
username <- Opts.envWithDefault[String]("POSTGRES_USER", "Postgres username", "postgres")
password <- Opts.envWithDefault[String]("POSTGRES_PASSWORD", "Postgres password", "postgres")
hostname <- Opts.envWithDefault[String]("POSTGRES_HOSTNAME", "Postgres hostname", "localhost")
database <- Opts.envWithDefault[String]("POSTGRES_DATABASE", "Postgres database", "thebean")
port <- Opts.envWithDefault[Int]("POSTGRES_PORT", "Postgres port", 5432)
threadPoolSize <- Opts.envWithDefault[Int]("POSTGRES_THREAD_POOL_SIZE", "Postgres thread pool size", 4)
} yield DatabaseConfig(username, password, hostname, database, port, threadPoolSize)
But that seems to be impossible because Opts doesn't have flatMap defined, and I don't see a good way to implement it (which isn't to say there isn't one). Any suggestions? Did I miss the magical import?
Edit:
The problematic code looks like this (the real problem code has more members, but this gives the idea):
(
Opts.envWithDefault[String]("POSTGRES_USER", "Postgres username", "postgres"),
Opts.envWithDefault[String]("POSTGRES_PASSWORD", "Postgres password", "postgres"),
Opts.envWithDefault[String]("POSTGRES_HOSTNAME", "Postgres hostname", "localhost"),
Opts.envWithDefault[String]("POSTGRES_DATABASE", "Postgres database", "thebean"),
Opts.envWithDefault[Int]("POSTGRES_PORT", "Postgres port", 5432),
Opts.envWithDefault[Int]("POSTGRES_THREAD_POOL_SIZE", "Postgres thread pool size", 4)
).mapN(DatabaseConfig.apply)
If you want to know what environment variable is used to set, say, the port, you have to count -- port is the 5th member of the case class, so you have to find the 5th environment variable created in the tuple. That's not great when there are a lot of these.
The following code, suggested in a comment, does improve things:
val username = Opts.envWithDefault[String]("POSTGRES_USER", "Postgres username", "postgres")
val password = Opts.envWithDefault[String]("POSTGRES_PASSWORD", "Postgres password", "postgres")
val hostname = Opts.envWithDefault[String]("POSTGRES_HOSTNAME", "Postgres hostname", "localhost")
val database = Opts.envWithDefault[String]("POSTGRES_DATABASE", "Postgres database", "thebean")
val port = Opts.envWithDefault[Int]("POSTGRES_PORT", "Postgres port", 5432)
val threadPoolSize = Opts.envWithDefault[Int]("POSTGRES_THREAD_POOL_SIZE", "Postgres thread pool size", 4)
(username, password, hostname, database, port, threadPoolSize).mapN(DatabaseConfig.apply)
But isn't this exactly what for-comprehensions are intended for? It seems like using one would be a bit cleaner, so I'm wondering if I'm missing an import or something, or if the library has genuinely decided to make it impossible to flatMap over Opts.
So I'm wondering [...] if the library has genuinely decided to make it impossible to flatMap over Opts.
Yes, they have deliberately decided to avoid flatMap, because there should be no causal relation between the arguments passed to the earlier options and option-specifications that come after them. For example, allowing something like
for
username <- Opts.envWithDefault[String]("X", "Postgres username", "W")
password <- Opts.envWithDefault[String]("Y", s"Password of ${username}", "Z")
yield SomeConfig(username, password)
would result in the absurd conclusion that one needs to know the username before one can display help for password, because passwords description depends on the validated username argument. This is how the IO-monad can behave in an interactive dialogue, but it's unsuitable for Opts.
It's intentionally made applicative, and not monadic. Therefore, there is no flatMap, and it would be very strange if they attempted to force it into the monadic interface, which is unnecessarily restrictive for this use case.
So, instead of
for
x <- m1
y <- m2
z <- m3
yield Foo(x, y, z)
for monadic ms just use
val x = a1
val y = a2
val z = a3
(a1, a2, a3).mapN(Foo.apply)
for applicative as.
Related
I am porting the following 10 lines of Python code to Scala:
import psycopg2
def execute(user, password, database, host, port, *queries):
connection = psycopg2.connect(user=user, password=password, host=host, port=port, database=database)
cursor = connection.cursor()
for sql in queries:
print(sql)
cursor.execute(sql)
connection.commit()
cursor.close()
connection.close()
I have the following equivalent Scala code:
def execute(user: String, password: String, database: String, host: String, port: Int, queries: String*): Unit = {
???
}
I want to execute (and print) bunch of SQL statements in a single transaction against the database (assume it to be Postgres) and be done.
How do I do that using doobie?
Note:
I cannot change the interface to my execute() (including I cannot add type or implicit params). It must take in String user, password etc. and a vararg of queries: String* and thus keep the interface same as the Python one.
Please also mention all imports needed
You can run multiple queries in one transaction in doobie using for-comprehension, for example:
val query = for {
_ <- sql"insert into person (name, age) values ($name, $age)".update.run
id <- sql"select lastval()".query[Long].unique
} yield p
But this solution won't work in your case, because you've got a dynamic list of queries. Fortunately, we can use traverse from cats:
import cats.effect.ContextShift
import doobie._
import doobie.implicits._
import cats.effect._
import scala.concurrent.ExecutionContext
import cats.implicits._
import cats._
import cats.data._
def execute(user: String, password: String, database: String, host: String, port: Int, queries: String*): Unit = {
//you can use other executor if you want
//it would be better to pass context shift as implicit argument to method
implicit val cs: ContextShift[IO] = IO.contextShift(ExecutionContext.global)
//let's create transactor
val xa = Transactor.fromDriverManager[IO](
"org.postgresql.Driver",
s"jdbc:postgresql://$host:$port/$database", //remember to change url or make it dynamic, if you run it agains another database
user,
password
)
val batch = queries
.toList //we need to change String* to list, since String* doesn't have necessary typeclass for Aplicative
.traverse(query => Update0(query, None).run) //we lift strings to Query0 and then run them, then we change List[ConnectionIO[Int]] to ConnectionIO[List[Int]]
//above can be done in two steps using map and sequence
batch //now we've got single ConnectionIO which will run in one transaction
.transact(xa) //let's make it IO[Int]
.unsafeRunSync() //we need to block since your method returns Unit
}
Probably your IDE will show you this code is invalid, but it's correct. IDEs just can't handle Scala magic.
You might also consider using unsafeRunTimed instead of unsafeRunSync to add the time limit.
Also, remember to add postgresql driver for jdbc and cats to your build.sbt. Doobie uses cats under the hood, but I think explicit dependency might be necessary.
Try solving it for just one query in a transaction and seeing what that function signature looks like.
Then look at how to get from there to your final destination.
I've been using doobie (cats) to connect to a postgresql database from a scalatra application. Recently I noticed that the app was creating a new connection pool for every transaction. I eventually worked around it - see below, but this approach is quite different from that taken in the 'managing connections' section of the book of doobie, I was hoping someone could confirm whether it is sensible or whether there is a better way of setting up the connection pool.
Here's what I had initially - this works but creates a new connection pool on every connection:
import com.zaxxer.hikari.HikariDataSource
import doobie.hikari.hikaritransactor.HikariTransactor
import doobie.imports._
val pgTransactor = HikariTransactor[IOLite](
"org.postgresql.Driver",
s"jdbc:postgresql://${postgresDBHost}:${postgresDBPort}/${postgresDBName}",
postgresDBUser,
postgresDBPassword
)
// every query goes via this function
def doTransaction[A](update: ConnectionIO[A]): Option[A] = {
val io = for {
xa <- pgTransactor
res <- update.transact(xa) ensuring xa.shutdown
} yield res
io.unsafePerformIO
}
My initial assumption was that the problem was having ensuring xa.shutdown on every request, but removing it results in connections quickly being used up until there are none left.
This was an attempt to fix the problem - enabled me to remove ensuring xa.shutdown, but still resulted in the connection pool being repeatedly opened and closed:
val pgTransactor: HikariTransactor[IOLite] = HikariTransactor[IOLite](
"org.postgresql.Driver",
s"jdbc:postgresql://${postgresDBHost}:${postgresDBPort}/${postgresDBName}",
postgresDBUser,
postgresDBPassword
).unsafePerformIO
def doTransaction[A](update: ConnectionIO[A]): Option[A] = {
val io = update.transact(pgTransactor)
io.unsafePerformIO
}
Finally, I got the desired behaviour by creating a HikariDataSource object and then passing it into the HikariTransactor constructor:
val dataSource = new HikariDataSource()
dataSource.setJdbcUrl(s"jdbc:postgresql://${postgresDBHost}:${postgresDBPort}/${postgresDBName}")
dataSource.setUsername(postgresDBUser)
dataSource.setPassword(postgresDBPassword)
val pgTransactor: HikariTransactor[IOLite] = HikariTransactor[IOLite](dataSource)
def doTransaction[A](update: ConnectionIO[A], operationDescription: String): Option[A] = {
val io = update.transact(pgTransactor)
io.unsafePerformIO
}
You can do something like this:
val xa = HikariTransactor[IOLite](dataSource).unsafePerformIO
and pass it to your repositories.
.transact applies the transaction boundaries, like Slick's .transactionally.
E.g.:
def interactWithDb = {
val q: ConnectionIO[Int] = sql"""..."""
q.transact(xa).unsafePerformIO
}
Yes, the response from Radu gets at the problem. The HikariTransactor (the underlying HikariDataSource really) has internal state so constructing it is a side-effect; and you want to do it once when your program starts and pass it around as needed. So your solution works, just note the side-effect.
Also, as noted, I don't monitor SO … try the Gitter channel or open an issue if you have questions. :-)
Background
I have Map[String,String] of configuration values. I want to extract a series of keys and provide meaningful error messages if any of them are missing. For example:
val a = Map("url"->"http://example.com", "user"->"bob", "password"->"12345")
Say I want to transform this into a case class:
case class HttpConnectionParams(url:String, user:String, password: String)
Now, I can simply use a for loop to extract the values:
for(url <- a.get("url");
user <- a.get("user");
password <- a.get("password")) yield {
HttpConnectionParams(url,user,password)
}
To get an Option[HttpConnectionParams]. This is nice and clean, except if I get a None then I don't know what was missing. I'd like to provide that information.
Validation with Scalaz
Enter scalaz. I'm using version 7.1.3.
From what I've been able to put together (a good reference is here) I can use disjunctions:
for(url <- a.get("url") \/> "Url must be supplied";
user <- a.get("user") \/> "Username must be supplied";
password <- a.get("password") \/> "Password must be supplied") yield {
HttpConnectionParams(url,user,password)
}
This is nice because now I get an error message, but this is railway oriented because it stops at the first failure. What if I want to get all of the errors? Let's use validation and the applicative builder (aka "|#|"):
val result = a.get("url").toSuccess("Url must be supplied") |#|
a.get("username").toSuccess("Username must be supplied") |#|
a.get("password").toSuccess("Password must be supplied")
result.tupled match {
case Success((url,user,password)) => HttpConnectionParams(url,user,password)
case Failure(m) => println("There was a failure"+m)
}
Questions
This does what I expect, but I have some questions about the usage:
Is there an easy to use alternative to scalaz for this use-case? I'd prefer to not open pandora's box and introduce scalaz if I don't have to.
One reason I'd like to not use scalaz is that it's really really hard to figure out what to do if you don't, like me, know the entire framework. For example, what is the list of implicits that you need to get the above code to work? import scalaz._ somehow didn't work for me.[1] How can I figure this out from the API docs?
Is there a more succinct way to express the validation use-case? I stumbled my way through until I arrived at something that worked and I have no idea if there are other, better ways of doing the same thing in scalaz.
[1] After much consternation I arrived at this set of imports for the applicative use-case. Hopefully this helps somebody:
import scalaz.std.string._
import scalaz.syntax.std.option._
import scalaz.syntax.apply._
import scalaz.Success
import scalaz.Failure
You can do this a little more nicely by defining a helper method and skipping the .tupled step by using .apply:
import scalaz._, Scalaz._
def lookup[K, V](m: Map[K, V], k: K, message: String): ValidationNel[String, V] =
m.get(k).toSuccess(NonEmptyList(message))
val validated: ValidationNel[String, HttpConnectionParams] = (
lookup(a, "url", "Url must be supplied") |#|
lookup(a, "username", "Username must be supplied") |#|
lookup(a, "password", "Password must be supplied")
)(HttpConnectionParams.apply)
Also, please don't be ashamed to use import scalaz._, Scalaz._. We all do it and it's just fine in the vast majority of cases. You can always go back and refine your imports later. I also still stand by this answer I wrote years ago—you shouldn't feel like you need to have a comprehensive understanding of Scalaz (or cats) in order to be able to use pieces of it effectively.
I have read a blog post about Reader monad.
The post is truly great and explains the topic in details but I did not get why I should use the Reader monad in that case.
The post says: Suppose there is a function query: String => Connection => ResultSet
def query(sql:String) = conn:Connection => conn.createStatement.executeQuery(sql)
We can run a few queries as follows:
def doSomeQueries(conn: Connection) = {
val rs1 = query("SELECT COUNT(*) FROM Foo")(conn)
val rs2 = query("SELECT COUNT(*) FROM Bar")(conn)
rs1.getInt(1) + rs2.getInt(1)
}
So far so good, but the post suggests use the Reader monad instead:
class Reader[E, A](run: E => A) {
def map[B](f: A => B):Reader[E, B] =
new Reader(е=> f(run(е)))
def flatMap[B](f:A => Reader[E, B]): Reader[E, B] =
new Reader(е => f(run(е)).run(е))
}
val query(sql:String): Reader[Connection, ResultSet] =
new Reader(conn => conn.createStatement.executeQuery(sql))
def doSomeQueries(conn: Connection) = for {
rs1 <- query("SELECT COUNT(*) FROM Foo")
rs2 <- query("SELECT COUNT(*) FROM Bar")
} yield rs1.getInt(1) + rs2.getInt(1)
Ok, I got that I don't need to thread connection through the calls explicitly. So what ?
Why the solution with Reader monad is better than the previous one ?
UPDATE: Fixed the typo in def query: = should be =>
This comment only exists because SO insists that edits must be at least 6 chars long. So here we go.
The most important reason is that the reader monad allows you to build up complex computations compositionally. Consider the following line from your non-reader example:
val rs1 = query("SELECT COUNT(*) FROM Foo")(conn)
The fact that we're passing in conn manually means that this line doesn't really make sense on its own—it can only be understood and reasoned about in the context of the doSomeQueries method that gives us conn.
Often this is just fine—there's obviously nothing wrong about defining and using local variables (at least in the val sense). Sometimes, though, it's more convenient (or desirable for other reasons) to build up computations out of stand-alone, composable pieces, and the reader monad can help with this.
Consider query("SELECT COUNT(*) FROM Foo") in your second example. Assuming we know what query is, this is an entirely self-contained expression—there are no variables like conn that need to be bound by some enclosing scope. This means you can reuse and refactor more confidently, and that you don't have quite so much stuff to hold in your head when you're reasoning about it.
Again, this isn't ever necessary—it's largely a matter of style. If you decide to give it a try (and I'd suggest that you do), you'll probably pretty quickly develop preferences and intuitions about where it makes your code more intelligible and where it doesn't.
One other advantage is that you can compose different kinds of "effects" using ReaderT (or by adding Reader into some other stack). That set of issues probably deserves its own question and answer, though.
One last note: you probably want your doSomeQueries to look like this:
def doSomeQueries: Reader[Connection, Int] = for {
rs1 <- query("SELECT COUNT(*) FROM Foo")
rs2 <- query("SELECT COUNT(*) FROM Bar")
} yield rs1.getInt(1) + rs2.getInt(1)
Or, if this really is the end of the line:
def doSomeQueries(conn: Connection) = (
for {
rs1 <- query("SELECT COUNT(*) FROM Foo")
rs2 <- query("SELECT COUNT(*) FROM Bar")
} yield rs1.getInt(1) + rs2.getInt(1)
).run(conn)
In your current version you're not actually using conn.
For finding out the general benefits of using ReaderMonad I recommend Travis Brown's excellent answer - the strength of ReaderMonad lies in its compositionality and other extras provided by monads (e.g. the ReaderT et al). You get the most benefit out of it if you write your other code in monadic style too.
You've also asked specifically what's so desirable in not having to pass the connection around explicitly. I'll try to answer this part of your question here.
First, few words less to type / less to read is already an improvement. The more complex the whole codebase is the more I appreciate that. When I read a long method (not written by me of course ;) ) I find it easier when its logic isn't interwoven with dumb argument passing.
Second, ReaderMonad gives you a guarantee, that the connection is the same object all the way down. Most often you want exactly that. In your first example it's very easy to call
query("SELECT COUNT(*) FROM Bar")(anotherConnectionWhereverItCameFrom)
regardless of whether it's been done for purpose or by mistake. When I read a long method and see ReaderMonad used I know that there'll be only one connection used. No nasty surprises caused by some "tactical solution" in the 219th line of the method.
Note, that those benefits can be also achieved without ReaderMonad, even if it does good job in that area. You could for example just write:
class Query(val connection: Connection) {
def apply(sql:String) = connection.createStatement.executeQuery(sql)
}
def doSomeQueries(query: Query) = {
val rs1 = query("SELECT COUNT(*) FROM Foo")
val rs2 = query("SELECT COUNT(*) FROM Bar")
rs1.getInt(1) + rs2.getInt(1)
}
doSomeQueries(new Query(connection))
It wouldn't have neither composability nor other nice features of monads, but would achieve the ReaderMonad's goal of not passing the argument (connection) explicitly.
Looking at an IO Monad example from Functional Programming in Scala, I created an SBT project to test out IO.scala:
def ReadLine: IO[String] = IO { readLine }
def PrintLine(msg: String): IO[Unit] = IO { println(msg) }
def converter: IO[Unit] = for {
_ <- PrintLine("Enter a temperature in degrees fahrenheit: ")
d <- ReadLine.map(_.toDouble)
_ <- PrintLine(fahrenheitToCelsius(d).toString)
} yield ()
But, when I run console from SBT to access the above class with REPL, I tried:
scala> val echo = Util.ReadLine.flatMap(Util.PrintLine)
echo: common.I01.IO[Unit] = common.I01$IO$$anon$2#71c6b580
I'm expecting to be prompted for typing in text (via readLine), but I see, as I understand, simply an anonymous function/class.
How can I test out the above code?
Calling flatMap on ReadLine just produces an IO[Unit] value that has not been interpreted. At some point, you have to call IO#run (or IO#unsafePerformIO in scalaz) to make the side effects happen
To preserve referential transparency, the general idea is to build up an IO[A] (where A is typically Unit) and at the "outermost" part of your program, call run on the value -- for example, from the main entry point of the application. That's not always easy/possible though depending on the environment you are running in -- e.g., some form of framework or container.
Because loss of referential transparency is generally considered a pretty serious disadvantage, it is common to defer running of the IO value as long as possible. Hence, it is common to say that IO is evaluated at the end of the universe.
In this case, the end of the universe is the REPL session, so try calling echo.run from the REPL.