How to include file in production mode for Play framework - scala

An overview of my environments:
Mac OS Yosemite, Play framework 2.3.7, sbt 0.13.7, Intellij Idea 14, java 1.8.0_25
I tried to run a simple Spark program in Play framework, so I just create a Play 2 project in Intellij, and change some files as follows:
app/Controllers/Application.scala:
package controllers
import play.api._
import play.api.libs.iteratee.Enumerator
import play.api.mvc._
object Application extends Controller {
def index = Action {
Ok(views.html.index("Your new application is ready."))
}
def trySpark = Action {
Ok.chunked(Enumerator(utils.TrySpark.runSpark))
}
}
app/utils/TrySpark.scala:
package utils
import org.apache.spark.{SparkContext, SparkConf}
object TrySpark {
def runSpark: String = {
val conf = new SparkConf().setAppName("trySpark").setMaster("local[4]")
val sc = new SparkContext(conf)
val data = sc.textFile("public/data/array.txt")
val array = data.map ( line => line.split(' ').map(_.toDouble) )
val sum = array.first().reduce( (a, b) => a + b )
return sum.toString
}
}
public/data/array.txt:
1 2 3 4 5 6 7
conf/routes:
GET / controllers.Application.index
GET /spark controllers.Application.trySpark
GET /assets/*file controllers.Assets.at(path="/public", file)
build.sbt:
name := "trySpark"
version := "1.0"
lazy val `tryspark` = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.10.4"
libraryDependencies ++= Seq( jdbc , anorm , cache , ws,
"org.apache.spark" % "spark-core_2.10" % "1.2.0")
unmanagedResourceDirectories in Test <+= baseDirectory ( _ /"target/web/public/test" )
I type activator run to run this app in development mode then type localhost:9000/spark in the browser, it shows result 28 as expected. However, when I want type activator start to run this app in production mode it shows the following error message:
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
[error] application -
! #6kik15fee - Internal server error, for (GET) [/spark] ->
play.api.Application$$anon$1: Execution exception[[InvalidInputException: Input path does not exist: file:/Path/to/my/project/target/universal/stage/public/data/array.txt]]
at play.api.Application$class.handleError(Application.scala:296) ~[com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.api.DefaultApplication.handleError(Application.scala:402) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$14$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:205) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$14$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:202) [com.typesafe.play.play_2.10-2.3.7.jar:2.3.7]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) [org.scala-lang.scala-library-2.10.4.jar:na]
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Path/to/my/project/target/universal/stage/public/data/array.txt
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) ~[org.apache.spark.spark-core_2.10-1.2.0.jar:1.2.0]
It seems that my array.txt file is not loaded in the production mode. How can solve this problem?

The problem here is that the public directory will not be available in your root project dir when you run in production. It is packaged as a jar (usually in STAGE_DIR/lib/PROJ_NAME-VERSION-assets.jar) so you will not be able to access them this way.
I can see two solutions here:
1) Place the file in the conf directory. This will work, but seems very dirty especially if you intend to use more data files;
2) Place those files in some directory and tell sbt to package it as well. You can keep using the public directory although it seems better to use a different dir especially if you would want to have many more files.
Supposing array.txt is placed in a dir named datafiles in your project root, you can add this to build.sbt:
mappings in Universal ++=
(baseDirectory.value / "datafiles" * "*" get) map
(x => x -> ("datafiles/" + x.getName))
Don't forget to change the paths in your app code:
// (...)
val data = sc.textFile("datafiles/array.txt")
Then just do a clean and when you run either start, stage or dist those files will be available.

Related

sbt: finding correct path to files/folders under resources directory

I've a simple project structure:
WordCount
|
|------------ project
|----------------|---assembly.sbt
|
|------------ resources
|------------------|------ Message.txt
|
|------------ src
|--------------|---main
|--------------------|---scala
|--------------------------|---org
|-------------------------------|---apache
|----------------------------------------|---spark
|----------------------------------------------|---Counter.scala
|
|------------ build.sbt
here's how Counter.scala looks:
package org.apache.spark
object Counter {
def main(args: Array[String]): Unit = {
val sc = new SparkContext(new SparkConf())
val path: String = getClass.getClassLoader.getResource("Message.txt").getPath
println(s"path = $path")
// val lines = sc.textFile(path)
// val wordsCount = lines
// .flatMap(line => line.split("\\s", 2))
// .map(word => (word, 1))
// .reduceByKey(_ + _)
//
// wordsCount.foreach(println)
}
}
notice that the commented lines are actually correct, but the path variable is not. After building the fat jar with sbt assembly and running it with spark-submit, to see the value of path, I get:
path = file:/home/me/WordCount/target/scala-2.11/Counter-assembly-0.1.jar!/Message.txt
you can see that path is assigned to the jar location and, mysteriously, followed by !/ and then the file name Message.txt!!
on the other hand when I'm inside the WordCount folder, and I run the repl sbt console and then write
scala> getClass.getClassLoader.getResource("Message.txt").getPath
I get the correct path (without the file:/ prefix)
res1: String = /home/me/WordCount/target/scala-2.11/classes/Message.txt
Question:
1 - why is there two different outputs from the same command? (i.e. getClass.getClassLoader.getResource("...").getPath)
2 - how can I use the correct path, which appears in the console, inside my source file Counter.scala?
for anyone who wants to try it, here's my build.sbt:
name := "Counter"
version := "0.1"
scalaVersion := "2.11.8"
resourceDirectory in Compile := baseDirectory.value / "resources"
// allows us to include spark packages
resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
resolvers += "Typesafe Simple Repository" at "http://repo.typesafe.com/typesafe/simple/maven-releases/"
resolvers += "MavenRepository" at "https://mvnrepository.com/"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0" % "provided"
and the spark-submit command is:
spark-submit --master local --deploy-mode client --class org.apache.spark.Counter /home/me/WordCount/target/scala-2.11/Counter-assembly-0.1.jar
1 - why is there two different outputs from the same command?
By command, I am assuming you mean getClass.getClassLoader.getResource("Message.txt").getPath. So I would rephrase the question as why does the same method call to classloader getResource(...) return two different result depending on sbt console vs spark-submit.
The answer is because they use different classloader with each having different classpath. console uses your directories as classpath while spark-submit uses the fat JAR, which includes resources. When a resource is found in a JAR, the classloader returns a JAR URL, which looks like jar:file:/home/me/WordCount/target/scala-2.11/Counter-assembly-0.1.jar!/Message.txt.
The whole point of using Apache Spark is to distribute some work across multiple computers, so I don't think you want to see your machine's local path in production.

Building jars properly with sbt

I have a map reduce .scala file like this:
import org.apache.spark._
object WordCount {
def main(args: Array[String]){
val inputDir = args(0)
//val inputDir = "/Users/eksi/Desktop/sherlock.txt"
val outputDir = args(1)
//val outputDir = "/Users/eksi/Desktop/out.txt"
val cnf = new SparkConf().setAppName("Example MapReduce Spark Job")
val sc = new SparkContext(cnf)
val textFile = sc.textFile(inputDir)
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile(outputDir)
sc.stop()
}
}
When I run my code, with setMaster("local[1]") parameters it works fine.
I want to put this code in a .jar and throw it to S3 to work with AWS EMR. Therefore, I use the following build.sbt to do so.
name := "word-count"
version := "0.0.1"
scalaVersion := "2.11.7"
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.0.2"
)
It generates a jar file, however none of my scala code is in there. What I see is just a manifest file when I extract the .jar
When I run sbt package this is what I get:
[myMacBook-Pro] > sbt package
[info] Loading project definition from /Users/lele/bigdata/wordcount/project
[info] Set current project to word-count (in build file:/Users/lele/bigdata/wordcount/)
[info] Packaging /Users/lele/bigdata/wordcount/target/scala-2.11/word-count_2.11-0.0.1.jar ...
[info] Done packaging.
[success] Total time: 0 s, completed Jul 27, 2016 10:33:26 PM
What should I do to create a proper jar file that works like
WordCount.jar WordCount
Ref: It generates a jar file, however none of my scala code is in there. What I see is just a manifest file when I extract the .jar
Make sure your WordCount.scala is in the root or in src/main/scala
From http://www.scala-sbt.org/1.0/docs/Directories.html
Source code can be placed in the project’s base directory as with hello/hw.scala. However, most people don’t do this for real projects; too much clutter.
sbt uses the same directory structure as Maven for source files by default (all paths are relative to the base directory):

How to do Slick configuration via application.conf from within custom sbt task?

I want to create an set task which creates a database schema with slick. For that, I have a task object like the following in my project:
object CreateSchema {
val instance = Database.forConfig("localDb")
def main(args: Array[String]) {
val createFuture = instance.run(createActions)
...
Await.ready(createFuture, Duration.Inf)
}
}
and in my build.sbt I define a task:
lazy val createSchema = taskKey[Unit]("CREATE database schema")
fullRunTask(createSchema, Runtime, "sbt.CreateSchema")
which gets executed as expected when I run sbt createSchema from the command line.
However, the problem is that application.conf doesn't seem to get taken into account (I've also tried different scopes like Compile or Test). As a result, the task fails due to com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'localDb'.
How can I fix this so the configuration is available?
I found a lot of questions here that deal with using the application.conf inside the build.sbt itself, but that is not what I need.
I have setup a little demo using SBT 0.13.8 and Slick 3.0.0, which is working as expected. (And even without modifying "-Dconfig.resource".)
Files
./build.sbt
name := "SO_20150915"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"com.typesafe" % "config" % "1.3.0" withSources() withJavadoc(),
"com.typesafe.slick" %% "slick" % "3.0.0",
"org.slf4j" % "slf4j-nop" % "1.6.4",
"com.h2database" % "h2" % "1.3.175"
)
lazy val createSchema = taskKey[Unit]("CREATE database schema")
fullRunTask(createSchema, Runtime, "somefun.CallMe")
./project/build.properties
sbt.version = 0.13.8
./src/main/resources/reference.conf
hello {
world = "buuh."
}
h2mem1 = {
url = "jdbc:h2:mem:test1"
driver = org.h2.Driver
connectionPool = disabled
keepAliveConnection = true
}
./src/main/scala/somefun/CallMe.scala
package somefun
import com.typesafe.config.Config
import com.typesafe.config.ConfigFactory
import slick.driver.H2Driver.api._
/**
* SO_20150915
* Created by martin on 15.09.15.
*/
object CallMe {
def main(args: Array[String]) : Unit = {
println("Hello")
val settings = new Settings()
println(s"Settings read from hello.world: ${settings.hw}" )
val db = Database.forConfig("h2mem1")
try {
// ...
println("Do something with your database.")
} finally db.close
}
}
class Settings(val config: Config) {
// This verifies that the Config is sane and has our
// reference config. Importantly, we specify the "di3"
// path so we only validate settings that belong to this
// library. Otherwise, we might throw mistaken errors about
// settings we know nothing about.
config.checkValid(ConfigFactory.defaultReference(), "hello")
// This uses the standard default Config, if none is provided,
// which simplifies apps willing to use the defaults
def this() {
this(ConfigFactory.load())
}
val hw = config.getString("hello.world")
}
Result
If running sbt createSchema from Console I obtain the output
[info] Loading project definition from /home/.../SO_20150915/project
[info] Set current project to SO_20150915 (in build file:/home/.../SO_20150915/)
[info] Running somefun.CallMe
Hello
Settings read from hello.world: buuh.
Do something with your database.
[success] Total time: 1 s, completed 15.09.2015 10:42:20
Ideas
Please verify that this unmodified demo project also works for you.
Then try changing SBT version in the demo project and see if that changes something.
Alternatively, recheck your project setup and try to use a higher version of SBT.
Answer
So, even if your code resides in your src-folder, it is called from within SBT. That means, you are trying to load your application.conf from within the classpath context of SBT.
Slick uses Typesafe Config internally. (So the approach below (described in background) is not applicable, as you can not modify the Config loading mechanism itself).
Instead try the set the path to your application.conf explicitly via config.resource, see typesafe config docu (search for config.resource)
Option 1
Either set config.resource (via -Dconfig.resource=...) before starting sbt
Option 2
Or from within build.sbt as Scala code
sys.props("config.resource") = "./src/main/resources/application.conf"
Option 3
Or create a Task in SBT via
lazy val configPath = TaskKey[Unit]("configPath", "Set path for application.conf")
and add
configPath := Def.task {
sys.props("config.resource") = "./src/main/resources/application.conf"
}
to your sequence of settings.
Please let me know, if that worked.
Background information
Recently, I was writing a custom plugin for SBT, where I also tried to access a reference.conf as well. Unfortunately, I was not able to access any of .conf placed within project-subfolder using the default ClassLoader.
In the end I created a testenvironment.conf in project folder and used the following code to load the (typesafe) config:
def getConfig: Config = {
val classLoader = new java.net.URLClassLoader( Array( new File("./project/").toURI.toURL ) )
ConfigFactory.load(classLoader, "testenvironment")
}
or for loading a genereal application.conf from ./src/main/resources:
def getConfig: Config = {
val classLoader = new java.net.URLClassLoader( Array( new File("./src/main/resources/").toURI.toURL ) )
// no .conf basename given, so look for reference.conf and application.conf
// use specific classLoader
ConfigFactory.load(classLoader)
}

Play framework: Running separate module of multi-module application

I'm trying to create a multi-module application and run one of it's modules separately from the others (from another machine).
Project structure looks like this:
main
/ \
module1 module2
I want to run a module1 as a separate jar file (or there is a better way of doing this?), which I will run from another machine (I want to connect it to the main app using Akka remoting).
What I'm doing:
Running "play dist" command
Unzipping module1.zip from universal folder
Setting +x mode to bin/module1 executable.
Setting my main class (will paste it below): instead of play.core.server.NettyServer im putting my main class: declare -r app_mainclass="module1.foo.Launcher"
Running with external application.conf file.
Here is my main class:
class LauncherActor extends Actor {
def receive = {
case a => println(s"Received msg: $a ")
}
}
object Launcher extends App {
val system = ActorSystem("testsystem")
val listener = system.actorOf(Props[LauncherActor], name = "listener")
println(listener.path)
listener ! "hi!"
println("Server ready")
}
Here is the console output:
#pavel bin$ ./module1 -Dconfig.file=/Users/pavel/projects/foobar/conf/application.conf
[WARN] [10/18/2013 18:56:03.036] [main] [EventStream(akka://testsystem)] [akka.event-handlers] config is deprecated, use [akka.loggers]
akka://testsystem/user/listener
Server ready
Received msg: hi!
#pavel bin$
So the system switches off as soon as it gets to the last line of the main method. If I run this code without Play - it works as expected, the object is loaded and it waits for messages, which is expected behavior.
Maybe I'm doing something wrong? Or should I set some options in module1 executable? Other ideas?
Thanks in advance!
Update:
Versions:
Scala - 2.10.3
Play! - 2.2.0
SBT - 0.13.0
Akka - 2.2.1
Java 1.7 and 1.6 (tried both)
Build properties:
lazy val projectSettings = buildSettings ++ play.Project.playScalaSettings ++ Seq(resolvers := buildResolvers,
libraryDependencies ++= dependencies) ++ Seq(scalacOptions += "-language:postfixOps",
javaOptions in run ++= Seq(
"-XX:MaxPermSize=1024m",
"-Xmx4048m"
),
Keys.fork in run := true)
lazy val common = play.Project("common", buildVersion, dependencies, path = file("modules/common"))
lazy val root = play.Project(appName, buildVersion, settings = projectSettings).settings(
resolvers ++= buildResolvers
).dependsOn(common, module1, module2).aggregate(common, module1, module2)
lazy val module1 = play.Project("module1", buildVersion, path = file("modules/module1")).dependsOn(common).aggregate(common)
lazy val module2: Project = play.Project("module2", buildVersion, path = file("modules/module2")).dependsOn(common).aggregate(common)
So I found a dirty workaround and I will use it until I will find a better solution. In case someone is interested, I've added this code at the bottom of the Server object:
val shutdown = Future {
readLine("Press 'ENTER' key to shutdown")
}.map { q =>
println("**** Shutting down ****")
System.exit(0)
}
import scala.concurrent.duration._
Await.result(shutdown, 100 days)
And now system works until I will hit the ENTER key in the console. Dirty, I agree, but didn't find a better solution.
If there will be something better, of course I will mark it as an answer.

sbt Task classpath

I'm working on a sbt Task and I would like to have access to some of the application classes and dependencies.
(Specifically, I'd like to generate the Database DDL using scalaquery)
Is there any way to add those dependencies to the task or maybe I need to create a plugin for this?
object ApplicationBuild extends Build {
val appName = "test"
val appVersion = "1.0-SNAPSHOT"
val appDependencies = Seq(
"org.scalaquery" % "scalaquery_2.9.0-1" % "0.9.5")
val ddl = TaskKey[Unit]("ddl", "Generates the ddl in the evolutions folder")
val ddlTask = ddl <<= (baseDirectory, fullClasspath in Runtime) map { (bs, cp) =>
val f = bs / "conf/evolutions/default"
// Figures out the last sql number used
def nextFileNumber = { ... }
//writes to file
def printToFile(f: java.io.File)(op: java.io.PrintWriter => Unit) { ...}
def createDdl = {
import org.scalaquery.session._
import org.scalaquery.ql._
import org.scalaquery.ql.TypeMapper._
import org.scalaquery.ql.extended.H2Driver.Implicit._
import org.scalaquery.ql.extended.{ ExtendedTable => Table }
import models._
printToFile(new java.io.File(nextFileNumber, f))(p => {
models.Table.ddl.createStatements.foreach(p.println)
});
}
createDdl
None
}
val main = PlayProject(appName, appVersion, appDependencies, mainLang = SCALA).settings(
ddlTask)
}
The error I get is
[test] $ reload
[info] Loading global plugins from /home/asal/.sbt/plugins
[info] Loading project definition from /home/asal/myapps/test/project
[error] /home/asal/myapps/test/project/Build.scala:36: object scalaquery is not a member of package org
[error] import org.scalaquery.session._
[error] ^
[error] one error found
Thanks in advance
You have to add ScalaQuery and everything else your build depends on as a build dependency. That means that basically, you have to add it "as an sbt plugin".
This is described in some detail in the Using Plugins section of the sbt wiki. It all boils down to a very simple thing, though - just add a line defining your dependency under project/plugins.sbt like this:
libraryDependencies += "org.scalaquery" % "scalaquery_2.9.0-1" % "0.9.5"
Now, the problem with using application classes in the build is that you can't really add build products as build dependencies. - So, you would probably have to create a separate project that builds your DDL module, and add that as dependency to the build of this project.