I'm thinking about how should be the process to deploy my already locally tested rest api to the cloud, lets say an infrastructure as a service (not a platform as a service such as Heroku) like amazon.
I have my local envorinment set up with sbt up and running but my question is How should I deploy this in a production environment?
Is it sane to define a process in which the devops pulls the most recent changes from the git repo and then simply executes sbt run?
I want to know how does the teams that uses scala+spray+sbt deploys their apis to a production environment.
The heart of our services is scala + akka + spray + mongo. So we are using GitHub for version control. After merging checked PRs to the master branch, Jenkins automaticaly tests'n'builds project. If all tests were successful then Jenking runs a couple of scripts:
Increment project version (currently written in shell, but will be changed to sbt)
Run assembly task with sbt-assembly
Run deploy script (written in Python with Fabric) wich deploys our jar to EC2
Basicaly on the thrid step you have a couple of choices:
Make a runnable jar using IO/Spray boot file:
object Boot extends App {
implicit val system = ActorSystem("ServiceName")
val log = system.log
val service = system.actorOf(Props[Service], name="serviceActor")
IO(Http) ! Http.Bind(service, interface = host, port = port)
Make a runnable jar as Akka's microkernel:
In this case you should extend Bootable trait and override startup and shutdown methods:
class Kernel extends Bootable {
// many lines of code
def startup() {
def shutdown() {
Using a TypeSafe startscript:
Can't show an example, but it has a good intro on github =)
We are using all of this way in different cases.
You should build a jar with the plugin sbt-assembly
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.0")
Then you can run the jar in production with java -jar
If you give version number to your project, this is a rather classic process.
Hope it helps.
Never went to PRO with spray-akka. Only pet projects. My suggestions here should be taken as inspiration. I know some of the options I present are costly in terms of maintenance or prone to errors.
I only used maven-shade-plugin (no experience with sbt) but I guess there's a similar solution.
Packaging Issues
There's few issues with this approach though. Akka and many of the spray modules use the references.conf and application.conf convention. When assembly/shading all your dependencies the resources (since they are named the same) may overwrite and you'd be unable to start the application.
The quick and dirty solution I found was to copy/paste the application and ref.conf of the dependencies into one controlled by me.
To speed up our development workflow we split the tests and run each part on multiple agents in parallel. However, compiling test sources seem to take most of the time for the testing steps.
To avoid this, we pre-compile the tests using sbt test:compile and build a docker image with compiled targets.
Later, this image is used in each agent to run the tests. However, it seems to recompile the tests and application sources even though the compiled classes exists.
Is there a way to make sbt use existing compiled targets?
Update: To give more context
The question strictly relates to scala and sbt (hence the sbt tag).
Our CI process is broken down in to multiple phases. Its roughly something like this.
stage 1: Use SBT to compile Scala project into java bitecode using sbt compile We compile the test sources in the same test using sbt test:compile The targes are bundled in a docker image and pushed to the remote repository,
stage 2: We use multiple agents to split and run tests in parallel.
The tests run from the built docker image, so the environment is the
same. However, running sbt test causes the project to recompile even
through the compiled bitecode exists.
To make this clear, I basically want to compile on one machine and run the compiled test sources in another without re-compiling
I don't think https://stackoverflow.com/a/37440714/8261 is the same problem because unlike it, I don't mount volumes or build on the host machine. Everything is compiled and run within docker but in two build stages. The file modified times and paths are retained the same because of this.
The debug output has something like this
Initial source changes:
added: Set()
modified: Set()
Invalidated products: Set(/app/target/scala-2.12/classes/Class1.class, /app/target/scala-2.12/classes/graph/Class2.class, ...)
External API changes: API Changes: Set()
Modified binary dependencies: Set()
Initial directly invalidated classes: Set()
Sources indirectly invalidated by:
product: Set(/app/Class4.scala, /app/Class5.scala, ...)
binary dep: Set()
external source: Set()
All initially invalidated classes: Set()
All initially invalidated sources:Set(/app/Class4.scala, /app/Class5.scala, ...)
Recompiling all 304 sources: invalidated sources (266) exceeded 50.0% of all sources
Compiling 302 Scala sources and 2 Java sources to /app/target/scala-2.12/classes ...
It has no Initial source changes, but products are invalidated.
Update: Minimal project to reproduce
I created a minimal sbt project to reproduce the issue.
As you can see, nothing changes between the build stages, other than running in the second stage in a new step (new container).
While https://stackoverflow.com/a/37440714/8261 pointed at the right direction, the underlying issue and the solution for this was different.
SBT seems to recompile everything when it's run on different stages of a docker build. This is because docker compresses images created in each stage, which strips out the millisecond portion of the lastModifiedDate from sources.
SBT depends on lastModifiedDate when determining if sources have changed, and since its different (the milliseconds part) the build triggers a full recompilation.
Java 8:
Setting -Dsbt.io.jdktimestamps=true when running SBT as recommended in https://github.com/sbt/sbt/issues/4168#issuecomment-417655678 to workaround this issue.
Follow recomendation in https://github.com/sbt/sbt/issues/4168#issuecomment-417658294
I solved the issue by setting SBT_OPTS env variable in the docker file like
ENV SBT_OPTS="${SBT_OPTS} -Dsbt.io.jdktimestamps=true"
The test project has been updated with this workaround.
Using SBT:
I think there is already an answer to this here: https://stackoverflow.com/a/37440714/8261
It looks tricky to get exactly right. Good luck!
Avoiding SBT:
If the above approach is too difficult (i.e. getting sbt test to consider that your test classes do not need re-compiling), you could instead avoid using sbt but instead run your test suite using java directly.
If you can get sbt to log the java command that it is using to run your test suite (e.g. using debug logging), then you could run that command on your test runner agents directly, which would completely preclude sbt re-compiling things.
(You might need to write the java command into a script file, if the classpath is too long to pass as a command-line argument in your shell. I have previously had to do that for a large project.)
This would be a much hackier approach that the one above, but might be quicker to get working.
A possible solution might be defining your own sbt task without dependencies or try to change the test task. For example you could create a task to run a JUnit runner if that was your testing framework. To define a task see this on Implementing Tasks.
You could even go as far as compiling sending the code and running the remotes from the same task as it is any scala code you want. From the sbt reference manual
You could be defining your own task, or you could be planning to redefine an existing task. Either way looks the same; use := to associate some code with the task key
Using scala playframework 2.5,
I build the app into a jar using sbt plugin PlayScala,
And then build and pushes a docker image out of it using sbt plugin DockerPlugin
Residing in the source code repository conf/development.conf (same where application.conf is).
The last line in application.conf says include development which means that in case development.conf exists, the entries inside of it will override some of the entries in application.conf in such way that provides all default values necessary for making the application runnable locally right out of the box after the source was cloned from source control with zero extra configuration. This technique allows every new developer to slip right in a working application without wasting time on configuration.
The only missing piece to make that architectural design complete is finding a way to exclude development.conf from the final runtime of the app - otherwise this overrides leak into production runtime and obviously the application fails to run.
That can be achieved in various different ways.
One way could be to some how inject logic into the build task (provided as part of the sbt pluging PlayScala I assume) to exclude the file from the jar artifact.
Other way could be injecting logic into the docker image creation process. this logic could manually delete development.conf from the existing jar prior to executing it (assuming that's possible)
If you ever implemented one of the ideas offered,
or maybe some different architectural approach that gives the same "works out of the box" feature, please be kind enough to share :)
I usually have the inverse logic:
I use the application.conf file (that Play uses by default) with all the things needed to run locally. I then have a production.conf file that starts by including the application.conf, and then overrides the necessary stuff.
for deploying to production (or staging) I specify the production/staging.conf file to be used
This is how I solved it eventually.
conf/application.conf is production ready configuration, it contains placeholders for environment variables whom values will be injected in runtime by k8s given the service's deployment.yaml file.
right next to it, conf/development.conf - its first line is include application.conf and the rest of it are overrides which will make the application run out of the box right after git clone by a simple sbt run
What makes the above work, is the addition of the following to build.sbt :
PlayKeys.devSettings := Seq(
"config.resource" -> "development.conf"
Works like a charm :)
This can be done via the mappings config key of sbt-native-packager:
mappings in Universal ~= (_.filterNot(_._1.name == "development.conf"))
See here.
I'm very (very!) new to Spark and Scala. I've been trying to implement what I thought to be the easy task of connecting to a linux machine that has Spark on it, and running a simple code.
When I create a simple Scala code, build a jar from it, place it in the machine and run spark-submit, everything works and I get a result.
(like the "SimpleApp" example here: http://spark.apache.org/docs/latest/quick-start.html)
My question is:
Are all of these steps mandatory? ? Must I compile, build and copy the jar to the machine and then manually run it every I change it?
Assume that the jar is already on the machine, is there a way to run it (calling spark-submit) directly from a different code through my IDE?
Taking it a bit further, if lets say I want to run different tasks, do I have to create different jars and place all of them on the machine? Are there any other approaches?
Any help will be appreciated!
There are two modes of running your code either submitting your job to the server. or by running in local mode which requires no Spark Cluster to be setup. Most generally use this for building and testing their application on small data-sets and then build and submit the tasks as jobs for production.
Running in Local Mode
val conf = new SparkConf().setMaster("local").setAppName("wordCount Example")
Setting master as "local" spark along with your application.
If you have already Built you jars you can use the same by specifying the spark masters url and by adding the required jars you can submit the job to a remote cluster.
val conf = new SparkConf()
.setAppName("SubmitJobToCluster Example")
Using the spark conf you can initialize SparkContext in your application and use it either in a local or cluster setup.
val sc = new SparkContext(conf)
This is a old project spark-examples you have samples programs which you can run directly from your IDE.
So Answering you questions
Are all of these steps mandatory? ? Must I compile, build and copy the jar to the machine and then manually run it every I change it?
Assume that the jar is already on the machine, is there a way to run it (calling spark-submit) directly from a different code through my IDE?
Yes you can. The above example does it.
Taking it a bit further, if lets say I want to run different tasks, do I have to create different jars and place all of them on the machine? Are there any other approaches?
Yes You just need one jar containing all your tasks and dependencies you can specify the class while submitting the job to spark. When doing it pro-grammatically you have complete control over it.
I have a Scalatra web service that runs with embedded Jetty. I'd now like to write integration tests that:
start the service (using the main method of the application)
run the tests (driving the HTTP interface)
stop the service.
This should all be triggered by an SBT command.
How should I go about this?
You could write such integration tests on top of BDD test frameworks like Specs. Unfiltered project has many such examples that should work for other web frameworks like Scalatra.
For example, take a look at ServerSpec:
"A Server" should {
"respond to requests" in {
http(host as_str) must_== "test"
It's starting up a test server specified in setup and hitting it using Dispatch in the specification. The key part is implemented in unfiltered.spec.jetty.Served trait, which does that you described: starting and stopping the service. There's also Specs2 version: unfiltered.specs2.jetty.Served.
Another trick you could use is sbt-revolver, which my favorite plugin while doing any web development, especially used in conjunction with JRebel. This plugin can load your web server in the background. I haven't tried test together, but it could work if you don't have to change the server-side during the test.
Scalatra offers a DSL to write tests. There is support for specs2 and scalatest.
By default an embedded Jetty will be used for testing. If you want to provide your own server, you can reuse the EmbeddedJettyContainer implementation and override start, stop and servletContextHandler.
start will be called before executing the tests, which allows to start your server if required. stop is called after the tests. servletContextHandler is required in order to add your servlets using addServlet(..).
This is from the spec2 integration:
trait BaseScalatraSpec extends SpecificationStructure with FragmentsBuilder with ScalatraTests {
override def map(fs: =>Fragments) = Step(start()) ^ super.map(fs) ^ Step(stop())
trait ScalatraTests extends EmbeddedJettyContainer with HttpComponentsClient { }
Alternatively you can provide your own Container implementation.
I need some advice on configuring a project so it works in development, staging and production environments:
I have a web app project, MainProject, that contains two sub-projects, ProjectA and ProjectB, as well as some common code, Common. It's in a Subversion repository. It's nearly all HTML, CSS and JavaScript.
In our current development environment we check MainProject out, then set up Apache virtual hosts to point at each of the sub-project's directories, as paths within each project are relative to their root. We also have a build process that then compiles each of the sub-projects into their own deliverable package, with the common code copied into each.
So - I'm trying to make development of this project a bit easier. At the moment there is a lot of configuration of file paths in Apache http.conf files, as well as the build.xml file and in a couple of other places too.
Ideally I'd like the project to be checked out of SVN onto a fresh computer, with a web server as part of the project, fully configured, that can then be run from the checkout directory with very little extra configuration, either on a PC or Mac. And I'd like anyone to be able to run the build to compile it too.
I'd love to hear from anyone who has done something like this, and any advice you have.
If you can add python as a dependency, you can get a minimal HTTP server running in less than ten lines of code. If you have basic server side code, there is a CGI server as well.
The following snippet is copied directly from the BaseHTTPServer documentation
import BaseHTTPServer
def run(server_class=BaseHTTPServer.HTTPServer,
server_address = ('', 8000)
httpd = server_class(server_address, handler_class)
I've done this with Jetty, from within Java. Basically you write a simple Java class that starts Jetty (which is a small web server) - you can make then this run via an ant task (I used it with automated tests - Java code made requests to the server and checked the results in various ways).
Not sure it's appropriate here because you don't mention Java at all, so apologies if it's not the kind of thing you're looking for.