Can SBT scopes be used for custom libraryDependencies for specific code blocks? - scala

I've a simple SBT project, in which one code block reads from HDFS (needs a certain version of Hadoop's libraryDependencies) and another code block (needs another version of Hadoop's libraryDependencies) writes the filtered result to Cassandra.
Can SBT scopes be used to assign a different libraryDependencies to the two code blocks?

You can do this, but you have to split your code over one of the scope axises: Project, configuration, task. The only axis that can be used for your purpose is the "project" axis. So you have to create a multi-project sbt project and split your code on its sub projects.
But his will not solve your problem. Because you will not be able to run the resulting application. The Java class loader has no way to decide, when to use the one version of Hadoop and when the other. It will load one version of the classes in question and then use it in all cases.
For this task you have to use a context aware class loader. An example for this is an OSGi container, like Apache Feilx. OSGi is version aware and can load different versions of the same library in the same Java process. It will then reference to the classes of the correct version of the library depending on the context the library is used.
To be more precise: You must convert your different versions of your Hadoop library into OSGi bundles. Then you must split your code into mutliple OSGi bundles, each with a dependency of the correct version of the Hadoop bundle in its meta data (Manifest file). When you want to start you application, you must run it in an OSGi container.
This can be done, but is quite complex. Better to clean up your code, so you only depend on one version of the Hadoop library.

Related

sbt-assembly: Generate a minimal JAR file

I've been using sbt-assembly to generate standalone JAR file for my scala project. However, I would like to reduce the size of my JAR file (its currently around 150MB and there's defintely room for improvement there).
I used the following command to list the contents of the JAR file that's produced:
jar tf <JAR file>
This revealed that there are lots of classes in the generated JAR file that are not used in the project. I believe these classes get included as part of third-party JARs.
Questions
(a) Is there an option that I can use to instruct sbt-assembly to generate a minimal JAR file that does not include the third-party classes that are not used in my project?
(b) I could use AssemblyStrategy to manually specify which files need to be excluded. Is this a sound strategy? I'm a bit concerned that with this approach the JAR file might end up throwing unexpected ClassNotFound exceptions.
Thanks in advance.
It's not easy to say what's used in your project and what is not. If you include a dependency into a project it might bring a few other ones in. Those child dependencies might also require their own dependencies and so on.
By default if you include some dependency in your project you intend to use it. The author of a dependency usually does the same thing. Thus, there is usually not much you can throw away, it's there for a reason. There are couple cases when this is not true:
Dependency author includes additional dependencies that will be used only in some settings, and that does not apply to your project
You are using a mega-dependency when you actually need only one of its libraries/features.
There are counter examples to this as well: Scalatest does not ship pegdown for generating html test reports because you don't need it usually. But it might be needed if you try to use -h flag to generate html.
Imagine the case when you use Apache Tika for pdf parsing. It wraps PDFBox to do the parsing. You don't need a bloat of all other libraries in that case that parse MS documents. The best thing to do is not to exclude files manually via sbt exclude or sbt-assembly rules because there is a risk you get it wrong and get run time class loading exception. Instead you need to use the right dependency like PDFBox directly. Unfortunately this is a lot of manual work in many cases to figure out all dependencies that you need, so it's your choice: easy and fat JAR, or painful and lean.
There are two ways to exclude dependencies:
Exclude transitive dependencies with exclude. See the docs here.
Don't use the top level dependency and manually add its subdependencies as you need them.
Ok, one more less fun option: use provided and make sure libraries are copied to your target environment and are on classpath. If you have many jars using the same libraries this helps to share those.
You can visualize your dependency tree with this plugin: https://github.com/jrudolph/sbt-dependency-graph. It's very helpful when trying to figure out what you are using and what you can remove. There are some tools like tattletale and loosejar that people suggest but I haven't tried them. If anyone has experience with those please share.
What might want to look at are treeshakers
For Java there's the following (I have not tried/used it):
http://proguard.sourceforge.net/

Generate a JAR from one Scala source file

I have no Scala experience, but I need to create a JAR to include on a project's classpath from a single Scala source file.
I'm thinking there is a relatively straightforward way to do this, but I can't seem to figure it out.
The Scala file is here: http://pastebin.com/MYqjNkac
The JAR doesn't need to be executable, it just needs to be able to be referenced from another program.
The most convenient way is to use some build tool like Sbt or Maven. For maven there is the maven-scala-plugin plugin, and for Sbt here is a tutorial.
If you don't want to use any build tool, you may want to compile the code with scalac and then create the jar file manually by using zip on the resulting class files and renaming it to jar. But you have to preserve the directory structure. In your pastebin you use the package org.apache.spark.examples.pythonconverters, so make sure the directories match.
Btw, if you want to just integrate this piece of code with your java project, and using maven, you can have the scala code in your 1 project as well (in src/main/scala). Just use the maven-scala-plugin plugin and hook it to the compile phase, or some sooner phase if your Java code depends on it. However, I don't recommend mixing multiple languages in one project, I would split it into two separate ones.

Scala Play messages file to inline or reuse the version in build.sbt

I have a Scala Play project and currently I show the current application version at some location in my main template. The version I can easily define in the conf/messages file. However, since I have an automated build for creating releases, the release iterations will update the build.sbt increasing the version according to the release there e.g. version := "1.0.6-SNAPSHOT"
I could use the same mechanics during the release to update my conf/messages file as well but instead I would prefer to have my conf/messages file including the version information from build.sbt e.g. alla application.version=${sbt.application.version}.
How can I accomplish this? is it possible at all?
UPDATE: it is worth mentioning that in Maven these build settings become Java system properties and can be easily used.
You can use sbt-buildinfo plugin to generate a Scala source based on the build.sbt.
The plugin generates a BuildInfo object, which contains information you can then use to display the application version.
Otherwise I don't think you can access sbt information from your configuration.
You can use the xsbt-filter plugin to achieve this. It basically works like Maven's resource filtering mechanism, and exposes the project's name, version, etc. by default. You can further configure it to expose other properties.

Setting up actions for multiple test folders in SBT using 'Simple' configuration

This is actually a duplicate of sorts of Setting up actions for multiple test folders in SBT, however the answer in that one specifically uses the Scala syntax for SBT.
In our project - currently at SBT 0.10.1, but I hope we can upgrade to 0.11 soon - we use the 'simple' configurating using SBT's own DSL.
How can I create separate testing tasks / commands in SBT for different folders? In my specific case, I'd like a batch of regular unit tests and a batch of integration tests.
A secondary question, is it possible - with SBT - to alter a Java property? For the integration tests, I'd like to set a property called 'env' to 'testing' (or 'integration-testing' soon), so that a different MongoDB database is accessed. When starting up the application, I can do this using -Denv=testing, but is it possible to do this in SBT instead?
You can use the simple configuration in conjunction with the Scala-based configuration, details are here. So you should be able to use the advice in the other question and leave your build.sbt untouched or only make minimal changes. I do this dual configuration frequently to define sub-projects and project dependencies, but keep the simplicity of adding library dependencies.
As for your second question, maybe you should make that a separate question, as I would like to know that as well :)

Selectively include dependencies in JAR

I have a library that I wrote in Scala that uses Bouncy Castle and has a whole bunch of dependencies. When I roll a jar, I can either roll a "fat" jar that has all the dependencies (including scala), which weighs in around 19 MB, or I can roll a skinny jar, which doesn't have dependencies, but is only a few hundred KB.
The problem is that I need to include the Bouncy Castle classes/jar with my library, because if its not on the classpath at runtime, all kinds of exceptions get thrown.
So, I think the ideal situation is if there is some way that I can get either Maven or SBT to include some but not all dependencies in the jar that gets rolled. Some dependencies are needed at compile-time, but not at run time, such as the Scala standard libraries. Is there some way to get that to happen?
Thanks!
I would try out the sbt proguard plugin from https://github.com/nuttycom/sbt-proguard-plugin . It should be able to weed out the classes that are not in use.
If it is sufficient to explicitly define which dependencies should be added (one the artifact-level, i.e., single JARs), you can define an assembly (in case of a single project) or an additional assembly project (in case of a multi-module project). Assembly descriptors can explicitly exclude/include artifacts from the dependencies.
Here is some good documentation on this topic (section 8.5.4), here is the official documentation.
Note that you can include all artifacts that belong to one group by using the wildcard notation in dependecySets, e.g. hibernate:*:jar would include all JAR files belonging to the hibernate group.
Covering maven...
Because you declare your project to be dependent upon bouncy castle in your maven pom, anybody using maven to depend upon your library will by default pull in bouncy castle as a transitive dependency.
You should set the appropriate scope on your dependencies, e.g. compile for stuff needed at compile and runtime, test for dependencies only needed in testing and provided for stuff you expect to be provided by the environment.
Whether your library's dependencies are packaged into dependent projects when they are built is a question of how those are projects configured and setting the scopes will influence the default behaviour.
For example, jar type packaging by default does not include dependencies, whereas war will include those in compile scope (but not test or provided). The design aim here was to have packaging plugins behave in the most commonly required way without needing configuration, but of course packaging plugins in maven can be configured to have different behaviour if needed. The plugins themselves which do packaging are well documented at the apache maven site.
If users of your library are unlikely to be using maven to build their projects, an option is to use the shade plugin which will allow you to produce an "uber-jar" which contains all the dependencies you wish. You can configure particular includes or excludes.
This can be a problematic way to deliver, for example where your library includes dependencies which version clash with the direct dependencies of projects using it, i.e. they use a different version of the same libraries yours does.
However if you can it is best that you leave this to maven to manage so that projects using your library can decide whether they want your dependencies or to specify particular versions giving them more flexibility. This is the idiomatic approach.
For more information on dependencies and scopes in maven, see the reference guide published by Sonatype.
I'm not a scala guy, but I have played around with assembling stuff in Java + Maven.
Have you tried looking into creating your own assembly descriptor for the assembly plugin? https://maven.apache.org/plugins/maven-assembly-plugin/assembly.html
You can copy / paste the jar-with-dependencies descriptor then just add some excludes to your < dependencySet >. I'm not a Maven expert, but you should be able to configure it so different profiles kick off different assembly builds.
EDIT: Ack, didn't see my HTML got hidden