I have no Scala experience, but I need to create a JAR to include on a project's classpath from a single Scala source file.
I'm thinking there is a relatively straightforward way to do this, but I can't seem to figure it out.
The Scala file is here: http://pastebin.com/MYqjNkac
The JAR doesn't need to be executable, it just needs to be able to be referenced from another program.
The most convenient way is to use some build tool like Sbt or Maven. For maven there is the maven-scala-plugin plugin, and for Sbt here is a tutorial.
If you don't want to use any build tool, you may want to compile the code with scalac and then create the jar file manually by using zip on the resulting class files and renaming it to jar. But you have to preserve the directory structure. In your pastebin you use the package org.apache.spark.examples.pythonconverters, so make sure the directories match.
Btw, if you want to just integrate this piece of code with your java project, and using maven, you can have the scala code in your 1 project as well (in src/main/scala). Just use the maven-scala-plugin plugin and hook it to the compile phase, or some sooner phase if your Java code depends on it. However, I don't recommend mixing multiple languages in one project, I would split it into two separate ones.
Related
I've been using sbt-assembly to generate standalone JAR file for my scala project. However, I would like to reduce the size of my JAR file (its currently around 150MB and there's defintely room for improvement there).
I used the following command to list the contents of the JAR file that's produced:
jar tf <JAR file>
This revealed that there are lots of classes in the generated JAR file that are not used in the project. I believe these classes get included as part of third-party JARs.
Questions
(a) Is there an option that I can use to instruct sbt-assembly to generate a minimal JAR file that does not include the third-party classes that are not used in my project?
(b) I could use AssemblyStrategy to manually specify which files need to be excluded. Is this a sound strategy? I'm a bit concerned that with this approach the JAR file might end up throwing unexpected ClassNotFound exceptions.
Thanks in advance.
It's not easy to say what's used in your project and what is not. If you include a dependency into a project it might bring a few other ones in. Those child dependencies might also require their own dependencies and so on.
By default if you include some dependency in your project you intend to use it. The author of a dependency usually does the same thing. Thus, there is usually not much you can throw away, it's there for a reason. There are couple cases when this is not true:
Dependency author includes additional dependencies that will be used only in some settings, and that does not apply to your project
You are using a mega-dependency when you actually need only one of its libraries/features.
There are counter examples to this as well: Scalatest does not ship pegdown for generating html test reports because you don't need it usually. But it might be needed if you try to use -h flag to generate html.
Imagine the case when you use Apache Tika for pdf parsing. It wraps PDFBox to do the parsing. You don't need a bloat of all other libraries in that case that parse MS documents. The best thing to do is not to exclude files manually via sbt exclude or sbt-assembly rules because there is a risk you get it wrong and get run time class loading exception. Instead you need to use the right dependency like PDFBox directly. Unfortunately this is a lot of manual work in many cases to figure out all dependencies that you need, so it's your choice: easy and fat JAR, or painful and lean.
There are two ways to exclude dependencies:
Exclude transitive dependencies with exclude. See the docs here.
Don't use the top level dependency and manually add its subdependencies as you need them.
Ok, one more less fun option: use provided and make sure libraries are copied to your target environment and are on classpath. If you have many jars using the same libraries this helps to share those.
You can visualize your dependency tree with this plugin: https://github.com/jrudolph/sbt-dependency-graph. It's very helpful when trying to figure out what you are using and what you can remove. There are some tools like tattletale and loosejar that people suggest but I haven't tried them. If anyone has experience with those please share.
What might want to look at are treeshakers
For Java there's the following (I have not tried/used it):
http://proguard.sourceforge.net/
I am new to spark but am trying to do some development. I am following "Reducing Build Times" instructions from the spark developer page. After creating the normal assembly I have written some classes that are dependent on one specific jar. I test my package in the spark-shell in which I have been able to include my jar by using defining SPARK_CLASSPATH, but the problem lies in actually compiling my code. What I want to achieve is to include that jar when compiling my added package (with build/sbt compile). Could I do that by adding a path to my jar in build/sbt file or sbt-launch-lib.bash, and if so how?
(Side note: I do not want to yet include the jar in the assembly because as I go I make some changes to it, and so it would be inconvenient. I am using Spark 1.4)
Any help is appreciated!
Based on the answer in the comments above, it looks like you are trying to add your jar as a dependency to the the mllib project as you do development on mllib itself. You can accomplish this by modifying the pom.xml file in the mllib directory within the Spark distribution.
You can find instructions on how to add a local file as a dependency here - http://blog.valdaris.com/post/custom-jar/. I haven't used this approach myself to including local file as a dependency, but I think it should work.
I am new in Scala and I have to learn Scala and SBT, I read the sbt tutorial but i am unable to understand the use of sbt, for what purpose its been used.After reading this tutorial
I am still confused can any one will explain it in simple words, also suggest me if there is some tutorial for simple build tool
When you write small programs that consist of only one, or just two or three source files, then it's easy enough to compile those source files by typing scalac MyProgram.scala in the command line.
But when you start working on a bigger project with dozens or maybe even hundreds of source files, then it becomes too tedious to compile all those source files manually. You will then want to use a build tool to manage compiling all those source files.
sbt is such a tool. There are other tools too, some other well-known build tools that come from the Java world are Ant and Maven.
How it works is that you create a project file that describes what your project looks like; when you use sbt, this file will be called build.sbt. That file lists all the source files your project consists of, along with other information about your project. Sbt will read the file and then it knows what to do to compile the complete project.
Besides managing your project, some build tools, including sbt, can automatically manage dependencies for you. This means that if you need to use some libraries written by others, sbt can automatically download the right versions of those libraries and include them in your project for you.
I am designing a tool, that takes an sbt project path as a parameter. I would like to be able to build that given project on the fly, and be able to get its classpath.
I previously designed my tool as a sbt plugin to achieve this but it is not flexible enough for my purpose: I don't want to have to parameter anything in the sbt config files of the project I am studying.
I would like to use sbt externally, construct a project (from a sbt directory path) and compile it externally in my scala code without invoking sbt in a console. This is a reproduction in code of what happens when "sbt" is typed in a given directory in the console. Is there a straightforward way to achieve this?
I think you need to look at SBT jar file and source code. Find the "Main" class and call it programmatically. The code is here: https://github.com/sbt/sbt. The main class is: xsbt.boot.Boot. I got it from sbt jar file by unzipping it and looking at META-INF/MANIFEST.MF. So you can see how SBT passes command line arguments to it and take it from there. Here is the Boot class just in case: https://github.com/sbt/sbt/blob/0.13/launch/src/main/scala/xsbt/boot/Boot.scala. Have fun! :)
p.s. in your code just call Boot.main(<your sbt commands>).
I have a vaguely similar requirement. I produce a command-line tool as one of the deliverables from my project. The script launches the Scala runtime itself and naturally needs the effective class-path for the project's dependencies. To get that in an external form, I use the SBT-Start-Script plug-in. While that plug-in does produce an actual launcher, I need to do more than it provides, so I just use it to externalize the project's (current) class-path, which I extract into a shell array initialization in a separate source file that may be source-ed by the main launcher script.
This question is not limited to lex and yacc, but how can I add a custom script compiler as part of a project? For example, I have the following files in the project:
grammar.y
grammar.l
test.script
The binary 'script_compiler' will be generated using grammar.y and grammar.l compiled by lex, yacc and g++. And then I want to use that generated script_compiler to compile test.script to generate CompiledScript.java. This file should be compiled along with the rest of the java files in the project. This setting is possible with XCode or make, but is it also possible with Eclipse alone? If not, how about together with Maven plugin?
(I might setup the script compiler as a separate project, but it would be nice if they can be put in the same project so that changes to the grammar files can be applied immediately)
Thanks in advance for your help!
You can add a custom "Builder" from the project properties dialog. This can be an ant script (with an optional target) or any other script or executable.
There are also maven plugins for ant and other scripting languages
If you just want to run an external program in Maven this is what you want: http://mojo.codehaus.org/exec-maven-plugin/ -- you can then run Maven targets from your IDE or command line and it should do the right thing either way.
To integrate with the normal compilation bind the plugin to the "generate-sources" phase and add the location where the Java files are generated to the "sourceRoot" option of the exec plugin. That way the compiler will pick them up.
Ideally you generate the code into a folder "target/generated-sources/MY_SCRIPT_NAME". That is the standard location for generated sources in the Maven world and e.g. IntelliJ IDEA will pick up source files inside of that location. Note that this doesn't work if the files are directly in "target/generated-sources".
The other option is to write your own Maven plugin, which is actually quite easy as well. See e.g. https://github.com/peterbecker/maven-code-generator